The useful Schema Analysis function in SQL Central gives a useful set of recommendations. One of these identifies sub-optimal column ordering saying:
This inevitably happens in a database whose structure evolves over time, but unless I'm missing something there's no easy (or even feasible) way of addressing this in a large schema with hundreds of tables / thousands of columns, foreign keys etc. Since for other purposes, one really doesn't care about column order, what would seem to be a really useful feature, would be an option on dbunload / UNLOAD to optimize the column order automatically during the unload - presumably the primary key would come first, then columns involved in constraints, and then everything else in ascending size. While an automatically calculated order might not be entirely optimal, it is likely to be better than a more or less random one! Better ideas eagerly requested! |
IMO this exhortation is an incorrect description of third-order effect on runtime performance. It is not discussing first-order set-level retrieval performance, but the scalar analysis of columns in a single row: The order of the columns in a table affects performance. Columns in a row are accessed sequentially in the order of their creation. For example, to access columns at the end of a row, the database server traverses the columns that appear earlier in the row. You should order your columns so that narrow and/or frequently accessed columns are placed before seldom accessed and/or wider columns in the table. Wide columns are columns greater than 15 bytes in size, or LONG data types (for example, LONG VARCHAR), or columns defined as XML. Primary key columns are always stored at the beginning of row. In other words, the engine has to "step over" column values 1 to 10 in each row in order to reach column value 11. If you are really worried about these calculations, you shouldn't worry that 55+1 is going to be faster than 55+100 after you move all the short values to the front... because it isn't... addition performance is not value dependent. Note that it doesn't have to step-over-byte-by-byte, in case that's what you were thinking :) Instead, you should move all the MOST-FREQUENTLY-REFERENCED columns to the front, so the engine has to "step over" fewer columns (fewer calculations)... but I doubt that you will see any improvement, EVEN IF YOU CAN identify the most-frequently-referenced columns (go ahead, try :) The mention about LONG columns is particularly misleading since most of their data isn't even stored in the row. Suggestion: Try a benchmark test, betcha won't be able to measure the effect :).
I think that is what I was thinking it could be :}
(23 Jun '20, 11:27)
Justin Willey
|
I was about to second that suggestion.
However, while I would also suggest to put PK columns first, and possibly go on with UNIQUE KEY columns and NON NULL FK columns and then NULLable FK columns, strictly sorting all "normal columns" by size (what size? declared vs. real (average) size?) would seem rather "unsorted" to me. I guess then I'll prefer to stick with the historical order. :)
In other words: I'd either re-sort the order myself (in a "logical" order) or let it be as-is. That being said, I guess I have rarely designed tables where the PK was not the first column(s). And if so, then I would certainly try to correct that as soon as possible by manually re-creating the table (via INSERT WITH AUTO NAME SELECT...).
Certainly I can't think of an example where the primary key isn't the first column, but certainly some of the other stuff gets added a lot later!