Update: Nick's comment is appreciated... is it the official response from Engineering? What is the Sybase Central 16 "Fragmentation" tab showing? The following screen shot from Sybase Central in SQL Anywhere 16.0.0.1915 says the rroad_group_2_property_pivot table has "1,224 table fragments" (see the bottom right corner). What does that mean, and where does it come from? In other words, does it affect query performance, and is there a system procedure I can call to get that number, plus the data used to display the pretty picture? Testing indicates it does not have anything to do with "file fragmentation": SELECT DB_PROPERTY ( 'DBFileFragments' ); DB_PROPERTY('DBFileFragments') '2' Testing also indicates it does not have anything to do with "table fragmentation": SELECT * FROM sa_table_fragmentation ( 'rroad_group_2_property_pivot', 'DBA' ); TableName, rows, row_segments, segs_per_row 'rroad_group_2_property_pivot', 1110934, 1110934, 1.0 Is "1,224 table fragments" saying that the table data for rroad_group_2_property_pivot is being stored in 1,224 non-contiguous blocks of pages within the database file (they are non-contiguous with respect to pages containing data for other tables, NOT that they are non-contiguous with respect to the disk drive... DBFileFragments = 2)? If so, why should I care about that? If I should care about it, how can I change the number? The REORGANIZE TABLE statement does not appear to directly affect that number; in fact, REORGANIZE TABLE can make Sybase Central's "table fragments" go UP as well as down. If NOT, then be aware that the Sybase Central Fragmentation display is frightening the smallfolk... it is frightening ME :) |
Here are my responses to your questions: Q: What is the Sybase Central 16 "Fragmentation" tab showing? It’s showing a page map of the dbspace containing the selected table or index. Basically, it shows which pages are used for the selected object, and provides a visual display of the level of page fragmentation. Selecting a table in the list shows its table and extension pages in the page map. In addition, it shows index pages for any indexes on the table that reside in the same dbspace as the table. Selecting an index in the list shows its index pages in the page map. Q: The following screen shot from Sybase Central in SQL Anywhere 16.0.0.1915 says the rroad_group_2_property_pivot table has "1,224 table fragments" (see the bottom right corner). What does that mean, and where does it come from? It means that the pages for the table’s data are spread across 1,224 non-contiguous blocks within the dbspace. In contrast, if all pages for the table’s data were contiguous within the dbspace, then there would be 1 table fragment. This information comes from the table_page_list column in SYSTAB. Similarly, the extension fragments information comes from the ext_page_list column in SYSTAB. The index fragments information comes from SYSPHYSIDX.allocation_bitmap. Q: In other words, does it affect query performance, and is there a system procedure I can call to get that number, plus the data used to display the pretty picture? It can affect query performance. If there are a lot of small fragments (say, one or two pages in size) and the fragments are spread widely across the dbspace, then the server requires more 64K reads (and potentially more drive head movement) to do a sequential scan of the table. There is no system procedure that returns this information. However, Sybase Central uses the following function to count fragments:
When a table is selected, Sybase Central executes the following SQL to obtain the page map and fragmentation information for the table (and any indexes on the table that reside in the same dbspace as the table):
When an index is selected, Sybase Central executes the following SQL to obtain the page map and fragmentation information for the index:
where index-category is defined as 1 for primary keys, 2 for foreign keys, and 3 for secondary indexes (unique constraints and indexes). Q: Testing indicates it does not have anything to do with "file fragmentation": SELECT DB_PROPERTY ( 'DBFileFragments' ); DB_PROPERTY('DBFileFragments') '2' True. The page map doesn’t indicate the fragmentation of the dbspace file on disk, so for a given object, the fragment count within the dbspace is equivalent to the fragment count on the disk only if the dbspace file is contiguous on the disk. Q: Testing also indicates it does not have anything to do with "table fragmentation": SELECT * FROM sa_table_fragmentation ( 'rroad_group_2_property_pivot', 'DBA' ); TableName, rows, row_segments, segs_per_row 'rroad_group_2_property_pivot', 1110934, 1110934, 1.0 True. This procedure displays information on row-splits. In this case rows and row_segments are equal if no rows are split across pages; otherwise, the number of row_segments will be greater than the number of rows. This info is unrelated to the number of table fragments. Q: Is "1,224 table fragments" saying that the table data for rroad_group_2_property_pivot is being stored in 1,224 non-contiguous blocks of pages within the database file (they are non-contiguous with respect to pages containing data for other tables, NOT that they are non-contiguous with respect to the disk drive... DBFileFragments = 2)? Exactly. Q: If so, why should I care about that? The number of table fragments is only useful if you also take into account the number of table pages. Consider the ratio of fragments/pages: If it is high (say, approaching 1.0) and the fragments are widely spaced across the dbspace, then the server requires more reads when doing a sequential scan. Q: If I should care about it, how can I change the number? The REORGANIZE TABLE statement does not appear to directly affect that number; in fact, REORGANIZE TABLE can make Sybase Central's "table fragments" go UP as well as down. If a table is suffering from excessive fragmentation within the dbspace, then REORGANIZE TABLE will remedy that (and also reorder the rows by the table’s primary key or clustered index). But REORGANIZE TABLE won’t reduce the number of fragments if there is little or no improvement to be gained. The number of fragments can increase by a relatively small amount as a result of a reorganization, but that isn’t a cause for concern if the fragments are closely spaced together (because the number of 64K reads won’t change). You really need to take into account the number of table pages, the number of table fragments, and how closely the fragments are to each other when determining whether to do a REORGANIZE TABLE. |
The above answers by Nick and Rich are accurate. To restate:
The grouping of table pages can affect the performance for sequential scans of the table. The following is a sketch of how you can estimate the cost of scanning a table sequentially with the current layout of pages or after all gaps are removed:
The F_CostDTT() approximates the disk transfer time using the defaults for a hard drive. You could update the constants using the sa_get_dtt() procedure or you could extend the function to do that properly. The real DTT linearly approximates between band positions but this sketch doesn't take that into account. The P_CostSequentialScan() takes a table bitmap and estimates the cost of scanning it if all the pages are perfectly contiguous and with the current layout. An example of using the procedure:
For example, one of my tables has a number of gaps between pages and the current estimated cost is 155ms and if it could be completely contiguous this would be 24ms. There is another important benefit to REORGANIZE TABLE that is not currently captured by the Fragmentation tab: how closely the base table row order matches the clustered index for the table. The catalog contains estimates for each index of how closely the index matches the table order. You can use a query such as the following to report the "percent clustered" for each index:
Here the percent_clustered column shows what happens when scanning the index from beginning to end. The "percent_clustered" rows are either on the same page as the previous row or the immediately following page; the remainder of rows are some larger distance apart (2 or more pages). As to row splits: In case a row split is a necessity because a row is that large that it simply won't fit on a single page (say, during its insert it has more than 4K (minus overhead) data in a database with 4K pages), will it occupy two (or more) continuous pages (still one base page and further extension pages) and therefore will be stored in an optimal fashion? If my assumtion is correct: Is there any means to distinguish such necessary row splits from those that are caused by inserting "small" data and then later "fill the real values" by updates? - I guess sa_table_fragmentation() would simply return a higher segs_per_row value in both cases...
(06 Nov '14, 03:30)
Volker Barth
Replies hidden
1
Volker, I must say I miss your presence at the summit. It is good to have your counterpoints in the discussion. The sa_table_fragmentation() can tell you how many rows per segment but not WHY there is a split. If it is due to rows longer than a page, then reorganizing the table will not improve matters. If it is due to inserting short rows and later updating to larger values, then it would be improved by reorganization. You can use values in SYS.SYSTAB to try to distinguish these two cases, but these values are only describing the entire table and won't really help if you have a lot of nicely behaved data and a small amount of different data. Consider: select table_name, "count", "count"/table_page_count as rows_per_table_page, table_page_count, ext_page_count from sys.systab where table_name in ('TRowFragments','TLongRows'); Here we estimate the number of rows per base table page. If the rows were large when inserted (large with respect to page size), the rows per page will be low, perhaps as low as one if most of the table rows are large. On the other hand, with rows that grow after insertion, the rows per page will be higher (perhaps 30 or so). The sa_table_fragmentation() tells you how many row segments need to be read to completely read a row.
(06 Nov '14, 08:22)
Ivan T. Bowman
|
I'd vote up your interim update if I could:)
Reorganize table seems to use the first empty space it finds in the database file. Which means, that a fragmented empty space will not help you here.
There are 2 different types of fragmentation vying for your attention on this same pane and that is always a recipe for confusion and maybe a bit of fear.
The reorganize table/index and the sa_table_fragmentation() output is about row fragmentation and that can be important to performance.
In contrast the graph shows a page map in the dbspace of the table or index chosen. As such, at most, it is going to show table page-level fragmentation and more akin to a file fragmentation you would get from a file system defragger utility. Due to caching, checkpointing and other sever features this level may not be as important to performance unless you are regularily doing a table scan in a cold cache scenario. It may also contribute to longer then expected checkpoint times. Startup times may also be affected by this level in some cases (during cache warming or queries frequently ran at startup).
There is additional information inside the bars of this graph that may help in some extreme scenarios ... but that does tend to distract from other lines of investigation around row fragementation, index densities, etc.
This is just a comment ... do feel free to express your opinions about other ways of doing things here ...
> 2 different types of fragmentation
As far as I can tell, the "other" type of fragmentation is not discussed at all, anywhere, in any of the documentation.
OTOH, I haven't read the docs cover-to-cover since V6, so it is with great anticipation that I await correction :)
So is it right to say that the way to address this (dbspace) type of fragmentation is with an unload / reload (preferably into a pre-built un-(at disk level)-fragmented dbspace file)?
According to Breck that is going to be revealed 12/06...
In my humble understanding, a rebuild should decrease page-level fragmentation - e.g. the typical separation between
steps during a rebuild should lead to separated ranges for data and index pages, as Ivan has explained here... It should also (like a REORGANIZE) lead to improved row storage and therefore to lesser extension pages, so I'd guess it would help both kind of fragmentations...
Disk-level fragmentation could usually be decreased with tools like SysInternals's contig even fpr running databases.
As stated, in my humble understanding:)