SAP SQL Anywhere

Question

Update: Nick's comment is appreciated... is it the official response from Engineering?

What is the Sybase Central 16 "Fragmentation" tab showing?

The following screen shot from Sybase Central in SQL Anywhere 16.0.0.1915 says the rroad_group_2_property_pivot table has "1,224 table fragments" (see the bottom right corner).

What does that mean, and where does it come from?

In other words, does it affect query performance, and is there a system procedure I can call to get that number, plus the data used to display the pretty picture?

Testing indicates it does not have anything to do with "file fragmentation":

SELECT DB_PROPERTY ( 'DBFileFragments' );

DB_PROPERTY('DBFileFragments')
'2'

Testing also indicates it does not have anything to do with "table fragmentation":

SELECT * FROM sa_table_fragmentation ( 'rroad_group_2_property_pivot', 'DBA' );

TableName,                       rows,     row_segments,  segs_per_row
'rroad_group_2_property_pivot',  1110934,  1110934,       1.0

Is "1,224 table fragments" saying that the table data for rroad_group_2_property_pivot is being stored in 1,224 non-contiguous blocks of pages within the database file (they are non-contiguous with respect to pages containing data for other tables, NOT that they are non-contiguous with respect to the disk drive... DBFileFragments = 2)?

If so, why should I care about that?

If I should care about it, how can I change the number? The REORGANIZE TABLE statement does not appear to directly affect that number; in fact, REORGANIZE TABLE can make Sybase Central's "table fragments" go UP as well as down.

If NOT, then be aware that the Sybase Central Fragmentation display is frightening the smallfolk... it is frightening ME :)

alt text

Answer 1

Here are my responses to your questions:

Q: What is the Sybase Central 16 "Fragmentation" tab showing?

It’s showing a page map of the dbspace containing the selected table or index. Basically, it shows which pages are used for the selected object, and provides a visual display of the level of page fragmentation.

Selecting a table in the list shows its table and extension pages in the page map. In addition, it shows index pages for any indexes on the table that reside in the same dbspace as the table. Selecting an index in the list shows its index pages in the page map.

Q: The following screen shot from Sybase Central in SQL Anywhere 16.0.0.1915 says the rroad_group_2_property_pivot table has "1,224 table fragments" (see the bottom right corner). What does that mean, and where does it come from?

It means that the pages for the table’s data are spread across 1,224 non-contiguous blocks within the dbspace. In contrast, if all pages for the table’s data were contiguous within the dbspace, then there would be 1 table fragment.

This information comes from the table_page_list column in SYSTAB. Similarly, the extension fragments information comes from the ext_page_list column in SYSTAB. The index fragments information comes from SYSPHYSIDX.allocation_bitmap.

Q: In other words, does it affect query performance, and is there a system procedure I can call to get that number, plus the data used to display the pretty picture?

It can affect query performance. If there are a lot of small fragments (say, one or two pages in size) and the fragments are spread widely across the dbspace, then the server requires more 64K reads (and potentially more drive head movement) to do a sequential scan of the table.

There is no system procedure that returns this information. However, Sybase Central uses the following function to count fragments:

CREATE TEMPORARY FUNCTION count_fragments( bitmap LONG VARBIT )
RETURNS INTEGER
BEGIN
    DECLARE num_fragments INTEGER;
    SELECT SUM( is_first_bit ) INTO num_fragments
    FROM (
    SELECT bitnum, IF bitnum = MIN( bitnum ) OVER ( ORDER BY bitnum RANGE
    BETWEEN 1 PRECEDING AND CURRENT ROW ) THEN 1 ELSE 0 ENDIF AS is_first_bit
    FROM dbo.sa_get_bits( bitmap )
    ) DT;
    RETURN num_fragments;
END;

When a table is selected, Sybase Central executes the following SQL to obtain the page map and fragmentation information for the table (and any indexes on the table that reside in the same dbspace as the table):

WITH MergedIndexBitmaps AS (
SELECT T.table_id, BIT_OR( P.allocation_bitmap ) AS idxs_bitmap
FROM SYS.SYSPHYSIDX P
JOIN SYS.SYSIDX I ON I.table_id = P.table_id AND I.phys_index_id = P.phys_index_id
JOIN SYS.SYSTAB T ON T.table_id = I.table_id AND T.dbspace_id = I.dbspace_id
GROUP BY T.table_id )
SELECT LOCATE( T.tab_page_list, '1', 1 ) AS tab_first_page,
LOCATE( T.tab_page_list, '1', -1 ) AS tab_last_page,
COMPRESS( SUBSTR( T.tab_page_list, tab_first_page, tab_last_page-tab_first_page+1 ), 
'gzip' ) AS tab_bitmap_zip,
count_fragments( T.tab_page_list ),
LOCATE( T.ext_page_list, '1', 1 ) AS ext_first_page,
LOCATE( T.ext_page_list, '1', -1 ) AS ext_last_page,
IF T.ext_page_count > 0 THEN
COMPRESS( SUBSTR( T.ext_page_list, ext_first_page, ext_last_page-ext_first_page+1 ), 
'gzip' )
ENDIF AS ext_bitmap_zip,
count_fragments( T.ext_page_list ),
LOCATE( M.idxs_bitmap, '1', 1 ) AS idxs_first_page,
LOCATE( M.idxs_bitmap, '1', -1 ) AS idxs_last_page,
IF COUNT_SET_BITS( M.idxs_bitmap ) > 0 THEN
COMPRESS( SUBSTR( M.idxs_bitmap, idxs_first_page, idxs_last_page-idxs_first_page+1 ), 
'gzip' )
ENDIF AS idxs_bitmap_zip
FROM SYS.SYSTAB T
JOIN SYS.SYSUSER U ON U.user_id = T.creator
LEFT OUTER JOIN MergedIndexBitmaps M ON M.table_id = T.table_id
WHERE T.table_name = '<table-name>' AND U.user_name = '<table-owner>';

When an index is selected, Sybase Central executes the following SQL to obtain the page map and fragmentation information for the index:

SELECT LOCATE( P.allocation_bitmap, '1', 1 ) AS idx_first_page,
LOCATE( P.allocation_bitmap, '1', -1 ) AS idx_last_page,
COMPRESS( SUBSTR( P.allocation_bitmap, idx_first_page, idx_last_page-idx_first_page+1 ), 
'gzip' ) AS idx_bitmap_zip,
count_fragments( P.allocation_bitmap )
FROM SYS.SYSPHYSIDX P
JOIN SYS.SYSIDX I ON I.table_id = P.table_id AND I.phys_index_id = P.phys_index_id
JOIN SYS.SYSTAB T ON T.table_id = I.table_id
JOIN SYS.SYSUSER U ON U.user_id = T.creator
WHERE I.index_name = '<index-name>' AND I.index_category = '<index-category>'
AND T.table_name = '<table-name>' AND U.user_name = '<table-owner>';

where index-category is defined as 1 for primary keys, 2 for foreign keys, and 3 for secondary indexes (unique constraints and indexes).

Q: Testing indicates it does not have anything to do with "file fragmentation": SELECT DB_PROPERTY ( 'DBFileFragments' );

DB_PROPERTY('DBFileFragments') '2'

True. The page map doesn’t indicate the fragmentation of the dbspace file on disk, so for a given object, the fragment count within the dbspace is equivalent to the fragment count on the disk only if the dbspace file is contiguous on the disk.

Q: Testing also indicates it does not have anything to do with "table fragmentation": SELECT * FROM sa_table_fragmentation ( 'rroad_group_2_property_pivot', 'DBA' );

TableName, rows, row_segments, segs_per_row 'rroad_group_2_property_pivot', 1110934, 1110934, 1.0

True. This procedure displays information on row-splits. In this case rows and row_segments are equal if no rows are split across pages; otherwise, the number of row_segments will be greater than the number of rows. This info is unrelated to the number of table fragments.

Q: Is "1,224 table fragments" saying that the table data for rroad_group_2_property_pivot is being stored in 1,224 non-contiguous blocks of pages within the database file (they are non-contiguous with respect to pages containing data for other tables, NOT that they are non-contiguous with respect to the disk drive... DBFileFragments = 2)?

Exactly.

Q: If so, why should I care about that?

The number of table fragments is only useful if you also take into account the number of table pages. Consider the ratio of fragments/pages: If it is high (say, approaching 1.0) and the fragments are widely spaced across the dbspace, then the server requires more reads when doing a sequential scan.

Q: If I should care about it, how can I change the number? The REORGANIZE TABLE statement does not appear to directly affect that number; in fact, REORGANIZE TABLE can make Sybase Central's "table fragments" go UP as well as down.

If a table is suffering from excessive fragmentation within the dbspace, then REORGANIZE TABLE will remedy that (and also reorder the rows by the table’s primary key or clustered index). But REORGANIZE TABLE won’t reduce the number of fragments if there is little or no improvement to be gained. The number of fragments can increase by a relatively small amount as a result of a reorganization, but that isn’t a cause for concern if the fragments are closely spaced together (because the number of 64K reads won’t change).

You really need to take into account the number of table pages, the number of table fragments, and how closely the fragments are to each other when determining whether to do a REORGANIZE TABLE.

Answer 2

The above answers by Nick and Rich are accurate. To restate:

File level fragmentation. The database file is not stored contiguously. If you scan the file from beginning to end, the query optimizer will assume it is working sequentially but there will actually be F=DB_PROPERTY('DBFileFragments') jumps of some unknown distance. If F is small with respect to the entire file size (say 1-2%) then there is little expected impact. However, as F increases, the sequential scans take increasingly long, affecting table scans, backups, and other large operations.
Row splits. The sa_table_fragmentation() procedure scans a table to find out how many rows are split across pages. When rows are split and a query references the columns not on the home page, there is a penalty for a random I/O to read the extension pages. The number of row splits for a table is not easily estimated from values currently stored in the system catalog.
Fragmentation of table and index pages within the database file. The Fragmentation tab is showing the layout of the table and index pages within the database file and the number of discontinuous blocks of pages. There is much less importance for index pages to be grouped together because a) often they are accessed randomly and b) often they are retained in cache anyhow. For indexes that are used in index-only-retrieval, better performance will be found if the index pages are grouped close to each other with few gaps or at least few large gaps. The REORGANIZE TABLE does not typically improve the grouping of index pages.

The grouping of table pages can affect the performance for sequential scans of the table. The following is a sketch of how you can estimate the cost of scanning a table sequentially with the current layout of pages or after all gaps are removed:

create temporary function F_CostDTT(in @dist int)
returns int
begin
    set F_CostDTT = case when @dist <= 1 then 122 when @dist <= 40 then 209 when @dist <= 80 then 263 when @dist <= 200 then 582 when @dist <= 400 then 697 when @dist <= 800 then 701 else 3876 end ;
end;

create temporary procedure P_CostSequentialScan(in @bits long varbit)
result(est_seconds_if_all_gaps_removed double, est_seconds_current_layout double)
begin
    select sum(F_CostDTT(1))/1e6 as est_seconds_if_all_gaps_removed, sum(F_CostDTT(dist))/1e6 as est_seconds_current_layout
    from (
        select bitnum, coalesce(bitnum - min(bitnum) over ( order by bitnum rows between 1 preceding and 1 preceding ), 1 ) as dist
        from sa_get_bits(@bits)
    ) D;
end;

The F_CostDTT() approximates the disk transfer time using the defaults for a hard drive. You could update the constants using the sa_get_dtt() procedure or you could extend the function to do that properly. The real DTT linearly approximates between band positions but this sketch doesn't take that into account.

The P_CostSequentialScan() takes a table bitmap and estimates the cost of scanning it if all the pages are perfectly contiguous and with the current layout. An example of using the procedure:

    select T.table_name, C.*
from sys.systab T cross apply P_CostSequentialScan(tab_page_list) C
order by T."Count" desc

For example, one of my tables has a number of gaps between pages and the current estimated cost is 155ms and if it could be completely contiguous this would be 24ms.

There is another important benefit to REORGANIZE TABLE that is not currently captured by the Fragmentation tab: how closely the base table row order matches the clustered index for the table. The catalog contains estimates for each index of how closely the index matches the table order. You can use a query such as the following to report the "percent clustered" for each index:

select table_name, index_name, TCard, percent_clustered
from (
select T.table_name, I.index_name
    , case when T.clustered_index_id = I.index_id then 'Y' else 'N' end case as Clustered
    , T."count" as TCard, T.table_page_count, T.ext_page_count
    , PI.key_value_count, PI.leaf_page_count, PI.depth
    , (100.*(1-PI.rand_transitions/greater(TCard,1))) as percent_clustered
    , PI.seq_transitions, PI.rand_transitions, (100.*PI.rand_distance/greater(rand_transitions,1)) avg_rand_distance
from sys.systab T join sys.sysidx I on T.table_id = I.table_id 
    join sys.sysphysidx PI on I.table_id = PI.table_id and I.phys_index_id = PI.phys_index_id
where clustered='Y'
) D

Here the percent_clustered column shows what happens when scanning the index from beginning to end. The "percent_clustered" rows are either on the same page as the previous row or the immediately following page; the remainder of rows are some larger distance apart (2 or more pages).

SAP SQL Anywhere

What is the Sybase Central 16 "Fragmentation" tab showing?

Follow this question

Related questions