Update: Nick's comment is appreciated... is it the official response from Engineering?


What is the Sybase Central 16 "Fragmentation" tab showing?

The following screen shot from Sybase Central in SQL Anywhere 16.0.0.1915 says the rroad_group_2_property_pivot table has "1,224 table fragments" (see the bottom right corner).

What does that mean, and where does it come from?

In other words, does it affect query performance, and is there a system procedure I can call to get that number, plus the data used to display the pretty picture?

Testing indicates it does not have anything to do with "file fragmentation":

SELECT DB_PROPERTY ( 'DBFileFragments' );

DB_PROPERTY('DBFileFragments')
'2'

Testing also indicates it does not have anything to do with "table fragmentation":

SELECT * FROM sa_table_fragmentation ( 'rroad_group_2_property_pivot', 'DBA' );

TableName,                       rows,     row_segments,  segs_per_row
'rroad_group_2_property_pivot',  1110934,  1110934,       1.0

Is "1,224 table fragments" saying that the table data for rroad_group_2_property_pivot is being stored in 1,224 non-contiguous blocks of pages within the database file (they are non-contiguous with respect to pages containing data for other tables, NOT that they are non-contiguous with respect to the disk drive... DBFileFragments = 2)?

If so, why should I care about that?

If I should care about it, how can I change the number? The REORGANIZE TABLE statement does not appear to directly affect that number; in fact, REORGANIZE TABLE can make Sybase Central's "table fragments" go UP as well as down.

If NOT, then be aware that the Sybase Central Fragmentation display is frightening the smallfolk... it is frightening ME :)

alt text

asked 26 Sep '14, 07:54

Breck%20Carter's gravatar image

Breck Carter
26.2k430599862
accept rate: 20%

edited 10 Oct '14, 17:53

I'd vote up your interim update if I could:)

(30 Sep '14, 03:17) Volker Barth

Reorganize table seems to use the first empty space it finds in the database file. Which means, that a fragmented empty space will not help you here.

(30 Sep '14, 06:51) Martin
2

There are 2 different types of fragmentation vying for your attention on this same pane and that is always a recipe for confusion and maybe a bit of fear.

The reorganize table/index and the sa_table_fragmentation() output is about row fragmentation and that can be important to performance.

In contrast the graph shows a page map in the dbspace of the table or index chosen. As such, at most, it is going to show table page-level fragmentation and more akin to a file fragmentation you would get from a file system defragger utility. Due to caching, checkpointing and other sever features this level may not be as important to performance unless you are regularily doing a table scan in a cold cache scenario. It may also contribute to longer then expected checkpoint times. Startup times may also be affected by this level in some cases (during cache warming or queries frequently ran at startup).

There is additional information inside the bars of this graph that may help in some extreme scenarios ... but that does tend to distract from other lines of investigation around row fragementation, index densities, etc.

This is just a comment ... do feel free to express your opinions about other ways of doing things here ...

(30 Sep '14, 13:25) Nick Elson S...
Replies hidden

> 2 different types of fragmentation

As far as I can tell, the "other" type of fragmentation is not discussed at all, anywhere, in any of the documentation.

OTOH, I haven't read the docs cover-to-cover since V6, so it is with great anticipation that I await correction :)

alt text

(01 Oct '14, 17:24) Breck Carter

So is it right to say that the way to address this (dbspace) type of fragmentation is with an unload / reload (preferably into a pre-built un-(at disk level)-fragmented dbspace file)?

(07 Oct '14, 09:03) Justin Willey

According to Breck that is going to be revealed 12/06...

(07 Oct '14, 09:59) Volker Barth

In my humble understanding, a rebuild should decrease page-level fragmentation - e.g. the typical separation between

  • CREATE TABLE...,
  • LOAD TABLE... and
  • ALTER TABLE ADD FOREIGN KEY... / CREATE INDEX

steps during a rebuild should lead to separated ranges for data and index pages, as Ivan has explained here... It should also (like a REORGANIZE) lead to improved row storage and therefore to lesser extension pages, so I'd guess it would help both kind of fragmentations...

Disk-level fragmentation could usually be decreased with tools like SysInternals's contig even fpr running databases.

As stated, in my humble understanding:)

(07 Oct '14, 10:29) Volker Barth
showing 3 of 7 show all flat view

Here are my responses to your questions:

Q: What is the Sybase Central 16 "Fragmentation" tab showing?

It’s showing a page map of the dbspace containing the selected table or index. Basically, it shows which pages are used for the selected object, and provides a visual display of the level of page fragmentation.

Selecting a table in the list shows its table and extension pages in the page map. In addition, it shows index pages for any indexes on the table that reside in the same dbspace as the table. Selecting an index in the list shows its index pages in the page map.

Q: The following screen shot from Sybase Central in SQL Anywhere 16.0.0.1915 says the rroad_group_2_property_pivot table has "1,224 table fragments" (see the bottom right corner). What does that mean, and where does it come from?

It means that the pages for the table’s data are spread across 1,224 non-contiguous blocks within the dbspace. In contrast, if all pages for the table’s data were contiguous within the dbspace, then there would be 1 table fragment.

This information comes from the table_page_list column in SYSTAB. Similarly, the extension fragments information comes from the ext_page_list column in SYSTAB. The index fragments information comes from SYSPHYSIDX.allocation_bitmap.

Q: In other words, does it affect query performance, and is there a system procedure I can call to get that number, plus the data used to display the pretty picture?

It can affect query performance. If there are a lot of small fragments (say, one or two pages in size) and the fragments are spread widely across the dbspace, then the server requires more 64K reads (and potentially more drive head movement) to do a sequential scan of the table.

There is no system procedure that returns this information. However, Sybase Central uses the following function to count fragments:

CREATE TEMPORARY FUNCTION count_fragments( bitmap LONG VARBIT )
RETURNS INTEGER
BEGIN
    DECLARE num_fragments INTEGER;
    SELECT SUM( is_first_bit ) INTO num_fragments
    FROM (
    SELECT bitnum, IF bitnum = MIN( bitnum ) OVER ( ORDER BY bitnum RANGE
    BETWEEN 1 PRECEDING AND CURRENT ROW ) THEN 1 ELSE 0 ENDIF AS is_first_bit
    FROM dbo.sa_get_bits( bitmap )
    ) DT;
    RETURN num_fragments;
END;

When a table is selected, Sybase Central executes the following SQL to obtain the page map and fragmentation information for the table (and any indexes on the table that reside in the same dbspace as the table):

WITH MergedIndexBitmaps AS (
SELECT T.table_id, BIT_OR( P.allocation_bitmap ) AS idxs_bitmap
FROM SYS.SYSPHYSIDX P
JOIN SYS.SYSIDX I ON I.table_id = P.table_id AND I.phys_index_id = P.phys_index_id
JOIN SYS.SYSTAB T ON T.table_id = I.table_id AND T.dbspace_id = I.dbspace_id
GROUP BY T.table_id )
SELECT LOCATE( T.tab_page_list, '1', 1 ) AS tab_first_page,
LOCATE( T.tab_page_list, '1', -1 ) AS tab_last_page,
COMPRESS( SUBSTR( T.tab_page_list, tab_first_page, tab_last_page-tab_first_page+1 ), 
'gzip' ) AS tab_bitmap_zip,
count_fragments( T.tab_page_list ),
LOCATE( T.ext_page_list, '1', 1 ) AS ext_first_page,
LOCATE( T.ext_page_list, '1', -1 ) AS ext_last_page,
IF T.ext_page_count > 0 THEN
COMPRESS( SUBSTR( T.ext_page_list, ext_first_page, ext_last_page-ext_first_page+1 ), 
'gzip' )
ENDIF AS ext_bitmap_zip,
count_fragments( T.ext_page_list ),
LOCATE( M.idxs_bitmap, '1', 1 ) AS idxs_first_page,
LOCATE( M.idxs_bitmap, '1', -1 ) AS idxs_last_page,
IF COUNT_SET_BITS( M.idxs_bitmap ) > 0 THEN
COMPRESS( SUBSTR( M.idxs_bitmap, idxs_first_page, idxs_last_page-idxs_first_page+1 ), 
'gzip' )
ENDIF AS idxs_bitmap_zip
FROM SYS.SYSTAB T
JOIN SYS.SYSUSER U ON U.user_id = T.creator
LEFT OUTER JOIN MergedIndexBitmaps M ON M.table_id = T.table_id
WHERE T.table_name = '<table-name>' AND U.user_name = '<table-owner>';

When an index is selected, Sybase Central executes the following SQL to obtain the page map and fragmentation information for the index:

SELECT LOCATE( P.allocation_bitmap, '1', 1 ) AS idx_first_page,
LOCATE( P.allocation_bitmap, '1', -1 ) AS idx_last_page,
COMPRESS( SUBSTR( P.allocation_bitmap, idx_first_page, idx_last_page-idx_first_page+1 ), 
'gzip' ) AS idx_bitmap_zip,
count_fragments( P.allocation_bitmap )
FROM SYS.SYSPHYSIDX P
JOIN SYS.SYSIDX I ON I.table_id = P.table_id AND I.phys_index_id = P.phys_index_id
JOIN SYS.SYSTAB T ON T.table_id = I.table_id
JOIN SYS.SYSUSER U ON U.user_id = T.creator
WHERE I.index_name = '<index-name>' AND I.index_category = '<index-category>'
AND T.table_name = '<table-name>' AND U.user_name = '<table-owner>';

where index-category is defined as 1 for primary keys, 2 for foreign keys, and 3 for secondary indexes (unique constraints and indexes).

Q: Testing indicates it does not have anything to do with "file fragmentation": SELECT DB_PROPERTY ( 'DBFileFragments' );

DB_PROPERTY('DBFileFragments') '2'

True. The page map doesn’t indicate the fragmentation of the dbspace file on disk, so for a given object, the fragment count within the dbspace is equivalent to the fragment count on the disk only if the dbspace file is contiguous on the disk.

Q: Testing also indicates it does not have anything to do with "table fragmentation": SELECT * FROM sa_table_fragmentation ( 'rroad_group_2_property_pivot', 'DBA' );

TableName, rows, row_segments, segs_per_row 'rroad_group_2_property_pivot', 1110934, 1110934, 1.0

True. This procedure displays information on row-splits. In this case rows and row_segments are equal if no rows are split across pages; otherwise, the number of row_segments will be greater than the number of rows. This info is unrelated to the number of table fragments.

Q: Is "1,224 table fragments" saying that the table data for rroad_group_2_property_pivot is being stored in 1,224 non-contiguous blocks of pages within the database file (they are non-contiguous with respect to pages containing data for other tables, NOT that they are non-contiguous with respect to the disk drive... DBFileFragments = 2)?

Exactly.

Q: If so, why should I care about that?

The number of table fragments is only useful if you also take into account the number of table pages. Consider the ratio of fragments/pages: If it is high (say, approaching 1.0) and the fragments are widely spaced across the dbspace, then the server requires more reads when doing a sequential scan.

Q: If I should care about it, how can I change the number? The REORGANIZE TABLE statement does not appear to directly affect that number; in fact, REORGANIZE TABLE can make Sybase Central's "table fragments" go UP as well as down.

If a table is suffering from excessive fragmentation within the dbspace, then REORGANIZE TABLE will remedy that (and also reorder the rows by the table’s primary key or clustered index). But REORGANIZE TABLE won’t reduce the number of fragments if there is little or no improvement to be gained. The number of fragments can increase by a relatively small amount as a result of a reorganization, but that isn’t a cause for concern if the fragments are closely spaced together (because the number of 64K reads won’t change).

You really need to take into account the number of table pages, the number of table fragments, and how closely the fragments are to each other when determining whether to do a REORGANIZE TABLE.

permanent link

answered 15 Oct '14, 15:48

Rich%20Jones's gravatar image

Rich Jones
33646
accept rate: 66%

The above answers by Nick and Rich are accurate. To restate:

  1. File level fragmentation. The database file is not stored contiguously. If you scan the file from beginning to end, the query optimizer will assume it is working sequentially but there will actually be F=DB_PROPERTY('DBFileFragments') jumps of some unknown distance. If F is small with respect to the entire file size (say 1-2%) then there is little expected impact. However, as F increases, the sequential scans take increasingly long, affecting table scans, backups, and other large operations.
  2. Row splits. The sa_table_fragmentation() procedure scans a table to find out how many rows are split across pages. When rows are split and a query references the columns not on the home page, there is a penalty for a random I/O to read the extension pages. The number of row splits for a table is not easily estimated from values currently stored in the system catalog.
  3. Fragmentation of table and index pages within the database file. The Fragmentation tab is showing the layout of the table and index pages within the database file and the number of discontinuous blocks of pages. There is much less importance for index pages to be grouped together because a) often they are accessed randomly and b) often they are retained in cache anyhow. For indexes that are used in index-only-retrieval, better performance will be found if the index pages are grouped close to each other with few gaps or at least few large gaps. The REORGANIZE TABLE does not typically improve the grouping of index pages.

The grouping of table pages can affect the performance for sequential scans of the table. The following is a sketch of how you can estimate the cost of scanning a table sequentially with the current layout of pages or after all gaps are removed:

create temporary function F_CostDTT(in @dist int)
returns int
begin
    set F_CostDTT = case when @dist <= 1 then 122 when @dist <= 40 then 209 when @dist <= 80 then 263 when @dist <= 200 then 582 when @dist <= 400 then 697 when @dist <= 800 then 701 else 3876 end ;
end;

create temporary procedure P_CostSequentialScan(in @bits long varbit)
result(est_seconds_if_all_gaps_removed double, est_seconds_current_layout double)
begin
    select sum(F_CostDTT(1))/1e6 as est_seconds_if_all_gaps_removed, sum(F_CostDTT(dist))/1e6 as est_seconds_current_layout
    from (
        select bitnum, coalesce(bitnum - min(bitnum) over ( order by bitnum rows between 1 preceding and 1 preceding ), 1 ) as dist
        from sa_get_bits(@bits)
    ) D;
end;

The F_CostDTT() approximates the disk transfer time using the defaults for a hard drive. You could update the constants using the sa_get_dtt() procedure or you could extend the function to do that properly. The real DTT linearly approximates between band positions but this sketch doesn't take that into account.

The P_CostSequentialScan() takes a table bitmap and estimates the cost of scanning it if all the pages are perfectly contiguous and with the current layout. An example of using the procedure:

    select T.table_name, C.*
from sys.systab T cross apply P_CostSequentialScan(tab_page_list) C
order by T."Count" desc

For example, one of my tables has a number of gaps between pages and the current estimated cost is 155ms and if it could be completely contiguous this would be 24ms.

There is another important benefit to REORGANIZE TABLE that is not currently captured by the Fragmentation tab: how closely the base table row order matches the clustered index for the table. The catalog contains estimates for each index of how closely the index matches the table order. You can use a query such as the following to report the "percent clustered" for each index:

select table_name, index_name, TCard, percent_clustered
from (
select T.table_name, I.index_name
    , case when T.clustered_index_id = I.index_id then 'Y' else 'N' end case as Clustered
    , T."count" as TCard, T.table_page_count, T.ext_page_count
    , PI.key_value_count, PI.leaf_page_count, PI.depth
    , (100.*(1-PI.rand_transitions/greater(TCard,1))) as percent_clustered
    , PI.seq_transitions, PI.rand_transitions, (100.*PI.rand_distance/greater(rand_transitions,1)) avg_rand_distance
from sys.systab T join sys.sysidx I on T.table_id = I.table_id 
    join sys.sysphysidx PI on I.table_id = PI.table_id and I.phys_index_id = PI.phys_index_id
where clustered='Y'
) D

Here the percent_clustered column shows what happens when scanning the index from beginning to end. The "percent_clustered" rows are either on the same page as the previous row or the immediately following page; the remainder of rows are some larger distance apart (2 or more pages).

permanent link

answered 05 Nov '14, 22:24

Ivan%20T.%20Bowman's gravatar image

Ivan T. Bowman
2.8k22732
accept rate: 39%

As to row splits:

In case a row split is a necessity because a row is that large that it simply won't fit on a single page (say, during its insert it has more than 4K (minus overhead) data in a database with 4K pages), will it occupy two (or more) continuous pages (still one base page and further extension pages) and therefore will be stored in an optimal fashion?

If my assumtion is correct: Is there any means to distinguish such necessary row splits from those that are caused by inserting "small" data and then later "fill the real values" by updates? - I guess sa_table_fragmentation() would simply return a higher segs_per_row value in both cases...

(06 Nov '14, 03:30) Volker Barth
Replies hidden
1

Volker, I must say I miss your presence at the summit. It is good to have your counterpoints in the discussion.

The sa_table_fragmentation() can tell you how many rows per segment but not WHY there is a split. If it is due to rows longer than a page, then reorganizing the table will not improve matters. If it is due to inserting short rows and later updating to larger values, then it would be improved by reorganization.

You can use values in SYS.SYSTAB to try to distinguish these two cases, but these values are only describing the entire table and won't really help if you have a lot of nicely behaved data and a small amount of different data.

Consider: select table_name, "count", "count"/table_page_count as rows_per_table_page, table_page_count, ext_page_count from sys.systab where table_name in ('TRowFragments','TLongRows');

Here we estimate the number of rows per base table page. If the rows were large when inserted (large with respect to page size), the rows per page will be low, perhaps as low as one if most of the table rows are large. On the other hand, with rows that grow after insertion, the rows per page will be higher (perhaps 30 or so). The sa_table_fragmentation() tells you how many row segments need to be read to completely read a row.

(06 Nov '14, 08:22) Ivan T. Bowman
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×242

question asked: 26 Sep '14, 07:54

question was seen: 3,141 times

last updated: 06 Nov '14, 08:22