The forum will be down for scheduled maintenance on Saturday, March 4 beginning at 10am EST. Actual downtime is unknown but may be several hours.

We currently use SQL Anywhere 12 in our consumer product.

With the absence of a "auto-shrink" feature in SQL Anywhere 12, in order to keep the databases manageable we automatically rebuild them (using dbunload via dbtools). However, we only do this if there appears to be a great amount of free/dirty pages in the database.

For example, if DB_PROPERTY('FreePages') + (DB_PROPERTY('CleanablePagesAdded') - DB_PROPERTY('CleanablePagesCleaned')) is above some amount on unload, then we will rebuild the database.

The problem is, the aforementioned properties don't seem to be accurate when we go to unload the database -- even after running the database cleaner. However, if the database is reloaded then these properties reflect the actual number of free pages.

To reproduce this (on a much smaller scale):

import sqlanydb
import os

def get_page_count(connection):
    """
    Gets the number of free and dirty pages.
    """
    cursor = connection.cursor()
    cursor.execute("SELECT CAST(DB_PROPERTY('FreePages') AS INT), CAST((DB_PROPERTY('CleanablePagesAdded')-DB_PROPERTY('CleanablePagesCleaned')) AS INT);")
    free, dirty = cursor.fetchone()
    cursor.close()

    return free, dirty

def run_cleaner(connection):
    """
    Forces the database cleaner to run
    """
    cursor = connection.cursor()
    cursor.execute('CALL sa_clean_database();')
    cursor.close()

def fill_with_data(connection):
    """
    Adds a bunch of data to the pad table.
    """
    cursor = connection.cursor()
    cursor.execute('BEGIN TRANSACTION;')
    cursor.execute('CREATE TABLE "Pad" ("foo" BINARY(1024), "bar" BINARY(1024));')

    for x in xrange(0, 1024):
        cursor.execute("INSERT INTO Pad (foo, bar) VALUES(CAST(newid()+newid()+newid()+newid() AS BINARY(1024)), CAST(newid()+newid()+newid()+newid() AS BINARY(1024)));")

    cursor.execute('COMMIT;')
    cursor.close()

def empty_data(connection):
    """
    Empty the data from the pad table.
    """
    cursor = connection.cursor()
    cursor.execute('BEGIN TRANSACTION;')
    cursor.execute('DELETE FROM Pad;')
    cursor.execute('COMMIT;')
    cursor.close()

def make_db():
    """
    Create the test database.
    """
    path = os.path.join(os.path.abspath(os.path.curdir), 'example.db')

    if os.path.exists(path):
        os.remove(path)

    connection = sqlanydb.connect(uid='DBA', pwd='sql', dbn='utility_db')

    cursor = connection.cursor()
    cursor.execute("CREATE DATABASE '%s';" % path)
    connection.close()

    return path

def main():
    path = make_db()
    connection = sqlanydb.connect(uid='DBA', pwd='sql', dbf=path)

    # Add a bunch of data
    fill_with_data(connection)
    print('After add: %d free pages, %d dirty pages' % get_page_count(connection))

    # Remove it
    empty_data(connection)
    print('After remove: %d free pages, %d dirty pages' % get_page_count(connection))

    # Run the cleaner
    run_cleaner(connection)
    print('After cleaner: %d free pages, %d dirty pages' % get_page_count(connection))

    connection.close()

    # Load the same database again
    connection2 = sqlanydb.connect(uid='DBA', pwd='sql', dbf=path)
    print('After reload: %d free pages, %d dirty pages' % get_page_count(connection2))
    connection2.close()

if __name__ == '__main__':
    main()

Which outputs:

$ python freepages-unload.py
After add: 4 free pages, 38 dirty pages
After remove: 4 free pages, 124 dirty pages
After cleaner: 90 free pages, 0 dirty pages
After reload: 97 free pages, 0 dirty pages

So the question is, why is the number of free pages different after a reload? Is it the cache? And is there any way to get an accurate reading? Some of these questions are also raised on the (very good) SQL Anywhere blog

asked 27 Feb '14, 19:28

Zachary%20Sims's gravatar image

Zachary Sims
21115
accept rate: 0%

Comment Text Removed

Why do you feel it is necessary to shrink the database file? When new rows are inserted, it is an expensive operation to grow the physical file as opposed to inserting the rows in an existing page.

...that came across a bit harsh :)

Reorgs (reloads) are often done for performance reasons, but they are often followed by ALTER DATABASE commands to add free space to optimize inserts.

Perhaps in your case, the number of rows grows for a short time, then shrinks and stays low for a long time... in that case shrinking the file would be a good idea... it would speed up a backup, for example.

(01 Mar '14, 13:45) Breck Carter

We understand the performance implications of "trimming" pages only to have them re-allocated. However, the main reason is portability of the database, which is important for how we're using SQL Anywhere (in an end-user desktop app.). We're looking at ways of minimizing the page growth (as mentioned int the SQL Anywhere Docs). But there are still situations where shrinking the file is important, e.g. if large chunks of data are deleted.

(02 Mar '14, 17:25) Zachary Sims

I believe that these properties are not updated in the system catalog until a checkpoint of the database. Try issuing an explicit CHECKPOINT before querying the properties.

permanent link

answered 27 Feb '14, 19:32

Mark%20Culp's gravatar image

Mark Culp
22.5k9129264
accept rate: 40%

That works -- thanks for the speedy reply.

New output:

$ python freepages-unload.py
After add: 12 free pages, 0 dirty pages
After remove: 12 free pages, 86 dirty pages
After cleaner: 98 free pages, 0 dirty pages
After reload: 97 free pages, 0 dirty pages
(27 Feb '14, 19:37) Zachary Sims

Does the cleaner break the rule that row positions are immutable? i.e., the page on which a row resides remains unchanged after insert.

(27 Feb '14, 21:11) Breck Carter

Is the database cleaner run as part of a CHECKPOINT?

It seems the CHECKPOINT helps in some cases but if (most) of a database is deleted and CHECKPOINT'd then the values still seem to be stale until the next reload?

(28 Feb '14, 00:02) Zachary Sims
Replies hidden
1

What makes you think the values are "stale"? A reload (a.k.a. reorg) absolutely positively changes everything about how rows are stored in pages, as opposed to the documentation about the cleaner which is vague about when it runs, what it actually does, and how thoroughly it does its job (well, the docs are fine, the cleaner's behavior is supposed to be vague :)

AFAIK the cleaner doesn't do everything a reorg does, otherwise it would be called "reorg" instead of "cleaner"... and a reorg definitely changes the number of free pages, it's why people run a reorg (or one of the reasons).

Note that two rules apply to rows: First, a row is not moved from the page it is originally inserted on. Second, a page can only hold rows from one table, not multiple tables. A reorg breaks the first rule, of course (that's why it's called "reorg")... and I am not sure if the cleaner can break the rule as well (hence my question above).

A re-reading of the cleaner docs reveals this phrase "reorganizing database pages for more efficient access"... does that apply to data pages, or index pages, or both? I am guessing that your testing proves the cleaner does NOT do everything a reorg (reload) does... perhaps you could call sa_clean_database() and then CHECKPOINT to see if forcing a long cleanup period causes it to match the effects of a reorg.

Example: Two pages contain one non-deleted row each, with lots of empty space on each page. At a minimum, a reorg will move one of those rows to a different page, thus creating one free page. Will the cleaner do that? Or does it just create a free page if all the rows on that page have been deleted?

FWIW suggestions to "run a CHECKPOINT" are often made because some statistical columns in the system catalog tables can become out-of-date and a CHECKPOINT will force them to be updated; e.g., the number of rows in SYSTAB.count. I believe the motivation behind letting these statistics become out of date is "nobody wants to spend the money updating the system catalog tables every single time a single row is inserted or deleted or etcetera"... which is true as far as I'm concerned :)

(28 Feb '14, 10:41) Breck Carter
2

The primary purpose of the cleaner is to 'clean' pages of rows which have been deleted. I.e. when a row is deleted it may not actually be removed from the table, but rather just marked as deleted... and it is the cleaner's role to find those rows (we keep track of where they are) and physically removed them from the table pages.

The cleaner never moves row "heads" (the piece that starts a row). AFAIK the reference to "reorganizing database pages for more efficient access" refers to moving data around within a database page, not between (from one to another) database pages.

(28 Feb '14, 21:25) Mark Culp

Growth in the number of free pages versus total pages is a first-order symptom of database file bloat, and it's easy to measure (FreePages versus FileSize).

Growth in the number of sparsely-filled pages is a second order symptom, harder to measure (see this answer describing sa_index_levels, sa_index_density, sa_table_fragmentation and sa_table_stats).

Growth in number of cleanable pages (which BTW is not the same as dirty pages) is probably a third- or fourth-order symptom, not worth the effort.

Suggestion: Just measure free pages versus total pages, see if it works well over time.

If it doesn't, then try adding off-peak REORGANIZE statements ahead of the free page versus total page check... the REORGANIZE operations will increase the number of free pages.

If you discover that measuring the number of cleanable pages is important, I will eat Surströmming :)

alt text

permanent link

answered 01 Mar '14, 14:15

Breck%20Carter's gravatar image

Breck Carter
26.8k422580826
accept rate: 20%

edited 01 Mar '14, 14:20

Thanks for the reply. I agree that the "larger than expected" file sizes are symptoms that directly relate to how we're using SQL Anywhere, e.g. we currently don't do what's recommended on http://dcx.sybase.com/index.html#1201/en/dbadmin/da-unexpected.html

Forgive my re-definition of dirty pages (cleanable pages, rather than those that need to be flushed). REORGANIZE isn't a cheap operation? The stat we're after is "how big will the database be after a rebuild," something that FreePages vs FileSize doesn't seem to tell us (even with a CHECKPOINT). And a reload certainly doesn't do a REORGANIZE?

(02 Mar '14, 18:23) Zachary Sims
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×38

question asked: 27 Feb '14, 19:28

question was seen: 931 times

last updated: 02 Mar '14, 18:23