Corrupt column statistics. Queries running for hours, if not re-creating statistics periodically

In our product, we are running a database with the SQL Anywhere 12.0.1.3895. This database receives changes from another database through replication using SQL Remote every 30 seconds. So the data is more or less permanently changing . Another process deletes data from the database which is detected to be ‘old’. Therefore the database is not increasing in size. The size of the database under test is about 20 GB.

In our test environment we permanently execute various queries against the database. After several days, we found, that the queries are getting slower and slower. The same queries that took a few seconds at the beginning, take several minutes or even up to hours!

After we are performing a create statistics command on all our tables, the performance is back again. A few seconds for most of those queries. After several days the same behavior happens again. Even a drop statistics command on all our tables brings the performance back again.

What’s wrong? We assumed that the statistics are maintained by the engine itself. Just because this is described as a new feature for the Version 12 of SQL Anywhere: self-healing statistics management (http://download.sybase.com/presentation/TW2011/DATSQL1.pdf)

How can we keep the performance long term? Are there any special settings we need to configure? There were similar threads in this forum, but without any final solution:

http://sqlanywhere-forum.sap.com/questions/13709/low-performance-with-wrong-statistics and

http://sqlanywhere-forum.sap.com/questions/14029/query-plan-changes-radically

We have saved a database with such corrupt statistics (slow queries). So we can provide query plans, dumps of SYSCOLSTATS (taken after sa_flush_statistics) or histograms (taken with sa_get_histogram), before and after creating/dropping the statistics.

But the main question is: How do the statistics become corrupt?

Thanks for any ideas.

performance statistics long-running-query

asked 11 Jul '13, 10:00

MartinM
141●5●6●12
accept rate: 0%

If you have read through this thread already: http://sqlanywhere-forum.sap.com/questions/13709/low-performance-with-wrong-statistics

then have you diagnosed what is specifically "wrong" with the statistics between these two cases? Is your statistical histogram skewed to a few buckets so that the cardinality estimates are off? Or are the statistical values actually stored not reflective of the actual row counts in the database?

Knowing what is "wrong" first would help us to explain "how" they got that way in the database.

(11 Jul '13, 12:47) Jeff Albion

Replies hidden

You have a "high throughput database" which historically have been difficult from a performance and tuning point of view. This old Techwave presentation may or may not provide some ideas: Achieving Higher Throughput with Your SQL Anywhere Database

Your "main question is: How do the statistics become corrupt?" is wayyyy above my pay grade, as are all the implications of that question (is it a bug? will it be fixed? how can I keep it from happening?)

However, you did ask for "any ideas"... how about a BFW? (brute force workaround)

...code a scheduled EVENT that executes CREATE STATISTICS or DROP STATISTICS or whatever other command(s) that make the problem go away, every night at 2 AM or whatever time seems appropriate. Make sure to include some record-keeping inside the event so you can answer the question "did the event run OK last night?" since scheduled events have been known to [cough] up and not run.

You might also consider running REORGANIZE statements on one or more tables and/or indexes.

(11 Jul '13, 14:40) Breck Carter

Replies hidden

Thanks for your attention. It is really hard for me diagnosing the statistics. In the histograms I see only non-string columns, but our tables contains many string columns. All the PK and FK are UUIDs of type nchar. (Is this a problem ?). And most of the SYSCOLSTAT columns I cannot interpret at all, e.g. frequencies. So comparing the actual row counts to the statistical values would be possible for some columns only.

But, even if I drop the statistics for a table (then the row counts definitely differ from the statistics), this increases the performance significantly.

Is there anything I can trace during runtime, to see if there is a specific event that corrupts the statistics ?

What do you mean with 'histogram skewed to a few buckets' and 'cardinality estimates are off' ?

(12 Jul '13, 02:19) MartinM

Thanks for the Techwave presentation. This is more or less about increasing update/insert/delete performance.

We already thought about the BFW. But got some hints (also read here in the forum: http://sqlanywhere-forum.sap.com/questions/12887/is-create-statistics-safe-to-run-with-activity-in-the-database), that there is an significant impact to other activity in the database. So we were carefully to implement that permanently. You should know, we are not administering a specific database. The database is embedded into our product and is running in many, many hospitals collecting clinical data in the delivery ward. So therefore we are looking for a solution that might prevent the problem upfront. But, I know, the BFW might be our only chance.

Is the impact of drop statistics less than creating them? But if the drop the statistics every night. Why not just switching the generation of statistics off? Is this possible?

(12 Jul '13, 02:40) MartinM

Thinking out loud here... statistics aren't the problem, slow-running queries are. Bad analogy: if you're having a heart attack, you don't want advice about eating habits and exercise. So... studying the actual plans being used in a sample Bad Database and a sample Good Database might yield ideas about forcing a good plan to be used all the time. Studying plans is a non-trivial exercise because they are determined dynamically and any little thing (a butterfly flapping) can invalidate a test if you are not careful setting it up... the only analogy I can think of is debugging time-dependent hardware interrupt handling where actually LOOKING at the system behavior CHANGES the system behavior.

(12 Jul '13, 02:51) Breck Carter

Try this google search:

graphical plan site:sqlanywhere.blogspot.com

(12 Jul '13, 02:53) Breck Carter

If you do the work to obtain valid good and bad graphical plans, ones you have proven to accurately represent good and bad behavior, folks here would be happy to look at them. The words "valid" and "accurately" are important because the world is awash in crappy plans that don't represent reality, and those are a waste of everyone's time. It's almost like nobody teaches the Experimental Method any more... no, wait, they don't :)

...end rant :)

(12 Jul '13, 03:02) Breck Carter

Here's a WAG: Are you being killed by plan caching?

To disable: SET OPTION PUBLIC.MAX_PLANS_CACHED = '0';

FWIW that particular Dead Chicken is not currently in use for Foxhound (an embedded system somewhat similar to yours except for the part about yours being life critical).

Here's a Dead Chicken that IS in permanent use in Foxhound, except during the single-connection data upgrade process:

SET OPTION PUBLIC.MAX_QUERY_TASKS = '1';

...because I finally got sick of random-total-performance-degradation and decided the minor benefits of intra-query parallelism in a multi-connection environment were not worth the strife.

(12 Jul '13, 03:22) Breck Carter

re "significant impact to other activity in the database"... I read that thread differently, that it's OK to run during a quiet time, and having spent too much time in hospitals I know they have quiet times. One possibility might be a process (repeated EVENT) that watches for a quiet time by computing changes in various performance statistics (CPU, disk, whatever) and deciding "now's the time". I'm a Certified Master Of SQL Anywhere Events... seriously... Foxhound consists almost entirely of EVENT blocks... so help is available :)

(12 Jul '13, 03:32) Breck Carter

All the PK and FK are UUIDs of type nchar. (Is this a problem ?).

Statistics for what are essentially randomly generated strings will not be very meaningful in terms of data distribution for narrowing down the amount of values we'll need to scan through.

And most of the SYSCOLSTAT columns I cannot interpret at all, e.g. frequencies.

See sa_get_histogram() and dbhist.

But, even if I drop the statistics for a table (then the row counts definitely differ from the statistics), this increases the performance significantly.

DROP STATISTICS removes the statistical distributions and re-builds the distributions ranges (but does not populate the ranges). It may be the simple case that you are resetting the histogram ranges, thus resetting the estimated predicate selectivity.

Is there anything I can trace during runtime, to see if there is a specific event that corrupts the statistics ?

I don't think there is something specific to track down - and even by your own problem description, it takes "several days" for this problem to appear. It sounds like the aggregate of the operations over time is what is causing the skew.

What do you mean with 'histogram skewed to a few buckets' and 'cardinality estimates are off' ?

Here's a visual sample of what I mean by 'histogram skewed to a few buckets'. I have the same 10 random data values from 1-100 listed in the picture, but by dividing up the histogram range differently, I get two radically different histograms with different selectivity estimates for specific values. (Where the selectivity is the number of times a predicate is expected to appear in a single bucket).

If I searched for the value '18' (... WHERE col = 18), the selectivity in the histogram is the same (1/1) and (1/1), but the value '61' (... WHERE col = 61) has very different histogram estimates: (1/1) in the first case, but (1/9) in the second.

alt text

This difference in the statistical distributions provides the optimizer with different estimates for the cardinalities of joins (where we are expecting to pull back more/less rows than anticipated because of the estimate, thus changing the underlying access plan of the query).

By performing a DROP STATISTICS, you are manually resetting the histogram ranges in the above diagram.

(12 Jul '13, 13:16) Jeff Albion

...I see a broken link instead of the image in your comment.

(12 Jul '13, 14:34) Breck Carter

... actually, we have broken permissions. The image is now there, but nobody can see it. I have messaged our forum administrator about this issue - hopefully it will be resolved shortly.

(12 Jul '13, 15:39) Jeff Albion

I still can't see the picture. But I think, I understand what you are trying to explain: During the ongoing statistics update, the number of buckets are not changed, but only high/low values and selectivity of the given buckets, right?

Would it be sufficient to just drop the statistics periodically ? Without creating manually. When will they be created again ? Or when can I see something again in the SYSCOLSTAT ?

(15 Jul '13, 09:16) MartinM

Thanks for the hints and the motivation. I will try to get some more details. Will play with some options.

Is there a chance, this behavior is better in SQLAny 16 ? I mean the Statistics Governor was new in 12.

(15 Jul '13, 09:21) MartinM

the number of buckets are not changed, but only high/low values and selectivity of the given buckets, right?

Correct. (And this is more evident now that my Excel picture is there ... although my text example makes no sense now, since my random data was reset by the time I captured my picture... whoops).

The high/low values for the buckets aren't changed that often in the actual database statistics - DROP/CREATE STATISTICS forces this operation. This is why you are more likely to see "skew" on tables that are constantly being updated.

Would it be sufficient to just drop the statistics periodically ?

If that is what routinely 'solves the issue' for you, and you can verify between a 'fast' and a 'slow' graphical plan that the selectivity of the sargable predicates on your queries is the only thing that's changing between these conditions, then yes this may be an option for you. It would be best to confirm that this is indeed happening first though - I haven't heard of such evidence in this thread yet.

Without creating manually. When will they be created again ?

If you only 'DROP' statistics and do not 'CREATE' statistics, the statistics are populated as we execute SELECT/INSERT/UPDATE/DELETE operations over the existing values. Initial operations are possibly going to be slower, since we'll assume we'll need to scan larger amounts of rows without the statistics to help estimate the work.

(17 Jul '13, 16:00) Jeff Albion

Thanks for the explanations and confirmations. I'm working on getting some comparable plans.

But, still, is there a way to see whether the Statistic Governor has already detected the wrong estimates?

I wonder why it is not cleaned up automatically.

SELECT CONNECTION_PROPERTY ( 'update_statistics' ); returns 'On'.

Help says, that the statistics governor checks every 30 minutes when the statistics are flushed.

Is there a way to detect that the governor is running?

Help also says, that the governor stops maintenance on statistics that are hard to fix ...

Is there a way to get some more info about the current state of the governor?

Or am I wrong if I expect that this should be detected by the governor?

(22 Jul '13, 09:36) MartinM

showing 3 of 16 show all flat view

One Answer:

active answers oldest answers newest answers popular answers

Have you thought about fiddling around with the following options:

collect_statistics_on_dml_updates option
Controls the gathering of statistics during the execution of data-altering DML statements such as INSERT, DELETE, and UPDATE.
update_statistics option
Controls whether connections can send query feedback to the statistics governor.

As to the docs, both options are ON by default and should only be set to OFF in particular cases, and please note, personally, I'm not at all pretending to understand anything about this documented difference:

The difference between the collect_statistics_on_dml_updates option and the update_statistics option is that the update_statistics option compares the actual number of rows that satisfy a predicate with the number of rows that are estimated to satisfy the predicate, and then updates the estimates. The collect_statistics_on_dml_updates option modifies the column statistics based on the values of the specific rows that are inserted, updated, or deleted.

However, this might be a chance to prevent the system from undesired statistic updates and might be helpful if you find out that your SQL Remote generated "DML flood" (which will consist of many, many single-row DML operations, AFAIK) may be inappropriate as basis for collecting stats.

AFAIK, both options can be set for individual users and/or connections - to allow further fine-tuning.

Note: I'm much more wild-guessing than Breck claims to be...

permanent link

answered 12 Jul '13, 07:16

Volker Barth
40.2k●361●550●822
accept rate: 34%

edited 12 Jul '13, 07:26

flat view

Your answer

toggle preview

community wiki:

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

*italic* or _italic_
**bold** or __bold__
link:[text](http://url.com/ "title")
image?![alt text](/path/img.jpg "title")
numbered list: 1. Foo 2. Bar
to add a line break simply add two spaces to where you would like the new line to be.
basic HTML tags are also supported

learn more about Markdown

Question tags:

performance ×275
long-running-query ×18
statistics ×14

question asked: 11 Jul '13, 10:00

question was seen: 6,361 times

last updated: 22 Jul '13, 09:37

SAP SQL Anywhere

Corrupt column statistics. Queries running for hours, if not re-creating statistics periodically

Follow this question

Related questions