In our product, we are running a database with the SQL Anywhere 12.0.1.3895. This database receives changes from another database through replication using SQL Remote every 30 seconds. So the data is more or less permanently changing . Another process deletes data from the database which is detected to be ‘old’. Therefore the database is not increasing in size. The size of the database under test is about 20 GB. In our test environment we permanently execute various queries against the database. After several days, we found, that the queries are getting slower and slower. The same queries that took a few seconds at the beginning, take several minutes or even up to hours! After we are performing a What’s wrong? We assumed that the statistics are maintained by the engine itself. Just because this is described as a new feature for the Version 12 of SQL Anywhere: self-healing statistics management (http://download.sybase.com/presentation/TW2011/DATSQL1.pdf) How can we keep the performance long term? Are there any special settings we need to configure? There were similar threads in this forum, but without any final solution: http://sqlanywhere-forum.sap.com/questions/13709/low-performance-with-wrong-statistics and http://sqlanywhere-forum.sap.com/questions/14029/query-plan-changes-radically We have saved a database with such corrupt statistics (slow queries). So we can provide query plans, dumps of SYSCOLSTATS (taken after But the main question is: How do the statistics become corrupt? Thanks for any ideas. |
Have you thought about fiddling around with the following options:
As to the docs, both options are ON by default and should only be set to OFF in particular cases, and please note, personally, I'm not at all pretending to understand anything about this documented difference:
However, this might be a chance to prevent the system from undesired statistic updates and might be helpful if you find out that your SQL Remote generated "DML flood" (which will consist of many, many single-row DML operations, AFAIK) may be inappropriate as basis for collecting stats. AFAIK, both options can be set for individual users and/or connections - to allow further fine-tuning. Note: I'm much more wild-guessing than Breck claims to be... |
If you have read through this thread already: http://sqlanywhere-forum.sap.com/questions/13709/low-performance-with-wrong-statistics
then have you diagnosed what is specifically "wrong" with the statistics between these two cases? Is your statistical histogram skewed to a few buckets so that the cardinality estimates are off? Or are the statistical values actually stored not reflective of the actual row counts in the database?
Knowing what is "wrong" first would help us to explain "how" they got that way in the database.
You have a "high throughput database" which historically have been difficult from a performance and tuning point of view. This old Techwave presentation may or may not provide some ideas: Achieving Higher Throughput with Your SQL Anywhere Database
Your "main question is: How do the statistics become corrupt?" is wayyyy above my pay grade, as are all the implications of that question (is it a bug? will it be fixed? how can I keep it from happening?)
However, you did ask for "any ideas"... how about a BFW? (brute force workaround)
...code a scheduled EVENT that executes CREATE STATISTICS or DROP STATISTICS or whatever other command(s) that make the problem go away, every night at 2 AM or whatever time seems appropriate. Make sure to include some record-keeping inside the event so you can answer the question "did the event run OK last night?" since scheduled events have been known to [cough] up and not run.
You might also consider running REORGANIZE statements on one or more tables and/or indexes.
Thanks for your attention. It is really hard for me diagnosing the statistics. In the histograms I see only non-string columns, but our tables contains many string columns. All the PK and FK are UUIDs of type nchar. (Is this a problem ?). And most of the SYSCOLSTAT columns I cannot interpret at all, e.g. frequencies. So comparing the actual row counts to the statistical values would be possible for some columns only.
But, even if I drop the statistics for a table (then the row counts definitely differ from the statistics), this increases the performance significantly.
Is there anything I can trace during runtime, to see if there is a specific event that corrupts the statistics ?
What do you mean with 'histogram skewed to a few buckets' and 'cardinality estimates are off' ?
Thanks for the Techwave presentation. This is more or less about increasing update/insert/delete performance.
We already thought about the BFW. But got some hints (also read here in the forum: http://sqlanywhere-forum.sap.com/questions/12887/is-create-statistics-safe-to-run-with-activity-in-the-database), that there is an significant impact to other activity in the database. So we were carefully to implement that permanently. You should know, we are not administering a specific database. The database is embedded into our product and is running in many, many hospitals collecting clinical data in the delivery ward. So therefore we are looking for a solution that might prevent the problem upfront. But, I know, the BFW might be our only chance.
Is the impact of drop statistics less than creating them? But if the drop the statistics every night. Why not just switching the generation of statistics off? Is this possible?
Thinking out loud here... statistics aren't the problem, slow-running queries are. Bad analogy: if you're having a heart attack, you don't want advice about eating habits and exercise. So... studying the actual plans being used in a sample Bad Database and a sample Good Database might yield ideas about forcing a good plan to be used all the time. Studying plans is a non-trivial exercise because they are determined dynamically and any little thing (a butterfly flapping) can invalidate a test if you are not careful setting it up... the only analogy I can think of is debugging time-dependent hardware interrupt handling where actually LOOKING at the system behavior CHANGES the system behavior.
Try this google search:
graphical plan site:sqlanywhere.blogspot.com
If you do the work to obtain valid good and bad graphical plans, ones you have proven to accurately represent good and bad behavior, folks here would be happy to look at them. The words "valid" and "accurately" are important because the world is awash in crappy plans that don't represent reality, and those are a waste of everyone's time. It's almost like nobody teaches the Experimental Method any more... no, wait, they don't :)
...end rant :)
Here's a WAG: Are you being killed by plan caching?
To disable: SET OPTION PUBLIC.MAX_PLANS_CACHED = '0';
FWIW that particular Dead Chicken is not currently in use for Foxhound (an embedded system somewhat similar to yours except for the part about yours being life critical).
Here's a Dead Chicken that IS in permanent use in Foxhound, except during the single-connection data upgrade process:
SET OPTION PUBLIC.MAX_QUERY_TASKS = '1';
...because I finally got sick of random-total-performance-degradation and decided the minor benefits of intra-query parallelism in a multi-connection environment were not worth the strife.
re "significant impact to other activity in the database"... I read that thread differently, that it's OK to run during a quiet time, and having spent too much time in hospitals I know they have quiet times. One possibility might be a process (repeated EVENT) that watches for a quiet time by computing changes in various performance statistics (CPU, disk, whatever) and deciding "now's the time". I'm a Certified Master Of SQL Anywhere Events... seriously... Foxhound consists almost entirely of EVENT blocks... so help is available :)
Statistics for what are essentially randomly generated strings will not be very meaningful in terms of data distribution for narrowing down the amount of values we'll need to scan through.
See
sa_get_histogram()
anddbhist
.DROP STATISTICS
removes the statistical distributions and re-builds the distributions ranges (but does not populate the ranges). It may be the simple case that you are resetting the histogram ranges, thus resetting the estimated predicate selectivity.I don't think there is something specific to track down - and even by your own problem description, it takes "several days" for this problem to appear. It sounds like the aggregate of the operations over time is what is causing the skew.
Here's a visual sample of what I mean by 'histogram skewed to a few buckets'. I have the same 10 random data values from 1-100 listed in the picture, but by dividing up the histogram range differently, I get two radically different histograms with different selectivity estimates for specific values. (Where the selectivity is the number of times a predicate is expected to appear in a single bucket).
If I searched for the value '18' (
... WHERE col = 18
), the selectivity in the histogram is the same (1/1) and (1/1), but the value '61' (... WHERE col = 61
) has very different histogram estimates: (1/1) in the first case, but (1/9) in the second.This difference in the statistical distributions provides the optimizer with different estimates for the cardinalities of joins (where we are expecting to pull back more/less rows than anticipated because of the estimate, thus changing the underlying access plan of the query).
By performing a
DROP STATISTICS
, you are manually resetting the histogram ranges in the above diagram....I see a broken link instead of the image in your comment.
... actually, we have broken permissions. The image is now there, but nobody can see it. I have messaged our forum administrator about this issue - hopefully it will be resolved shortly.
I still can't see the picture. But I think, I understand what you are trying to explain: During the ongoing statistics update, the number of buckets are not changed, but only high/low values and selectivity of the given buckets, right?
Would it be sufficient to just drop the statistics periodically ? Without creating manually. When will they be created again ? Or when can I see something again in the SYSCOLSTAT ?
Thanks for the hints and the motivation. I will try to get some more details. Will play with some options.
Is there a chance, this behavior is better in SQLAny 16 ? I mean the Statistics Governor was new in 12.
Correct. (And this is more evident now that my Excel picture is there ... although my text example makes no sense now, since my random data was reset by the time I captured my picture... whoops).
The high/low values for the buckets aren't changed that often in the actual database statistics - DROP/CREATE STATISTICS forces this operation. This is why you are more likely to see "skew" on tables that are constantly being updated.
If that is what routinely 'solves the issue' for you, and you can verify between a 'fast' and a 'slow' graphical plan that the selectivity of the sargable predicates on your queries is the only thing that's changing between these conditions, then yes this may be an option for you. It would be best to confirm that this is indeed happening first though - I haven't heard of such evidence in this thread yet.
If you only 'DROP' statistics and do not 'CREATE' statistics, the statistics are populated as we execute SELECT/INSERT/UPDATE/DELETE operations over the existing values. Initial operations are possibly going to be slower, since we'll assume we'll need to scan larger amounts of rows without the statistics to help estimate the work.
Thanks for the explanations and confirmations. I'm working on getting some comparable plans.
But, still, is there a way to see whether the Statistic Governor has already detected the wrong estimates?
I wonder why it is not cleaned up automatically.
SELECT CONNECTION_PROPERTY ( 'update_statistics' ); returns 'On'.
Help says, that the statistics governor checks every 30 minutes when the statistics are flushed.
Is there a way to detect that the governor is running?
Help also says, that the governor stops maintenance on statistics that are hard to fix ...
Is there a way to get some more info about the current state of the governor?
Or am I wrong if I expect that this should be detected by the governor?