Statistics that are generated in the table syscolstat are really needed for the database?

I'm having problems because of low performance statistics. I periodically drop statistics and improves performance significantly. Is there any possibility to not generate them?

Thanks, Leonardo.

asked 05 Oct '12, 09:26

LGregianin's gravatar image

LGregianin
26151220
accept rate: 40%

edited 05 Oct '12, 09:27

1

I too would like to know if this is possible. I'm seeing the same issue after upgrading from version 9 to 12. If I drop statistics from the PK on a 37 million row table, performance goes through the roof. Response comes back in 2 seconds instead of 60+.

(05 Oct '12, 15:45) grant
2

Leonardo and Grant,

Would you please post the graphical plans with statistics before ( I.e.,the slow plan), and after (i.e., the fast plan) you run 'drop statistics' statement. It would be useful to investigate these particular issues based on your plans.

With many thanks, A. Nica

(05 Oct '12, 20:25) Nica _SAP

Hi Nica,

Sybase 9.0.2 without drop statistics

Sybase 9.0.2 without drop statistics

Sybase 9.0.2 with droped statistics

Sybase 9.0.2 with droped statistics

Test in Sybase 12.0.1

Sybase 12.0.1

UPDATE:

Select:

select * from tbvenpedido pedido join tbvenitenspedido itens on 
(itens.empresa = pedido.empresa and itens.pedido = pedido.pedido)
order by 1, 2

More details:

  740.091 rows
  Database size: 1,5 gb
(08 Oct '12, 15:01) LGregianin
Replies hidden

Strange both plans look identical, are your charts correct?

(09 Oct '12, 02:51) Martin
1

Yes @Martin, the graphics are correct. The only difference between the two queries is drop statistics [all my tables] statement.

(09 Oct '12, 08:17) LGregianin

Statistics that are generated in the table syscolstat are really needed for the database?

Yes, very much so. Collecting a statistical distribution of values helps provide the database optimizer a (rough) estimate of the amount of work it will need to perform to scan for sargable predicates in an index and also helps with join strategy estimates inside the optimizer.

Is there any possibility to not generate them?

You can use ALTER STATISTICS to control the collection of statistics on particular columns/tables. Generally it is not recommended to turn this feature off.

Statistics are automatically maintained by the database engine as SELECT/INSERT/UPDATE/DELETE operations are executed. Generally, statistics can easily become 'incorrect' over time due to two major reasons:

  1. The 'buckets/bins/discrete ranges' picked for the histogram ranges no longer reflect a "balance" of values across the table. As a result of this "unbalanced" histogram range division, values are not as unique in each bucket (many or most values are stored in one or very few buckets), and the optimizer has to scan through more values found in the "full" bucket (resulting in low index selectivity in the optimizer plan).

  2. The 'selectivity' values stored in the ranges are incorrect (i.e. the number of values that really appear in the table does not reflect the percentage of values stored in the histogram bucket range).

For older versions of SQL Anywhere, the only way to correct this information is to manually issue the CREATE STATISTICS command when you know the information is incorrect. (Either slowly over time, or perhaps after large data operations).

You can check what the current values are in the histogram data for your table by using the dbhist utility or the sa_get_histogram() stored procedure.


The new statistics governor in SQL Anywhere 12 is intended to help "automate" the check for these two issues - when it detects an issue it will try to issue a "CREATE STATISTICS" statement for you, as needed. Since this feature did not exist in older versions, it may be required to review the statistical data as part of your migration strategy.

permanent link

answered 05 Oct '12, 16:23

Jeff%20Albion's gravatar image

Jeff Albion
10.7k171174
accept rate: 24%

edited 05 Oct '12, 16:38

Jeff Albion, If the statistics generated are used just to improve the performance of the database, "because" this happens misconduct? Incorrect normalization of the database?

(09 Oct '12, 15:35) Walmir Taques
Replies hidden
1

I'm not 100% sure about your question, but if you're implying that the above described behaviour is possibly a "bug", I would instead suggest this is more of a known product limitation that we're always trying to improve upon.

As much as we try to mathematically keep track of accurate information, as I mentioned, over time this can become "skewed" (due to rounding issues of the actual percentage values themselves, plus the fact that we do not automatically re-divide up the histogram ranges without CREATE STATISTICS being run).

Since we don't know that the statistical information is really different than what appears for all of the values stored in the table (until a manual SQL request comes in to re-evaluate the statistics), the database doesn't have a chance to correct itself.

The newer versions of SQL Anywhere now try to limit this behaviour by doing this dynamic monitoring of the statistics for you rather than assuming that the user will be constantly monitoring the statistics themselves and adjusting as necessary manually (older SQL Anywhere versions).


Aside: "Normalizing a database" generally implies something about the way entities are arranged in a schema (from a theoretical perspective) and is not related to database statistics (used for optimization at runtime).

(10 Oct '12, 17:23) Jeff Albion

Hi, @Jeff Albion I do not mean to imply anything, just had a little doubt about "whether normalization of a database" could influence your performance with statistics. Now it's clearer. Sorry. Thank you!

(11 Oct '12, 13:51) Walmir Taques

Leonardo,

Please post the plan files (or sent them to me directly at anisoara.nica@sap.com): in dbisql: 1. call sa_flush_cache() before each execution 2. in the 'Plan Viewer', set the option 'Detailed and node statisitcs' 3. 'Get Plan' 4. save the plan in a file using 'Save As...' 5. repeat 1-4 for each combination {without drop stats, drop stats} X {with order by, without order by}

Thanks Ani

permanent link

answered 09 Oct '12, 13:54

Nica%20_SAP's gravatar image

Nica _SAP
866722
accept rate: 3%

These are the results:

Before drop statistics without order by clause

Before drop statistics with order by clause

After drop statistics without order by clause

After drop statistics with order by clause

In my previous tests I was not running the command: call sa_flush_cache() so the result was so diverse. I do not understand why the return of select is so slow.

permanent link

answered 10 Oct '12, 13:42

LGregianin's gravatar image

LGregianin
26151220
accept rate: 40%

edited 10 Oct '12, 13:44

Can you please email or attach the query plan as per the instructions. The screen caps and PDFs do not contain all of the information nor provides an easy mechanism to analyze the issue. To get the plan that we require,

  1. Go to Tools | Plan Viewer or Shift-F5
  2. Set the Statistics Level drop down to "Detailed and node statistics"
  3. Click Get Plan button
  4. Click Save As ... button and select a filename that identifies the plan iteration
  5. Post or email the .saplan file that is generated from each iteration.

Please ensure that sa_flush_cache() is called for each iteration as noted in Ani's previous post.

(10 Oct '12, 14:25) Chris Keating
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×11
×11

question asked: 05 Oct '12, 09:26

question was seen: 2,027 times

last updated: 11 Oct '12, 13:51