Please be aware that the content in SAP SQL Anywhere Forum will be migrated to the SAP Community in June and this forum will be retired.

Hi all,

a standard DBXTRACT reload.sql file contains a bunch of LOAD STATISTICS statements, typically based on a particular page size. AFAIK, they are used to refine the table statistics of the database that is to be built.

Two related questions:

  1. Are these statistics based on the particular data extract or on the consolidated database's data?
  2. If based on the consolidated's data, how senseful is this, as each remote will often only include a small part of all data and might have totally different variance? Or asked differently: How "bad" is it to omit these LOAD STATISTICS statements?

I'm just upgrading a custom extraction script, and the LOAD STATISTICS stuff is the only part that I cannot code myself:) So if it's not too useful, I'd like to omit it.

Regards Volker

(Currently using ASA 8.0.3.5574 but also testing with SA 11.0.1 and SA 12 beta.)

asked 08 Mar '10, 13:10

Volker%20Barth's gravatar image

Volker Barth
40.2k361550822
accept rate: 34%

Just to add: I'm aware that LOAD STATISTICS is very useful when reloading a database in order to rebuild it (or to extract a remote that should contain all data). But that's not what I'm up to, here.

(08 Mar '10, 13:12) Volker Barth

I just found the answer to a similar question on the replication NG, cf. the thread "news://forums.sybase.com:119/e4CnfkiDCHA.289@forums.sybase.com" by Rob Waywell:

The benefit of carrying over the statistics is that the optimizer will be "smart" right from the start. Actually, this makes much more sense for dbunload than for dbxtract since there is no guarantee that the distribution of data in the remote database will match the distribution that existed in the consolidated database.

Taking out the LOAD STATISTICS, means that the optimizer has to start from ground level to learn the distribution of the data. It may affect the initial performance of the database, but it won't affect the correctness of the result sets in any way.

Please correct me if this answer (originally for ASA 8.0.1.2600) is not correct anymore or if there's more to pay attention to.

permanent link

answered 08 Mar '10, 14:32

Volker%20Barth's gravatar image

Volker Barth
40.2k361550822
accept rate: 34%

Rob's response is just as applicable to recent SQL Anywhere releases as it was to 8.0.1.

(08 Mar '10, 16:29) Glenn Paulley

@Glenn: Thanks for the clarification. Then I should be fine without LOAD STATISTICS as long as I use LOAD TABLE with STATISTICS ON ALL COLUMNS (which seems to be the only option in pre 9.0.1 versions), correct? I.e. then LOAD TABLE should generate fitting statistics for the remote itself.

(08 Mar '10, 16:57) Volker Barth
1

Right - the new statistics generated as part of LOAD TABLE processing will be accurate, though not as detailed in those parts of the distribution that have been previously queried and, consequently, saved in the extant histogram.

(10 Mar '10, 17:10) Glenn Paulley

Just for the record: As a consequence, I dropped all LOAD STATISTICS statements from our customized reload script, which makes this script easier to handle in case of schema updates. My impression is that the LOAD TABLE generated stats are not worse. However, as most remotes are quite small (< 25 MB), the impact is small, too.

(12 Mar '10, 08:57) Volker Barth
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×103
×21
×14

question asked: 08 Mar '10, 13:10

question was seen: 2,193 times

last updated: 08 Mar '10, 14:32