I'm trying to get to grips with the very varied performance we are seeing when rebuilding large databases (ie 50 - 200 GB). We are finding order of magnitude differences with apparently much less capable hardware outperforming the more capable, even when virtualisation is taken out of the equation.

Logically I would have thought that the speed of the process will be limited by one of factors of processor / disk / memory, however on the face of it we have systems where the processors are barely ticking over, the disk read / write speed is far below what can be seen in straightforward disk operations (ie copying files between the same disks or a SQLA backup) and lots of RAM (ie more than the size of the database) is available. The problems seem more acute where SANs are involved - these are enterprise SANs (EMC etc connected by FiberChannel etc).

Can anyone suggest what metrics would be worth looking at, or give some insight into the internals of the process that might help. I can get detailed stats, but so far nothing obvious is identifying the bottleneck - I'm beginning to wonder about things like latency, block size etc I notice that dbbackup now has a block size tuning facility.

v16.0.0.2213 with v16 and v11 format databases.

asked 18 Jan '16, 14:40

Justin%20Willey's gravatar image

Justin Willey
7.6k137179249
accept rate: 20%

2

What exact command(s) are you using? Are you doing a unload followed by dbinit + reload, or are you doing a dbunload -an (or -ar)? Also, are you using -u (unordered unload) when unloading the data?

FWIW: When rebuilding large databases we typically see that the largest amount of time is spent rebuilding the indexes. Are there significant differences in the sizes of the tables or distribution of data (wrt index keys) across databases?

When rebuilding a database using a network disk (e.g. a SAN) you may get better performance if you start the database server with -u (use buffered disk I/O) because otherwise network latency could be a killer to performance since the server will be waiting for each disk write to finish. (when -u is used the server only waits when a 'critical' write is performed - e.g. commit or checkpoint... but these waits are much more expensive because the OS must flush its entire cache of disk buffers).

(18 Jan '16, 15:53) Mark Culp
Replies hidden

Are these databases identical schema? Could one set of data be highly fragmented?

(19 Jan '16, 08:18) harncw
Replies hidden
1

Why does the Help say this? "The -b option should not be used if you are using LOAD TABLE to bulk load data."

Is -b of no performance benefit?

(19 Jan '16, 08:54) Breck Carter

Hm, I would guess it's different (and somewhat contrary to the cited help text):

LOAD TABLE does not need the particular "-b" option because it

  • locks the whole table against other writers (so there is no need for "-b"'s exclusive connection),
  • does not write the inserted data to the translog by default (so the side-effect of "-b" of not using a translog is no advantage) and
  • does not fire triggers (so the side-effect of "-b" ("Bulk operation mode doesn't disable the firing of triggers.") is not effective.

I would guess that LOAD TABLE does work well in standard engine mode so the somewhat obscure -b mode is not needed. And the same would go for UNLOAD/RELOAD with the default internal import (-ii or -xi).

(19 Jan '16, 10:07) Volker Barth

I'm not exactly sure why this statement has been made in the docs. In addition to Volker's response, the comment may have some history and could have been put there initially to warn about using bulk load mode when replication or mirroring is involved. I.e. when bulk mode is used there is no recording of any changes in the transaction log, so if you are running in a replication or mirroring environment the necessary info (in the log) will not exist and therefore will not be replicated. However, if you use LOAD TABLE without -b in a mirroring configuration then LOAD TABLE will record the data in the transaction log (use WITH ROW LOGGING or with CONTENT LOGGING) so is safe to use.

(19 Jan '16, 10:41) Mark Culp

Hi Mark

Many thanks for coming back on this question.

We started off with the "all-in-one" approach eg:

dbunload -v -c "START=dbsrv16 -x none –c 40G -ca 0;Server=IQX;DBN=IQX;UID=pears;PWD=xxxxx;DBF=D:\Databases\pears.db" -an "F:\Databases\IQXReload\pears.db" –o "C:\temp\unload.log

We've tried quite a lot of variations, specifying more or fewer startup options - but it seem to make little difference. when we saw very slow performance (cf a desktop machine) on a SAN connected system, we tried breaking it down to see if the issue was on the un-load or re-load side, or both. Both operations seem to be slower than we would hope.

In all the tests so far (fast and slow) we have being doing ordered unloads.

The unload part of the operation is about 20% of the total, so perhaps not the most crucial to concentrate on but the difference between the SAN and local is very marked - at least 10 x as fast on the desktop (HDD to SSD). Its noticeable that for at least some of the time the desktop seems to be writing to six or more unload files whereas the SAN connected server never manages more than 2 - most of the time just one.

On the load side, what you say about -u makes a lot of sense and we will be testing that tonight.

Certainly the index building etc is a big part of the reload time as you say, at least half. I'm not sure how consistent the distribution between the various database would be, there will certainly be some significant differences. What surprises us most though is that the server doesn't seem to be "trying" very hard. No indicators that we can see seem to be more than ticking over, though something must be "maxed out"! Hence you latency comments strike a chord - I'll report back on how the load tests go with -u.

(19 Jan '16, 14:33) Justin Willey

Thanks for the suggestion, I've checked and the schema are very similar but not identical. The tables where the vast bulk of the data are, are identical.

Both databases have similar low levels of internal fragmentation.

(19 Jan '16, 14:59) Justin Willey

Pre-growing the new database file (via DBINIT or ALTER DATABASE) does not matter here, right?

(19 Jan '16, 16:25) Volker Barth
1

It does help a bit when you do the process in two steps, and interestingly the v16 dbunload does it automatically when you use the -an option for the direct re-build. A nice enhancement.

(20 Jan '16, 06:45) Justin Willey

for "tonight" read "this weekend" :)

(22 Jan '16, 09:59) Justin Willey

We tried the -u setting but it didn't seem to make a significant difference. Both aspects of the unload / reload still appear to be going much more slowly than seems reasonable for the hardware spec.

(I'm going to try that on some other systems though as it may be very helpful on some with higher latency {ie iSCSI SANs} - but that's another issue.)

(25 Jan '16, 14:41) Justin Willey
showing 2 of 11 show all flat view
Be the first one to answer this question!
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×275
×52
×48

question asked: 18 Jan '16, 14:40

question was seen: 2,065 times

last updated: 25 Jan '16, 14:41