When extracting data for a SQL Remote database, it is highly recommended to run the according queries with isolation level 3 when extracting from an active database - cf. this quote from the DBXTRACT doc page:

By default, the Extraction utility (dbxtract) runs at isolation level zero. If you are extracting a database from an active database server, you should run it at isolation level 3 to ensure that data in the extracted database is consistent with data on the database server. Running at isolation level 3 may hamper others' turnaround time on the database server because of the large number of locks required. It is recommended that you run the Extraction utility (dbxtract) when the database server is not busy, or run it against a copy of the database.

This is understandable: Say, the remote would need data from table_1 to table_100, and would query the data in ascending "table name order". Then isolation level 3 ensures that no other transaction can modify data in any of the tables in a way that would affect the extracted result sets once they have been queried:

  • I.e. a transaction may still modify table_50 while data for table_1 is extracted - and when table_50 will be extracted, the modified (and committed) data will be extracted.
  • On the other hand, once table_50 has been queried, it will be locked against further updates until the extraction is finished and the subscription is started (usually within the same transaction).

Obviously, isolation level 3 may limit oncurrent modifications.

Is there any other method to reliably extract data

  • without using isolation level 3 and
  • without having to use a "copy database to extract" - as discussed lately in this FAQ?

(Note: I'm gonna try an answer myself...)

asked 05 Sep '12, 17:08

Volker%20Barth's gravatar image

Volker Barth
29.5k291441646
accept rate: 32%

edited 10 Sep '12, 16:28


I have not tried this yet, and would like an expert like Reg to comment on that approach:

One might use isolation level "snapshot" to query the data (within one transaction).

This would ensure that the contents of table_1 to table_100 as of the state of the transaction's start are extracted, no matter whether another transaction would modify the data. I.e. even if another transaction would modify table_50 before the data of that table have been queried for extraction, the extraction would still get the same result set for table_50 as if it had been unchanged.

In contrast to using isolation level 3 (serializable) this won't put any locks on the queried tables.

However, there are two caveats:

  1. Using snapshot isolation itself has its cost, as it requires the storage of row versions.
  2. The subscription must be started here before extracting the data in order to log the concurrent data changes while the extraction takes place (whereas with isolation level 3, the subscriptions are started at the end of the extraction).
    In the small sample, this would lead to the update on table_50 to get sent to the freshly extracted remote (if it applies to that one), and that is necessary to prevent lost updates here.

If my understanding is correct, this might be a method to increase the throughput while extracting data on a busy system.

permanent link

answered 05 Sep '12, 17:22

Volker%20Barth's gravatar image

Volker Barth
29.5k291441646
accept rate: 32%

edited 05 Sep '12, 17:24

2

One possible issue I see is that there is still a window of opportunity between the time that you execute the START SUBSCRIPTION command (which commits) and the opening of the next transaction that selects all the data using snapshot isolation for another connection to insert/update/delete data that you are about to select for extraction (you did say it was a busy system). This means the change will already be in the newly extracted remote database and will also show up in the first message sent to the extracted remote, which might be an issue.

I'm still of the opinion that taking a copy of the consolidated database and extracting remote databases from that copy is still the best and most effective way to extract remote databases and ensure that the data is 100% in synch with the consolidated at the time of the extraction. If I were an administrator of a SQL Remote system, that is what I would be doing.

(06 Sep '12, 09:57) Reg Domaratzki
Replies hidden

Well, I would prefer to use REMOTE RESET for all subscriptions and then one COMMIT instead of several START SUBSCRIPTIONs but that won't make a difference in the end.

So, I see your good point, and I'm glad that I've asked:)

As a resume, I will conclude that only if one can reliably prevent that mentioned "window of opportunity" (which will obviously depend on the "business" of the system), the suggested approach would work...


But this seems difficult to guarantee or check: I don't think there is an easy way to "know" if no other transaction has COMMITed after one's own last COMMIT, as the own COMMIT will change properties like "Commit" or "CurrentRedoPos" itself...

(06 Sep '12, 11:35) Volker Barth

Well, one chance to minimize (note: not fully exclude) that window of opportunity could be to use a BEGIN SNAPSHOT statement immediately after COMMITing the REMOT RESET. That ought to be the "fastest way" to start a snapshot, methinks...

(06 Sep '12, 11:41) Volker Barth
1

Reg, your talk on that "window of opportunity" made me think in more detail:

I do see that extracting from a live database with isolation level 3 and using REMOTE RESET within the same transaction as the extraction queries does guarantee that no updates are lost and that no updates can occur in both the extracted data and the generated messages. - FWIW, that's the way we handle our extraction now, and it works well.

So I'm primarily asking to gain more insight on possible improvements:

When using the "extract from a database copy" approach, isn't there the same "window of opportunity" that another transaction changes data after the REMOT RESET and before the backup is complete, as the full backup will contain all transactions that are running until the last log page is written to the backup?

The following doc page that deals with that approach recommends to shutdown the database after starting the subscriptions - implying there are no other current transactions.

Start the subscriptions and then immediately shut down the consolidated database and the SQL Remote Message Agent (if it is running).
The subscriptions must be started at the same time that the consolidated database copy is made. Any operations that take place between copying the database and starting the subscriptions can be lost, and can lead to errors at remote databases.

However, that kind of offline backup seems rather unlikely to do on a busy system.

So I would conclude that the "extract from a database copy" approach has its limitations on a busy system, as well, as it could also introduce the situation that data changes are both contained in the database from which the data is extracted and in the first messages for the remote.

Or have I missed the point here?

(07 Sep '12, 03:44) Volker Barth

I would re-word the documentation to read :

"Shut down the consolidated database, start it up in local mode, start your subscriptions and then shut down the local consolidated database".

At this point, you take your backup of the database file, and then start the consolidated database up again. By guaranteeing that you are the only user on the system when you create new remote users and possibly reset existing remote users, you have completely closed the windows of opportunity.

I've made a comment on the DocCommentXchange link you referenced to this effect.

Obviously, this isn't a manual process, but something you have scripted in advance to reduce the downtime of the consolidated database.

dbstop -c eng=cons;uid=dba;pwd=sql
dbisql -c dbf=c:\cons.db;eng=not_cons;uid=dba;pwd=sql;start=dbeng12 read sql.sql
copy c:\cons.db d:\cons.db
dbsrv12 -n cons c:\cons.db
(07 Sep '12, 08:58) Reg Domaratzki
Replies hidden

That's a clear solution, and I appreciate the clarification in the docs.

However, having to stop the active database at all to re-extract databases is something we surely would like to prevent - after all, that stops all connections, loses the cache contents and the like.

That was one of the main reasons why we went from that approach to extracting with isolation level 3 several years ago. (Aside: This was only possible after we had given up using DBXTRACT in favour of a custom extraction process which only takes a few seconds for our - usually "small" - remotes).

So after this thorough discussion, I would think that if one really has to extract from a live database, the "snaphot isolation approach" might be the lesser evil if isolation level 3 isn't affordable in terms of blocking/concurrent access... just my personal conclusion, of course:)

(07 Sep '12, 09:42) Volker Barth

FWIW, this led to another question on how I can ensure (or at least check) whether another connection is doing COMMITs between my REMOTE RESET/COMMIT and BACKUP (or BEGIN SNAPSHOT) statements. - If this is possible, one could surely close that "window of opportunity" IMHO.

(10 Sep '12, 16:31) Volker Barth
showing 2 of 7 show all flat view
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×73
×10
×7
×5
×3

question asked: 05 Sep '12, 17:08

question was seen: 1,548 times

last updated: 10 Sep '12, 16:31