When control is returned to the caller after a CHECKPOINT statement is issued by an EVENT or other connection, is the checkpoint process complete?

Or is there a chance some thread(s) or fiber(s) or frammiejammie(s) or winkelfrotz(es) are still busy doing checkpoint-ey things?

I am thinking the answers are "no" and "yes" because if you put a bullet in the head of a server (taskkill) immediately after doing a sackload of updates and executing an explicit CHECKPOINT, the subsequent recovery on startup takes some time.

Database recovery in progress
    Last checkpoint at Wed Jul 20 2011 13:34
    Checkpoint log...
    ... recovery 25% complete
    ... recovery 47% complete
    ... recovery 73% complete
    ... recovery 97% complete
    Transaction log: foxhound1.log...
    Checkpointing...
Starting checkpoint of "f" (foxhound1.db) at Wed Jul 20 2011 13:43
Finished checkpoint of "f" (foxhound1.db) at Wed Jul 20 2011 13:43
Recovery complete

asked 20 Jul '11, 13:53

Breck%20Carter's gravatar image

Breck Carter
27.0k424582831
accept rate: 21%


When an explicit checkpoint statement completes, the checkpoint has completed and no other async stuff is queued up to be done afterwards. That doesn't mean that some other frammiejammie can't sneak in and modify the file immediately afterwards. Only on shutdown do we guarantee that the file is never modified after the checkpoint completes. In some older versions, it was pretty much guaranteed that the file would be modified immediately after the checkpoint completed if there were any transactions in progress; however, that particularly unfortunate behaviour went away in v10.

permanent link

answered 20 Jul '11, 14:08

John%20Smirnios's gravatar image

John Smirnios
8.7k377107
accept rate: 40%

Checkpoints which are done automatically by the server are done asynchronously as a background operation, but if you execute an explicit CHECKPOINT then it will be done synchronously - when the statement returns the checkpoint operation will be complete.

At the time that you see the "Finished Checkpoint of <dbname>" message in the console log the checkpoint operation is complete - all work associated with the checkpoint is complete.

Remember that during recovery the operations are:

  1. rollback database pages to the last checkpoint
  2. replay all operations in the transaction log that occurred after the checkpoint
  3. rollback all transactions that were not committed when the server went down.

Also note that even though you may have just done a checkpoint just before you killed the server, the server must still go through all pages that reside in the checkpoint log to see if there are any pages that must be rolled back as part of step 1 above. If the checkpoint log was large due to a lot of activity on the server prior to going down then the number of pages in the checkpoint log could be large, and thus step 1 may take some time to read all of those pages. I suspect you had a large checkpoint log.

permanent link

answered 20 Jul '11, 14:20

Mark%20Culp's gravatar image

Mark Culp
22.6k9129265
accept rate: 40%

edited 22 Jul '11, 08:46

2

With new enough database files, if the file is clean we don't traverse the entire checkpoint log. Instead, we verify up to 100 pages just as a sanity check to ensure none of them look like they need to be rolled back.

(20 Jul '11, 14:25) John Smirnios
Replies hidden

Thanks John... I had forgotten that you put in that optimization!

(20 Jul '11, 14:32) Mark Culp
Comment Text Removed

In this case, all database changes have been committed before the checkpoint... only one connection is doing any work (application database upgrade), and the last thing it did was commit, checkpoint, taskkill.

Complicating matters is... the connection doing the work is a DatabaseStart event... you probably didn't anticipate people running hour-long transactions and then shutting down the engine from inside a DatabaseStart event :)

Yes, I am planning to remove the taskkill because I've moved to 12.0.1 and the taskkill was introduced to stop 11.0.1 crashing and hanging on STOP ENGINE... but if it starts crashing and hanging again the taskkill will go back, hence my question about frammiejammies.

(20 Jul '11, 16:41) Breck Carter
Replies hidden
2

There are background tasks that clean up the database and can execute after the checkpoint before you get to your taskkill.

(20 Jul '11, 16:45) John Smirnios

Further complicating matters is... this is embarrassing, to have forgotten to mention it... the database recovery process is working without a log. The absolutely gigantic transaction log from the previous run is explicitly deleted before the server is restarted.

Am I just lucky to be able the start the server at all? FWIW all of this has been working OK for months, from a data consistency point of view.

(20 Jul '11, 16:46) Breck Carter
Replies hidden

Breck exploring the limits, again:)

(20 Jul '11, 16:58) Volker Barth

Are you saying you do a taskkill and then forcibly remove the log yourself? Are you using "-f" to start up afterwards? I would recommend only removing the log after a clean shutdown. Or perhaps there's a form of backup that does nothing other than truncate the log?

(21 Jul '11, 09:35) John Smirnios

Yes, yes, no, no. Taskill, DEL the log, dbsrv12 without -f, and no backup performed.

Hence the "Am I just lucky?" comment. I am certainly going to change that logic BUT I am curious about two things: How can it possibly work? and the earlier question, why does the recovery take so long?

Perhaps the answers are related... and metaphysical in nature :)

(21 Jul '11, 15:49) Breck Carter
1

Each dbspace plus the transaction log has what is called an "active" bit which means that it needs recovery. After your checkpoint, if the dbspace is modified without adding anything to the transaction log (eg. by the cleaner) then the system dbspace will be marked as "active" but the transaction log will not be. During recovery, we will need to rollback the system dbspace but probably won't complain about the missing log because it is not "active". That is probably why you are getting away with removing the log.

Why does it take so long? Well, how much time are we talking about here in absolute terms? Largely, all we do is a sequential scan of the checkpoint log and selectively throw some of those images into cache. Since you had just performed a checkpoint, I expect very few pages to be rolled back so recovery should take about as long as it takes to do a sequential scan of the checkpoint log.

(21 Jul '11, 21:14) John Smirnios
showing 3 of 9 show all flat view
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×101

question asked: 20 Jul '11, 13:53

question was seen: 1,780 times

last updated: 22 Jul '11, 08:46