I'm borrowing Calvin's previous title and tag as it fits so well.

August 18-2011:

Client calls in and says that the application is losing connection to the database and only a reboot of their server fixes it. I login remotely and find multiple entries in the Windows event log regarding SQL anywhere unable to read from file, Bad Block on disk, and LSI Controller error, most of which correspond to unplanned shutdown events.

I promptly make a backup image using the Sybase Central GUI, copy it to my computer and call both my point of contact and their IT guy.

A week or so later, same point of contact at the client emails me about an unrelated application question. I answer it and ask if anything has been done to the server -- "No everything has been fine since we talked to you."

September 08 2011:

Same client calls in this morning. The database service will not start.
After multiple conversations with the IT guy I determine that the following happened (according to him): He was not able to get onsite until the 20th of August. He powered off the server and booted into a command-line environment (either linux or dos) and ran chkdsk on the drive (which evidently is a single drive plugged into an LSI RAID controller). When the chkdsk came back clean, he copied the contents of the database folder to another computer, rebooted the computer and left. A week to 10 days went by with no problems until Tuesday when the server shut down again.

This time, while onsite, he unplugged the drive from the RAID controller and plugged it directly into the motherboard and ran chkdsk again, which came back clean again, started the server in this configuration and left. After 4 hours the server crashed again and the drive now reports failures during chkdsk. The drive is mountable and bootable, so most of the data was copied to a new drive, but one of the files that fails to copy with a read error is of course the database file.

So after evaluating all of this, knowing I have a good backup from the 18th, I try to start the daily backup image files created by an event each night. All of these files come back with "This file is not a valid database file."

I ended up starting the 08-20-2011 copy he had made manually as it was the latest available that would start.

So my questions:

  1. What would you have done given the scenario above?
  2. Why do I have backup files that are invalid?
  3. Can I make an invalid file valid, and if so how?
  4. How can I make sure that future backup files are valid?

This is the text of the backup event running on a database starting as a service using SQLA 12.0.1.3324 -- dest is a string set as the backup location + what day it is

    BACKUP DATABASE DIRECTORY dest TRANSACTION LOG TRUNCATE;

asked 08 Sep '11, 15:01

Siger%20Matt's gravatar image

Siger Matt
3.3k5672101
accept rate: 15%

edited 08 Sep '11, 15:19

To clarify - each of the backup files comes with a transaction log that is 0KB which as indicated below seems to indicate that the daily backups are not finishing. So as an add-on question, how do you diagnose a backup that does not finish?

(08 Sep '11, 15:57) Siger Matt

You didn't mention if you ran DBVALID on your backup on the 18th of August. If you did not, that would have been your first error - after seeing messages in the Windows event log related to disk I/O issues, it would have been prudent to not only take the backup but also validate its contents.

Point 2 - unless your performance requirements are high, I would enable CRC checking (page checksums) on all database pages so that (even) transient disk failures that occur on a page read from disk will cause an assertion. You should realize that most OS's and file systems can go with unreported file system and page corruption; see articles here and here.

Point 3 - You don't say how often you test your recovery strategy. It's fine to have a backup; it's another thing altogether to make sure that you can use it.

Point 4 - if you have all of the (untruncated) transaction logs, you can take an older (hopefully valid) backup and then apply all of the logs in order to get to the same state.

Point 5 - chkdsk doesn't check everything. Actually it checks very little. It does nothing to ensure that individual pages in a file are not corrupt from media failure.

Make copies of everything before you start any resolution.

Can you make the file valid? Possibly, but likely only with the help of technical support. To get the data out yourself, you'll need to be able to get to the point that a server can start the database. Once you get there, you can try to unload as much of the data that you can; it may be, for example, that other corruption has occurred with indexes, so the data pages themselves are fine.

permanent link

answered 08 Sep '11, 15:32

Glenn%20Paulley's gravatar image

Glenn Paulley
10.8k576106
accept rate: 43%

edited 08 Sep '11, 15:34

1) I did not run the DBVALID on the 18th, but I have since and it is valid. 2) I will definitely look into this 3) You are absolutely correct. Unfortunately our strategy has been to make the backup files and has not included validation and testing. That will be fixed. 4) It seems from those daily backups we have no transaction logs as they are all 0kb which as others have indicated seems to mean the backup did not finish. 5) Absolutely.

(08 Sep '11, 16:02) Siger Matt

Regarding (1), I can't say too much. If the backups do not have an off-site (or at least separate medium) location, I would suggest putting one in place.

For (2), there are a set of bytes in the database file that can mention a database is valid. These, internally, are only modified for certain events (e.g. upgrade db). If these are not reading correctly due to physical corruption then the possibility of (3) may not even be a desirable option.

Lastly, for (4), you can run a validation on the backup after is has been completed. Keep in mind with my suggestion for (1) a valid backup will only remain valid if the medium it is stored upon does not have a failure.


As for recovery options, if you have all of the transaction logs available (from the last successful backup), you may try to translate them with dbtran and apply them to the database. You can also apply just the logs to the database, but I would recommend dbtran first (as the logs themselves may be corrupt given the failures you have already noted). And, be sure to make a copy of your backup before applying the dbtran or transaction logs!

Hope this helps!

permanent link

answered 08 Sep '11, 15:25

Tyson%20Lewis's gravatar image

Tyson Lewis
2.2k1641
accept rate: 22%

3

I would add that the likely cause of #2 is that the backup did not complete successfully. Backup images of database files are given a signature that identifies them as invalid when the backup starts. At the end of the backup process, that signature is rewritten to indicate that the file is valid.

(08 Sep '11, 15:34) John Smirnios

Yes I would agree that it seems the backups are not completing successfully as the transaction log file for each daily file is 0KB and empty. I mistakenly thought this was normal because of the TRUNCATE option but I have since learned that the TRUNCATE is for the active transaction log. Any ideas on diagnosing the backups not finishing issue?

(08 Sep '11, 15:56) Siger Matt
Replies hidden

Well, you describe low-level IO errors so it's possible that the backups are failing due to an IO error. You can try to run a backup manually to see if it completes. Does your event capture errors and record in the database whether the statement completely successfully?

(08 Sep '11, 16:02) John Smirnios

That is possible, but while the failures were occurring I was able to take the manual backup on the 18th. The event does not currently do anything like that. What are the best ways to both capture errors and record that a backup completed successfully?

(08 Sep '11, 16:09) Siger Matt

You may have a look at the maintenance plan facility: It does automatically create events for backup and/or validation. These events do contain exception handlers to report errors in the generated dbo.maint_plan_report table. Successful event runs are reported similarily.

This might be a good starting point to enhance your own event handlers.

Note that validation maintenance plans do validate the original database, not a copy of the backup - although the latter is much more reasonable as you have found out...

(09 Sep '11, 02:48) Volker Barth

One option when you run into the "This file is not a valid database file." is to use dbtran to convert the transaction log into a SQL command file which you could apply to your latest working database. But if your backup of the transaction log is 0 kB I guess your out of luck:-(

Best regards Ove Halseth

permanent link

answered 09 Sep '11, 05:16

Ove%20Halseth's gravatar image

Ove Halseth
181257
accept rate: 0%

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×438
×84
×29
×10
×2

question asked: 08 Sep '11, 15:01

question was seen: 3,975 times

last updated: 09 Sep '11, 05:16