After 3 occurrences of " ERROR Assertion failed: 101412[LIVE] (11.0.1.2713) Page number on page does not match page requested" with 3 subsequent database unload/reloads (all of this in 2 weeks), I am convinced I'm staring down the barrel of a hardware problem on the database server. Any ideas on how to go about testing for this? Or proving it to the IT department who's virtual server my database is running on? |
If you have a support plan, you can open a technical support case and submit the database for review. If the page in the database is genuinely corrupted, the "type" of corruption that happened to the page can sometimes give clues as to what might have happened. Some real-life past examples from other customers:
Reviewing the database for the type of corruption through a technical support plan is your best plan for trying to determine "who is to blame." |
Showing that you have a hardware problem is difficult. I would recommend that you consult your hardware manufacturer's documentation to see if they have any hardware diagnostic tools? You may also find some generic tools - memory tests, disk tests, etc - online by doing some googling. Your issue could also be caused by software errors or a misconfigured system. Ensure that your disk, file system and operating system is configure to not cache disk pages. For example, if you are using Linux you must ensure that the file system has write caching turned off. 2
Liam used the term "IT department" in a sentence, whereas your answer seems to assume a different sort of environment, you know, one where everyone pulls together in common cause :)
(11 Jul '13, 09:30)
Breck Carter
Replies hidden
Breck has a point. When working with one's own IT team, resolution tends to be a smoother road. When dealing with a client, one always has to be careful how you approach it. IT guys (as per your cartoon), can be quite defensive if a service provider even suggest there may be a problem on their hardware ;-)
(12 Jul '13, 05:57)
Liam
|
Just some feedback: After adding the -u option (http://dcx.sybase.com/index.html#1101/en/dbadmin_en11/u-database-dbengine.html) to disable direct i/o (thank's Eric and Jeff), the client's database has been running smoothly for 3 weeks now. The IT department still disputes the contention that it's a hardware problem though ;-) They may be correct, since a VM is software :) The Help topic doesn't say dbsrv11 -u explicitly "disables" anything, it changes behavior to go through the OS cache system, and it seems to imply you only need to do this if the database is running on an overcrowded computer... like a VM... which IMO is a bad thing to do with a busy database. What else is running on the same computer? Does IT understand that you don't get something for nothing, not even with a VM?
(30 Jul '13, 14:49)
Breck Carter
|
If it has become a finger-pointing exercise, and you have no influence over the IT department, you may have to disprove whatever they are claiming as the cause. That may require repeated tests using a copy of the production database in the development environment that do not exhibit the same symptom. Always take the high road, that you care only about the end user, say it before they do :)