We noticed during some testing that SQL Remote will send data that hasn't been checkpointed (not sure that's a word).

We did a disaster recovery senecio where we forced a crash of a database involved in replication during the application of replication messages. We recovered the database by unloading and reloading then setting the offsets appropriately. At this point the offsets were set to the end of the log file that was in service when the disaster occurred. There were many transactions in that log file which were not applied to the crashed database since it was not a clean shutdown.

When we ran SQL Remote in sent the transactions that were not applied to the remotes. The end result of course was the remote had data the consolidated did not have which of course is a problem.

How do you recover from this? I've asked a related question regarding applying log files to a rebuilt database.

Thanks

Jim

asked 10 Feb, 15:53

J%20Diaz's gravatar image

J Diaz
1.1k293858
accept rate: 10%

1

We noticed during some testing that SQL Remote will send data that hasn't been checkpointed (not sure that's a word).

SQL Remote will only send data that is committed, but whether the database has checkpointed and physically written the changed data to the physical file as opposed to the changed page only existing in memory is irrelevant. Once a transaction has been committed, it should be recoverable.

You really mean checkpoint, not commit?

We did a disaster recovery scenario where we forced a crash of a database involved in replication during the application of replication messages.

Can you describe what you mean when you say "we forced a crash of the database"? Did you just kill the dbsrv17 process? Did you pull the hard drive out of the machine with the active transaction log?

We recovered the database by unloading and reloading then setting the offsets appropriately. At this point the offsets were set to the end of the log file that was in service when the disaster occurred.

I would argue this is not recovery. Why were you not able to go to a backup and apply transaction logs up to and including the time of the crash?

Also, what offsets are you referring to? Are you manually manipulating offsets in the SYSREMOTEUSER table, or are you talking about using dblog -x/-z against the rebuilt database.

Thanks, Reg

(10 Feb, 16:17) Reg Domaratzki
Comment Text Removed

I do mean Checkpoint and not commit.

We had a consolidated database that was very close to the 1TB size limit for a single db space 4 K page size. I inserted a very large amount of data into one of it's remotes . We then ran SQL Remote on the consolidated and during the application of the messages the database attempted to expend itself beyond the 1TB limit. When this occurred the server crashed. Log snippet below

01/12 08:57:19. Fatal error: The database file "" has reached its maximum file size. 01/12 08:57:25. Fatal error: Internal error. 01/12 21:51:42. Cannot open transaction log file -- The system cannot find the file specified. 01/12 22:11:47. Fatal error: The database file "" has reached its maximum file size. 01/12 22:11:53. Fatal error: Internal error.

I renamed the log file and started the server with -f

I. 01/13 04:36:28. Forcing recovery without transaction log I. 01/13 04:36:28. Checkpointing... I. 01/13 04:36:28. Starting checkpoint of "" () at Wed Jan 13 2021 04:36 I. 01/13 04:36:28. Finished checkpoint of "" () at Wed Jan 13 2021 04:36 I. 01/13 04:36:28. Recovery complete I. 01/13 04:36:28. Database server shutdown automatically after log applied I. 01/13 04:36:30. Completing server shutdown I. 01/13 04:36:30. Database server stopped at Wed Jan 13 2021 04:36

I then manually unloaded and reloaded set offsets and ran SQL Remote in send. Applied these messages to another remote and was left with data in both remotes that didn't exist in the consolidated.

I then recovered the consolidated by translating the message files and applying manually.

What would have been nice is if SQL Remote didn't send the data and the consolidated database asked the initial remote to resend the messages.

(10 Feb, 17:08) J Diaz
Replies hidden

Using dbeng16 -f and dbremote (without -u) is a recipe for disaster.

I would have loved to have gotten my hands on the database file(s) and active transaction log after this fatal error, as well as the last backup and backed up transaction logs since the last full backup and figured how to properly recovery this database without using -f and ensuring that we recovery all committed transactions. If that's not possible because of the state of the database and/or logs after the fatal error, that's a problem worth fixing.

dbremote -u is your friend here. It was put in place to help ensure that dbremote would never send operation from a transaction log that you could possibly lose, since in theory an offline log can be moved offsite once it's created to protect against a site disaster.

Reg

(11 Feb, 09:42) Reg Domaratzki

Thanks very much and yes I am a believer in -u. Some day I plan on repeating this in SQLA 17 I'll let you know when I do.

(11 Feb, 16:23) J Diaz

Well, IMHO if you decide to

  • use the current log of a crashed database as offline log,
  • but do a restore of the crashed database file without that log (and apparently therefore lacking some transactions) and make a rebuild of the restored database and force the log offsets to "fit" and use that with SQL Remote,

it seems quite clear that there's a discrepancy between the data published by SQL Remote (based on the offline log) and the database contents itself, that the database itself can not notice...


FWIW, even using "DBREMOTE -u" (i.e. only using transactions from offline logs) would not prevent that situation in my understanding while using your steps. However, in that case you might have gone back to the last backup and then use the current log to bring the database up-to-date (unless that would fail again due to the file size limitations). With v17, you might use Point-in-time recovery here.

permanent link

answered 11 Feb, 06:35

Volker%20Barth's gravatar image

Volker Barth
37.4k343510776
accept rate: 34%

converted 11 Feb, 11:54

Reg%20Domaratzki's gravatar image

Reg Domaratzki
7.1k340110

Thanks for your comments.

You go that right the -u and the last backup most likely would have solved this issue.

In this case we would have started with the last backup applied all files BEFORE the crash, unloaded and reloaded set the offsets to the log file before the crash and place in operation. Theoretically this would have caused all remotes to resend because no fresh/new replication messages would apply.

We have changed our dbremote process to incremental backup, receive, incremental backup send, incremental backup with the -u option in dbremote. Important to note our consolidated does nothing but support replication including a couple of post receive hooks to process some incoming data.

(11 Feb, 09:02) J Diaz

I do recommend that SAP consider a couple of changes

1) Don't grow a DB Space beyond it's limits 2) Don't allow DBRemote to process data that has not been through a Checkpoint

permanent link

answered 11 Feb, 09:06

J%20Diaz's gravatar image

J Diaz
1.1k293858
accept rate: 10%

converted 11 Feb, 13:52

Hm, we have used SQL Remote for quite "immediate" responses between cons and remotes, so waiting for a checkpoint would have been undesired.

Can't comment on the file size limits, never been there :)

(11 Feb, 09:10) Volker Barth
  1. We can't grow a DB space beyond it's limit. Do you mean "check to see if growing the dbspace would exceed the maximum number of pages before attempting to grow" ?
  2. Like Volker, I think waiting for a checkpoint is poor idea. You can get very similar behaviour by running dbremote with the -u switch, which severely limits (if not eliminates) the possibly of losing a transaction log that dbremote has sent messages from.
(11 Feb, 09:25) Reg Domaratzki
Replies hidden

Well, a GrowDB system event could certainly warn and/or enforce reasonable actions when a critical threshold (say .95% of the maximum file size) has been used.

(11 Feb, 09:43) Volker Barth

Yes to your number 1, that's exactly what I was thinking. There are other technical issues, like what do you do with the data coming in that might need the space?

As I understand it -u has nothing to do with applying messages, perhaps checkpoint could be an option.

(11 Feb, 13:50) J Diaz

As stated here and on your related older question, I still guess a GrowDb event handler could already help to prevent a database halt once a critical size limit is hit. Would that not help here?

Note, if the engine's behaviour can be improved here, that's fine - but I would not expect changes for version 16, so I don't think you should wait on that...

(11 Feb, 15:50) Volker Barth
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×98
×27

question asked: 10 Feb, 15:53

question was seen: 152 times

last updated: 11 Feb, 16:23