I. 09/17 06:58:31. SQL Anywhere Network Server Version 12.0.1.3750
I. 09/17 06:59:58.     ... recovery 0% complete
I. 09/17 19:25:43.     ... recovery 50% complete
I. 09/18 08:42:31.     ... recovery 99% complete
I. 09/18 09:10:31. Recovery complete

This was a single log file (6,739,460,096 bytes), with one day of activity. I'm glad I wasn't in a hurry.

Jim

asked 18 Sep '13, 08:32

J%20Diaz's gravatar image

J Diaz
830253144
accept rate: 14%

edited 14 Jun '14, 16:30

Breck%20Carter's gravatar image

Breck Carter
26.3k430600866

1

That's a very big log file (or at least it seems it - you don't say how big the database is). Could there be unnecessary stuff going on? If it is anomalous it might be worth translating it in to SQL and seeing what is in it (you need a fair amount of time and something that will read very big text files)..

We had an example recently where was was a process going on where the data could be safely discarded if the database restarted (it only related to individual sessions and was deleted at logoff.) We changed the tables involved to shared global non-transnational temporary tables. It went faster and log file size fell by 90%.

(18 Sep '13, 09:13) Justin Willey
2

Was that an ordinary recovery after a server crash where both the database and log files were OK, or was it the application of the log file to a restored database file?

If the latter, perhaps more frequent backups should be taken, and/or HA should be used.

If the former, then perhaps something exciting is going on, like massive transactions that deserve more frequent commits (e.g., deleting millions of rows in one statement, etc).

Generally, recovery runs as fast or faster than the original operations, so if recovery runs "too long" the solution is to reduce the amount of work that needs to be recovered... which is what Justin said using fewer words :)

(18 Sep '13, 09:15) Breck Carter
1

Has the recovery taken place on the same machine (and configuration) as the original database operations?

Needing more time to recover than to generate the original operations seems weird when running on the same HW.

(18 Sep '13, 09:21) Volker Barth
Replies hidden

"Needing more time to recover than to generate the original operations" ...how do you know that's true? The OP said "one day of activity".

(18 Sep '13, 09:36) Breck Carter

I don't know that, apparently. However, the recovery has lasted more than 26 hours, and I guess one day of activity still has 24 hours at maximum:)

(18 Sep '13, 09:47) Volker Barth
2

What was the line previous to the first "% complete" message. Was it "Transaction log..." or "Rollback log..."? If it was "Rollback log..." then you might be rolling back a massive transaction (or transactions).

Normal recovery only applies transactions since the last checkpoint. Have you done something to suppress checkpoints or make them extremely infrequent? For example, do you set -gc or -gr on the server command line?

Also, there are some cases where recovery can take much longer than then original statement. For example, if you have a large table without a primary key defined then a "delete from T" goes pretty quickly but recovery will take a long time since, on recovery, each delete is a single-row delete. Without a primary key, each row deletion requires a sequential table scan and that becomes an O(n^2) algorithm.

Also, runtime modifications can be done in parallel on multiprocessor machines; however, recovery is single threaded. Recovery in this case can also take longer than the original execution.

It is also worth verifying that you gave the server sufficient cache to perform the recovery. Usually, default cache settings work but there are cases where they are not sufficient. If you captured the console log, you should be able to find out out how much cache was used.

(18 Sep '13, 10:02) John Smirnios
Replies hidden

From my comment above:

Needing more time to recover than to generate the original operations seems weird when running on the same HW.

OK, I stand corrected: It only seems weird, if one doesn't know the facts. John does, of course.

@John: That is excellent information, worthy to be part of the docs, I'd say...

(18 Sep '13, 11:59) Volker Barth

ha ha you got me, I didn't notice the day number.

(18 Sep '13, 14:15) Breck Carter
1

"worthy to be part of the docs" ...yes, indeed, since recovery time is ALWAYS inversely proportional to personal urgency (takes longest when you're in the most hurry).

(18 Sep '13, 14:17) Breck Carter

Re: "... worthy to be part of the docs ..." - well, I'm not sure but John's information could be there already? It has certainly been mentioned before (in this forum I think?) and in Glenn/Ivan's whitepapers (I think?). At one point I think SQLA even had a warning message in the log when it saw a table that did not have a primary key (or maybe I just dreamed that?). If it is (or not) is moot if the information cannot be easily found in the documentation and/or the docs are not read. :-(

I have sent an email to the doc team to suggest some improvements.

(18 Sep '13, 14:22) Mark Culp

Sorry for the delay. The database is 250 GB. This is a full subscription to a SQL remote consolidated with approx 130 remote sites. Two of the sites are full subscriptions. This was performed as part of a test of disaster recovery, all data being recovered was a result of replication no actual transactions were run on the database.

The hardware is different the recovery was performed on a Poweredge 2900, the database runs on a much more capable Virtual Machine. So this could account for the time, however the 2900 is also quite capable 15K 6 GBS SCSI, WIN 2K3 R2 4 GB Memory. We run tests like this on this machine all the time, I've never seen it take this long.

(18 Sep '13, 14:57) J Diaz
Replies hidden
1

Kudos on actually testing recovery. More kudos on creating a recovery test that uncovers a problem before reality strikes. Please keep us informed of your progress in searching for a solution.

(18 Sep '13, 15:02) Breck Carter

Primary-key-less-tables was only one small part of John's comment... chances are the other valuable material is not yet in the Help.

(18 Sep '13, 15:05) Breck Carter

Agreed. Personally, I was not aware of the "on recovery, each delete is a single-row delete" and the "recovery is always single-threaded" issues...

And these may hurt folks who would certainly not forget to declare fitting PKs...

(18 Sep '13, 15:19) Volker Barth

I was including the part about the deletes in my statement... I've seen/heard of this class of problems in many customer performance issues and I would have thought that this was documented somewhere (e.g. in one or more performance whitepapers). As mentioned, I've passed on this thread and some suggestions to the doc team to have them improve the docs.

(18 Sep '13, 15:44) Mark Culp

Don't worry, you have noticed it within the 24 hours grace time:)

Though it's somewhat embarrassing that the great keeper of DATEDIFF stumbles over such a detail...

(18 Sep '13, 16:39) Volker Barth

Hm, I'm not really understanding what role replication takes here - does the log contain uncommitted OPs (though in that case, it shouldn't matter whether they result from local or replicated transactions?

Or do you try to re-build the remote by re-syncing with the cons?

As to replication: AFAIK, the server within its transaction log management needs to deal with publications whereas subscriptions are dealt with by the SQL Remote message agent. - And within a remote, there are usually only "full subscriptions back to the cons", so usually there's no filtering on the message agent side, either.

(18 Sep '13, 16:46) Volker Barth

Your correct replication has no effect I mentioned only because of the questions regarding size and the fact that the transactions in the log file should not be associated with any work tables as Justin mentioned since these are not replicated.

(18 Sep '13, 18:46) J Diaz
More comments hidden
showing 5 of 18 show all flat view

We have looked into the transactions in this log file and all look normal. We do not have any tables without primary keys. We are chalking this up to a slower machine and the issue John brought up regarding recovery being single threaded.

permanent link

answered 19 Sep '13, 14:27

J%20Diaz's gravatar image

J Diaz
830253144
accept rate: 14%

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×6

question asked: 18 Sep '13, 08:32

question was seen: 774 times

last updated: 14 Jun '14, 16:30