I've finally migrated a 8.0.3 consolidated database to 184.108.40.20624, and nearly all is well.
The one strange thing happens with two 220.127.116.1124 remotes that I use to test with.
They have twice failed to replicate with the above error, such as:
Then the cons answers with a resend, and the resent messages (comprising several files) are then applied without problem. This happens with both v12 remotes.
Strange enough, I have "identical" remotes with 8.0.3 (i.e. with logically identical subscriptions) that do apply the same messages without error (and without need to resend).
AFAIK, the SQL Remote options are similar (e.g. -l is set to the same value - I'm using the DBTools API actually).
I've been using the v12 remotes for a while (since last November), and that kind of error has appeared now and then (and could obviously be resolved by a re-send in all cases). Therefore it does not seem to be bound to the cons having moved from v8 to v12.
Therefore I'd like to ask if there is a known problem with v12 and such errors...
Just to add:
Further analysis shows that this has not happened with 18.104.22.16852 and older versions, and it seems to happen when the remote gets more than one message with the same message header. In the samples, there were several "empty" messages with the following header:
This seems to be seen as an error with 22.214.171.12498 (and 126.96.36.19924) whereas older versions do apply messages with identical headers without problem, here from the log of the 8.0.3 remote:
Note 1: Sending messages with identical headers is very common in our setup, as we use SQL Remote in batch mode, and therefore each remote that has not "answered" currently will soon get a "liveness" message for each SQL Remote run in the cons.
Note 2: A remote with 10.0.1.4181 seems to have the same behaviour.
The problem of dbremote reporting "Deleting duplicate message" has been fixed in the following builds :
v188.8.131.5215, v184.108.40.20691, v220.127.116.1165
We strongly suspect but cannot prove that the "duplicate log operations" is the same problem, but have not been able to reproduce that problem here.
answered 02 Jun '11, 08:49
Further analysis shows that this might be bound to the number of worker threads for applying messages:
It occurs that when -w is set to 0 (the default), the messages with identical header are applied correctly (and each message is applied directly after receiving). In contrast, using -w 1 (as our DBTools-API is set to) receives all messages and then starts to apply them asynchronously (which is desired and the obvious effect of using a separate worker thread for applying).
This has worked fine 18.104.22.16866 - 22.214.171.12401 and with 126.96.36.19952 (MR) - and has worked for years with v8.
However, the parallel application seems to lead to problems with 188.8.131.5298 and above, as the applying thread seems to treat messages with identical headers as wrong and then triggers the warning about duplicate messages.
10.0.1.4181 seems to have this issue, too.
When using no separate worker thread (-w 0), each message is received and applied on its own (as long as the order is correct). As the applying code does only treat one message at a time it cannot notice that there are duplicate identical message headers, and therefore does not trigger a warning. As such, using -w 0 seems a valid workaround.
So I would conclude that something has changed in the analysis of message headers in recent builds that causes this issue.
The issue should be reproducible with any SQL Remote setup fo the according builds, if a cons sends several messages to the same remote (during several batch runs) when the log offset in the cons has not changed in-between. That will send messages with identical headers (i.e. the unchanged log offset) to the remote.