I am responsible for a SQL Anywhere Mobilink system that I did not write, so excuse my ignorance. This past, weekend a hard drive crash took down our main database which has a set of tables synced to dozens of user phones -- an iphone app is used to gather customer equipment data on-site. We use mobilink to do the sync. We have a database backup at the end of every weekday so we have not lost any data from the main server but how can I get the phones to resume syncing? Some of them have synced up but I am getting many errors (794 among others). Is there a way to have all the distributed users sync back to the master? I am willing to lose any changes on the phones to get things working again. This surely must be a scenario your product was designed to handle. Right?
This question is marked "community wiki".
|
Summarizing information from the comments... A system timeline, with consolidated database states condb S0..S1..S2..S3...XX...S2..S2..S4 - ^ ^ ^ remote1 ....a.......c.............d - ^ ^ remote2 ........b.....................e Remote1 will fail to synchronize its changes
Option #3 is the only sure way to achieve consistency. It's not automatic because it loses data. Back to the timeline, remote2 will synchronize successfully, because from its point of view everything is consistent. It did not sync within the window of loss. |
So the Mobilink system has no way to recover and reset the sync? That would be an insane design and a worthless product. Our database system is being hit all the time from different apps and sources. Returning to a state that it was in several days ago at the exact moment of a disk crash is not possible. Without knowing the specifics of what MobiLink server is reporting, I am reluctant to advise next steps. Certainly, you can reset the remote if that is the appropriate action. See ml_reset_sync_state system procedure. Please read and understand what this does before using it. The point I was marking with respect to a recovery is that if there have been synchronizations since the point in time of the database that has been put into the system, those remotes will no longer be able to sync without some intervention. This is not ideal. I take it that the database that was put back into production was not restored to the point in time of the hard drive failure. If so, there will likely be remotes that cannot sync if they have sync'd at a time after that restore point i.e., you have lost the sync status that synchronization uses to ensure data consistency between the remote and consolidated. You can reset the sync state to force that remote to synchronize.
(12 Dec '17, 09:24)
Chris Keating
Your log suggests that the reset procedure in theory should work with the affected users.
(12 Dec '17, 10:07)
Chris Keating
Please hold off on the use of ml_reset_sync_state. We are looking at options on the remote.
(12 Dec '17, 10:36)
Chris Keating
3
Please note that this is really a salvage because data has been lost at the consolidated - the data might simply be related to the current sync state of affected remotes or could extend to data that should be in the consolidated and may no longer exist on the remote. Given the flexibility of the scripts that are used for synchronization, MobiLink would not be able to determine what rows should look like - that is left to the consolidated and its recovery gear. This is not a defect in design. It is reasonable that MobiLink expects that the consolidated database is capable of recovery with no loss of data. That did not happen in this case. You need to now decide 1) Is there data on the remote that may be important to the application and efforts should be made to get that data into the consolidated? If so, this may be a manually effort for the affected remotes. -or- 2) Are you willing to lose that data? If so, options include resetting the remote status or recreating those remotes by sync'ing the database from an empty state - in that case, keep the existing remote as you may be able to manually re-enter the information that exists only in the remote. You may want to work with technical support to go through the details if you are not familiar with MobiLink.
(12 Dec '17, 13:36)
Chris Keating
1
The design is not insane and the product is not worthless. Just my €.02... Volker
(15 Dec '17, 21:01)
Volker DB-TecKy
Replies hidden
Just to add: Understanding such as system does also include the knowledge that in the (hopefully rare) case the mentioned requirement cannot be fulfilled (i.e. the consolidated database cannot be restored up do the point of the last sync), any of the remote databases that have sync'ed with the consolidated after that point may have lost data and/or may need to be resynchronized. That's what Chris and Tim have explained.
(17 Dec '17, 12:24)
Volker Barth
|
Welcome :-)
It might be helpful to include the errors from the Mobilink log file.
The -794 error is reporting that some error has happened in MobiLink server. As Tim has indicated, the MobiLink server log is important in this case. I am going to guess that the MobiLink server log will indicate that there is a progress mismatchIf that is the case, I am going to suggest that the database was not fully recovered.
Are you absolutely sure that the consolidated was recovered to the time that the hard drive crashed. What is the consolidated database? If not SQL Anywhere, how does that database ensure that the recovery is to the last committed transaction in the case of a hard drive crash? If SQL Anywhere, was the transaction log damaged because of the hard drive crash. If not, was it applied to the backup.
MobiLink Synchronization is designed to handle cases where the database is fully recovered. Remotes will not sync if data is lost that is relevant to the status of their synchronization has been lost.
Thanks for the warning :)...
If you are actually looking for help, please provide the information requested (the dbmlsrv -o diagnostic log)
FYI One the site admins forwarded snipets of the MobiLink log from which the error. The errors include [-10400] Invalid sync sequence ID for remote ID which is to be expected if there was lost data related to the sync progress. There were also some [10106] Unable to lock the remote ID which suggest overlapping syncs from the same client.