SAP SQL Anywhere

Question

Hey everyone,

We're currently struggling with some issues in our productive product(s) and a bit of help would be much appreciated.

The symptom

Seems quite simple at first : the synchronization starts successfully, the upload occurs, the download is prepared, it downloads and put rows into the transaction log and finally we're getting a

caught UljException : remoteID=37afd978-648e-11e6-8000-b1e0a102c04c, errorCode=-1305, params=86,404,0, sqlOffset=-1, message=UltraLiteJ Error[-1305]: MobiLink communication error – code: 86, parameter: 404, system code: 0com.ianywhere.ultralitejni16.implementation.JniException: UltraLiteJ Error[-1305]: MobiLink communication error – code: 86, parameter: 404, system code: 0

on one table (must say, this table can get pretty big after a few days).

The error occurs on several devices that until then successfully synchronized.

The context

We're using :

Ultralite on Android devices as the remote database
A reverse proxy
Mobilink 16
An Oracle database as the consolidated database

The product runs big amount of data and in our case, the device has come to a point where it needs to synchronize 100K+ rows on the table on which the error is generated.

We have a super strong Wifi connection. Thus the problem is not coming from a client losing connection.

The logs state that between the moment we start syncing a table and the moment the error pops is quite random. I've seen values as ~60 minutes, ~35 minutes and... 4 minutes.

And the question

Knowing that it's less likely the connection between the client and the mobilink server/reverse proxy that is faulty, what would cause a 404 error ?

I thank you in advance for your support,

Best regards.

Edit (2016-08-26): here's an extract of the logs

Mobilink server logs
------------------------------------
I. 2016-08-23 11:07:54. <4912875> Request from "UL 16.0.2193" for: remote ID: 37afd978-648e-11e6-8000-b1e0a102c04c, user name: damdoumdada, version: R12
I. 2016-08-23 11:07:56. <4912875> The current synchronization is using a connection with connection ID 'Unknown'
I. 2016-08-23 11:07:56. <4912875> User 'damdoumdada' has been successfully authenticated against LDAP server 'dimdoumdimdim'
I. 2016-08-23 11:07:56. <4912879> COMMIT Transaction: Begin synchronization
W. 2016-08-23 11:07:56. <4912875> [10017] The consolidated and remote databases have different timestamp precisions.  Consolidated database timestamps are precise to 6 digit(s) in the fractional second while the remote database timestamps are precise to 2 digit(s)
I. 2016-08-23 11:07:56. <4912875> COMMIT Transaction: Begin synchronization
I. 2016-08-23 11:07:56. <4912875> COMMIT Transaction: Upload
I. 2016-08-23 11:07:56. <4912875> COMMIT Transaction: Prepare for download
W. 2016-08-23 11:08:25. <4912875> [10116] The earliest transaction has been running for '1792' seconds
I. 2016-08-23 11:08:25. <4912875> Sending the download to the remote database
I. 2016-08-23 11:08:27. <4912875> COMMIT Transaction: Download
I. 2016-08-23 11:08:27. <4912875> COMMIT Transaction: End synchronization
E. 2016-08-23 11:10:48. <4912875> [-10279] Connection was dropped due to lack of network activity
I. 2016-08-23 11:10:48. <4912875> Synchronization complete
I. 2016-08-23 11:10:48. <4912875> PHASE: sync_request: 13
I. 2016-08-23 11:10:48. <4912875> PHASE: receive_upload: 2068
I. 2016-08-23 11:10:48. <4912875> PHASE: get_db_worker: 0
I. 2016-08-23 11:10:48. <4912875> PHASE: connect: 5
I. 2016-08-23 11:10:48. <4912875> PHASE: authenticate_user: 228
I. 2016-08-23 11:10:48. <4912875> PHASE: begin_sync: 27
I. 2016-08-23 11:10:48. <4912875> PHASE: apply_upload: 44
I. 2016-08-23 11:10:48. <4912875> PHASE: prepare_for_download: 24
I. 2016-08-23 11:10:48. <4912875> PHASE: fetch_download: 30615
I. 2016-08-23 11:10:48. <4912875> PHASE: end_sync: 14
I. 2016-08-23 11:10:48. <4912875> PHASE: send_download: 141472
I. 2016-08-23 11:10:48. <4912875> PHASE: wait_for_download_ack: 0
I. 2016-08-23 11:10:48. <4912875> PHASE: get_db_worker_for_download_ack: 0
I. 2016-08-23 11:10:48. <4912875> PHASE: connect_for_download_ack: 0
I. 2016-08-23 11:10:48. <4912875> PHASE: nonblocking_download_ack: 0

Near these logs, we can find :

E. 2016-08-23 11:08:46. <4913034> [-10117] Stream Error: Unable to continue unknown HTTP session
I. 2016-08-23 11:08:46. <Main> Synchronization connection with connection ID 'Unknown' has been disconnected
I. 2016-08-23 11:08:46. <Main> COMMIT Transaction: End connection

Answer 1

3

We have an ml-session-id header on every HTTP request that the server uses to associate the different HTTP requests of a sync together. A 404 probably means that the server received an HTTP request with an unknown session id. There're two likely causes of this:

The sync timed out and the client's last request didn't make it in time. If this is the cause then error -12079 will appear in the log before a "Unable to continue unknown HTTP session" error. The two errors will show up with different sync numbers so this won't be easy to spot in a busy server. This just indicates something went wrong on the network and there isn't really a lot you can do about this.
There was a network blip and when the client tried to reconnect your load balancer directed it to a different server in your farm. When using HTTP, if the client detect a network error before the timeout, it will automatically reconnect and resume the sync under the covers. If you have more than one ML server, I suspect this is what's causing the errors. It should be possible to configure your load balancer to forward all HTTP requests with the same remote id to the same server, but I don't know the details. I'll poke someone who would know and ask them to add a comment.

If you're going to be doing large downloads, you also might want to look into the restartable downloads feature. With that, the server will keep failed download around and they can be resumed even after a disconnect. The download information isn't shared across servers in a farm, so if you turn on resumable downloads with more than one server, you'll have to make sure that all requests from a remote go to the same server.

permanent link

answered 24 Aug '16, 09:38

Bill Somers
1.2k●8●20
accept rate: 40%

Hey Bill,

I've looked into the server logs and have spotted several "[-10117] Stream Error: Unable to continue unknown HTTP session". Is there any difference between the -10117 and the -12079 ?

(24 Aug '16, 10:27) yan

Replies hidden

If you are seeing this error message then the MobiLink connection to Oracle may be blocked by a lock held by some other connection, perhaps another MobiLink connection, perhaps not.

(I think the word "clocking" should be "blocking" in the "Probable cause" below)

Connection ID '%1' is currently blocked by connection ID '%2' for %3 seconds on %4

Error code 10117

Error constant CONN_BLOCKED_ON

Parameter 1 A MobiLink server database connection ID.

Parameter 2 Another database connection ID.

Parameter 3 Time in seconds.

Parameter 4 Table name or database operation that is currently blocked.

Probable cause

The MobiLink server will detect any its database operations that take longer than a given time and reports the connection ID that is currently clocking the MobiLink server connections. To avoid this problem, try to reduce the transaction open time.

(24 Aug '16, 11:33) Breck Carter

1

FWIW the MobiLink server log is the very first place to look for diagnostic information about MobiLink errors, because that's where you will find what you're looking for 99.99% of the time.

The other 0.01% of the time, helpful information will be found elsewhere; e.g., on the client side, Oracle database server, etcetera.

Of course, I am exaggerating. The real number is only 99.9% :)

(24 Aug '16, 11:37) Breck Carter

I don't see the Error code 10117 message in the mobilink server logs but the -10117.

Which makes me wondering because there is a high probability that the central database might take long to execute the query.

Is there any link between the two error codes ? :o

(26 Aug '16, 03:36) yan

Well, IMHO, both are totally different messages:

Each error has a numeric error code. Error codes are always negative. Warning codes are always positive. Zero means no error or warning.

-10117 is the error STREAM_ERROR
10117 is the warning CONN_BLOCKED_ON Breck has brought into the play.

As you are seeing a -10117 error but do not get the 10117 warning, I guess a blocked connection is not necessarily an issue here...

FWIW, I don't find anything about the -12079 error mentioned by Bill in the docs. Do you notice a -10279 error ("Connection was dropped due to lack of network activity") instead?

(26 Aug '16, 04:17) Volker Barth

Yes, totally ! And it is reported on the same synchronization id as the one who reported the initial problem.

Should I go with Bill's answer then ?

(26 Aug '16, 04:38) yan

Sorry, I just tried to clear up the error codes... - can't tell on the ML problem itself.

However, Bill – who is the expert – has raised the question whether you are using a ML server farm. If not, then in my limited understanding, point 2 from his answer would not apply here. So do you use several ML servers?

For further diagnosis, it might be helpful if you could include a ML server log snippet showing these errors and the lines before/after...

(26 Aug '16, 04:47) Volker Barth

We have two ML servers running. The Reverse Proxy is set to keep HTTP session sticky and finally we don't use the restartable feature. We tried though but have decided to get rid of it.

Finally, I've added some ML server logs in the initial post. :-)

(26 Aug '16, 05:11) yan

> Is there any link between the two error codes ? :o

Other than my stupidity, apparently not :)

(27 Aug '16, 00:12) Breck Carter

showing 1 of 9 show all flat view

SAP SQL Anywhere

Mobilink communication error 404

Follow this question

Related questions