Our HA configuration : Server 1 (DC) : Primary database Arbiter
Server 2 (DRC) : Mirror Database
We have problem when network latency from DC to DRC slow/bad then primary database hang/pending transaction. All transaction from client apps to primary database pending/hang until arbiter get status database mirror disconnected or connected.
I assume that primary database hang/pending transaction because arbiter need to state primary or mirror status.
Need your help how to configure HA SQL Anywhere without primary database hang/pending transaction in condition slow network latency.
Thanks For your Help
asked 12 Apr '16, 09:05
Debugging slowdown between connected primary/mirror
There are two possible synchronous calls that are made between a connected primary and aynchronous mirror that could cause transactions on the primary to slow down:
1.1) As Mark mentioned, every ~100 sends, the primary will wait for the mirror to reply back to say that buffers have been written to disk. This is to ensure the mirrors buffers aren't overloaded. If this is the case, there would be evidence in the mirror log files. I would suggest enabling mirror logging on the mirror server (specified with logfile=filename.mlog in CREATE MIRROR SERVER statement - see http://dcx.sap.com/index.html#sa160/en/dbreference/alter-mirror-server-statement.html*d5e32538 ), and then look for a line that looks like:
MM/DD HH:MM:SS.SSS << LOG_PAGES,16,partner,first_page=1,need_ack=1
On an asynchronous mirror, most of these requests should have need_ack=0, but if you see need_ack=1, the primary will wait for a reply. You should look in the log for the next line that looks like this:
MM/DD HH:MM:SS.SSS >> SUCCESS,partner
This line means the reply was sent to the primary. If there is a large time gap between the receive of the request and the reply, then this will be delaying your primary. This slowdown could be caused by a slow hardware. We recommend that hardware on the mirror is just as good (or better) than the hardware being used for the primary.
Alternately, if the time gap isn't very large and you're seeing longer delays, you can also enable mirror logging on the primary to see how long it's taking for these requests to get from one server to the other. This travel time could account for the delays as well. If this is the case, we recommend that you try to improve the network between the machines.
1.2) The second possibility is that the mirror is running behind the primary in applying transactions. In this case, you should be able to see the following message in the console log (output from -o server.conslog on the server start command line):
Database "demo" mirroring: primary blocked for x seconds waiting for the mirror to catch up"
The fix for this is to ensure that the hardware on the mirror is just as good (or better) than the hardware being used for the primary. It's also possible that the network could be causing this problem. (As in #1 you could check for this by looking for DU_MIRROR_CATCHUP in the primary and mirror .mlog files to see how long each request takes)
From this posting, though, it sounds like perhaps your problem is when the primary loses the connection to the mirror and needs to check with the arbiter whether to stay on as primary. I don't think the arbiter should be checking with the mirror as you've suggested, so I'm not actually sure what slowdown you could be seeing here. However, here are some ideas about improving your configuration
2.1) Unless you have set the mirroring option auto_failover to ON, you don't actually have high availability with an asynchronous mirror. You could potentially change the mirror to be a read-only scale-out node (ie copy node). This would prevent the primary from checking for quorum when the node disconnects; however, you may encounter some of the network delays mentioned in 1.1.
2.2) In general, we recommend that the primary, mirror, and arbiter are all located on separate machines for optimal high availability. It sounds like in your situation, the primary is experiencing high enough network latency to cause dropped connections. In this case, I wonder whether it would be better for the mirror to takeover as primary, and the primary can rejoin when the network issues resolved. For this to work properly, you will need to make the mirror synchronous, and move the arbiter to another machine. If you want to have a preferred primary server, you could set the "preferred" option, as Volker has mentioned.
2.3) Perhaps one of your problems is that you're experiencing network drops when in reality there are just extended delays. In this case, you could try increasing the liveness timeout by adding "lto=timeout_value" to the connection strings provided in the CREATE MIRROR SERVER statements. See http://dcx.sap.com/index.html#sa160/en/dbadmin/livenesstimeout.html
2.4) As an alternative to using SQLA database mirroring, you could use live backups. http://dcx.sap.com/index.html#sa160/en/dbadmin/da-backup-dbs-4977640.html
answered 13 Apr '16, 11:45
(1) If the primary server cannot verify that it has quorum (i.e. knows that it is still the primary and to do this it needs to be able to to communicate to either the arbiter or the mirror or both) then it will stop all COMMITs until such time that it can get quorum again - this is working as it is intended to work. If the primary were to allow COMMITs to complete without quorum then it would be allowing the possibility of lost transactions in the future.
(2) If quorum is not the problem then the other thing that happens under the cover is that the primary will slow down (and stop/wait) if the primary gets "too far" ahead of the mirror. I.e. The primary needs to send the transaction information to the mirror and the primary needs to ensure that the mirror does not get overloaded so what it does is after every 100 (I think that is the number?) packets the primary will send a "did you get this?" request and then wait until it gets a response - If the mirror is behind and has a backlog of packets that it has not yet received then the response can be delayed and you may see connections that are COMMITing on the primary "hang" until the response is received.
The only suggestion that I would make is to attempt to improve your network connection between DC and DRC.