I'm trying to investigate a server suffering very occasional freeze-ups where the whole database becomes unresponsive. The customer has been resolving the issue by restarting the db engine, but with over 100 users this is hardly a long term solution. Occasionally the database frees itself up after some minutes, before a restart has been initiated. Trying to investigate at the time the freeze-up happens isn't possible as the system is, well, frozen and you can't connect by any method. I have put on request level logging for SQL and BLOCKS in the hope that it may show if a long running process is blocking everything else. In the most recent freeze-up, the system was paused for around 13 minutes and then came back to life. This is reflected in the request log (viewed with sa_get_request_times() ) that shows a gap in activity at the relevant time (and lots of statements that all took approx 800,000 milliseconds - ie 13.5 minutes - to complete). What I can't see in that output is any data relating to blocking (or its absence) - the output from the procedure doesn't seem to have any relevant information. Nor does the only other procedure I am aware of for analysing request log data - sa_get_request_profile. So in summary - having enabled request level logging of BLOCKS - how do I use it? Many thanks. v10.0.1.4310 (yes I know - but I don't see anything in v16 either) |
After running the 1
Thanks Mikel - I've checked that output and the good news is that my issue isn't being caused by blocking - the bad news is I now have no clue as to what else might be doing it :( I've added a note to DCX about that table as it isn't mentioned at all in the help documentation.
(12 Jun '15, 15:25)
Justin Willey
Comment Text Removed
1
Are any of the statements that take 13 minutes to complete expected to take that long? If the multiprogramming level is low enough, all of the workers could have requests assigned to them (which would ‘block’ new requests until one of them completed). What is the -gn value currently set to? When the server is running normally, what does If you would like to troubleshoot the issue while the server is unresponsive, having a connection open with the dedicated_task option set to ‘On’ may help. How many logical processors are being used by SQL Anywhere? Version 10 is end of support, and there are some issues regarding intra-query parallelism that have been fixed in 12 & 16, but not 10. If this was the case, setting max_query_tasks=1 may help.
(15 Jun '15, 09:19)
Mikel Rychliski
Replies hidden
Comment Text Removed
...and please note that "ActiveReq" will not count worker threads used for proxy connections (or maybe for external procedures, don't know for v10 as that does not have external environments like v11 and above) - compare that other question. Of course that will not matter if it's not an issue with the MPL at all.
(15 Jun '15, 14:10)
Volker Barth
Thanks for following this up. Nothing should take anything like 13 minutes or even 13 seconds. There are 4 logical processors (it's a virtual machine), and there is no explicit -gn setting so it will presumably use the default of 20. Causal checking of normal usage puts active tasks between 2 & 7 (average 2.4). There aren't any remote servers in use. I've put an event in place to track the number of active tasks and I'll see if that sheds any light on the issue. (In the meantime I'll see what I can do to accelerate the plan for moving to v16!)
(17 Jun '15, 17:45)
Justin Willey
What I do see in the log is occasional "Internal warning: Request task x took y seconds", where Y can be up to 20 seconds. Is there any way of correlating these warnings with something in the request log?
(17 Jun '15, 17:47)
Justin Willey
UPDATE - logging for a day (where no freeze up happened) gave a max of 9 active tasks
(18 Jun '15, 07:39)
Justin Willey
Try the following Goggle search... "Internal warning" "Request task" What you will see is some finger-pointing at Queries From Hell... another reason to fire up Foxhound... or run Application Profiling if you have the stamina :)
(18 Jun '15, 08:29)
Breck Carter
When you say "it's a virtual machine" are you sure that SQL Anywhere is getting more than one processor? Are you personally in control of the system setup, or is it in the hands of a True Believer in The Church Of The Magical VM?
(18 Jun '15, 08:33)
Breck Carter
The latter I fear - though reality has begun to creep in :). It does seem though we are getting to use all the processors.
(18 Jun '15, 11:45)
Justin Willey
2
What I have seen so far for SQLA to become temporarily unresponsive was a full table scan on a huge table or the growth of the database file blocking until the growing finished...So in a VM environment I would have a look at IO.
(19 Jun '15, 06:08)
Martin
Will do, thank you for the suggestion.
(19 Jun '15, 07:17)
Justin Willey
2
It might be a good idea to check if the VM infrastructure logged any events at the time of freeze up. If someone takes a snapshot of the VM or the VM is moved between physical hosts (vMotion in VMWare), the database would be unresponsive.
(19 Jun '15, 09:50)
Mikel Rychliski
|
Version 9 had a history of non-responsiveness sometimes due to checkpoints and backups (and the checkpoints resulting from backups). If memory serves those issues went away with V10 but perhaps it's worth looking at the timing of checkpoints and non-responsiveness.
Does Foxhound show any interesting activity leading up to the periods of non-responsiveness?
I need to get 16 installed and then I can put FH on.