What my BOFHs observe daily:
Given the extremeness of the situation (100% CPU usage), I am hoping that someone may fill me in on what we could be doing that could possibly cause this situation. We have not been able to reproduce this yet nor have we discovered any infinite loops in our code. Any pointers are welcome. I am basically a n00b when it comes to SQL Anywhere, so I have to read up on application profiling I guess. asked 25 Oct '11, 11:41 Rune Daz Liquid |
I had to defer the troubleshooting to my colleague, as I was otherwise engaged. Eventually he found a query that would run all CPUs flat out and never finish. He waited for 15 minutes before killing it. Then he looked at the query plan. In his words "it was messy". The solution was to add an index to one of the temporary tables used. That whacked the query execution plan back into shape and we are back in business. The query now takes 20 seconds to run. Not sure this deserves to be marked as 'an answer', but my colleague's workaround will hopefully see us through. answered 08 Nov '11, 09:41 Rune IMHO, what "deserves to be marked as an (accepted) answer" should be answered by the one with the question - in short by you:)
(08 Nov '11, 10:14)
Volker Barth
IMHO that's not a workaround - that's fixing the problem. This is exactly what indexes are for!
(10 Nov '11, 16:49)
carolstone
Yeah, indexes are nice and all, but the code in question ran just fine with SQL Anywhere 10. We think we tripped over a bug introduced with later versions of SQL Anywhere. Even if a lack of an index (for a temporary table mind you) triggers a table scan, one would assume the query would finish eventually. To go flat out on all 24 CPU cores for a long time (on one occasion they waited for more than an hour before killing it) does not seem right to me.
(11 Nov '11, 01:38)
Rune
Replies hidden
So then you might consider a bug report?
(11 Nov '11, 03:18)
Volker Barth
We will look into a reliable way of reproducing this eventually, yes.
(11 Nov '11, 04:48)
Rune
|
Please tell us the SQL Anywhere version number, e.g. 12.0.1.3436
Ooops, I forgot to mention that.
But you guessed right, we are still on build 3436. I have not checked out the latest EBF yet.
Two suggestions: First, http://sqlanywhere.blogspot.com/2011/08/new-maxbps768-set-maxquerytasks-1.html
Second, run the Foxhound monitor against your server. Version 1.1 is available here http://risingroad.com/
...and version 1.2 is discussed here http://sqlanywhere.blogspot.com/2011/10/whats-cool-whats-new-in-foxhound-12.html
You could use ProcDump from technet to create a dump of the running process during the high CPU usage and provide it to Sybase support for analysis. It reminds me of one of my own questions:
a sequential scan blocks all new connections
I have not checked out the latest EBF yet.
It is definitely recommended to try this to see if this is a known (fixed) issue.
Given the extremeness of the situation (100% CPU usage), I am hoping that someone may fill me in on what we could be doing that could possibly cause this situation.
Does the server ever get out of the 100% CPU spin, or does it stay there forever / a long period of time? Having a "temporary" 100% CPU "spike" may not be unexpected (e.g. you have an intense CPU-bound, parallelized query).
Have you tried using the Diagnostic Tracing utility to determine which queries are being run during these 100% CPU times?
This trick isn't likely to solve your problem, but could give you a place to start if (like me) you're too busy (or lazy) to really track it down properly. Sometimes you can get a sense of who's doing what if you connect using ISQL and run call sa_conn_info(). It's a relatively quick peek into what's going on.
The last time we had a very slow server it was duelling backups. And the time before that it was three different reporting queries calling the same monster sub-query at the same time. We were able to track both of these down pretty quickly starting with sa_conn_info() because we could track down what the different connections to our server were actually doing by hunting down the people responsible for each of those connections, rather than trying to get the database to tell us everything.
Good luck!