What my BOFHs observe daily:

  • All 16 CPU cores maxed out at 100%
  • It is possible to connect to the server instance using Sybase Central, but it takes forever and they have given up further diagnostics (easier to just turn the process 'off and on again')
  • Deadlock logging enabled, does not show anything according to one of my colleagues
  • SQL Anywhere 10 could "phreak out" as well I'm told, but would not max out all CPU cores (at most just one)

Given the extremeness of the situation (100% CPU usage), I am hoping that someone may fill me in on what we could be doing that could possibly cause this situation. We have not been able to reproduce this yet nor have we discovered any infinite loops in our code. Any pointers are welcome.

I am basically a n00b when it comes to SQL Anywhere, so I have to read up on application profiling I guess.

asked 25 Oct '11, 11:41

Rune's gravatar image

Rune
2365817
accept rate: 50%

retagged 11 Nov '11, 03:29

Daz%20Liquid's gravatar image

Daz Liquid
861182338

Please tell us the SQL Anywhere version number, e.g. 12.0.1.3436

(25 Oct '11, 12:02) Reimer Pods

Ooops, I forgot to mention that.

But you guessed right, we are still on build 3436. I have not checked out the latest EBF yet.

(25 Oct '11, 12:09) Rune

Two suggestions: First, http://sqlanywhere.blogspot.com/2011/08/new-maxbps768-set-maxquerytasks-1.html

Second, run the Foxhound monitor against your server. Version 1.1 is available here http://risingroad.com/

...and version 1.2 is discussed here http://sqlanywhere.blogspot.com/2011/10/whats-cool-whats-new-in-foxhound-12.html

(25 Oct '11, 14:34) Breck Carter

You could use ProcDump from technet to create a dump of the running process during the high CPU usage and provide it to Sybase support for analysis. It reminds me of one of my own questions:

a sequential scan blocks all new connections

(26 Oct '11, 03:03) Martin

I have not checked out the latest EBF yet.

It is definitely recommended to try this to see if this is a known (fixed) issue.

Given the extremeness of the situation (100% CPU usage), I am hoping that someone may fill me in on what we could be doing that could possibly cause this situation.

Does the server ever get out of the 100% CPU spin, or does it stay there forever / a long period of time? Having a "temporary" 100% CPU "spike" may not be unexpected (e.g. you have an intense CPU-bound, parallelized query).

Have you tried using the Diagnostic Tracing utility to determine which queries are being run during these 100% CPU times?

(26 Oct '11, 11:52) Jeff Albion
1

This trick isn't likely to solve your problem, but could give you a place to start if (like me) you're too busy (or lazy) to really track it down properly. Sometimes you can get a sense of who's doing what if you connect using ISQL and run call sa_conn_info(). It's a relatively quick peek into what's going on.

The last time we had a very slow server it was duelling backups. And the time before that it was three different reporting queries calling the same monster sub-query at the same time. We were able to track both of these down pretty quickly starting with sa_conn_info() because we could track down what the different connections to our server were actually doing by hunting down the people responsible for each of those connections, rather than trying to get the database to tell us everything.

Good luck!

(26 Oct '11, 16:38) carolstone
More comments hidden
showing 5 of 6 show all flat view

I had to defer the troubleshooting to my colleague, as I was otherwise engaged.

Eventually he found a query that would run all CPUs flat out and never finish. He waited for 15 minutes before killing it.

Then he looked at the query plan. In his words "it was messy".

The solution was to add an index to one of the temporary tables used. That whacked the query execution plan back into shape and we are back in business. The query now takes 20 seconds to run.

Not sure this deserves to be marked as 'an answer', but my colleague's workaround will hopefully see us through.

permanent link

answered 08 Nov '11, 09:41

Rune's gravatar image

Rune
2365817
accept rate: 50%

edited 08 Nov '11, 09:43

IMHO, what "deserves to be marked as an (accepted) answer" should be answered by the one with the question - in short by you:)

(08 Nov '11, 10:14) Volker Barth

IMHO that's not a workaround - that's fixing the problem. This is exactly what indexes are for!

(10 Nov '11, 16:49) carolstone

Yeah, indexes are nice and all, but the code in question ran just fine with SQL Anywhere 10. We think we tripped over a bug introduced with later versions of SQL Anywhere. Even if a lack of an index (for a temporary table mind you) triggers a table scan, one would assume the query would finish eventually. To go flat out on all 24 CPU cores for a long time (on one occasion they waited for more than an hour before killing it) does not seem right to me.

(11 Nov '11, 01:38) Rune
Replies hidden

So then you might consider a bug report?

(11 Nov '11, 03:18) Volker Barth

We will look into a reliable way of reproducing this eventually, yes.

(11 Nov '11, 04:48) Rune
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×409
×11

question asked: 25 Oct '11, 11:41

question was seen: 3,434 times

last updated: 11 Nov '11, 04:48