We've run into an issue all of the sudden where a database instance on one of our servers keeps going offline. We are unable to identify the cause of it but have noticed that when we kill off all the semaphores it continues to run without problems even though the load on the system increases. We are heading into our peak season in a few weeks and the load on the db server is going to increase dramatically. It would be nice to identify the cause of the problem before that happens.

We are planning to update it to ASA9 running in ASA7 compatibility mode and that in and of itself might fix the issue but identifying the cause now would be in our best interests.

Other details: Database server has 2 instances on it - the problem instance contains a 9.3 gb shared database - the other instance contains 4 small databases that have never gone offline.

kernel.shmmax = 268435456 kernel.sem = 250 32000 32 256

System is a VMware host with 2 cpus and 12 gb of RAM

asked 18 Jul '11, 17:03

Wen%20BV's gravatar image

Wen BV
16112
accept rate: 0%

edited 20 Sep '11, 17:32

Volker%20Barth's gravatar image

Volker Barth
40.1k361549819


If you delete all of the semaphores used by the database server (and/or clients) then there was a bug a long long time ago where the server would spin in a tight loop and consume lots of CPU. A change was made a long long time ago - Nov. 2003 - that (I believe) went into ASA 9.0.0 so that if the server's semaphore is deleted the server will now shutdown with an error stating such. A related issue that would cause the server to crash when the server's semaphore was deleted was fixed in ASA 9.0.1 build 2020 and 9.0.2 build 3120 (QTS 390341).

So I suspect that you have it this issue in ASA 7. When the semaphore is deleted it spins and consumes a lot of CPU but will continue to process requests (but slowly).

Please note that ASA 7 is long ago EOL'ed. ASA 9 is also EOL.

If you must stick to 9.0.x then I would recommend switching to the lastest available 9.0.2 EBF. Your problem will likely be resolved.... provided you don't delete the server's (and clients') semaphore. If the semaphore is being deleted by some process on your system then you need to try to track down which process is doing this. E.g. Check your other applications on the system that are using "ipcrm".

permanent link

answered 18 Jul '11, 18:04

Mark%20Culp's gravatar image

Mark Culp
24.9k10139297
accept rate: 41%

no we deleted the semaphores manually and it magically prevented the database from going offline. What we are seeing is the DB will just go offline at random times and we need to kill the dbsrv7 processes and restart the db service. At times it was happening as frequently as every 15 minutes but never with any observable pattern. eventually we wrote a script that would automatically do this for us every 30 minutes so we could get sleep at night but obviously that is a really bad thing to do. We then discovered that our steps were creating more and more semaphores without removing the old ones so by the end of the night we had some 250 semaphores and the db service would refuse to start. So we wrote a script that would kill the semaphores and then restart the service every 24 hrs. At one point I accidentally ran that script while the db was running and it killed the semaphores but the db server doesn't mark the database as being offline anymore. The script basically just runs ipcrm by the way...

We have already purchased licenses of ASA12 and are planning to migrate to it after our peak season ends in September. We have done the port to get our applications to work with ASA9 in ASA7 compatibility mode -

(18 Jul '11, 18:25) Wen BV
Replies hidden

Ok, so my answer only addressed the topic of your "work around". We need to get to the root cause of your original issue. Please explain what you mean by "going offline" - what does this mean? hangs? stops responding? shuts down?

When the server "goes offline" what messages, if any, are displayed in the console log? If you run truss against the database server process what activity does it show?

(18 Jul '11, 21:02) Mark Culp

uh what's truss? we don't have any deep sybase experience in house so we're winging it. By going offline I mean we have a shared database in the first instance of that server that is 9.4 gigs. It periodically goes offline so when we call the status of the database using our application we get the following response - only the first item - the big shared 9.4 gig db will say offline - we know this is happening first because we get Nagios warnings from our web server "epos1" letting us know that it is unable to make a connection - if we sit and wait eventually we start getting actual warnings that it isn't able to do anything - it queues up the requests and when they start timing out the world ends, the sky falls, death is imminent... and other colorful expressions to that effect. : ] That database services all of the transactions for about 70 universities in the US doing textbook and other merchandise purchases.

eposdb Mon Jul 18 20:35:31 MST 2011 db1-risk.sequoiap.net

INSTANCE e[32;1mdb1-riske[0m e[1mONLINEe[0m port 2639 --------------------------------------------- shared e[1monlinee[0m 9.4G

INSTANCE e[32;1minstance1e[0m e[1mONLINEe[0m port 2640 --------------------------------------------- epos_103 e[1monlinee[0m 213M epos_201 e[1monlinee[0m 135M epos_539 e[1monlinee[0m 111M shared e[1monlinee[0m 11M

(18 Jul '11, 23:38) Wen BV
Replies hidden

Sorry, I used Solaris' term ("truss") for Linux's term ("strace"). Run "strace -p <pid-of-server> -T -o <filename.txt>" to get a log of system calls that are being made by the database server.

So "offline" means that you cannot connect to the server. This can be caused by numerous things. How much CPU and I/O is being used by the server process when this happens? How are you connecting to the server: TCP/IP or shared memory? You may want to try leaving a dbisqlc connection connected to the server (use the "idle=0" connection parameter so the connection does not timeout) and then when the server goes "offline" see if this dbisqlc session can still process requests (E.g. try "select 1 from dummy"; if that works try "call sa_conn_list()" or "call sa_locks()"... but I'm not sure these procedures existed in v7?)

(19 Jul '11, 04:03) Mark Culp

FWIW, the dummy system table and sa_conn_info do exist in v5.5 - at least on Windows...

(19 Jul '11, 08:39) Volker Barth

CALL sa_conn_info() and CALL sa_locks() work in 7.0.3.2046 on Windows, but not sa_conn_list(). Also, SELECT * FROM procedure_call() does not work in SA7 so if you want a result set you have to use a cursor to CALL the sa_things.

(19 Jul '11, 08:52) Breck Carter

I guess someone porting from ASA 7 won't be used to the great v9 "from procedure" syntax. When testing with dbisqlc, just CALLing the procedures would do:)

(19 Jul '11, 09:35) Volker Barth

Perhaps someone on this forum might be interested in helping us with some consulting. If the upgrade to ASA9 next week doesn't fix the problem we are going to be in a lot of trouble. We just don't have this knowledge in house.

(26 Jul '11, 16:30) Wen BV
showing 3 of 8 show all flat view
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×108
×68
×13

question asked: 18 Jul '11, 17:03

question was seen: 3,501 times

last updated: 20 Sep '11, 17:32