The forum will be down for scheduled maintenance on Saturday, March 4 beginning at 10am EST. Actual downtime is unknown but may be several hours.

This has been reported a few times but no resolutions reported and I wondered if anyone had got to the bottom of it?

We occasionally have database connections where the client application has been disconnected but the connection is still active on the server and cannot be dropped. Attempting to drop the connection either through Sybase Central or using DROP CONNECTION has no effect (and no error is reported). It makes no difference when the client machine is rebooted.

The connection appears in the list of active connection in Sybase Central and is consuming CPU time (slightly less than a full processor's worth - though it varies). The statement reported as the last statement is a fairly straightforward and is being executed by the system for other connections thousands of times per hour without problem. The connection is not shown as being blocked.

The server version is 10.0.1.4239

Related questions:

http://sqlanywhere-forum.sap.com/questions/1772

http://sqlanywhere-forum.sap.com/questions/1368

asked 06 Jul '11, 07:12

Justin%20Willey's gravatar image

Justin Willey
6.5k104135200
accept rate: 21%

edited 06 Jul '11, 07:14


We have seen customer reports in the past (known to exist 11.0.1) of the bug CR #594568 leaving 'dead connections' around that can no longer be dropped. (e.g. We started a HashFilter/parallelized intra-query task and 'accidentally' left around some worker threads afterwards).

Notably, CR #594568 was only the first fix for a class of 'HashFilter' problems - there were additional issues (in particular, CR #620095 ) fixed on later builds/versions (including 10.0.1/12.0.1). I would recommend seeing if this 'zombie connection' behaviour still occurs after these fixes/builds:


Update (2012/03/30):

We have now diagnosed more instances of a 'hung connection' problem from other customer reports. The updated list for parallelized HashFilter problems (that may leave Exchange worker tasks 'abandoned' and 'undroppable') now includes the following issues/builds:

  1. CR #594568 - 11.0.1.2401
  2. CR #620095 - 10.0.1.4182, 11.0.1.2544, 12.0.1.3126
  3. CR #693560 - 10.0.1.4309, 11.0.1.2786, 12.0.1.3713
  4. CR #702733 - 11.0.1.2786, 12.0.1.3713

The list of associated parallelized 'HashFilter' issues (server hang/crashes/incorrect results) include:

  1. CR #663991 - 10.0.1.4205, 11.0.1.2584, 12.0.1.3320
  2. CR #674917 - 11.0.1.2631, 12.0.1.3388

Note: CR #702733 was identified after the End-of-Life date for SQL Anywhere 10.0.x. As such, this fix was not backported to this codeline and it remains a possibility for 10.0.x builds to encounter undroppable internal connections in very rare circumstances. If this is a concern for you, please upgrade to a later version of SQL Anywhere (11.0.1 / 12.0.1), which includes this fix. Setting the connection option 'MAX_QUERY_TASKS' to '1' or restarting the database server is a workaround for the issue in 10.0.1.

permanent link

answered 06 Jul '11, 13:14

Jeff%20Albion's gravatar image

Jeff Albion
10.7k171174
accept rate: 24%

edited 30 Mar '12, 14:10

Nevertheless, a great summary!

FWIW, the last CR notes a version 4.0 - is Watcom SQL (as product, not as the great SQL dialect) still under active development?

(06 Jul '11, 13:26) Volker Barth
Replies hidden

The last reference was a recently fixed issue; the builds I have posted should be accurate. I believe the public facing system is still 'catching up' with the versions. Thanks for the report!

(06 Jul '11, 13:41) Jeff Albion

It's not directly reproducible but is occurring regularly. Potentially more helpfully: the server in question crashed with this issue on-going and as part of the investigation of that (probably unrelated) issue (case express #116801982) a full core dump and a request level log have been uploaded to support today.

(06 Jul '11, 14:35) Justin Willey
Replies hidden

That second sentence could have been clearer - I meant the investigation of the crashing problem!

(06 Jul '11, 15:39) Justin Willey
1

If the server is also crashing, it sounds like the environment is possibly 'unstable'. There is likely another issue involved (that is responsible for the crash specifically) that may be contributing to the hanging connections (e.g. potentially something 'like' CR #674917). It could be the case that getting a connection into this state precipitates the crash, or...

I would be interested to see if this behaviour still happens after we figure out the cause of the crash internally - it looks like we're still currently investigating that issue.

(06 Jul '11, 16:40) Jeff Albion

Thanks for the updated info!

(30 Mar '12, 14:19) Justin Willey

Let me add my thanks, too, for taking the time to update two three other discussions as well as this one

(30 Mar '12, 15:38) Breck Carter
Replies hidden
2

You're quite welcome - I've been trying to keep track of the reports for this issue and I think I've found all of the questions now. Feel free to link back to this thread if you find any more around.

If I hear of anything else regarding this situation, I will try to keep this answer updated. Please feel free to report future instances of any 'undroppable connections' as a new question after these builds have been released. (These builds have also been currently requested in QA).

(30 Mar '12, 16:01) Jeff Albion

The 11.0.1 Windows / Linux EBFs have now been released for the CR #702733 issue:

11.0.1.2789 - Windows x86/x64 - http://downloads.sybase.com/swd/detail.do?baseprod=144&client=ianywhere&relid=15647

11.0.1.2794 - Linux x86/x64 - http://downloads.sybase.com/swd/detail.do?baseprod=144&client=ianywhere&relid=15643

The 12.0.1 EBFs are still progressing through QA.

(20 Apr '12, 13:55) Jeff Albion
showing 4 of 9 show all flat view

My guess would be that the connection has a huge transaction to rollback. The communications link may be gone but the server must actually rollback all uncommitted operations performed by that connection before the connection will disappear from the list reported by the server.

permanent link

answered 06 Jul '11, 09:46

John%20Smirnios's gravatar image

John Smirnios
8.7k377106
accept rate: 40%

...so this would be a "delayed" disconnect and not a failed one?

(06 Jul '11, 09:51) Volker Barth
Replies hidden

That would make sense John - thanks. There shouldn't be any really big transactions but .... If this is what is happening, what would you expect the ReqType (Type of active request) to show - these connections are showing 'FETCH'? In case it helps ReqStatus is 'Executing'.

(06 Jul '11, 09:57) Justin Willey
Replies hidden

As far as the server is concerned, the communication link and the internal representation of a database connection are distinct (though clearly related) entities. The internal representation of a database connection cannot go away until the rollback completes. I don't know for sure what information is returned by the connection info procedures when the communication link has terminated but the internal representation of a connection is still being torn down but I wouldn't be surprised if it was 'active'.

One thing that could be done while in the state described by Volker is to try to execute a checkpoint on another connection. Checkpoints cannot be performed while a rollback is in progress. If you can execute a checkpoint, my guess is wrong and perhaps there is a bug. If you cannot execute a checkpoint, you can't really tell for sure if there's a rollback in progress since other things can prevent checkpoints too but it's a good indicator.

(06 Jul '11, 10:02) John Smirnios

Without looking to be certain, I would imagine that the last active request indicates the last operation performed over the communications link. The rollback on a disconnect is generated internally as part of the tear-down process.

(06 Jul '11, 10:05) John Smirnios

Thank for that tip - it hadn't occurred to me. Checkpoints are happening regularly (about every 10 mins) as normal for the server, taking a few seconds (the connection has been in this state for a few hours now) - so does that suggest that there could be an issue? It's the only odd thing we've been able to establish so far about this server that is crashing regularly for no apparent reason. I'm just wondering if we could be running into some sort of resource leak.

(06 Jul '11, 10:27) Justin Willey

If you are doing checkpoints, the connection in question is not performing a rollback. There must be something else going on.

(06 Jul '11, 14:01) John Smirnios
showing 2 of 6 show all flat view
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×124
×113

question asked: 06 Jul '11, 07:12

question was seen: 2,384 times

last updated: 20 Apr '12, 13:56