I've been having some frustration with this over the last few weeks, I hope someone with better knowledge of what is happening can help shine a light on this.

We have been receiving 40W06 "All threads are blocked" errors for most of our queries that are being passed through an external environment. So far CLR, C_ODBC64, and C_ESQL64 have been tested and found to exhibit this issue. I am running 12.0.1.3942 on both the production and my local system. My local system does not exhibit these issues (although I found a bug with long binary). This error is not received when calling C functions without an external environment (which isn't an option with CLR).

For additional clarification, I launched a dbsrv12 instance from a shortcut on the production desktop and received an "All threads are blocked" error on the first attempt to call the external function:

  • select property('AutoMultiprogrammingLevel') => 1
  • select property('CurrentMultiprogrammingLevel') => 7
  • select property('MaxMultiprogrammingLevel') => 400
  • select property('ThreadDeadlocksAvoided') => 0
  • select property('ThreadDeadlocksReported') => 0
  • call sa_report_deadlocks() => no rows

Additional information collected from examining the server itself:

  • Process Monitor shows no attempt by either dbextclr12 or dbexternc12 to read the requested .DLL file
  • After an error message the server will restart dbexternc12, while dbextclr12 is not restarted.
  • ProcDump does not indicate any exceptions (1st chance or otherwise) generated from within dbexternc12
  • DbgView does not show any diagnostic information generated from within dbexternc12

I'm starting to run out of ideas on trying to figure out what's going on. I half expect the answer to be "upgrade to 16" or "wait for another EBF". If I had some idea of tracking down the underlying cause or getting some indicator of what is happening (other than an error message that is misguided at best) then I might be able to resolve at least a workaround to get this up and running again.

asked 28 May '15, 21:53

Erik%20Anderson's gravatar image

Erik Anderson
42181223
accept rate: 15%

Just to understand the situation:

For additional clarification, I launched a dbsrv12 instance from a shortcut on the production desktop and received an "All threads are blocked" error on the first attempt to call the external function.

So you have started a new server instance and are getting that error immediately when e.g. calling the external C function, right? Are there other "normal" connections or events active at that time? Or are there proxy connections (for remote data access), which will require their own threads, AFAIK? Does sa_conn_info() reveal something relevant here?

The current MPL seems rather low (since 20 is its default value), has it been set to that low value?

You have set the log_deadlocks option to 'On', right?


AFAIK, an external environment will require at least one worker thread within the database engine to communicate with the external environment (and probably one other for maintenance of the external environment), and a further one to handle the requests from that external process. AFAIK, only the connections handling requests are shown with sa_conn_info(), so there might be more workers in use than are shown...

Does the problem disappear when you increase the minimum MPL level or do not use automatic MPL?

(29 May '15, 05:25) Volker Barth
Replies hidden

> This error is not received when calling C functions without an external environment

I assume that means you DO NOT have the problem when using the old-school SQL Anywhere external call interface in which the external function shares the memory of the server, but you DO have the problem with the fancy new SQL Anywhere external environment support.

There may be logic flaw in the C code that escapes notice when the C code shares memory with the server, but results in the symptom when the C environment is separate from the server's memory space.

This logic flaw may involve SQL commands that are sent back to the server from the C code... perhaps multiple commands are using separate threads when they don't have to, and/or multiple threads are not being released when they should be.

If you post the C code, perhaps someone can see the problem.

(29 May '15, 08:00) Breck Carter
Replies hidden

A valuable hint, IMVHO! Given that, there also may be a problem simply with the number of threads used, as the old-school external interface will require less worker threads, AFAIK...

(29 May '15, 08:04) Volker Barth

In this situation the engine I launched myself is running as an unknown engine name. There may be one or two events firing (which show up as connections). The external call makes no database calls. There continues to be a "-gn 100" on the launch line. We have logged deadlocks on this database before and I set the option described in http://scn.sap.com/community/sql-anywhere/blog/2014/05/21/what-exactly-is-thread-deadlock with no effect.

I can try disabling auto-MP but this does feel like (*) There is no cause for the engine to run out of threads, (*) The engine did not react as though it is running out of threads while stating it could react, and (*) The engine does not have any record that it ever ran out of threads. I did get a traceback(*) of the error but it was simply a reference to the external call.

(29 May '15, 12:16) Erik Anderson

You are correct that the "old-school" call interface is working. This was actually uncovered when attempting to test some new DLL modifications, I ended up having to launch a second engine to test them without crashing the production engine.

Your reaction is similar to mine. It would make sense if ProcessExplorer showed the DLL loaded into the memory space of the external environment or ProcessMonitor showed any attempt to load the file. I have not yet tested with a garbage filename (as far as I remember) but unless the main process does a file check I doubt it would behave any differently. I'm tempted to start sticking ProcessMonitor on the main process to look for something, but I'm unsure I'm ready for that firehose by now (and I'm rather badly out of my league if I'm trying to debug a database engine I don't have source for)

The C++ external processes do not create any threads of their own and do not issue SQL commands back to the database. I have no idea how the C# is implemented internally; I'm not issuing sql commands in there but calling a library known to create two IO threads and I am returning resultsets.

I just realized yesterday that this is affecting all our external function access. We have scheduled events failing as far back as our console logs go back. Often times the errors do not happen for 3-4 hrs after the engine is started and there seems to be one period for a couple days last month that no errors were reported. Currently it is happening immediately on engine startup.

(29 May '15, 12:26) Erik Anderson

I can try disabling auto-MP but this does feel like (*) [...]

I certainly agree on your points, I've just made that "wild" suggestion to check whether the strange problem may be anyhow due to the number of available threads, simply as an external environment will require its own worker threads...

(29 May '15, 12:45) Volker Barth

Is it possible for you to debug the external DLL, or will it not be called at all?

Cf. Karim's answer to that older FAQ.

(Note: I'm not asking you to debug the database engine:))

(29 May '15, 12:48) Volker Barth

I've had ProcDump, ProcExp. ProcMon, and DbgView attached to the external environment. It launches, reads some registry values, and then exits -- no attempt to access or open the DLL file, no obvious errors, no exceptions thrown or caught.

This issue was triggered by me trying to debug an update to our SMTP client library. I ended up loading it "old-school" and sticking ProcDump on the database engine. And yes, it did crash a couple times.

The regex library and the ZK client have been untouched for multiple years and neither one of them are functional at this point.

(29 May '15, 12:58) Erik Anderson

Just a further (and obvious) hint:

May you be able to test a fresh EBF?

E.g. a current fix was made for the following (which is not fitting your situation, but who knows?):

 ================(Build #4197  - Engineering Case #775809)================

    In rare, timing dependent cases, the server could have crashed when making 
    a native external function call if a thread deadlock error occurred. This 
    has been fixed.

I don't have any further hints or clues, as you will have noticed.

(29 May '15, 13:27) Volker Barth
showing 2 of 9 show all flat view
Be the first one to answer this question!
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×404
×22
×16
×5
×4

question asked: 28 May '15, 21:53

question was seen: 765 times

last updated: 29 May '15, 13:29