Folks,

We have an intermittent hang starting dbeng12 via the capi on OSX. It is not 100% repeatable but affects our automated integration tests fairly frequently.

It does not happen with the same tests running on windows.

Is there any way to get any diagnostic information out of the dbeng12 process to try and narrow down why it is hanging?

Cheers, Dan

asked 27 Sep '12, 01:28

Dan%20Cleyne's gravatar image

Dan Cleyne
486101627
accept rate: 25%

edited 22 Apr '13, 13:15

Jeff%20Albion's gravatar image

Jeff Albion
10.7k171174

When you start the process, do you include a console output file (dbeng12 -o console.txt)? If so, does the file get created at all, or does the process just fail to start entirely? If the file is created, what is reported before it "hangs"? If it doesn't get created, can you see the process attempting to start in "Activity Monitor"?

If we can't figure out the problem from that output, we can work directly with you to capture a core file of the process and bring it in for analysis to see if this would help explain what's happening inside the server: http://www.sybase.com/contactus/support/

(27 Sep '12, 12:46) Jeff Albion

Thanks Jeff,

Because we are starting the server via the capi we can't specify the -o argument to the executable. Following your suggestion though we've added the LOG=<file> to the connection string... and since then it hasn't hung... So I'm going to monitor it for now until I get another hang and then I'll see what is in the log output from dbeng12...

Cheers, Dan

(27 Sep '12, 21:00) Dan Cleyne

So, we finally seem to have reached the bottom of this issue and I thought I'd document it in case anyone else ran into it.

Basically, the issue is that the dynamic library load call we were using has some undesirable behaviour in the debugger when used with a particular flag. In the samples provided with the dbcapi the load library call is as follows:

handle = dlopen(name, RTLD_LAZY);

This is the recommended way of calling this function however there are some potential issues with debugging as documented here: http://tldp.org/HOWTO/Program-Library-HOWTO/dl-libraries.html

So we changed the code to: handle = dlopen(name, RTLD_NOW);

Which gets us past the problems documented above.

Thanks to all for help/suggestions.

Cheers, Dan

permanent link

answered 03 Dec '12, 20:53

Dan%20Cleyne's gravatar image

Dan Cleyne
486101627
accept rate: 25%

Ok, finally got one to hang, and unfortunately there was no log file created.

The process is stopped on: (marked with **)

libsystem_kernel.dylib`__wait4:
0x7fff8e46214c:  movl   $33554439, %eax
0x7fff8e462151:  movq   %rcx, %r10
0x7fff8e462154:  syscall
**0x7fff8e462156:  jae    0x7fff8e46215d            ; __wait4 + 17**
0x7fff8e462158:  jmpq   3743
0x7fff8e46215d:  ret    
0x7fff8e46215e:  nop    
0x7fff8e46215f:  nop

The assembly code that is running at the point it hangs is:

0x1079b9322:  jne    0x1079b9338               ; sqlany_new_connection_ex + 1416
0x1079b9324:  movq   8(%rdi), %rdi
0x1079b9328:  callq  0x1079becfd               ; symbol stub for: db_init
0x1079b932d:  testl  %eax, %eax
0x1079b932f:  je     0x1079b93a0               ; sqlany_new_connection_ex + 1520
0x1079b9331:  movl   $1, 28(%rbx)
0x1079b9338:  movq   (%rbx), %rax
0x1079b933b:  movq   144(%rax), %rdx
0x1079b9342:  movq   8(%rbx), %rdi
0x1079b9346:  xorl   %esi, %esi
0x1079b9348:  callq  0x1079bed03               ; symbol stub for: db_set_property
0x1079b934d:  movq   8(%rbx), %rdi
0x1079b9351:  movq   %r12, %rsi
0x1079b9354:  callq  0x1079bed09               ; symbol stub for: db_string_connect
**0x1079b9359:  movq   8(%rbx), %rdi**
0x1079b935d:  movl   12(%rdi), %eax
0x1079b9360:  movl   %eax, 32(%rbx)
0x1079b9363:  leaq   36(%rbx), %rsi
0x1079b9367:  movl   $256, %edx
0x1079b936c:  callq  0x1079bede7               ; symbol stub for: sqlerror_message

Looking in "Activity Monitor" I can see a dbeng12 process.

permanent link

answered 28 Sep '12, 03:06

Dan%20Cleyne's gravatar image

Dan Cleyne
486101627
accept rate: 25%

I should add that we're using SQL Anywhere 12.01 for OSX

(28 Sep '12, 03:30) Dan Cleyne

It seems like you were having some troubles passing in the "-o" parameter initially to the engine, but may have found a way around that. Is anything reported in the "-o" output from the database server when this happens?

(28 Sep '12, 13:44) Jeff Albion

No, the log file is never created.

(30 Sep '12, 18:35) Dan Cleyne

We managed to get this stack trace out of gdb:

0 0x00007fff8ba3f154 in __wait4 ()

1 0x000000010a531fdb in Java_com_sybase_asa_logon_ASAConnect_findServers ()

2 0x000000010a52cbb3 in Java_com_sybase_asa_logon_ASAConnect_findServers ()

3 0x000000010a51585e in Java_com_sybase_asa_logon_ASAConnect_findServers ()

4 0x000000010a516162 in Java_com_sybase_asa_logon_ASAConnect_findServers ()

5 0x000000010a5165fa in Java_com_sybase_asa_logon_ASAConnect_findServers ()

6 0x000000010a504532 in sqlerror_message ()

7 0x000000010a50487b in sqlerror_message ()

8 0x000000010a33b359 in sqlany_new_connection_ex ()

(01 Oct '12, 03:58) Dan Cleyne
Replies hidden

Can you show us the connection code you're using and how you're attempting to start the database server? Does it start okay outside of the capi interface?

And I assume this is from a client stack trace in gdb? This stack trace is suggesting the client is trying to find a server, but can't.

(01 Oct '12, 12:01) Jeff Albion

I'm replying here to get some better text formatting... I can't figure out how to do the formatting in the comments

I should clarify a few things here. Firstly, this hang only manifests itself in integration testing, either in a debugger on a local machine, or on the build server. It is particularly frequent in the debugger using xcode 4.4. Outside of those environments it all works fine.

The code we use to start and connect to the server is as follows:

{
    _Logger->LogDebug( "Opening database with connection string: " + GetConnectionString() );

    // create a new sqlany connection object, and connect to our configured database.
    _Connection = _Api.sqlany_new_connection();

    if ( !_Api.sqlany_connect(_Connection, GetConnectionString().c_str() ) )
    {
        char err_msg [512];
        _Api.sqlany_error(_Connection, err_msg, sizeof(err_msg) );
        std::stringstream err;
        err << "sqlanywhere err: " << err_msg << std::endl;
        err << "when attempting to open: " << GetConnectionString();

        _Logger->LogError( err.str() );

        /* failed to connect */
        CatchError("Failed to connect");

        /* SQL Anywhere's API requires us to go through the full
         * disconnection process even if a connection attempt failed.
         */
        Close();
        return false;
    }
    return true;
}

An example of a connection string that we're using to start the server and connect to a database is as follows:

ENG=vartmptmp2792eUqhK;uid=DBA;pwd=sql;dbf=/var/tmp/tmp.279.2eUqhK;START=dbeng12 -ga -qi -n vartmptmp2792eUqhK -o /var/tmp/tmp.279.2eUqhK.log.txt

That connection string is for an integration test, which is why the database and server have a semi-random name to stop file and db server conflicts on the build server.

permanent link

answered 01 Oct '12, 19:59

Dan%20Cleyne's gravatar image

Dan Cleyne
486101627
accept rate: 25%

I'd suggest to add the LOG=filename connection parameter to get some diagnostic output on the apparantly failing attempts to connect to the engine. You could also add -z to the engine command line.

As Jeff has asked: Does the engine named vartmptmp2792eUqhK really start here? (Well, if not, -z won't help any further...)


FWIW, specifying both ENG and the -n in the START connection parameter is somewhat error-prone: If both are not identical, you may start a different engine than you're trying to connect to... cf. Graeme's explanation here. - Yes, I'm aware that you have set both to the same value - I'm just trying to give hint.

(02 Oct '12, 03:29) Volker Barth

We've tried the LOG= connection parameter and it doesn't appear to produce a log file either. The way the code is written guarantees that the ENG and -n parameter are the same. However, I wonder if this could be some of the problem. If we have a race condition somewhere in thread startup we might indeed end up with a server that we can't connect to despite the names being the same.

From memory we tried the -n without the ENG= parameter and it wouldn't connect. According to that link you posted, it seems we tried it the wrong way round. if we specified ENG= and left the -n off would we connect to the named server?

I also read somewhere that ENG= was deprecated. Is SERVER= the correct replacement?

Also of note is that if the application is compiled with gcc we don't seem to get this issue (well, it hasn't manifested yet). It seems to be under clang that we get it consistently. Sadly we can't switch to gcc because then our Objective-c code doesn't compile.

I'll give the LOG= and the -z another whirl.

Cheers, Dan

(02 Oct '12, 03:51) Dan Cleyne

...and with this connection string:

Server=vartmptmp2rYWlPv;uid=DBA;pwd=sql;dbf=/var/tmp/tmp.2.rYWlPv;LOG=/var/tmp/tmp.2.rYWlPv.log.txt;START=dbeng12 -ga -qi -z

I get this log file:

Tue Oct 02 2012 18:29:55 18:29:55 Attempting to connect using: UID=DBA;PWD=**;DBF=/var/tmp/tmp.2.rYWlPv;ServerName=vartmptmp2rYWlPv;START='dbeng12 -ga -qi -z';LOG=/var/tmp/tmp.2.rYWlPv.log.txt 18:29:55 Attempting to connect to a running server... 18:29:55 Attempting SharedMemory connection (no sasrv.ini cached address) 18:29:55 Failed to connect over SharedMemory 18:29:55 No server found, attempting to run START line...

...and the process has hung. There is a dbeng12 in the task list with the correct server name.

(02 Oct '12, 04:40) Dan Cleyne
Replies hidden

Is this expected to work with shared memory?

Does it work if you connect via TCP/IP? (Add "LINKS=TCPIP" to the connection string and the "-x TCPIP" to the START command)?

(02 Oct '12, 04:48) Volker Barth

We want it to work with shared memory. Most of the time, this does work exactly as we expect it to. This 'hang' only happens during integration testing and in the xcode debugger (quite frequently in 4.4, not so much in 4.3). We really don't want to use TCP/IP because we're deploying SA12 as an embedded DB.

In the case of the log line above, I would expect it NOT to find the server because we've deliberately given it a unique name for this session only. So, that part I'm not worried about. The worry is that when it gets to the "attempting to run START line"... it never comes back...

(02 Oct '12, 05:04) Dan Cleyne

My read of the above information suggests that it seems that the database server process is starting up ("dbeng12 in the task list"), but the "-o" log is never created, meaning there is something happening on server start-up that we haven't been able to capture (particularly if the server normally starts okay).

Running a "dtruss -f" of the process using the SQL Anywhere C API, or capturing a core file of the engine process that started would be the next step. If you haven't already, I'd highly recommend opening a technical support case so that we can help you go over this information directly.

(06 Nov '12, 13:29) Jeff Albion
showing 5 of 6 show all flat view
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×412
×24
×17
×6

question asked: 27 Sep '12, 01:28

question was seen: 1,832 times

last updated: 22 Apr '13, 13:15