I am not looking to re-code my Java functions as Transact SQL ...

I only brought that code sample as a demonstration of the simplest Java code I've written and embedded in the database.

I have much more complicated code that cannot be rewritten ...

The crux of my question is in the overhead cost of using Java functions in version 12 (with an external JVM) versus version 9 (with an internal JVM).

How can I improve THAT ?

::Siger Matt Edit below - added code from previous question::

Just to keep the code from the other question on the same page:

CREATE FUNCTION "dbo"."GetNumerics"( in inputString varchar(500) ) 
returns varchar(500)
external name 'com.gemtrak.general.StandardFunctions.getNumerics (Ljava/lang/String;)Ljava/lang/String;' language java

package com.gemtrak.general;

public class StandardFunctions
{
    public static String getNumerics(String inputString)
    {
        StringBuffer input = new StringBuffer(inputString);
        StringBuffer output = new StringBuffer();
        int theLength = input.length();
        char c;

        for ( int i=0; i<theLength; ++i )
        {
            c = input.charAt(i);
            if ( c >= '0' && c <= '9' )
            {
                output.append(c);
            }
        }

        return output.toString();
    }
}

There is probably an END statement that is missing, but it was missing in the original, this is just copy/paste.

Called by:

select dbo.getnumerics(isnull(telephone1,'')) t1, * into #x from locinfo where loctype = 'C'

asked 18 Jan '12, 14:56

Frum%20Dude's gravatar image

Frum Dude
136339
accept rate: 0%

edited 20 Jan '12, 12:28

Siger%20Matt's gravatar image

Siger Matt
3.2k526998

For the record (in case anyone was wondering) the Java VM went from internal to external with SQL Anywhere 10.

(18 Jan '12, 16:50) Breck Carter
Replies hidden
1

Correct... and I suspect that is the cause of the performance difference - if the Java code does lots of calls back to the database/server to get/put data then the performance is not going to be as good when the Java VM is running externally since the cost of the call (to the server) is much greater.

(18 Jan '12, 19:11) Mark Culp
2

Actually, my test case (if you look at my earlier example) is simply extracting numeric digits from a string .. and the performance is 10 times slower... and that's on a server which is 30-40% faster than the original SQL 9 box! How can Sybase allow that to happen?

(18 Jan '12, 19:48) Frum Dude
1

10 times slower... A while back, I remember reading about a policy at Sybase which stated that no aspect of sqla performance would decline with a new release--performance would always be improved. Why was the jvm moved outside the database? Why would that slow performance? After all, a call to a java method is a call to a memory location, whether the jvm is considered to be internal or external to the database. true?

(19 Jan '12, 08:58) dejstone

I would venture to say that the main reason to use an EXTERNAL JVM allows flexibility in that as Java evolves, the database server can take immediate advantage of new Java features ... just like the difference between component stereos versus 'single box' solutions... But certainly that kind of performance hit is not acceptable for that flexibility!

(19 Jan '12, 09:49) Frum Dude
Replies hidden

Please show us the exact code that you ran to show the performance change... the Java plus the calling SQL code as well... thanks!

(I know you posted the Java code earlier but that's somewhere else...)

(19 Jan '12, 10:57) Breck Carter
4

EXTERNAL JVM allows flexibility in that as Java evolves

There are other reasons for this move: 32-bit database servers (still popular when 10.0.0 was released...) have a maximum of 4 GB of process space (usually closer to ~3.8GB). On Windows, they actually get 2GB of process space (unless the /3GB switch is specified...) - the 4GB total is split between kernel space (2GB) and user process space (2GB).

Forcing the JVM load in-process reduces the amount of memory further for the database cache (down from ~1.8 GB on 32-bit Windows), which can increase the temporary file usage (thus slowing performance of the database server, overall).


We would be interested in seeing your two performance tests (version 9 and version 12), but to also know more about the environment you're currently testing in: Which operating System is it currently running on? Is this a 32-bit or 64-bit environment (OS, database server, JVM)? How much memory does the system have, overall? Have you tried monitoring resources such as CPU / Memory / Disk via an "OS Performance monitor application", during your test so that you can see if there are obvious hardware bottlenecks that you are hitting in your test? Are you supplying any JVM parameters (such as -Xmx) to adjust the size of the JVM memory when the JVM is being launched?

(19 Jan '12, 12:41) Jeff Albion

...apparently, guesswork is alive and well as evidenced by all the replies made in the absence of any knowledge about how the Java function is called (from a SET statement in a loop, versus a single SELECT list, versus a WHERE clause, versus a subquery, versus...)

Just as a Vegas slot machine eventually rewards the player, this forum will eventually provide your answer!

(20 Jan '12, 09:24) Breck Carter

Given Siger's guesswork does fit (as I would guess, too), we would still have to know how many times the function is called in this sample, i.e. how many rows do fulfill the WHERE clause...

(20 Jan '12, 12:34) Volker Barth
showing 4 of 9 show all flat view

There were a number of very compelling reasons for us to move to a de-coupled Java VM beginning with SQL Anywhere version 10, some of which have been discussed above. To those, I would add that the reasons included the ability to run any version of the SUN (now Oracle) JRE to suit the needs of the application, and that, in the new architecture, any issue with the Java VM itself or with the Java function would not bring down or hang the server (or, minimally, a worker thread). Going to a de-coupled model, yet still supporting JDBC calls from the Java function back into the server, required us to develop some fairly sophisticated infrastructure inside the server kernel to support that; these identical mechanisms are used for CLR and other language environments as well. For Java procedures, the interface between the JVM and the server is JDBC and JDBC is a very heavyweight interface compared to the in-process, custom-made JRE we had built in ASA Version 6.

So the creation and deletion of that inter-process infrastructure is certainly more expensive than when the Java VM was inside the database server, there's absolutely no question about that. Indeed, the example Java program posted above is a perfect example to illustrate the expense of that overhead, as it would if similar logic was embedded in an SQL user-defined function (and even though the SQL UDF is actually executed "in process"); see my post on that subject for the gory details.

As Jeff mentioned, there may be some configuration settings that you can employ to make the performance marginally better. However, I would not expect hugely significant performance gains from doing so.

permanent link

answered 19 Jan '12, 14:43

Glenn%20Paulley's gravatar image

Glenn Paulley
10.7k571104
accept rate: 43%

...as it would if similar logic was embedded in an SQL user-defined function...

Are you saying the overhead of a SQL UDF is - in general - somewhat similar to that of an external Java call? (That would come as a surprise to me.)

Or does it just mean any UDF is "impossible to optimize" and will therefore have bad influence on the query optimizer's choice of optimal plans - particularly when the UDF/function call is part of a SELECT list or WHERE clause and gets called multiple times per query?

(20 Jan '12, 11:47) Volker Barth
Replies hidden

Both. The cost of invoking a SQL user-defined function is orders of magnitude greater than the cost of, for example, reading a row off of page of disk during a sequential scan. That's what my 2008 blog post meant to illustrate. The optimization point is valid, too - it is (virtually) impossible to determine the cost/selectivity of a predicate involving a user-defined function. The server tries to estimate it by memoizing statistics from prior executions, but significant variance in the function's parameters, or the function's execution, have the potential to significantly impact the real cost of the chosen execution plan.

(20 Jan '12, 14:23) Glenn Paulley

Ah, I see - and it seems very understandable that calling a function is much more overhead than reading a row.

However, I still would think that a SQL UDF (and stored procedure) should be significantly "cheaper" to call than any external function/procedure... At least that has been my experience over the years, and therefore I would generally prefer to write SQL functions over external ones if the SQL language features allow for that....

(23 Jan '12, 03:29) Volker Barth

I will have to disagree on the point "I would generally prefer to write SQL functions over external ones if the SQL language features allow for that...." ... The power and elegance of Java over Transact-SQL is such that I certainly prefer to write in Java ... and it's such a shame that we have been essentially locked out of writing a sizable amount of Java due to this major performance degradation ... I think that Sybase should take a hard look at how to get the performance to a 20-30% degradation and NOT a 90% degradation as I've experienced ...

(26 Jan '12, 08:47) Frum Dude
Replies hidden

Well, I was talking about that programmer-friendly Watcom-SQL, not the Transact-SQL dialect:) - and about code that can be handled with SQL in a reasonable and maintainable way (so, perhaps not too much string manipulation and the like...)

For the Java performance, I certainly see your point but that's not my playground...

(26 Jan '12, 09:42) Volker Barth
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×415
×249
×65

question asked: 18 Jan '12, 14:56

question was seen: 1,934 times

last updated: 26 Jan '12, 09:44