We are trying to upgrade our commercial application from SA9 to SA12. But there is serious performance problem. We are using query like: SELECT name, getPrice(productId, _priceType_) FROM products ORDER BY name priceType is provided by the application. The table is indexed by name. There is over 100k products. Function is evaluated in ~1ms, uses few other tables. In SA9 query was performed in <1s, the function was called only for fetched rows. User can see first screen of product list in 1s. In SA12 the function is called for each row in the table, and next during fetch again. So the query is executed in 2 minutes. User has to wait 2 minutes to view product list. This is independent of Optimization_goal option. Even for value-sensitive cursor type, where evaluating function for each row before fetch makes no sense, server performs evaluation. Is there any way to force SA12 to behave like SA9? I know we can found workaround for this case, but there is over 100 of such queries in the application. It is significant cost to change everything. |
In version 10.0, expression evaluation in SQLA changed to be generally more efficient. A consequence is that in some cases expressions are evaluated before they are needed -- this reduces the overhead needed to track which expressions are currently valid. For the case of absolute and relative fetches, the changes that generally improve performance cause a degradation as they lead to evaluations of potentially expensive UDF or subquery expressions that are not retrieved by the client. In the case that the Row_counts option is on, an absolute fetch is used to count the rows in the cursor unless the query optimizer can determine a good estimate of the number of rows. For example, "SELECT * FROM T" can usually estimate the number of rows returned based on the number of rows in T. Various features of the query (predicates in the WHERE clause, use of computed columns or joins) may affect whether the estimate is considered sufficiently accurate to avoid this fetch. In order to improve the performance in these cases, a server change has been made to avoid evaluating select expressions for rows skipped by a relative or absolute fetch (including those for the purpose of the Row_counts option). The change is expected to be available in a future SP with build number 4089 or higher (version 12) or 1858 or higher (version 16). The associated release notes may reference engineering case # 760144. Please see also this same answer posted to a related question. |
Glenn Paulley on UDFs (until recently he was head of the SQL Anywhere query engine development team): http://glennpaulley.ca/conestoga/2014/02/set-level-operations-do-matter/ |
What is your optimization goal setting set to? i.e. what is output of select connection_property( 'optimization_goal' ); If it is set to ALL-ROWS then the query engine may choose to materialize the entire result set before returning the first row back to the client. Try setting the optimization_goal to FIRST-ROW. The OP did say "This is independent of Optimization_goal option."
(10 Feb '14, 13:30)
Breck Carter
Replies hidden
Ok, missed that sentence. Ignore my answer.... it was a long shot guess ;-)
(10 Feb '14, 13:40)
Mark Culp
2
Are there any cursor gurus lurking about? ...this question cries out for an Alpha Plus to answer it :)
(11 Feb '14, 13:40)
Breck Carter
That's an attempt to "prefetch" those query processing gurus, isn't it? Can't tell whether it has been successful by now:(
(12 Feb '14, 11:50)
Volker Barth
|
Your overall symptoms (but not necessarily all the details) aren't too dis-similar form some of the difficulties we have been having in migrating from v10.0.1 to v16.0.0. Certainly they have revolved around unnecessary repeated execution of UDFs that doesn't happen in v10.0.1. We have been seeing steady incremental improvements with these issues over the last year and so I'd strongly recommend trying your application on a recent version of 16.0.0. I'm afraid I can't speak for v12 as while we encountered the same problems in v12.0.1 some time ago, we haven't been testing against any updates since v16 came out. |
I've found the reason of this strange behavior. Well, and that has not happened with SA9?
(27 Feb '14, 17:19)
Volker Barth
Comment Text Removed
Comment Text Removed
Comment Text Removed
Oops, I just accidentally deleted ODL_Sybase's comment when trying to delete my own! Sorry! ODL_Sybase's comment talked about how the symptom has returned, and it required ROW_COUNTS to be on to see the bad behavior.
(04 Mar '14, 16:36)
Breck Carter
ROW_COUNTS has a place of honor in the 2008 article How To Make SQL Anywhere Slow, and it has always been a bad idea, in SQL Anywhere 16, in ASA 9 (see below), even in SQL Anywhere 5 (see below). ===== ASA 9 ===== ASA Database Administration Guide Database Options Alphabetical list of options ROW_COUNTS option [database] Function Specifies whether the database will always count the number of rows in a query when it is opened. Allowed values ON, OFF Scope Can be set for an individual connection or for the PUBLIC group. Takes effect immediately. Default OFF Description If this option is set to OFF, the row count is usually only an estimate. If this option is set to ON, the row count is always accurate. Warning When ROW_COUNTS is set to ON, it may take significantly longer to execute queries. In fact, it will usually cause Adaptive Server Anywhere to execute the query twice, doubling the execution time. ===== SQL Anywhere 5 ===== ROW_COUNTS Specifies whether the database will always count the number of rows in a query when it is opened. If this option is off, the row count, the row count is usually only an estimate. If this option is on, the row count is always accurate but opening queries may take significantly longer.
(04 Mar '14, 16:37)
Breck Carter
The question "How do I [satisfy some requirement] without turning ROW_COUNTS on?" has been asked in the past... and it could be asked again if you don't want to search.
(04 Mar '14, 16:40)
Breck Carter
Replies hidden
Documentation say about "row_counts":
(04 Mar '14, 16:42)
ODL_Sybase
Yes, we probably can manually execute "select count(*)" or "select count(distinct ...)" when GROUP BY used.
(04 Mar '14, 16:49)
ODL_Sybase
By the way: cannot find this question. Can you post a link?
(04 Mar '14, 16:55)
ODL_Sybase
That's still a mystery to me, too - as you state, the bad performance has to do with computed columns, and computed columns should not be re-calculated just because they are queried, cf. this doc topic because that would make them simply superfluous...
(05 Mar '14, 03:23)
Volker Barth
Please post this as a new question; this thread is becoming unmanageable :)
(05 Mar '14, 09:03)
Breck Carter
Continued in http://sqlanywhere-forum.sap.com/questions/20587/sa12-performance-when-row_counts-option-turned-on
(05 Mar '14, 14:36)
ODL_Sybase
More comments hidden
|
Are you sure all 100 queries experience the same degradation of performance, or are you guessing at that?
User-defined functions have always been dangerous to performance, requiring care in usage, and this has increased over time. As a result function performance is sometimes a big problem when moving from version to version. The best approach is to look at the graphical plan with statistics to see what is going on. It may be possible to make changes (e.g., materialized views) that don't involve rewriting every query.
No, I wasn't checking all queries. But all tested work as described. The plan is very simple. If "order by" is by indexed column this is simple "scan table using index". Query inside the function is listed as subquery. Still I wonder: why evaluation of UDF is performed for each table row for value-sensitive cursor? The results cannot be used during fetch, so it is wasting time. Even if I turn off prefetching and reduce size of fetch buffers, still evaluation is performed. First version of our application was using ASA 5, and UDF was always evaluated during fetch, unless used for sorting or grouping.
FWIW: "Rows cannot be prefetched for value-sensitive cursors. This requirement may affect performance." http://dcx.sybase.com/index.html#1201/en/dbprogramming/keyset-cursor-sqlapp.html
What do you mean by "The results cannot be used during fetch"?
The rules for value-sensitive cursors may have (probably have?) changed over time, and they are VERY complex IMO: http://dcx.sybase.com/index.html#1201/en/dbprogramming/keyset-cursor-sqlapp.html
In particular, it seems that any changes made since the cursor was opened must be detected: "If a row was updated or may have been updated since the cursor was opened, SQL Anywhere returns a SQLE_ROW_UPDATED_WARNING when the row is fetched." ...that implies (to me, at least) the entire result set must be evaluated before the first fetch.
Perhaps someone who knows the internal workings can chime in :)
How simple is the function? Does it contain a single expression setting the return value?
Yes, you're right. Old values for value-sensitive cursor can be useful.
I was testing other cursor types too, but I will check again descriptions, maybe I can find any clues.
After check: sensitive cursor should be the best. But no success.
I was trying with DYNAMIC SCROLL cursor with DMRF=YES.
6 x if, 3 x select. 3 procedure calls (very simple procedure). Some assignments and single exit point (return local variable).
I guess Elmi asked whether the UDF may be inlined - and that seems not possible because of the fact that is does comprise much more than a single SELECT INTO or RETURN statement...
FWIW, here's a link to a detailed whitepaper by Ani Nica called Query Processing Based on SQL Anywhere 12.0.1 Architecture - it has some information on UDFs, too.
Yes, that is what I meant, Volker, and you are right that a UDF with an IF statement would not be inlined. I'll need to do some more testing.
Another wild guess: I would assume that this UDF is a deterministic one. Does it make a difference when you change it to "NOT DETERMINISTIC" (or vice versa)? - It may have an impact whether the function's return values will be cached internally or not (though in my understanding, this caching would take place "on the fly" when processing each separate function call and not by an intial run to materialize the whole request...)
FWIW: Sensitive (aka DYNAMIC SCROLL) cursors should not prefetch with the default setting of the PREFETCH option ("conditional"), so specifying DMRF=YES should not be necessary by default.
When using sensitive cursors, are you sure the database server could apply them as desired (i.e. there was no warning that a different cursor type than requested was used, I guess with SQLCODE 121 SQLE_CURSOR_OPTIONS_CHANGED)?
I'm asking as I agree on your understanding that a sensitive cursor should not allow the evaluation of all rows before any row is returned... - still puzzled...
I was checking both. Behavior is same
The function (as a matter of fact called procedure) can have side effects, there is CREATE VARIABLE statement. Actually not in the execution path.
Can it cause server to mark function as not deterministic, against explicit declaration?
There is no warning.
I would assume that the warning will only be displayed for already (pre-)fetched rows (so the cursor may internally buffer the previous row contents), thus not requiring a pre-evaluation of the whole result set. At least that's what the cited warning SQLE_ROW_UPDATED_WARNING (SQLCODE 104) does tell:
Do you mean the function is called twice for each row in the result set (once initially in the discussed undesired "full evaluation" just before the first fetch, and a second time during the real fetch of the according row)?
Can you add a MESSAGE statement within the function (say, noting the according productId) to document the funcion call sequence?
Yes, there are 2 passes, each in order defined in ORDER BY.
That means by index, but not by primary key
What API are you using? Could it be the query is executed twice, and the first run is just needed to describe the result set?
(Note: That's wild wild wild guessing... - just as DBISQL seems to perform a second query request to describe the needed result set... - I do not claim at all that this would apply here in any way)
For testing purposes I have prepared SQL procedure. In the procedure there is FOR statement. This is the easiest way to control cursor type and fetching.
In the production environment embedded SQL is used.
There is no difference.
Simple call in DBISQL works similarly. But I will try to investigate this a bit. Maybe this is really Describe? But: if so, how to avoid it?
Have you tried testing with dbisqlc instead of DBISQL? (I don't claim it will make a difference here, though AFAIK it tends to have less overhead...)
No difference
Description and preparation is rather fast. Execution itself is time consuming.
BTW: What isolation level are you using (as that has implications for cursors)?
Do you note differences in locking (via sa_locks or the like) between v9 and v12?