Intra-query parallelism

I'm trying to optimize some reports on SQL Anywhere 12.0.1.3324. My test system is relatively strong for SQL Anywhere - 24 CPU cores with 24GB RAM. One of the main problem is that SQL Anywhere doesn't utilize the hardware resources or does it in an inconsistent way. I have found that the same query may use parallelism (Parallel Table Scan, for example) on one execution and doesn't do it on the next one. Note that I'm the only user in the system, there are no concurrent queries and the data is not changed at all. So my questions are:

Is there any way to force or hint the optimizer to choose parallel execution plan to utilize the available hardware and decrease response time?
Is it possible to make HASH GROUP BY parallel?
How can I know why the optimizer chooses or not chooses parallel execution? What factors affect the optimizer's choice?

I have more questions about SQLA intra-query parallelism, but I don't want to ask too many questions in a single post. I can upload query plans if required.

Thanks in advance Leonid Gvirtz http://www.gvirtz-consulting.com

optimizer parallel tuning

asked 25 Jan '12, 11:50

Leonid Gvirtz
296●4●8●15
accept rate: 0%

HashGroupBy can be done in parallel. Although, the optimizer will likely weigh benefits to determine if the overhead of coordinating a parallel effort is greater than the lesser execution time of a serial effort.

A brief introduction to parallelism is available at: http://dcx.sybase.com/index.html#1201/en/dbusage/parallelism.html

O/W, you might be able to post a graphical plan and see if someone can explain some of the optimizer's decisions.

(25 Jan '12, 14:23) Tyson Lewis

Some further hints:

What's the current multiprogramming level (MPL)?
The influence of the max_query_tasks option, cf. this FAQ.

(26 Jan '12, 03:28) Volker Barth

Thanks to Volker, now I'm able to upload the query plans.link text

join_6.saplan - refers to the first pair of tables. The execution is serial both before and after the restart of SQLA.

join_7_after_restart.saplan - the (partly) parallel execution plan for the second pair of tables before the restart of SQLA.

join_7_after_restart.saplan - the same query after the restart - the execution became fully serial and much slower.

(30 Jan '12, 05:40) Leonid Gvirtz

Replies hidden

FWIW, the "thanks" should go to the forum community - your newly gained reputation points come from several users, of which I might be one ... I just wanted to inform you about that now overcome limitation:)

(30 Jan '12, 06:53) Volker Barth

I'm new to this kind of forums, still learning :)

(30 Jan '12, 07:24) Leonid Gvirtz

flat view

5 Answers:

active answers oldest answers newest answers popular answers

Tried to upload the query plans - but received a message that upload is available only for users with at least 100 reputation points. I see that I have only 51 :(.

permanent link

answered 28 Jan '12, 02:51

Leonid Gvirtz
296●4●8●15
accept rate: 0%

Comment Text Removed

141 points - now you might try again:)

(29 Jan '12, 11:33) Volker Barth

flat view

Regarding 3.: The optimizer considers the current workload of the database server and will not use parallel execution if the workload is high. So the usage of a parallel strategy for a query is non deterministic.

permanent link

answered 26 Jan '12, 04:07

Martin
9.0k●130●169●257
accept rate: 14%

edited 26 Jan '12, 06:54

Volker Barth
40.2k●361●550●822

I agree with you that intra-query parallelism is non-deterministic, as it should be. The question is how much it is non-deterministic and at which point non-deterministic becomes unstable. On my test systems, I'm the only user and the only connection to the database, the test servers (a strong system mentioned above and my laptop) are dedicated for SQL Anywhere, so load from other processes is quite stable (zero). Also, I perform all my tests for a number of times, so the effect of physical reads etc. should not play here.

So far, I have found that I can never know for sure whether parallel strategy will be used by the optimizer or not, even though all relevant factors, at least those known to me, remain constant. One of the last problematic scenarios:

I have two pairs of almost identical tables. The structure (columns, indexes, etc.) is exactly the same and the data is only slightly different. Actually, the data represents measurements of some kind from different days. The same query worked in parallel for one pair of tables and serially for the second one. This behavior was quite consistent. I didn't succeed to understand from the query plans why there is such a difference in processing of so similar tables. The difference in response time is quite significant.

Then, SQLA server was restarted, with no relation to my tests. Suddenly, I discovered that the query that used to run in parallel now runs serially - and quite consistently! So, I'm looking for a way to understand why. I'm going to try to attach query plans shortly.

Of course, I understand that SQLA is probably not the best product for a data warehouse and Sybase IQ will most probably do this job much better. But, that's what my customer has and they want try produce the best from SQLA before considering other technologies. They have very positive experience with SQLA until now, by the way.

Thanks for your help

(28 Jan '12, 02:45) Leonid Gvirtz

The optimizer's cost model takes into account a variety of factors that include:

the number of concurrently executing requests
the number of available CPU cores at optimization time
the number of worker threads available
the estimated I/O and CPU cost

to determine the best cost-based strategy for any parallelizable query.

For the three query plans you included, all of the plans differ in the amount of table and/or index data that is in cache. With the plan "join_7_before_restart" all index and table pages are all nearly 100% resident in the buffer pool at optimization time; the availability of idle CPU cores, and available worker threads (from the current pool of 50) leads to the choosing of a parallel plan.

The two other plans have significant portions of the data (either table or index) not resident in the buffer pool. This causes the optimizer to try to determine the most cost-efficient strategy involving the reads of so much data; in the case of join_6, it's an index scan of the SU table, whereas in join_7_after_restart it's a sequential scan.

You may see quite different behaviour if you calibrate your disk subsystem for parallel reads, which should improve the optimizer's estimate of parallel read performance for those queries that involve large tables that are not fully resident in the buffer pool.

(31 Jan '12, 16:20) Glenn Paulley

flat view

I'm by no means a plan viewer expert, so treat this as just a few wild guesses (until the real experts like Glenn will tell the truth):

Is there a particular reason to use a static cache size (-ca 0)?
The "after restart plan" seems to run against an empty cache, whereas the "before restart" does not. Therefore it seems reasonable that the after restart query has to do much more disk reads - particularly in case your index (whose definition I do not know) may be able to fulfill much of the WHERE clauses predicates. It's an index-only retrieval, I would think.
What does happen when you "preload" the index pages into the cache (by calculating the index average value or the like)?
As the WHERE clause seems to contain JOIN conditions - does the behaviour differ when you code the query with INNER JOIN conditions?
Some testing with query hints (FORCE INDEX ...) may also be worthwhile.

Sorry, if this does sound more like a lot of questions than like an answer...

EDIT:

For the 2nd point, there's a strong hint in the 12.0.1 docs - look at the boldly emphasized part:

While there can be advantages to parallel query execution in such an environment, the optimizer I/O cost model for a single device makes it difficult for the optimizer to choose a parallel table or index scan unless the table data is fully resident in the cache. However, if you calibrate disk subsystem using the ALTER DATABASE CALIBRATE PARALLEL READ statement, the optimizer can cost the benefits of parallel execution with greater accuracy. The optimizer is likely to choose execution plans with parallelism when the disk subsystem has multiple platters.

permanent link

answered 30 Jan '12, 07:23

Volker Barth
40.2k●361●550●822
accept rate: 34%

edited 30 Jan '12, 08:22

flat view

Just another observation: All query estimates expect much more rows in the result set (3.7e+06) than there are actually (simply 1). This might prevent intra-query parallelism if the following doc quote holds here:

A query is more likely to use parallelism if the query processes a lot more rows than are returned.

The fact that your query does use a particular group by TYPE_NAME defined via a CASE expression might make it more difficult for the optimizer to pre-calculate the result sets's cardinality - apparently, group by a case expression with two branches can only return two rows at maximum...

As TYPE_NAME is simply dependent on one column of one table, how would the query perform if you turn the definition of TYPE_NAME in a derived table (based on S) and join that derived table with SU?

SELECT
    CASE WHEN S.SGSN IN ( -371269922, -555819298 ) THEN 'Home' ELSE 'Roaming' END TYPE_NAME,
       SUM(SU.OCTETS_IN), 
       SUM(SU.OCTETS_OUT), 
       SUM(SU.OCTETS_IN + SU.OCTETS_OUT) 
FROM   
    SDR_USAGE_STAT_HRS_1_7 SU,
    SDR_STAT_HRS_1_7 S
WHERE  SU.START_TIME BETWEEN '2012-01-06 15:00:00.000' and '2012-01-07 03:00:00.000' 
AND  S.SUBSCRIBER_ID = SU.SUBSCRIBER_ID 
AND S.SESSION_KEY = SU.SESSION_KEY 
AND S.SUBSESSION_KEY = SU.SUBSESSION_KEY 
AND SU.DEVICE_ID = 4 
GROUP BY TYPE_NAME

permanent link

answered 30 Jan '12, 07:48

Volker Barth
40.2k●361●550●822
accept rate: 34%

edited 30 Jan '12, 07:48

flat view

I tried a number of things proposed by Volker and Glenn and had some progress, but unfortunately the situation didn't improve too much.

First, I tried the calibration by ALTER DATABASE CALIBRATE PARALLEL READ. It didn't make any measurable difference to the query plan, so I decided to revert it back by ALTER DATABASE RESTORE DEFAULT CALIBRATION.

Then, I tried to figure out how to make the optimizer estimate less disk reads for the plan. After some experiments, I came to solution that can be seen in the attached join_6_parallel.saplan, I just moved the CASE construction to an outer query and received much more parallel execution plan, even HASH GROUP BY is parallel now. But, this is not the end of the story.

Then, I tried to increase the range condition in the query and take the whole day instead of just 3 hours. The query plan looked pretty much the same (parallel), but when I run the query I discovered that query runs in parallel only first 10 seconds or so and then becomes pretty much serial. Reviewing of the detailed plan (see join_6_semi_parallel_whole_day.saplan) reveals that only the most-left branch under the Exchange node actually did the job and others didn't bring any rows. I wonder what can be the reason of that. Does it mean that the optimizer changed its mind during the actual processing of the query? Is it normal/expected?

In the meanwhile, I'm considering completely different approach - DIY (Do It Yourself) parallelism. If the query can be easily split into a number of parts that return relatively small result sets each, then it is possible to run the resulted sub-queries in parallel in separate connections and then combine the results. If the original query is wrapped in a stored procedure, as in my case, it is even possible to implement this approach transparently to the calling application by creating a set of events inside the procedure, triggering them (events will work asynchronously) and then combine the results. I tried DIY parallelism for a query like mentioned in join_6_semi_parallel_whole_day.saplan, but spanning 3 days - and the results are quite impressive as compared to intra-query parallelism.

permanent link

answered 01 Feb '12, 06:27

Leonid Gvirtz
296●4●8●15
accept rate: 0%

I have not checked the new plans - however, is the relevant data "cached" during your new tests?

Glenn seems to have confirmed my suspicion that the massive need to disk reads prevents parallel operation - so have you tried to "pre-cache" the relevant data? Or is this unrealistic, as the query would usually access data that is not cached?

(01 Feb '12, 07:05) Volker Barth

The behavior is consistent, I run the queries a number of times. The rate of caching of the data is good, the actual number of disk reads is much fewer than the estimated one and not so high in absolute values. In a real life we cannot expect that the data will be perfectly cached. This is true even for most of OLTP systems, but here I'm dealing with a kind of data warehouse application.

(01 Feb '12, 08:06) Leonid Gvirtz

Replies hidden

Well, then I don't have any further suggestions - but the real experts may have...

(01 Feb '12, 08:19) Volker Barth

flat view

Your answer

toggle preview

community wiki:

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

*italic* or _italic_
**bold** or __bold__
link:[text](http://url.com/ "title")
image?![alt text](/path/img.jpg "title")
numbered list: 1. Foo 2. Bar
to add a line break simply add two spaces to where you would like the new line to be.
basic HTML tags are also supported

learn more about Markdown

Question tags:

optimizer ×28
parallel ×10
tuning ×8

question asked: 25 Jan '12, 11:50

question was seen: 5,328 times

last updated: 01 Feb '12, 08:19

SAP SQL Anywhere

Intra-query parallelism

Follow this question

Related questions