The 16.0.0.2127 query in question takes over 2 seconds to return the expected result of zero rows, but with an additional redundant BETWEEN predicate the response time is zero seconds to return zero rows.

Adding the BETWEEN predicate was equivalent to waving a dead chicken over the keyboard. There was no apparent reason to expect it to help, but since the BETWEEN predicate was an assertion of a true condition there was no reason to expect it to hurt.

There was no CLUSTERED index that might indicate BETWEEN would help.

The fast plan shows that sequential table scans were used for both tables, unlike the slow plan that shows indexes were used for everything.

In fact, the fast plan shows that no indexes were even considered, whereas the slow plan shows that SQL Anywhere worked very hard to pick the right indexes.

So... there's some magic afoot; here are the plans:

Fast plan

Slow Plan

In the absence of an actual explanation, I'm going to go with Dead Chicken: When nothing else seems to help, try adding a correct but redundant BETWEEN predicate! :)

-------------------------------------------------------------------------
CREATE OR REPLACE VARIABLE @menu_down_count                             UNSIGNED BIGINT  = 20;
CREATE OR REPLACE VARIABLE @connection_id_string                        VARCHAR ( 50 )   = '1-33137-20160118152015-061';
CREATE OR REPLACE VARIABLE @locator_number                              UNSIGNED BIGINT  = 227243; 
CREATE OR REPLACE VARIABLE @oldest_connection_history_sample_set_number UNSIGNED BIGINT  = 227243; 
CALL sa_flush_cache();

-------------------------------------------------------------------------
-- 1. Fast query (0 sec) contains a redundant BETWEEN CLAUSE.
SELECT TOP @menu_down_count
                     rroad_group_2_property_pivot.sample_set_number      AS locator_number,
                     rroad_group_1_property_pivot.sample_recorded_at     AS locator_timestamp,
                     CAST ( 1 AS INTEGER )                               AS artificial_row_type
                FROM rroad_group_2_property_pivot
                        INNER JOIN rroad_group_1_property_pivot
                                ON rroad_group_1_property_pivot.sample_set_number = rroad_group_2_property_pivot.sample_set_number
               WHERE rroad_group_2_property_pivot.connection_id_string = @connection_id_string
                 AND rroad_group_2_property_pivot.sample_set_number    < @locator_number
                 AND rroad_group_2_property_pivot.sample_set_number    
                        BETWEEN @oldest_connection_history_sample_set_number -- Performance: Optimization: This redundant predicate seems to help the "no rows" result case.
                            AND @locator_number
               ORDER BY rroad_group_1_property_pivot.sample_set_number DESC;

See plan: plan1_fast_with_redundant_BETWEEN.saplan
-------------------------------------------------------------------------
-- 2. Slow query (2.7 sec) contains a redundant BETWEEN CLAUSE.
SELECT TOP @menu_down_count
                     rroad_group_2_property_pivot.sample_set_number      AS locator_number,
                     rroad_group_1_property_pivot.sample_recorded_at     AS locator_timestamp,
                     CAST ( 1 AS INTEGER )                               AS artificial_row_type
                FROM rroad_group_2_property_pivot
                        INNER JOIN rroad_group_1_property_pivot
                                ON rroad_group_1_property_pivot.sample_set_number = rroad_group_2_property_pivot.sample_set_number
               WHERE rroad_group_2_property_pivot.connection_id_string = @connection_id_string
                 AND rroad_group_2_property_pivot.sample_set_number    < @locator_number
               ORDER BY rroad_group_1_property_pivot.sample_set_number DESC;

See plan: plan2_slow_without_redundant_BETWEEN.saplan
-------------------------------------------------------------------------
CREATE TABLE DBA.rroad_group_1_property_pivot ( --  256,931 rows
   sampling_id                                /*         X */ UNSIGNED INT NOT NULL,
   sample_set_number                          /* PK      X */ UNSIGNED BIGINT NOT NULL,
   sample_lost                                /*         X */ VARCHAR ( 1 ) NOT NULL,
   ... other columns omitted
   sample_recorded_at                                         TIMESTAMP NOT NULL,
   ... other columns omitted
   CONSTRAINT ASA277 PRIMARY KEY (
      sample_set_number )
 );
-- Parents of DBA.rroad_group_1_property_pivot -- none --
-- Children -- none --
CREATE INDEX ix_id_lost ON DBA.rroad_group_1_property_pivot (
   sampling_id,
   sample_lost );
CREATE INDEX ix_number_DESC_id_lost ON DBA.rroad_group_1_property_pivot (
   sample_set_number DESC,
   sampling_id,
   sample_lost );
-------------------------------------------------------------------------
CREATE TABLE DBA.rroad_group_2_property_pivot ( -- 1,967,439 rows
   sampling_id                                     UNSIGNED INT NOT NULL,
   sample_set_number               /* PK      X */ UNSIGNED BIGINT NOT NULL,
   connection_number               /* PK        */ BIGINT NOT NULL,
   connection_id_string            /*         X */ VARCHAR ( 50 ) NOT NULL 
      COMPUTE ( "STRING" (
         "sampling_id", '-', "connection_number", '-', "DATEFORMAT" ( "LoginTime",'YYYYMMDDHHNNSS-SSS' ) ) ),
   ... other columns omitted
   LoginTime                                       LONG VARCHAR NOT NULL DEFAULT '1900-01-01',
   ... other columns omitted
   CONSTRAINT ASA278 PRIMARY KEY (
      sample_set_number,
      connection_number )
 );
-- Parents of DBA.rroad_group_2_property_pivot -- none --
-- Children -- none --
CREATE INDEX xconnection_id_string ON DBA.rroad_group_2_property_pivot (
   connection_id_string,
   sample_set_number );
-------------------------------------------------------------------------

asked 27 Jan '16, 09:49

Breck%20Carter's gravatar image

Breck Carter
26.8k420580826
accept rate: 20%

edited 27 Jan '16, 09:50


In the fast plan with the sequential scans, the sequential scans actually aren't being run at all. The prefilter node is filtering with the predicate "FALSE", which means no rows will be requested from the nodes below. So the query will always return no rows.

The BETWEEN statement is evaluating to sample_set_number >= x AND sample_set_number <= x which gets reduced to sample_set_number = x

But the other predicate is sample_set_number < x.

Combining these two predicates results in FALSE because there is clearly no number that satisfies both of them.

To clarify: In the first example, we can determine there will be no rows in the result set without having to read a single page in the database. The actual contents of the table is irrelevant, it's similar to adding WHERE 1=2 to the query. In the second example, we have to actually search the index for rows with sample_set_number < n.

When there's a prefilter node in a plan, the engine is evaluating a predicate that has no relation to any table or view. Usually this is a contradiction (WHERE 1=2).

The optimizer governor probably stops further optimization of the plan because the estimated cost to execute it is so small.

permanent link

answered 27 Jan '16, 11:31

Mikel%20Rychliski's gravatar image

Mikel Rychliski
2.1k1641
accept rate: 30%

edited 27 Jan '16, 13:55

Ahhh... that would explain the "PreFilter" node in the plan diagram! ( doh! :)

In theory, code could be added to the application to test for combinations of host variables that would guarantee contradictions, and skip the query in those cases.

In practice, "if you can think of extra predicates, add extra predicates" might be a way to get the optimizer to do the heavy thinking :)

...especially when the query is taking forever to run.

(27 Jan '16, 14:25) Breck Carter

The PreFilter node disappears when the @locator_number value is changed to 227343 which is higher by 100 and results in 20 rows being returned.

The plan looks completely different from either of the previous two plans, and it runs like a rocket (effectively zero time).

Bottom line: It is a wonderful thing to have the plan dynamically constructed each time a query is executed!

Here is the third plan: Third plan

CREATE OR REPLACE VARIABLE @menu_down_count                             UNSIGNED BIGINT  = 20;
CREATE OR REPLACE VARIABLE @connection_id_string                        VARCHAR ( 50 )   = '1-33137-20160118152015-061';
CREATE OR REPLACE VARIABLE @locator_number                              UNSIGNED BIGINT  = 227343; 
CREATE OR REPLACE VARIABLE @oldest_connection_history_sample_set_number UNSIGNED BIGINT  = 227243; 
CALL sa_flush_cache();

SELECT TOP @menu_down_count
                     rroad_group_2_property_pivot.sample_set_number      AS locator_number,
                     rroad_group_1_property_pivot.sample_recorded_at     AS locator_timestamp,
                     CAST ( 1 AS INTEGER )                               AS artificial_row_type
                FROM rroad_group_2_property_pivot
                        INNER JOIN rroad_group_1_property_pivot
                                ON rroad_group_1_property_pivot.sample_set_number = rroad_group_2_property_pivot.sample_set_number
               WHERE rroad_group_2_property_pivot.connection_id_string = @connection_id_string
                 AND rroad_group_2_property_pivot.sample_set_number    < @locator_number
                 AND rroad_group_2_property_pivot.sample_set_number    
                        BETWEEN @oldest_connection_history_sample_set_number -- Performance: Optimization: This redundant predicate seems to help the "no rows" result case.
                            AND @locator_number
               ORDER BY rroad_group_1_property_pivot.sample_set_number DESC;
(27 Jan '16, 14:57) Breck Carter

If I understand correctly, there are no rows in the table with sample_set_number < 227243 (and this is known by the application?).

In plan2 we are filtering on: sample_set_number < 227243

In plan3 we are filtering on: 227243 < sample_set_number < 227343

plan2 is (erroneously?) expecting to find 88% of the rows below 227243 on group_1, so the plan is constructed in a different order.

Perhaps recreating the statistics will speed up the original query without the between statement?

(27 Jan '16, 15:18) Mikel Rychliski
Replies hidden

Yes, plans 1 and 2 correctly find no rows, except that plan 2 takes forever.

Recreating statistics is a non-starter because it is a high-throughput application with millions of rows regularly inserted and deleted.

I suppose the occasional Plan From Hell might be regarded as an aspect of The Dark Side Of Dynamic Plan Calculation, but these exceptions occur SO INFREQUENTLY that I am happy to deal with them... and "redundant predicate" is a welcome addition to The Refrigerator Of Solutions next to Dead Chicken :)

(27 Jan '16, 16:32) Breck Carter

I guess the issue is the predicate isn't truly redundant from SQL Anywhere's perspective. Even if the lowest number in the column is higher then 227242, the only way we would know this is from the column statistics. And we can't actually use that for the search because column statistics are non transactional (and in this case, very wrong). Even if that wasn't a problem, we can't construct an access plan assuming there are no rows below 227243 because that could change before we execute it.

You have an extra 'hint' of information that's being passed by the BETWEEN clause.

Ideally, the column statistics would be kept a little more accurate (I wonder how they got so skewed?). Then the optimizer would probably use plan3 for the no rows case as well.

(27 Jan '16, 17:05) Mikel Rychliski
1

OK, so maybe the predicate is not redundant from the optimizer's point of view. Since mere mortals have no way of predicting the optimizer's behavior, it reinforces this rule of thumb: "If you have a predicate, use a predicate."

In other words, just because the predicate is not required for the purposes of correct execution, is no reason not to include it. SQL Anywhere needs all the information it can get to pick a plan that is fast as well as correct.

The goal is not to create an elegant query, but to create a fast correct query.

(28 Jan '16, 08:11) Breck Carter
showing 3 of 6 show all flat view
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×239

question asked: 27 Jan '16, 09:49

question was seen: 358 times

last updated: 28 Jan '16, 08:12