For performance purposes (index usage) I am considering changing several queries from

SELECT ... ORDER BY table_name.column_name_1;

to

SELECT ... ORDER BY table_name.column_name_2;

The columnn_name_1 and 2 values in each result set are unique so the ORDER BY clauses are deterministic.

My first question is, "Are the two result sets returned in the same order?"

If the answer is "no" then the next question is, "Which rows are returned in different orders?"

The answer to the second question may help me figure out if the data can be "fixed" so the answer to question 1 becomes "yes", and I can make the change without affecting application correctness.

Bottom line: How do I write queries to answer questions 1 and 2?

asked 25 Sep '10, 10:29

Breck%20Carter's gravatar image

Breck Carter
32.5k5417261050
accept rate: 20%

edited 26 Sep '10, 10:52

I'm confused - why would you expect order by col1 and order by col2 to return rows in the same order, unless col1 and col2 have identical data?

(25 Sep '10, 13:33) Graeme Perrow

One is the INTEGER DEFAULT AUTOINCREMENT primary key, and the other is a TIMESTAMP which may or may not be monotonically ascending for each new row. Vastly different data, potentially the same order. I can only make one CLUSTERED. (and now that I think of it, there may be a clue to the solution in that "monotonically ascending" comment... and no, I haven't read Volker's answer yet)

(25 Sep '10, 22:31) Breck Carter

More: I want to change ORDER BY timestamp-column to ORDER BY autoincrement-column because currently there are a LOT of queries with each version of ORDER BY and I want to optimize them ALL by having the same single index help the ORDER BY.

(25 Sep '10, 22:36) Breck Carter

Here's a weird query that answers the question, "Show me the top 100 consecutive row pairs of table tab, ordered by column1, where the values of column2 in the row pair are out of order, and order the result set by the out-of-orderedness (the magnitude of the error)."

At least, it seems to work...

CREATE TABLE tab (
   column1  INTEGER NOT NULL PRIMARY KEY,
   column2  DATE NOT NULL );

INSERT tab VALUES ( 1, DATEADD ( DAY, 1, CURRENT DATE ) );
INSERT tab VALUES ( 2, DATEADD ( DAY, 2, CURRENT DATE ) );
INSERT tab VALUES ( 3, DATEADD ( DAY, 3, CURRENT DATE ) );
INSERT tab VALUES ( 4, DATEADD ( DAY, 1, CURRENT DATE ) ); -- out of date order
INSERT tab VALUES ( 5, DATEADD ( DAY, 5, CURRENT DATE ) );
INSERT tab VALUES ( 6, DATEADD ( DAY, 6, CURRENT DATE ) );
INSERT tab VALUES ( 7, DATEADD ( DAY, 7, CURRENT DATE ) );
INSERT tab VALUES ( 8, DATEADD ( DAY, 1, CURRENT DATE ) ); -- out of date order
INSERT tab VALUES ( 9, DATEADD ( DAY, 9, CURRENT DATE ) );
COMMIT;

WITH row_pair_view AS
   ( SELECT FIRST_VALUE ( column1  ) OVER row_pair_window  AS this_column1,
            LAST_VALUE  ( column1  ) OVER row_pair_window  AS next_column1,
            FIRST_VALUE ( column2 )  OVER row_pair_window  AS this_column2,
            LAST_VALUE  ( column2 )  OVER row_pair_window  AS next_column2,
            DATEDIFF ( DAY, 
                       this_column2, 
                       next_column2 ) AS day_difference
       FROM tab
     WINDOW row_pair_window 
         AS ( ORDER BY column1 
              ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING ) )
SELECT TOP 100 *
  FROM row_pair_view
 WHERE this_column2 > next_column2
 ORDER BY day_difference ASC;

this_column1,next_column1,this_column2,next_column2,day_difference
7,8,'2010-10-03','2010-09-27',-6
3,4,'2010-09-29','2010-09-27',-2

Alas, in the real world table, there are LOTS of row pairs that are out-of-order on column 2.

Sigh.

The good news is, that WINDOW query ran in 39 seconds on a 1.2 million row table in an already-heavily-loaded SQL Anywhere 11 database... sometimes, it just takes your breath away, that a query so WEIRD can run so quickly.

permanent link

answered 26 Sep '10, 10:37

Breck%20Carter's gravatar image

Breck Carter
32.5k5417261050
accept rate: 20%

Comment Text Removed

Ha, ha... in the real world, a (probable) hardware error caused out-of-order CURRENT TIMESTAMP values to be recorded. In other words, column2 is occasionally out of order but it should not be... it's data cleanup problem... otherwise the ORDER BY results are the same! Woohoo!

(26 Sep '10, 11:32) Breck Carter

Here is a variation on the WINDOW ... ROWS BETWEEN solution where the rows are "partitioned" by a third column (column0 in the code below).

In this variation, rows need to be ordered within each partition, not globally. There are two partitions: column0 = 1 and 2.

CREATE TABLE tab (
   column0  INTEGER NOT NULL,
   column1  INTEGER NOT NULL PRIMARY KEY,
   column2  DATE NOT NULL );

INSERT tab VALUES ( 1, 1, DATEADD ( DAY, 1, CURRENT DATE ) );
INSERT tab VALUES ( 2, 2, DATEADD ( DAY, 2, CURRENT DATE ) );
INSERT tab VALUES ( 1, 3, DATEADD ( DAY, 3, CURRENT DATE ) );
INSERT tab VALUES ( 2, 4, DATEADD ( DAY, 1, CURRENT DATE ) ); -- out of date order
INSERT tab VALUES ( 1, 5, DATEADD ( DAY, 5, CURRENT DATE ) );
INSERT tab VALUES ( 2, 6, DATEADD ( DAY, 6, CURRENT DATE ) );
INSERT tab VALUES ( 1, 7, DATEADD ( DAY, 7, CURRENT DATE ) );
INSERT tab VALUES ( 2, 8, DATEADD ( DAY, 6, CURRENT DATE ) ); -- ok ( really, it's ok now :)
INSERT tab VALUES ( 1, 9, DATEADD ( DAY, 9, CURRENT DATE ) );
COMMIT;

-- Solution 2 partitions by column0...

WITH row_pair_view AS
   ( SELECT FIRST_VALUE ( column1  ) OVER row_pair_window  AS this_column1,
            LAST_VALUE  ( column1  ) OVER row_pair_window  AS next_column1,
            FIRST_VALUE ( column2 )  OVER row_pair_window  AS this_column2,
            LAST_VALUE  ( column2 )  OVER row_pair_window  AS next_column2,
            DATEDIFF ( DAY, 
                       this_column2, 
                       next_column2 ) AS day_difference
       FROM tab
     WINDOW row_pair_window 
         AS ( PARTITION BY column0
              ORDER BY column1 
              ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING ) )
SELECT TOP 100 *
  FROM row_pair_view
 WHERE this_column2 > next_column2
 ORDER BY day_difference ASC;

this_column1,next_column1,this_column2,next_column2,day_difference
2,4,'2010-09-28','2010-09-27',-1

The previous solution doesn't look at column0 so it gives a false "hit":

WITH row_pair_view AS
   ( SELECT FIRST_VALUE ( column1  ) OVER row_pair_window  AS this_column1,
            LAST_VALUE  ( column1  ) OVER row_pair_window  AS next_column1,
            FIRST_VALUE ( column2 )  OVER row_pair_window  AS this_column2,
            LAST_VALUE  ( column2 )  OVER row_pair_window  AS next_column2,
            DATEDIFF ( DAY, 
                       this_column2, 
                       next_column2 ) AS day_difference
       FROM tab
     WINDOW row_pair_window 
         AS ( ORDER BY column1 
              ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING ) )
SELECT TOP 100 *
  FROM row_pair_view
 WHERE this_column2 > next_column2
 ORDER BY day_difference ASC;

this_column1,next_column1,this_column2,next_column2,day_difference
3,4,'2010-09-29','2010-09-27',-2
7,8,'2010-10-03','2010-10-02',-1
permanent link

answered 26 Sep '10, 11:54

Breck%20Carter's gravatar image

Breck Carter
32.5k5417261050
accept rate: 20%

While I agree with Graeme that I somewhat don't get the point:

If you want to find out if both sort orders are identical then how about using the row_number function and then use an EXCEPT query to find out if there are any rows with different row_number values. That would mean they are sorted differently.

SELECT col1, col2, ..., coln, row_number() over (order by col1) as sortorder1
FROM table1
EXCEPT
SELECT col1, col2, ..., coln, row_number() over (order by col2) as sortorder2
FROM table1
ORDER BY WhatYouLike

If that query returns an empty resultset, the sort order is identical (well, not necessarily in general, but for this particular result set).

It might also be possible to use two row_number functions in one single select and then do the comparison in the WHERE clause, such as

[RE-EDITED BY VOLKER as a derived query]

SELECT * FROM
  (SELECT col1, col2, ..., coln,
    row_number() over (order by col1) as sortorder1,
    row_number() over (order by col2) as sortorder2
    FROM table1) S
WHERE sortorder1 <> sortorder2

Please note: I haven't tested any of these queries and can't say if they are efficient in any way, but it might lead you on the right track:)


From Breck: A good start (answers the first question), but alas, either too much information or none at all...

CREATE TABLE tab (
   column1  INTEGER NOT NULL PRIMARY KEY,
   column2  DATE NOT NULL );

INSERT tab VALUES ( 1, DATEADD ( DAY, 1, CURRENT DATE ) );
INSERT tab VALUES ( 2, DATEADD ( DAY, 2, CURRENT DATE ) );
INSERT tab VALUES ( 3, DATEADD ( DAY, 3, CURRENT DATE ) );
INSERT tab VALUES ( 4, DATEADD ( DAY, 1, CURRENT DATE ) ); -- out of date order
INSERT tab VALUES ( 5, DATEADD ( DAY, 5, CURRENT DATE ) );
INSERT tab VALUES ( 6, DATEADD ( DAY, 6, CURRENT DATE ) );
INSERT tab VALUES ( 7, DATEADD ( DAY, 7, CURRENT DATE ) );
INSERT tab VALUES ( 8, DATEADD ( DAY, 1, CURRENT DATE ) ); -- out of date order
INSERT tab VALUES ( 9, DATEADD ( DAY, 9, CURRENT DATE ) );
COMMIT;

SELECT column1, column2, row_number() over (order by column1) as sortorder1
FROM tab
EXCEPT
SELECT column1, column2, row_number() over (order by column2) as sortorder2
FROM tab
ORDER BY column1;

column1,column2,sortorder1
2,'2010-09-28',2
3,'2010-09-29',3
4,'2010-09-27',4
5,'2010-10-01',5
6,'2010-10-02',6
7,'2010-10-03',7
8,'2010-09-27',8

SELECT column1, column2,
row_number() over (order by column1) as sortorder1,
row_number() over (order by column2) as sortorder2
FROM tab
WHERE sortorder1  sortorder2

Could not execute statement.
Window function used in predicate
SQLCODE=-964, ODBC 3 State="HY000"
Line 1, column 1

-- 2nd sample as corrected by Volker
SELECT * FROM
  (SELECT column1, column2,
    row_number() over (order by column1) as sortorder1,
    row_number() over (order by column2) as sortorder2
    FROM tab) S
WHERE sortorder1 <> sortorder2
ORDER BY column1

Leads to the same (and therefore verbose) output than the SELECT ... EXCEPT query.

permanent link

answered 25 Sep '10, 18:21

Volker%20Barth's gravatar image

Volker Barth
40.1k361549819
accept rate: 34%

edited 30 Sep '10, 07:45

Comment Text Removed

See my comment to Graeme about "the point"... it's an unnecessary restriction, IMO, only allowing one CLUSTERED index per table :)

(26 Sep '10, 09:12) Breck Carter

@Breck: However, according to a recent article on Glenn's blog, the query optimizer is able to check any index's "CLUSTERED" characteristics (defined or not) and make use of "nearly clustered" indizes, too. Cf. http://iablog.sybase.com/paulley/2010/09/analyzing-clustered-indexes/

(26 Sep '10, 13:14) Volker Barth
Comment Text Removed
Comment Text Removed
Comment Text Removed

@Breck: Have no SA engine ready available at the moment, but I guess my 2nd example should run when turned inside a derived query, cf. http://sqlanywhere-forum.sap.com/questions/1015

(26 Sep '10, 13:58) Volker Barth
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×275
×19
×6

question asked: 25 Sep '10, 10:29

question was seen: 3,172 times

last updated: 30 Sep '10, 07:45