For performance purposes (index usage) I am considering changing several queries from SELECT ... ORDER BY table_name.column_name_1; to SELECT ... ORDER BY table_name.column_name_2; The columnn_name_1 and 2 values in each result set are unique so the ORDER BY clauses are deterministic. My first question is, "Are the two result sets returned in the same order?" If the answer is "no" then the next question is, "Which rows are returned in different orders?" The answer to the second question may help me figure out if the data can be "fixed" so the answer to question 1 becomes "yes", and I can make the change without affecting application correctness. Bottom line: How do I write queries to answer questions 1 and 2? asked 25 Sep '10, 10:29 Breck Carter |
Here's a weird query that answers the question, "Show me the top 100 consecutive row pairs of table tab, ordered by column1, where the values of column2 in the row pair are out of order, and order the result set by the out-of-orderedness (the magnitude of the error)." At least, it seems to work... CREATE TABLE tab ( column1 INTEGER NOT NULL PRIMARY KEY, column2 DATE NOT NULL ); INSERT tab VALUES ( 1, DATEADD ( DAY, 1, CURRENT DATE ) ); INSERT tab VALUES ( 2, DATEADD ( DAY, 2, CURRENT DATE ) ); INSERT tab VALUES ( 3, DATEADD ( DAY, 3, CURRENT DATE ) ); INSERT tab VALUES ( 4, DATEADD ( DAY, 1, CURRENT DATE ) ); -- out of date order INSERT tab VALUES ( 5, DATEADD ( DAY, 5, CURRENT DATE ) ); INSERT tab VALUES ( 6, DATEADD ( DAY, 6, CURRENT DATE ) ); INSERT tab VALUES ( 7, DATEADD ( DAY, 7, CURRENT DATE ) ); INSERT tab VALUES ( 8, DATEADD ( DAY, 1, CURRENT DATE ) ); -- out of date order INSERT tab VALUES ( 9, DATEADD ( DAY, 9, CURRENT DATE ) ); COMMIT; WITH row_pair_view AS ( SELECT FIRST_VALUE ( column1 ) OVER row_pair_window AS this_column1, LAST_VALUE ( column1 ) OVER row_pair_window AS next_column1, FIRST_VALUE ( column2 ) OVER row_pair_window AS this_column2, LAST_VALUE ( column2 ) OVER row_pair_window AS next_column2, DATEDIFF ( DAY, this_column2, next_column2 ) AS day_difference FROM tab WINDOW row_pair_window AS ( ORDER BY column1 ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING ) ) SELECT TOP 100 * FROM row_pair_view WHERE this_column2 > next_column2 ORDER BY day_difference ASC; this_column1,next_column1,this_column2,next_column2,day_difference 7,8,'2010-10-03','2010-09-27',-6 3,4,'2010-09-29','2010-09-27',-2 Alas, in the real world table, there are LOTS of row pairs that are out-of-order on column 2. Sigh. The good news is, that WINDOW query ran in 39 seconds on a 1.2 million row table in an already-heavily-loaded SQL Anywhere 11 database... sometimes, it just takes your breath away, that a query so WEIRD can run so quickly. answered 26 Sep '10, 10:37 Breck Carter Comment Text Removed
Ha, ha... in the real world, a (probable) hardware error caused out-of-order CURRENT TIMESTAMP values to be recorded. In other words, column2 is occasionally out of order but it should not be... it's data cleanup problem... otherwise the ORDER BY results are the same! Woohoo! |
Here is a variation on the WINDOW ... ROWS BETWEEN solution where the rows are "partitioned" by a third column (column0 in the code below). In this variation, rows need to be ordered within each partition, not globally. There are two partitions: column0 = 1 and 2. CREATE TABLE tab ( column0 INTEGER NOT NULL, column1 INTEGER NOT NULL PRIMARY KEY, column2 DATE NOT NULL ); INSERT tab VALUES ( 1, 1, DATEADD ( DAY, 1, CURRENT DATE ) ); INSERT tab VALUES ( 2, 2, DATEADD ( DAY, 2, CURRENT DATE ) ); INSERT tab VALUES ( 1, 3, DATEADD ( DAY, 3, CURRENT DATE ) ); INSERT tab VALUES ( 2, 4, DATEADD ( DAY, 1, CURRENT DATE ) ); -- out of date order INSERT tab VALUES ( 1, 5, DATEADD ( DAY, 5, CURRENT DATE ) ); INSERT tab VALUES ( 2, 6, DATEADD ( DAY, 6, CURRENT DATE ) ); INSERT tab VALUES ( 1, 7, DATEADD ( DAY, 7, CURRENT DATE ) ); INSERT tab VALUES ( 2, 8, DATEADD ( DAY, 6, CURRENT DATE ) ); -- ok ( really, it's ok now :) INSERT tab VALUES ( 1, 9, DATEADD ( DAY, 9, CURRENT DATE ) ); COMMIT; -- Solution 2 partitions by column0... WITH row_pair_view AS ( SELECT FIRST_VALUE ( column1 ) OVER row_pair_window AS this_column1, LAST_VALUE ( column1 ) OVER row_pair_window AS next_column1, FIRST_VALUE ( column2 ) OVER row_pair_window AS this_column2, LAST_VALUE ( column2 ) OVER row_pair_window AS next_column2, DATEDIFF ( DAY, this_column2, next_column2 ) AS day_difference FROM tab WINDOW row_pair_window AS ( PARTITION BY column0 ORDER BY column1 ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING ) ) SELECT TOP 100 * FROM row_pair_view WHERE this_column2 > next_column2 ORDER BY day_difference ASC; this_column1,next_column1,this_column2,next_column2,day_difference 2,4,'2010-09-28','2010-09-27',-1 The previous solution doesn't look at column0 so it gives a false "hit": WITH row_pair_view AS ( SELECT FIRST_VALUE ( column1 ) OVER row_pair_window AS this_column1, LAST_VALUE ( column1 ) OVER row_pair_window AS next_column1, FIRST_VALUE ( column2 ) OVER row_pair_window AS this_column2, LAST_VALUE ( column2 ) OVER row_pair_window AS next_column2, DATEDIFF ( DAY, this_column2, next_column2 ) AS day_difference FROM tab WINDOW row_pair_window AS ( ORDER BY column1 ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING ) ) SELECT TOP 100 * FROM row_pair_view WHERE this_column2 > next_column2 ORDER BY day_difference ASC; this_column1,next_column1,this_column2,next_column2,day_difference 3,4,'2010-09-29','2010-09-27',-2 7,8,'2010-10-03','2010-10-02',-1 answered 26 Sep '10, 11:54 Breck Carter |
While I agree with Graeme that I somewhat don't get the point: If you want to find out if both sort orders are identical then how about using the row_number function and then use an EXCEPT query to find out if there are any rows with different row_number values. That would mean they are sorted differently.
If that query returns an empty resultset, the sort order is identical (well, not necessarily in general, but for this particular result set). It might also be possible to use two row_number functions in one single select and then do the comparison in the WHERE clause, such as [RE-EDITED BY VOLKER as a derived query]
Please note: I haven't tested any of these queries and can't say if they are efficient in any way, but it might lead you on the right track:) From Breck: A good start (answers the first question), but alas, either too much information or none at all... CREATE TABLE tab ( column1 INTEGER NOT NULL PRIMARY KEY, column2 DATE NOT NULL ); INSERT tab VALUES ( 1, DATEADD ( DAY, 1, CURRENT DATE ) ); INSERT tab VALUES ( 2, DATEADD ( DAY, 2, CURRENT DATE ) ); INSERT tab VALUES ( 3, DATEADD ( DAY, 3, CURRENT DATE ) ); INSERT tab VALUES ( 4, DATEADD ( DAY, 1, CURRENT DATE ) ); -- out of date order INSERT tab VALUES ( 5, DATEADD ( DAY, 5, CURRENT DATE ) ); INSERT tab VALUES ( 6, DATEADD ( DAY, 6, CURRENT DATE ) ); INSERT tab VALUES ( 7, DATEADD ( DAY, 7, CURRENT DATE ) ); INSERT tab VALUES ( 8, DATEADD ( DAY, 1, CURRENT DATE ) ); -- out of date order INSERT tab VALUES ( 9, DATEADD ( DAY, 9, CURRENT DATE ) ); COMMIT; SELECT column1, column2, row_number() over (order by column1) as sortorder1 FROM tab EXCEPT SELECT column1, column2, row_number() over (order by column2) as sortorder2 FROM tab ORDER BY column1; column1,column2,sortorder1 2,'2010-09-28',2 3,'2010-09-29',3 4,'2010-09-27',4 5,'2010-10-01',5 6,'2010-10-02',6 7,'2010-10-03',7 8,'2010-09-27',8 SELECT column1, column2, row_number() over (order by column1) as sortorder1, row_number() over (order by column2) as sortorder2 FROM tab WHERE sortorder1 sortorder2 Could not execute statement. Window function used in predicate SQLCODE=-964, ODBC 3 State="HY000" Line 1, column 1
Leads to the same (and therefore verbose) output than the SELECT ... EXCEPT query. answered 25 Sep '10, 18:21 Volker Barth Comment Text Removed
See my comment to Graeme about "the point"... it's an unnecessary restriction, IMO, only allowing one CLUSTERED index per table :) @Breck: However, according to a recent article on Glenn's blog, the query optimizer is able to check any index's "CLUSTERED" characteristics (defined or not) and make use of "nearly clustered" indizes, too. Cf. http://iablog.sybase.com/paulley/2010/09/analyzing-clustered-indexes/ Comment Text Removed
Comment Text Removed
Comment Text Removed
@Breck: Have no SA engine ready available at the moment, but I guess my 2nd example should run when turned inside a derived query, cf. http://sqlanywhere-forum.sap.com/questions/1015 |
I'm confused - why would you expect
order by col1
andorder by col2
to return rows in the same order, unless col1 and col2 have identical data?One is the INTEGER DEFAULT AUTOINCREMENT primary key, and the other is a TIMESTAMP which may or may not be monotonically ascending for each new row. Vastly different data, potentially the same order. I can only make one CLUSTERED. (and now that I think of it, there may be a clue to the solution in that "monotonically ascending" comment... and no, I haven't read Volker's answer yet)
More: I want to change ORDER BY timestamp-column to ORDER BY autoincrement-column because currently there are a LOT of queries with each version of ORDER BY and I want to optimize them ALL by having the same single index help the ORDER BY.