For performance purposes (index usage) I am considering changing several queries from SELECT ... ORDER BY table_name.column_name_1; to SELECT ... ORDER BY table_name.column_name_2; The columnn_name_1 and 2 values in each result set are unique so the ORDER BY clauses are deterministic. My first question is, "Are the two result sets returned in the same order?" If the answer is "no" then the next question is, "Which rows are returned in different orders?" The answer to the second question may help me figure out if the data can be "fixed" so the answer to question 1 becomes "yes", and I can make the change without affecting application correctness. Bottom line: How do I write queries to answer questions 1 and 2? |
Here is a variation on the WINDOW ... ROWS BETWEEN solution where the rows are "partitioned" by a third column (column0 in the code below). In this variation, rows need to be ordered within each partition, not globally. There are two partitions: column0 = 1 and 2. CREATE TABLE tab ( column0 INTEGER NOT NULL, column1 INTEGER NOT NULL PRIMARY KEY, column2 DATE NOT NULL ); INSERT tab VALUES ( 1, 1, DATEADD ( DAY, 1, CURRENT DATE ) ); INSERT tab VALUES ( 2, 2, DATEADD ( DAY, 2, CURRENT DATE ) ); INSERT tab VALUES ( 1, 3, DATEADD ( DAY, 3, CURRENT DATE ) ); INSERT tab VALUES ( 2, 4, DATEADD ( DAY, 1, CURRENT DATE ) ); -- out of date order INSERT tab VALUES ( 1, 5, DATEADD ( DAY, 5, CURRENT DATE ) ); INSERT tab VALUES ( 2, 6, DATEADD ( DAY, 6, CURRENT DATE ) ); INSERT tab VALUES ( 1, 7, DATEADD ( DAY, 7, CURRENT DATE ) ); INSERT tab VALUES ( 2, 8, DATEADD ( DAY, 6, CURRENT DATE ) ); -- ok ( really, it's ok now :) INSERT tab VALUES ( 1, 9, DATEADD ( DAY, 9, CURRENT DATE ) ); COMMIT; -- Solution 2 partitions by column0... WITH row_pair_view AS ( SELECT FIRST_VALUE ( column1 ) OVER row_pair_window AS this_column1, LAST_VALUE ( column1 ) OVER row_pair_window AS next_column1, FIRST_VALUE ( column2 ) OVER row_pair_window AS this_column2, LAST_VALUE ( column2 ) OVER row_pair_window AS next_column2, DATEDIFF ( DAY, this_column2, next_column2 ) AS day_difference FROM tab WINDOW row_pair_window AS ( PARTITION BY column0 ORDER BY column1 ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING ) ) SELECT TOP 100 * FROM row_pair_view WHERE this_column2 > next_column2 ORDER BY day_difference ASC; this_column1,next_column1,this_column2,next_column2,day_difference 2,4,'2010-09-28','2010-09-27',-1 The previous solution doesn't look at column0 so it gives a false "hit": WITH row_pair_view AS ( SELECT FIRST_VALUE ( column1 ) OVER row_pair_window AS this_column1, LAST_VALUE ( column1 ) OVER row_pair_window AS next_column1, FIRST_VALUE ( column2 ) OVER row_pair_window AS this_column2, LAST_VALUE ( column2 ) OVER row_pair_window AS next_column2, DATEDIFF ( DAY, this_column2, next_column2 ) AS day_difference FROM tab WINDOW row_pair_window AS ( ORDER BY column1 ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING ) ) SELECT TOP 100 * FROM row_pair_view WHERE this_column2 > next_column2 ORDER BY day_difference ASC; this_column1,next_column1,this_column2,next_column2,day_difference 3,4,'2010-09-29','2010-09-27',-2 7,8,'2010-10-03','2010-10-02',-1 |
While I agree with Graeme that I somewhat don't get the point: If you want to find out if both sort orders are identical then how about using the row_number function and then use an EXCEPT query to find out if there are any rows with different row_number values. That would mean they are sorted differently.
If that query returns an empty resultset, the sort order is identical (well, not necessarily in general, but for this particular result set). It might also be possible to use two row_number functions in one single select and then do the comparison in the WHERE clause, such as [RE-EDITED BY VOLKER as a derived query]
Please note: I haven't tested any of these queries and can't say if they are efficient in any way, but it might lead you on the right track:) From Breck: A good start (answers the first question), but alas, either too much information or none at all... CREATE TABLE tab ( column1 INTEGER NOT NULL PRIMARY KEY, column2 DATE NOT NULL ); INSERT tab VALUES ( 1, DATEADD ( DAY, 1, CURRENT DATE ) ); INSERT tab VALUES ( 2, DATEADD ( DAY, 2, CURRENT DATE ) ); INSERT tab VALUES ( 3, DATEADD ( DAY, 3, CURRENT DATE ) ); INSERT tab VALUES ( 4, DATEADD ( DAY, 1, CURRENT DATE ) ); -- out of date order INSERT tab VALUES ( 5, DATEADD ( DAY, 5, CURRENT DATE ) ); INSERT tab VALUES ( 6, DATEADD ( DAY, 6, CURRENT DATE ) ); INSERT tab VALUES ( 7, DATEADD ( DAY, 7, CURRENT DATE ) ); INSERT tab VALUES ( 8, DATEADD ( DAY, 1, CURRENT DATE ) ); -- out of date order INSERT tab VALUES ( 9, DATEADD ( DAY, 9, CURRENT DATE ) ); COMMIT; SELECT column1, column2, row_number() over (order by column1) as sortorder1 FROM tab EXCEPT SELECT column1, column2, row_number() over (order by column2) as sortorder2 FROM tab ORDER BY column1; column1,column2,sortorder1 2,'2010-09-28',2 3,'2010-09-29',3 4,'2010-09-27',4 5,'2010-10-01',5 6,'2010-10-02',6 7,'2010-10-03',7 8,'2010-09-27',8 SELECT column1, column2, row_number() over (order by column1) as sortorder1, row_number() over (order by column2) as sortorder2 FROM tab WHERE sortorder1 sortorder2 Could not execute statement. Window function used in predicate SQLCODE=-964, ODBC 3 State="HY000" Line 1, column 1
Leads to the same (and therefore verbose) output than the SELECT ... EXCEPT query. Comment Text Removed
See my comment to Graeme about "the point"... it's an unnecessary restriction, IMO, only allowing one CLUSTERED index per table :) @Breck: However, according to a recent article on Glenn's blog, the query optimizer is able to check any index's "CLUSTERED" characteristics (defined or not) and make use of "nearly clustered" indizes, too. Cf. http://iablog.sybase.com/paulley/2010/09/analyzing-clustered-indexes/ Comment Text Removed
Comment Text Removed
Comment Text Removed
@Breck: Have no SA engine ready available at the moment, but I guess my 2nd example should run when turned inside a derived query, cf. http://sqlanywhere-forum.sap.com/questions/1015 |
I'm confused - why would you expect
order by col1
andorder by col2
to return rows in the same order, unless col1 and col2 have identical data?One is the INTEGER DEFAULT AUTOINCREMENT primary key, and the other is a TIMESTAMP which may or may not be monotonically ascending for each new row. Vastly different data, potentially the same order. I can only make one CLUSTERED. (and now that I think of it, there may be a clue to the solution in that "monotonically ascending" comment... and no, I haven't read Volker's answer yet)
More: I want to change ORDER BY timestamp-column to ORDER BY autoincrement-column because currently there are a LOT of queries with each version of ORDER BY and I want to optimize them ALL by having the same single index help the ORDER BY.