Currently, I have exported a bunch of SQL Anywhere 12 tables to MS SQL Server - with the help of Breck's great blog article on a script-based transfer. - Thanks again, Breck! Now I would like to add an automatic test to compare the SA and MSS data - to assure the data transfer has been successful. (Breck's script does check whether all rows are transferred correctly, but I would like to check the contents, too.) - As there are lots of tables, I would prefer a method that can be automated (e.g. by enumerating all tables via a loop over the schema or the like). But that's more difficult than I'd thought:
Do you have any ideas for further options? |
In the end, I used a combined approach: Using a code block to enumerate all relevant tables and build a bunch of SQL statements for each, namely to
Those statements were stored as strings in a temporary table. Finally the result set was OUTPUTed again to build a SQL script. By design, the statements do form a long SQL batch but no code blocks, so OUTPUT can be used within. Thas script was run with DBISQL (which can OUTPUT proxy tables), and finally I've DIFFed the data files. Quite slow (particulary compared to the very fast data transfer when exporting the SA data to MSS), but it's automated:) And now I have noticed that missing the KEEPNULLS option in MS SQL BULK INSERT definetely has introduced some data changes - some rows with NULLs in SA have default values in MS ... - obviously not just relying on "it has the same number of rows in both databases" was a senseful approach ... So one can conclude that using OUTPUT and using loops within code blocks do not exclude each other (as I had initially thought) - as long as one uses the latter to generate a batch of statements containing the former statement:)
(18 May '11, 08:43)
Volker Barth
|
Maybe this is too simplistic an answer (and the performance is probably not where you need it to be), but suppose your local table is tSA and the corresponding SQL Server table is tMSS, could you not simply compare the value returned by: select count(*) from tSA to: select count(*) from (select * from tSA union select * from tMSS) d and see if the two counts are the same. Assuming that you have already verified that both tSA and tMSS have the same number of rows, the above union should give you the same count as tSA if both tables are identical. This is an interesting approach I'm gonna check... I tried a similar approach, namely instead of building the "differing" result set (select 'SA', * from tSA EXCEPT ALL 'SA', * from tMS) UNION ALL (select 'MS', * from tMS EXCEPT ALL 'MS', * from tSA) order by 2, 1 desc just to count the rows of that query (put as a derived table) - counting seems to work. Are there known restrictions w.r.t. combinations of UNION/EXCEPT and proxy tables?
(18 May '11, 08:30)
Volker Barth
1
Karim's solution is quite interesting; Volker, you may have missed that it actually is comparing the two tables for equality because of the UNION DISTINCT. You could wrap it up as one statement: select ( select count(*) from tSA ) as cntSA , ( select count(*) from tMS ) as cntMS , if cntSA=cntMS and cntSA=(select count(*) from ( select * from tSA UNION DISTINCT select * from tMS ) DT ) then 'equal' else 'not equal' endif If there is a difference in a row between SA and MS then it will appear two times in the UNION DISTINCT. Note that this won't find differences in string case or some other kinds of differences. Of course, this also does not output the rows that differ, but it will be a more efficient way to verify that two tables are equal. It should be at least 50% faster than the EXCEPT ALL approach. The GROUP BY approach that counts rows in tSA and tMS then compares is more powerful (handles duplicates, outputs mismatches) but it is harder to code up because you can not use *. Karim's solution is simple and handles any tables with the same schema. Nice!
(20 May '11, 16:02)
Ivan T. Bowman
Replies hidden
@Ivan: Thanks for the addition! I'm aware of use of UNION in Karim's approach (more aware than of "UNION DISTINCT" being a synonym for the simple "UNION"...). But your combined query is even more handy. What my different tests have revealed is the following: The comparison by set operators has showed much more differences in my migrated data sources than the "outputting and diffing". I found out that this is due to different handling of empty strings (''): Those were imported on the MSS side as NULLs. I guess the reason is that when doing the output, I had not used quotes around strings, and that has obviously led to the importing as NULLs. This is considered a mere cosmetic problem in this case, but apparently (and correctly) the set operators treated '' vs. NULL as not identical whereas the un-quoted output made them appear identical... Another lesson learnt.
(21 May '11, 18:08)
Volker Barth
|