Currently, I have exported a bunch of SQL Anywhere 12 tables to MS SQL Server - with the help of Breck's great blog article on a script-based transfer. - Thanks again, Breck!

Now I would like to add an automatic test to compare the SA and MSS data - to assure the data transfer has been successful. (Breck's script does check whether all rows are transferred correctly, but I would like to check the contents, too.) - As there are lots of tables, I would prefer a method that can be automated (e.g. by enumerating all tables via a loop over the schema or the like).

But that's more difficult than I'd thought:

  1. My first intent was to use the UNLOAD statement against both the local SA 12 and the remote MSS SQL data (by temporarily adding a proxy table for each MSS table) and then unloading to two different folders and comparing the results.
    This approach seems to fail since UNLOAD from a proxy table is not supported ( on Windows 32-bit). It returns SQLCODE -134 (SQLE_NOT_IMPLEMENTED - Feature not supported).

  2. OUTPUT from proxy tables does work, but is not available within procedures or code blocks - making it difficult to use that without coding all the tables I want to compare.

  3. Another attempt was to use the EXCEPT/UNION statements as in this answer in order to find the differing rows. But that does not work as expected, either: Though both EXCEPTs (local except proxy and proxy except local) return empty result sets (as expected), the UNION ALL of both returns the local table's contents. So these (combined) set operations seem not to work correctly with proxy tables.

Do you have any ideas for further options?

asked 18 May '11, 05:13

Volker%20Barth's gravatar image

Volker Barth
accept rate: 34%

In the end, I used a combined approach:

Using a code block to enumerate all relevant tables and build a bunch of SQL statements for each, namely to

  • create the according proxy table, select all those rows, output them in PK ordering and drop the proxy table
  • select all rows from the local table and output them to a different file

Those statements were stored as strings in a temporary table. Finally the result set was OUTPUTed again to build a SQL script. By design, the statements do form a long SQL batch but no code blocks, so OUTPUT can be used within.

Thas script was run with DBISQL (which can OUTPUT proxy tables), and finally I've DIFFed the data files.

Quite slow (particulary compared to the very fast data transfer when exporting the SA data to MSS), but it's automated:)

And now I have noticed that missing the KEEPNULLS option in MS SQL BULK INSERT definetely has introduced some data changes - some rows with NULLs in SA have default values in MS ... - obviously not just relying on "it has the same number of rows in both databases" was a senseful approach ...

permanent link

answered 18 May '11, 08:41

Volker%20Barth's gravatar image

Volker Barth
accept rate: 34%

So one can conclude that using OUTPUT and using loops within code blocks do not exclude each other (as I had initially thought) - as long as one uses the latter to generate a batch of statements containing the former statement:)

(18 May '11, 08:43) Volker Barth

Maybe this is too simplistic an answer (and the performance is probably not where you need it to be), but suppose your local table is tSA and the corresponding SQL Server table is tMSS, could you not simply compare the value returned by:

select count(*) from tSA


select count(*) from (select * from tSA union select * from tMSS) d

and see if the two counts are the same. Assuming that you have already verified that both tSA and tMSS have the same number of rows, the above union should give you the same count as tSA if both tables are identical.

permanent link

answered 18 May '11, 08:17

Karim%20Khamis's gravatar image

Karim Khamis
accept rate: 40%

This is an interesting approach I'm gonna check...

I tried a similar approach, namely instead of building the "differing" result set

(select 'SA', * from tSA EXCEPT ALL 'SA', * from tMS)
(select 'MS', * from tMS EXCEPT ALL 'MS', * from tSA)
order by 2, 1 desc

just to count the rows of that query (put as a derived table) - counting seems to work.

Are there known restrictions w.r.t. combinations of UNION/EXCEPT and proxy tables?

(18 May '11, 08:30) Volker Barth

Karim's solution is quite interesting; Volker, you may have missed that it actually is comparing the two tables for equality because of the UNION DISTINCT. You could wrap it up as one statement:

select ( select count(*) from tSA ) as cntSA
     , ( select count(*) from tMS ) as cntMS
     , if cntSA=cntMS 
       and cntSA=(select count(*) 
                  from ( select * from tSA 
                         UNION DISTINCT 
                         select * from tMS 
                        ) DT 
       then 'equal' else 'not equal' endif

If there is a difference in a row between SA and MS then it will appear two times in the UNION DISTINCT. Note that this won't find differences in string case or some other kinds of differences.

Of course, this also does not output the rows that differ, but it will be a more efficient way to verify that two tables are equal. It should be at least 50% faster than the EXCEPT ALL approach.

The GROUP BY approach that counts rows in tSA and tMS then compares is more powerful (handles duplicates, outputs mismatches) but it is harder to code up because you can not use *. Karim's solution is simple and handles any tables with the same schema. Nice!

(20 May '11, 16:02) Ivan T. Bowman
Replies hidden

@Ivan: Thanks for the addition! I'm aware of use of UNION in Karim's approach (more aware than of "UNION DISTINCT" being a synonym for the simple "UNION"...). But your combined query is even more handy.

What my different tests have revealed is the following: The comparison by set operators has showed much more differences in my migrated data sources than the "outputting and diffing". I found out that this is due to different handling of empty strings (''): Those were imported on the MSS side as NULLs. I guess the reason is that when doing the output, I had not used quotes around strings, and that has obviously led to the importing as NULLs.

This is considered a mere cosmetic problem in this case, but apparently (and correctly) the set operators treated '' vs. NULL as not identical whereas the un-quoted output made them appear identical...

Another lesson learnt.

(21 May '11, 18:08) Volker Barth
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here



Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text]( "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:


question asked: 18 May '11, 05:13

question was seen: 4,322 times

last updated: 21 May '11, 18:09