The forum will experience an outage sometime between February 10 at 7:00pm EST and February 12 at 11:59 EST for installation of security updates. The actual time and duration of the outage are unknown but attempts will be made to minimize the downtime. We apologize for any inconvenience.

Fairly often, a customer will ask me to refresh their Test database with Production. Sometimes, if the customer is performing certain tests using data stored only in the Test database, I will need to manually input data from the Test application to the Test database, after it has been refreshed with Production. It can be difficult to know for certain, all the data which needs to be contained within the newly refreshed Test database. Does anyone know a method that I might use to quickly compare the data in a certain subset of tables? I am thinking about a procedure which will output the data of approximately a dozen tables to one test file. Prior to refreshing Test, I would run this procedure against both Test and Production and use a "vdiff" type utility to compare text in the two output files. Currently, I have unload statements which output data from the target tables to individual files. Does anyone know a similar process which would result in a single file for each database? Thanks.

asked 28 Apr '12, 09:01

dejstone's gravatar image

accept rate: 0%

edited 28 Apr '12, 10:38

Just a link to a somewhat question (and discussion):

How can I compare data from proxy tables and local tables?.

And I would surely agree with Glenn's warning in case you are about to compare the results of a heavy join - I would recommend a table-by-table comparison...

(29 Apr '12, 07:24) Volker Barth

We have to do this kind of compare on a regular basis to test the results of complex algorithms in our application software. We also have recovery situations where we selectively restore data.

There are two methodologies that we use.

  1. We unload one database's tables using the unload statement from SQL Central - this with the unload method that creates the tables on reload. We then edit the table names to be recreated on the load (e.g. table name > tablename_test) We then execute the load script which populates the test data in the database. We can then do various NOT EXISTS queries or variation compares on column values for matching keys between tablename and tablename_test. Note that you can you can use the MERGE Statement to insert rows from one into the other based on conditional tests (MERGE is a fantastic tool for this stuff and modifying the data in the source (insert) tables gives you a lot of control over the final result). I note that in your case you might be going from production to test - we have had to do a similar process when we needed to selectively restore information from a historical backup.

  2. For prolonged complicated sets with lots of columns (some containing text) we export the tables into two excel spreadsheets (use the new version of excel that can handle a large number of rows). We then use an Excel Compare software that does a fantastic job of identifying and highlighting differences - including Inserts and Deletes.

permanent link

answered 29 Apr '12, 13:00

Glenn%20Barber's gravatar image

Glenn Barber
accept rate: 0%

edited 29 Apr '12, 13:27

You can use the UNLOAD SELECT statement to output the result of a query which (presumably) will join all of the necessary tables so that the output is a single file.

This approach will work well only if you have a star-schema (or something close) that will limit the size of the output file and, more importantly, make it easier to determine where the missing/additional data is coming from. My point is that if you're joining ten tables, and then compare the result from two databases, it can be considerably harder to pinpoint the missing/extra data if the output file differs by tens or hundreds of thousands of rows - all because 5 rows were missing from a single table.

Another way of doing the comparison is to set up proxy tables to the other database, and then compose a FULL OUTER JOIN between the like tables in each database. This avoids the need to export the database contents to a flat file, but will not perform as well for large datasets.

permanent link

answered 28 Apr '12, 09:45

Glenn%20Paulley's gravatar image

Glenn Paulley
accept rate: 43%

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here



Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text]( "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:


question asked: 28 Apr '12, 09:01

question was seen: 856 times

last updated: 29 Apr '12, 13:27