The forum will experience an outage sometime between February 10 at 7:00pm EST and February 12 at 11:59 EST for installation of security updates. The actual time and duration of the outage are unknown but attempts will be made to minimize the downtime. We apologize for any inconvenience.

# Which is faster, ON EXISTING SKIP or UNION DISTINCT?

 Requirement: Gather distinct primary key values from several tables t2 ... tn into one table t1. Which is faster, this? ```INSERT t1 ON EXISTING SKIP SELECT [primary key columns] FROM t2 UNION ALL SELECT [primary key columns] FROM t3 UNION ALL SELECT [primary key columns] FROM t4 ... UNION ALL SELECT [primary key columns] FROM tn; ``` ...or this? ```INSERT t1 SELECT [primary key columns] FROM t2 UNION SELECT [primary key columns] FROM t3 UNION SELECT [primary key columns] FROM t4 ... UNION SELECT [primary key columns] FROM tn; ``` ...or, it probably doesn't matter? asked 07 Nov '10, 16:20 Breck Carter 26.6k●418●575●824 accept rate: 21% Volker Barth 29.3k●287●438●644 1 Is there an underlying assumption that t1 does not contain any PK values from t2..tn beforehand? Otherwise, the second statement is doomed to fail, and that would matter... (07 Nov '10, 21:03) Volker Barth Note that in order to use the ON EXISTING SKIP clause the table must have a primary key. So if the table does not have a PK then the first statement is doomed to fail. My answer (see below) assumes that T1 had the same PK columns as the T2, T3, ... Tn. (07 Nov '10, 21:57) Mark Culp Comment Text Removed @Mark: To clarify: My comment was not dealing with the existence of a PK column in t1 (that a necessity, of course), but with the question whether the table t1 might already contain any rows with PK values that also exist in the other tables. (08 Nov '10, 09:07) Volker Barth @Everyone: Sorry for the implicit assumption: The schema and data are such that both methods actually work :) (08 Nov '10, 11:04) Breck Carter 1 @Mark: "doomed to fail"... I need that on a coffee cup, I'm too fat to wear it on a tshirt :) (08 Nov '10, 11:07) Breck Carter @Breck: Hey, that was mine contribution, and I definetely owe it to Joe Jackson's "Beat Crazy" album:) (08 Nov '10, 11:12) Volker Barth More comments hidden showing 4 of 6 show all flat view

 I am going to say that "it depends" but my initial thought would be that the second form would be better. My reasoning is that the first form is going to force an index probe into T1 for every insert to check for existence, whereas the second form will first generate a list of unique values (by either doing a sort or more likely generating an - hopefully in-memory - hash of all values) and then doing the insert thus reducing the number of probes into the T1 primary key index. To verify my intuition I ran the following tests: Step 1: create tables ``````create table t1 ( i int primary key ); create table t2 ( i int primary key, s varchar(200) ); create table t3 ( i int primary key, s varchar(200) ); create table t4 ( i int primary key, s varchar(200) ); create table t5 ( i int primary key, s varchar(200) ); create table t6 ( i int primary key, s varchar(200) ); create table t7 ( i int primary key, s varchar(200) ); create table t8 ( i int primary key, s varchar(200) ); `````` Step 2: populate the tables with unique values ``````insert into t2( i, s ) select row_num, 't2 ' || row_num from sa_rowgenerator( 20000, 29999 ); insert into t3( i, s ) select row_num, 't3 ' || row_num from sa_rowgenerator( 30000, 39999 ); insert into t4( i, s ) select row_num, 't4 ' || row_num from sa_rowgenerator( 40000, 49999 ); insert into t5( i, s ) select row_num, 't5 ' || row_num from sa_rowgenerator( 50000, 59999 ); insert into t6( i, s ) select row_num, 't6 ' || row_num from sa_rowgenerator( 60000, 69999 ); insert into t7( i, s ) select row_num, 't7 ' || row_num from sa_rowgenerator( 70000, 79999 ); insert into t8( i, s ) select row_num, 't8 ' || row_num from sa_rowgenerator( 80000, 89999 ); commit; `````` Step 3: run first form - insert ... on existing skip ``````drop table t1; create table t1 ( i int primary key ); insert into t1( i ) on existing skip select t2.i from t2 union all select t3.i from t3 union all select t4.i from t4 union all select t5.i from t5 union all select t6.i from t6 union all select t7.i from t7 union all select t8.i from t8 ; `````` Repeat step 3 five times in total, recording the time of the insert on each iteration Step 4: run second form - insert ... union ... ``````drop table t1; create table t1 ( i int primary key ); insert into t1( i ) select t2.i from t2 union select t3.i from t3 union select t4.i from t4 union select t5.i from t5 union select t6.i from t6 union select t7.i from t7 union select t8.i from t8 ; `````` Repeat step 4 five times in total, recording the time of the insert on each iteration Step 5: drop all tables ``````drop table t1; drop table t2; drop table t3; drop table t4; drop table t5; drop table t6; drop table t7; drop table t8; `````` then repeat step 1, then populate the tables with non-unique values (as a new form of step 2) ``````insert into t2( i, s ) select row_num, 't2 ' || row_num from sa_rowgenerator( 1, 10000 ); insert into t3( i, s ) select row_num, 't3 ' || row_num from sa_rowgenerator( 1, 10000 ); insert into t4( i, s ) select row_num, 't4 ' || row_num from sa_rowgenerator( 1, 10000 ); insert into t5( i, s ) select row_num, 't5 ' || row_num from sa_rowgenerator( 1, 10000 ); insert into t6( i, s ) select row_num, 't6 ' || row_num from sa_rowgenerator( 1, 10000 ); insert into t7( i, s ) select row_num, 't7 ' || row_num from sa_rowgenerator( 1, 10000 ); insert into t8( i, s ) select row_num, 't8 ' || row_num from sa_rowgenerator( 1, 10000 ); commit; `````` then repeat steps 3 and 4 to get new timings. The results that I got on my dual core laptop are: unique values 'on existing skip': 2.1, 2.1, 2.2, 2.1, 2.1 unique values '... union ...': 1.4, 1.5, 1.5, 1.5, 1.5 non-unique 'on existing skip': 1.1, 1.1, 1.3, 1.2, 1.2 non-unique '... union ...': 0.3, 0.3, 0.3, 0.3, 0.3 So in both the unique value case and the non-unique value case the second form (using 'union') is faster. Note that in my case the amount of data being used fits into cache and therefore for larger data sets you may get different results. HTH answered 07 Nov '10, 21:07 Mark Culp 22.3k●9●129●262 accept rate: 40% 3 Cool! You have verified Murphy's Law for me: I picked Method 1 to use, before reading your reply :) (08 Nov '10, 11:06) Breck Carter
 toggle preview community wiki:

By Email:

Markdown Basics

• *italic* or _italic_
• **bold** or __bold__
• image?![alt text](/path/img.jpg "title")
• numbered list: 1. Foo 2. Bar
• to add a line break simply add two spaces to where you would like the new line to be.
• basic HTML tags are also supported

Question tags:

×238
×6

question asked: 07 Nov '10, 16:20

question was seen: 1,367 times

last updated: 08 Nov '10, 02:29