I am going to say that "it depends" but my initial thought would be that the second form would be better. My reasoning is that the first form is going to force an index probe into T1 for every insert to check for existence, whereas the second form will first generate a list of unique values (by either doing a sort or more likely generating an - hopefully in-memory - hash of all values) and then doing the insert thus reducing the number of probes into the T1 primary key index.
To verify my intuition I ran the following tests:
Step 1: create tables
create table t1 ( i int primary key );
create table t2 ( i int primary key, s varchar(200) );
create table t3 ( i int primary key, s varchar(200) );
create table t4 ( i int primary key, s varchar(200) );
create table t5 ( i int primary key, s varchar(200) );
create table t6 ( i int primary key, s varchar(200) );
create table t7 ( i int primary key, s varchar(200) );
create table t8 ( i int primary key, s varchar(200) );
Step 2: populate the tables with unique values
insert
into t2( i, s )
select row_num, 't2 ' || row_num
from sa_rowgenerator( 20000, 29999 );
insert
into t3( i, s )
select row_num, 't3 ' || row_num
from sa_rowgenerator( 30000, 39999 );
insert
into t4( i, s )
select row_num, 't4 ' || row_num
from sa_rowgenerator( 40000, 49999 );
insert
into t5( i, s )
select row_num, 't5 ' || row_num
from sa_rowgenerator( 50000, 59999 );
insert
into t6( i, s )
select row_num, 't6 ' || row_num
from sa_rowgenerator( 60000, 69999 );
insert
into t7( i, s )
select row_num, 't7 ' || row_num
from sa_rowgenerator( 70000, 79999 );
insert
into t8( i, s )
select row_num, 't8 ' || row_num
from sa_rowgenerator( 80000, 89999 );
commit;
Step 3: run first form - insert ... on existing skip
drop table t1;
create table t1 ( i int primary key );
insert
into t1( i )
on existing skip
select t2.i from t2
union all
select t3.i from t3
union all
select t4.i from t4
union all
select t5.i from t5
union all
select t6.i from t6
union all
select t7.i from t7
union all
select t8.i from t8
;
Repeat step 3 five times in total, recording the time of the insert on each iteration
Step 4: run second form - insert ... union ...
drop table t1;
create table t1 ( i int primary key );
insert
into t1( i )
select t2.i from t2
union
select t3.i from t3
union
select t4.i from t4
union
select t5.i from t5
union
select t6.i from t6
union
select t7.i from t7
union
select t8.i from t8
;
Repeat step 4 five times in total, recording the time of the insert on each iteration
Step 5: drop all tables
drop table t1;
drop table t2;
drop table t3;
drop table t4;
drop table t5;
drop table t6;
drop table t7;
drop table t8;
then repeat step 1, then populate the tables with non-unique values (as a new form of step 2)
insert
into t2( i, s )
select row_num, 't2 ' || row_num
from sa_rowgenerator( 1, 10000 );
insert
into t3( i, s )
select row_num, 't3 ' || row_num
from sa_rowgenerator( 1, 10000 );
insert
into t4( i, s )
select row_num, 't4 ' || row_num
from sa_rowgenerator( 1, 10000 );
insert
into t5( i, s )
select row_num, 't5 ' || row_num
from sa_rowgenerator( 1, 10000 );
insert
into t6( i, s )
select row_num, 't6 ' || row_num
from sa_rowgenerator( 1, 10000 );
insert
into t7( i, s )
select row_num, 't7 ' || row_num
from sa_rowgenerator( 1, 10000 );
insert
into t8( i, s )
select row_num, 't8 ' || row_num
from sa_rowgenerator( 1, 10000 );
commit;
then repeat steps 3 and 4 to get new timings.
The results that I got on my dual core laptop are:
- unique values 'on existing skip': 2.1, 2.1, 2.2, 2.1, 2.1
- unique values '... union ...': 1.4, 1.5, 1.5, 1.5, 1.5
- non-unique 'on existing skip': 1.1, 1.1, 1.3, 1.2, 1.2
- non-unique '... union ...': 0.3, 0.3, 0.3, 0.3, 0.3
So in both the unique value case and the non-unique value case the second form (using 'union') is faster. Note that in my case the amount of data being used fits into cache and therefore for larger data sets you may get different results.
HTH
Is there an underlying assumption that t1 does not contain any PK values from t2..tn beforehand? Otherwise, the second statement is doomed to fail, and that would matter...
Note that in order to use the ON EXISTING SKIP clause the table must have a primary key. So if the table does not have a PK then the first statement is doomed to fail. My answer (see below) assumes that T1 had the same PK columns as the T2, T3, ... Tn.
@Mark: To clarify: My comment was not dealing with the existence of a PK column in t1 (that a necessity, of course), but with the question whether the table t1 might already contain any rows with PK values that also exist in the other tables.
@Everyone: Sorry for the implicit assumption: The schema and data are such that both methods actually work :)
@Mark: "doomed to fail"... I need that on a coffee cup, I'm too fat to wear it on a tshirt :)
@Breck: Hey, that was mine contribution, and I definetely owe it to Joe Jackson's "Beat Crazy" album:)