SAP SQL Anywhere

Question

Requirement: Gather distinct primary key values from several tables t2 ... tn into one table t1.

Which is faster, this?

INSERT t1 ON EXISTING SKIP
SELECT [primary key columns]
  FROM t2
UNION ALL
SELECT [primary key columns]
  FROM t3
UNION ALL
SELECT [primary key columns]
  FROM t4
...
UNION ALL
SELECT [primary key columns]
  FROM tn;

...or this?

INSERT t1 
SELECT [primary key columns]
  FROM t2
UNION
SELECT [primary key columns]
  FROM t3
UNION
SELECT [primary key columns]
  FROM t4
...
UNION
SELECT [primary key columns]
  FROM tn;

...or, it probably doesn't matter?

Answer 1

I am going to say that "it depends" but my initial thought would be that the second form would be better. My reasoning is that the first form is going to force an index probe into T1 for every insert to check for existence, whereas the second form will first generate a list of unique values (by either doing a sort or more likely generating an - hopefully in-memory - hash of all values) and then doing the insert thus reducing the number of probes into the T1 primary key index.

To verify my intuition I ran the following tests:

Step 1: create tables

create table t1 ( i int primary key );
create table t2 ( i int primary key, s varchar(200) );
create table t3 ( i int primary key, s varchar(200) );
create table t4 ( i int primary key, s varchar(200) );
create table t5 ( i int primary key, s varchar(200) );
create table t6 ( i int primary key, s varchar(200) );
create table t7 ( i int primary key, s varchar(200) );
create table t8 ( i int primary key, s varchar(200) );

Step 2: populate the tables with unique values

insert
  into t2( i, s )
select row_num, 't2 ' || row_num
  from sa_rowgenerator( 20000, 29999 );
insert
  into t3( i, s )
select row_num, 't3 ' || row_num
  from sa_rowgenerator( 30000, 39999 );
insert
  into t4( i, s )
select row_num, 't4 ' || row_num
  from sa_rowgenerator( 40000, 49999 );
insert
  into t5( i, s )
select row_num, 't5 ' || row_num
  from sa_rowgenerator( 50000, 59999 );
insert
  into t6( i, s )
select row_num, 't6 ' || row_num
  from sa_rowgenerator( 60000, 69999 );
insert
  into t7( i, s )
select row_num, 't7 ' || row_num
  from sa_rowgenerator( 70000, 79999 );
insert
  into t8( i, s )
select row_num, 't8 ' || row_num
  from sa_rowgenerator( 80000, 89999 );
commit;

Step 3: run first form - insert ... on existing skip

drop table t1;
create table t1 ( i int primary key );
insert
  into t1( i )
    on existing skip
select t2.i from t2
union all
select t3.i from t3
union all
select t4.i from t4
union all
select t5.i from t5
union all
select t6.i from t6
union all
select t7.i from t7
union all
select t8.i from t8
;

Repeat step 3 five times in total, recording the time of the insert on each iteration

Step 4: run second form - insert ... union ...

drop table t1;
create table t1 ( i int primary key );
insert
  into t1( i )
select t2.i from t2
union
select t3.i from t3
union
select t4.i from t4
union
select t5.i from t5
union
select t6.i from t6
union
select t7.i from t7
union
select t8.i from t8
;

Repeat step 4 five times in total, recording the time of the insert on each iteration

Step 5: drop all tables

drop table t1;
drop table t2;
drop table t3;
drop table t4;
drop table t5;
drop table t6;
drop table t7;
drop table t8;

then repeat step 1, then populate the tables with non-unique values (as a new form of step 2)

insert
  into t2( i, s )
select row_num, 't2 ' || row_num
  from sa_rowgenerator( 1, 10000 );
insert
  into t3( i, s )
select row_num, 't3 ' || row_num
  from sa_rowgenerator( 1, 10000 );
insert
  into t4( i, s )
select row_num, 't4 ' || row_num
  from sa_rowgenerator( 1, 10000 );
insert
  into t5( i, s )
select row_num, 't5 ' || row_num
  from sa_rowgenerator( 1, 10000 );
insert
  into t6( i, s )
select row_num, 't6 ' || row_num
  from sa_rowgenerator( 1, 10000 );
insert
  into t7( i, s )
select row_num, 't7 ' || row_num
  from sa_rowgenerator( 1, 10000 );
insert
  into t8( i, s )
select row_num, 't8 ' || row_num
  from sa_rowgenerator( 1, 10000 );
commit;

then repeat steps 3 and 4 to get new timings.

The results that I got on my dual core laptop are:

unique values 'on existing skip': 2.1, 2.1, 2.2, 2.1, 2.1
unique values '... union ...': 1.4, 1.5, 1.5, 1.5, 1.5
non-unique 'on existing skip': 1.1, 1.1, 1.3, 1.2, 1.2
non-unique '... union ...': 0.3, 0.3, 0.3, 0.3, 0.3

So in both the unique value case and the non-unique value case the second form (using 'union') is faster. Note that in my case the amount of data being used fits into cache and therefore for larger data sets you may get different results.

HTH

SAP SQL Anywhere

Which is faster, ON EXISTING SKIP or UNION DISTINCT?

Follow this question

Related questions