Please be aware that the content in SAP SQL Anywhere Forum will be migrated to the SAP Community in June and this forum will be retired.

Requirement: Gather distinct primary key values from several tables t2 ... tn into one table t1.

Which is faster, this?

INSERT t1 ON EXISTING SKIP
SELECT [primary key columns]
  FROM t2
UNION ALL
SELECT [primary key columns]
  FROM t3
UNION ALL
SELECT [primary key columns]
  FROM t4
...
UNION ALL
SELECT [primary key columns]
  FROM tn;

...or this?

INSERT t1 
SELECT [primary key columns]
  FROM t2
UNION
SELECT [primary key columns]
  FROM t3
UNION
SELECT [primary key columns]
  FROM t4
...
UNION
SELECT [primary key columns]
  FROM tn;

...or, it probably doesn't matter?

asked 07 Nov '10, 16:20

Breck%20Carter's gravatar image

Breck Carter
32.5k5417261050
accept rate: 20%

edited 07 Nov '10, 21:28

Volker%20Barth's gravatar image

Volker Barth
40.2k361550822

1

Is there an underlying assumption that t1 does not contain any PK values from t2..tn beforehand? Otherwise, the second statement is doomed to fail, and that would matter...

(07 Nov '10, 21:03) Volker Barth

Note that in order to use the ON EXISTING SKIP clause the table must have a primary key. So if the table does not have a PK then the first statement is doomed to fail. My answer (see below) assumes that T1 had the same PK columns as the T2, T3, ... Tn.

(07 Nov '10, 21:57) Mark Culp
Comment Text Removed

@Mark: To clarify: My comment was not dealing with the existence of a PK column in t1 (that a necessity, of course), but with the question whether the table t1 might already contain any rows with PK values that also exist in the other tables.

(08 Nov '10, 09:07) Volker Barth

@Everyone: Sorry for the implicit assumption: The schema and data are such that both methods actually work :)

(08 Nov '10, 11:04) Breck Carter
1

@Mark: "doomed to fail"... I need that on a coffee cup, I'm too fat to wear it on a tshirt :)

(08 Nov '10, 11:07) Breck Carter

@Breck: Hey, that was mine contribution, and I definetely owe it to Joe Jackson's "Beat Crazy" album:)

(08 Nov '10, 11:12) Volker Barth
More comments hidden
showing 4 of 6 show all flat view

I am going to say that "it depends" but my initial thought would be that the second form would be better. My reasoning is that the first form is going to force an index probe into T1 for every insert to check for existence, whereas the second form will first generate a list of unique values (by either doing a sort or more likely generating an - hopefully in-memory - hash of all values) and then doing the insert thus reducing the number of probes into the T1 primary key index.

To verify my intuition I ran the following tests:

Step 1: create tables

create table t1 ( i int primary key );
create table t2 ( i int primary key, s varchar(200) );
create table t3 ( i int primary key, s varchar(200) );
create table t4 ( i int primary key, s varchar(200) );
create table t5 ( i int primary key, s varchar(200) );
create table t6 ( i int primary key, s varchar(200) );
create table t7 ( i int primary key, s varchar(200) );
create table t8 ( i int primary key, s varchar(200) );

Step 2: populate the tables with unique values

insert
  into t2( i, s )
select row_num, 't2 ' || row_num
  from sa_rowgenerator( 20000, 29999 );
insert
  into t3( i, s )
select row_num, 't3 ' || row_num
  from sa_rowgenerator( 30000, 39999 );
insert
  into t4( i, s )
select row_num, 't4 ' || row_num
  from sa_rowgenerator( 40000, 49999 );
insert
  into t5( i, s )
select row_num, 't5 ' || row_num
  from sa_rowgenerator( 50000, 59999 );
insert
  into t6( i, s )
select row_num, 't6 ' || row_num
  from sa_rowgenerator( 60000, 69999 );
insert
  into t7( i, s )
select row_num, 't7 ' || row_num
  from sa_rowgenerator( 70000, 79999 );
insert
  into t8( i, s )
select row_num, 't8 ' || row_num
  from sa_rowgenerator( 80000, 89999 );
commit;

Step 3: run first form - insert ... on existing skip

drop table t1;
create table t1 ( i int primary key );
insert
  into t1( i )
    on existing skip
select t2.i from t2
union all
select t3.i from t3
union all
select t4.i from t4
union all
select t5.i from t5
union all
select t6.i from t6
union all
select t7.i from t7
union all
select t8.i from t8
;

Repeat step 3 five times in total, recording the time of the insert on each iteration

Step 4: run second form - insert ... union ...

drop table t1;
create table t1 ( i int primary key );
insert
  into t1( i )
select t2.i from t2
union
select t3.i from t3
union
select t4.i from t4
union
select t5.i from t5
union
select t6.i from t6
union
select t7.i from t7
union
select t8.i from t8
;

Repeat step 4 five times in total, recording the time of the insert on each iteration

Step 5: drop all tables

drop table t1;
drop table t2;
drop table t3;
drop table t4;
drop table t5;
drop table t6;
drop table t7;
drop table t8;

then repeat step 1, then populate the tables with non-unique values (as a new form of step 2)

insert
  into t2( i, s )
select row_num, 't2 ' || row_num
  from sa_rowgenerator( 1, 10000 );
insert
  into t3( i, s )
select row_num, 't3 ' || row_num
  from sa_rowgenerator( 1, 10000 );
insert
  into t4( i, s )
select row_num, 't4 ' || row_num
  from sa_rowgenerator( 1, 10000 );
insert
  into t5( i, s )
select row_num, 't5 ' || row_num
  from sa_rowgenerator( 1, 10000 );
insert
  into t6( i, s )
select row_num, 't6 ' || row_num
  from sa_rowgenerator( 1, 10000 );
insert
  into t7( i, s )
select row_num, 't7 ' || row_num
  from sa_rowgenerator( 1, 10000 );
insert
  into t8( i, s )
select row_num, 't8 ' || row_num
  from sa_rowgenerator( 1, 10000 );
commit;

then repeat steps 3 and 4 to get new timings.

The results that I got on my dual core laptop are:

  • unique values 'on existing skip': 2.1, 2.1, 2.2, 2.1, 2.1
  • unique values '... union ...': 1.4, 1.5, 1.5, 1.5, 1.5
  • non-unique 'on existing skip': 1.1, 1.1, 1.3, 1.2, 1.2
  • non-unique '... union ...': 0.3, 0.3, 0.3, 0.3, 0.3

So in both the unique value case and the non-unique value case the second form (using 'union') is faster. Note that in my case the amount of data being used fits into cache and therefore for larger data sets you may get different results.

HTH

permanent link

answered 07 Nov '10, 21:07

Mark%20Culp's gravatar image

Mark Culp
24.9k10141297
accept rate: 41%

edited 08 Nov '10, 02:29

3

Cool! You have verified Murphy's Law for me: I picked Method 1 to use, before reading your reply :)

(08 Nov '10, 11:06) Breck Carter
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×275
×7

question asked: 07 Nov '10, 16:20

question was seen: 2,988 times

last updated: 08 Nov '10, 02:29