Best performance on large table inserts

Two tables:
A (col1, col2, col3, primary key (col1, col2))
B (col1, col2, col3, col4, col5, primary key (col1, col2))

With table B having thousands of records, which of the following statements would be the least expensive?

Option 1:
insert into A (col1, col2, col3) on existing skip select col1, col2, col3 from B

Option 2:
insert into A (col1,col2,col3) select col1, col2, col3 from B where not exists (select * from A where col1=b.col1 and col2=b.col2)

My gut tells me they're exactly the same. But I need more than my gut to go on.

performance

asked 13 Aug '14, 05:12

Liam
361●9●11●18
accept rate: 0%

No, I don't claim to know the answer. But I wanted to add, if you're on SQLA 11 or higher, there's a 3rd method (example not verified):

MERGE INTO A (col1, col2, col3)
USING B ON A.col1 = B.col1 and A.col2 = B.col2
WHEN NOT MATCHED INSERT

IMHO all tree statements should be broken down by the query engine to similar sequences of execution, so there should be no real difference in performance.

But that's speculative, so to get more detailed information you might want to execute both statements in identical environments, capture the graphical plans and analyze them for deviations.

(13 Aug '14, 08:43) Reimer Pods

OK just a guess but I would think Option 1 is more efficient when dealing with a small number of rows and option 2 when dealing with large inserts. I believe some DBMS's just ignore the error with option 1.

Jim

(13 Aug '14, 19:00) J Diaz

Replies hidden

I'm with Jim on this one. Thanks guys

(17 Aug '14, 00:48) Liam

I believe some DBMS's just ignore the error with option 1.

FWIW, I believe "INSERT ... ON EXISTING ..." is primarily a SQL Anywhere vendor extension so I don't really understand the "some DBMS's" statement... - of course, other DBMS like MySQL have somwhat similar extensions like "INSERT ... ON DUPLICATE KEY UPDATE..."

(19 Aug '14, 04:33) Volker Barth

flat view

One Answer:

active answers oldest answers newest answers popular answers

Predicting the performance of real-world SQL queries by visually inspecting stripped-down sample code has almost no chance of success.
A three-column table with only "thousands of rows" may be regarded as a very small table by SQL Anywhere, not worthy of optimization. Exceptions exist, of course; e.g. this question about a 6,000 row table with an INTEGER primary key index that consumes 1.1G of disk space :)
In this case (comparing ON EXISTING SKIP with WHERE NOT EXISTS) it's not difficult to actually code and test BOTH versions, in the real-world form, and look at the Graphical Plan With Statistics... testing is the only reliable way to determine performance...
...and even then, it's not perfect, because when conditions change sometimes so do the plans.

Having said that, my guess is they will have the same plans and the same performance. I am VERY PROUD of my guesswork... alt text

permanent link

answered 19 Aug '14, 06:22

Breck Carter
32.5k●541●727●1050
accept rate: 20%

flat view

Your answer

toggle preview

community wiki:

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

*italic* or _italic_
**bold** or __bold__
link:[text](http://url.com/ "title")
image?![alt text](/path/img.jpg "title")
numbered list: 1. Foo 2. Bar
to add a line break simply add two spaces to where you would like the new line to be.
basic HTML tags are also supported

learn more about Markdown

Question tags:

performance ×276

question asked: 13 Aug '14, 05:12

question was seen: 1,307 times

last updated: 19 Aug '14, 06:22

SAP SQL Anywhere

Best performance on large table inserts

Follow this question

Related questions