This is a late follow-up on this question:
Say, I have two tables T1 and T2 with identical schema and nearly identical contents in the same database. They have a primary key pk.
Now I want to list those rows from table T1 (i.e. select T1.*) that don't exist at all in T2 - i.e. T1's primary key doesn't exist in T2.
I can think of
Or should I prefer the way that does seem most logical/comprehensible to me?
(Yes, Breck, of course I should...)
In my tests, the queries do use different plans and perform not identical well (and I'm using sa_flush_cache()) in between). But I would like if there's a general answer or at least a rule-of-thumb.
The result sets of approaches A-D should be identical.
FWIW, some test results with 188.8.131.5289 on a "slow" box:
A: 1750 - 1875 ms
So my current personal preference is A, both because of semantics and performance...
Approach C: Use a left join and filter out the joined rows
select T1.* from T1 left join T2 on T1.pk = T2.pk where T2.pk is null order by T1.pk
Personally, I think of this as a rather "tricky" (i.e. incomprehensible) approach.
I think, that the best approach depends on data statistics in both tables. I am personally prefer human readable, clear and easy to understand sql queries. Personally for me, approach C (leader) and approach B is much easy to read and understand. But,data statistics is very important.
Approach A: Use an IN subquery with a table with EXCEPT
select * from T1 where pk in (select pk from T1 except select pk from T2) order by pk
CORRECTION: My assumption about this transformation seem (partly) wrong. Whereas calling PLAN() on both queries yields similar results, calling REWRITE() leads to different optimizations, and the actual query execution time is different, as well. Therefore I turn the JOIN version into a forth answer as approach D.
I guess the set operators EXCEPT and INTERSECT are still somewhat underestimated. Personally, I like to use them as they cope fine with SQL's set-based logic.
Approach D: Use a JOIN with a derived table with EXCEPT
select T1.* from T1 inner join (select pk from T1 except select pk from T2) DT on T1.pk = DT.pk order by T1.pk
I thought this was just a rewrite of approach A but it performs different (and better in my tests).
answered 29 Aug '11, 04:23