This is a late follow-up on this question: Say, I have two tables T1 and T2 with identical schema and nearly identical contents in the same database. They have a primary key pk. Now I want to list those rows from table T1 (i.e. select T1.*) that don't exist at all in T2 - i.e. T1's primary key doesn't exist in T2. I can think of Or should I prefer the way that does seem most logical/comprehensible to me? (Yes, Breck, of course I should...) In my tests, the queries do use different plans and perform not identical well (and I'm using sa_flush_cache()) in between). But I would like if there's a general answer or at least a rule-of-thumb. The result sets of approaches A-D should be identical. FWIW, some test results with 12.0.1.3389 on a "slow" box: A: 1750 - 1875 ms So my current personal preference is A, both because of semantics and performance... |
Approach C: Use a left join and filter out the joined rows select T1.* from T1 left join T2 on T1.pk = T2.pk where T2.pk is null order by T1.pk Personally, I think of this as a rather "tricky" (i.e. incomprehensible) approach. I like this approach best.. it's the cleanest to my fuzzy brain, although I would probably typically write it as you did in Approach B because that's the way I'd incrementally think it through. The disadvantage with approach A is that the next guy that comes along to work in your code is not going to understand it. Now, I'd like to think this is an ad-hoc query, or at least a daily or weekly report, rather than a real production frequently run query. In case this assumption is correct, the diference in processing time does't matter. I read somewhere that computer code is first meant to be read by humans, not computers. I agree with that. your code needs to be intelligible to the next guy... the computer just doesn't care how eloquent you are.
(27 Aug '11, 20:29)
Ron Hiner
Replies hidden
@Ron: You're correct that this has come up while doing some ad-hoc queries. Nevertheless, it's a general question that I have beared for a long time, as the according query task used often as "building block" of more complex queries...
(28 Aug '11, 05:42)
Volker Barth
to me the way to go is an outer join with a where clause like C. The left join lists all matches in T.2 as well as all rows from T1. I would think this would be the fastest, but i would have to do some testing. it's also the least complicated which in my book most of the time is the way to go.
(31 Aug '11, 14:34)
izatt82
|
Approach B: Use a NOT IN subquery select * from T1 where pk not in (select pk from T2) order by pk That might be the "classical" approach. |
I think, that the best approach depends on data statistics in both tables. I am personally prefer human readable, clear and easy to understand sql queries. Personally for me, approach C (leader) and approach B is much easy to read and understand. But,data statistics is very important. |
Approach A: Use an IN subquery with a table with EXCEPT select * from T1 where pk in (select pk from T1 except select pk from T2) order by pk
CORRECTION: My assumption about this transformation seem (partly) wrong. Whereas calling PLAN() on both queries yields similar results, calling REWRITE() leads to different optimizations, and the actual query execution time is different, as well. Therefore I turn the JOIN version into a forth answer as approach D. I guess the set operators EXCEPT and INTERSECT are still somewhat underestimated. Personally, I like to use them as they cope fine with SQL's set-based logic. For performance reasons, one might use "EXCEPT ALL" instead of "EXCEPT" here - by definition the "select pk from T1" cannot contain duplicates (it simply selects the PRIMARY KEY), so the default "Duplicate elimination algorithm" use by EXCEPT would be unnecessary...
(28 Aug '11, 05:53)
Volker Barth
|
Approach D: Use a JOIN with a derived table with EXCEPT select T1.* from T1 inner join (select pk from T1 except select pk from T2) DT on T1.pk = DT.pk order by T1.pk I thought this was just a rewrite of approach A but it performs different (and better in my tests). See the notes on EXCEPT ALL vs. EXCEPT on approach A.
(29 Aug '11, 04:27)
Volker Barth
|
I am looking forward to the answer here because I usually pick B which evidently performs worst.
Any hints to general performance hints from the Sybase query experts like Glenn or Ivan - or does it read "it depends"?
Anybody from the Alpha class listening?
Cf. Breck's blog
Yes, I'm still interested in more opinions...
From the current remarks, I would think that SQL language features like EXCEPT still seem rather unfamiliar, and that therefore one would prefer not to use them - from a practical point of view: Even if oneself does understand them, the next guy might not.
I do appreciate this opinion, but such kind of thinking might freeze the SQL usage to something like SQL/92 in the long run...
Glenn, Ivan, anybody - any hints would still be highly appreciated ... particularly as the public opinion seems to prefer those approaches that do perform worse in this case... (which is alright with me, of course!).
See the tag I've added:)