Say, I have a child table (1:n) and want to find out which (if any) parent has entries with different values in the child table.

Sounds unclear, methinks, so I try to be more concrete:

For example, in the SA 12 sample database, I want to find out whether any Customer has entries in the SalesOrders table with different regions. So, obviously, a GROUP BY is needed, and CustomerID is the parent id.

But how do I "count" the different region entries? - Since there are different entries per CustomerID, a simple count(*) won't do.

Classically, I have used min/max-comparisons for that, e.g.

select CustomerID, min(Region), max(Region) from SalesOrders
group by CustomerID
having min(Region) <> max(Region)
order by 1

But a count(distinct) query might seem more adequate, such as

select CustomerID, count(distinct Region) from SalesOrders
group by CustomerID
having count(distinct Region) > 1
order by 1

However, the latter seems (a bit) worse in performance, possibly as it requires a duplicate elimination. (I respect that it does offer more information, as it really counts the different entries whereas the first query just returns those with at least 2 different entries.)

Is there any preferred version for queries like that (both for reasons of comprehensibility and performance)?

asked 14 Oct '10, 12:12

Volker%20Barth's gravatar image

Volker Barth
30.3k301454660
accept rate: 32%

edited 14 Oct '10, 12:21


It's hard (impossible?) to tell for sure with such a small table, but my first guess that query 2 is not only simpler but faster seems to be born out by the Version 12.0.0.2566 runtimes and the plans after calling sa_flush_cache().

Here's the query I might have written; its plan is more complex than query 2 but the runtime looks the same (BUT AGAIN, the table is too small to tell anything)...

CALL sa_flush_cache();

SELECT CustomerID,
       COUNT(*) AS RegionCount
  FROM ( SELECT DISTINCT 
                CustomerID,
                Region
           FROM SalesOrders ) AS CustomerRegion
 GROUP BY CustomerID
HAVING RegionCount > 1
 ORDER BY 1;
permanent link

answered 14 Oct '10, 14:53

Breck%20Carter's gravatar image

Breck Carter
25.7k427587847
accept rate: 20%

PS the derived table is there, not for performance, but to help me divide-and-conquer the problem. After writing that, I might have turned it into your query 2 if that turned out to be faster... but I'm not smart enough to come up with query 2 from scratch :)

(14 Oct '10, 15:00) Breck Carter

Interestingly enough, I'm frequently using the "SELECT col1, count() FROM T GROUP BY col HAVING count() > 1" kind of queries. As such, my query 2 seemed not that complicated. I'm having (a bit) more trouble with the derived query approach...

(14 Oct '10, 15:15) Volker Barth
1

Your query 2 uses COUNT ( DISTINCT column ) together with GROUP BY different-column which is considerably more sophisticated (IMO) than a simple SELECT column, COUNT() GROUP BY same-column... which I, too, use all time. Derived tables are great because they let you solve part of the problem first, and test the solution separately: "Now I have all the distinct CustomerID - Region pairs in a derived table, now I can write an ordinary simple SELECT column, COUNT() GROUP BY same-column just like I (and you) frequently do.

(14 Oct '10, 18:49) Breck Carter

Having said that, in a million-row table query 2 may run faster... or, maybe, the plans will look completely different with more rows. The bottom line (the answer to your original question) is that there is NOTHING WRONG with query 2, at least in the tiny database case.

(14 Oct '10, 18:52) Breck Carter
2

FWIW I have babbled about "divide and conquer" on a few occasions, in this blog post and the others it links to: http://sqlanywhere.blogspot.com/2008/10/divide-and-conquer-by-working-backwards.html

(14 Oct '10, 18:55) Breck Carter

@Breck: I generally prefer the divide-and-conquer-aprroach, too, and derived tables are very useful here - so I fully agree. - Having said that, it's obviously a question of one's own usage of SQL to tell whether a particular solution is more or less comprehensible. That may be a good or bad situation:)

(15 Oct '10, 08:46) Volker Barth
More comments hidden
showing 5 of 6 show all flat view

This is a nasty hack I've seen used before, I can't remember using it though.

select CustomerID,Total from  
(select CustomerID,sum(cnt) as Total from 
(select distinct CustomerID,Region,1 as cnt from salesorders) as foo
group by CustomerID) as boo where Total > 1
permanent link

answered 14 Oct '10, 14:36

Daz%20Liquid's gravatar image

Daz Liquid
861182338
accept rate: 28%

2

Hmmm after reading the question again, I'm not sure this meets either comprehensibility or performance :)

(14 Oct '10, 14:47) Daz Liquid
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×90
×7
×6

question asked: 14 Oct '10, 12:12

question was seen: 1,283 times

last updated: 14 Oct '10, 14:53