The forum will experience an outage sometime between February 10 at 7:00pm EST and February 12 at 11:59 EST for installation of security updates. The actual time and duration of the outage are unknown but attempts will be made to minimize the downtime. We apologize for any inconvenience.

having a query like this:

select * from T where ID in (1,1,1,1,1)

does it have any negative effect, if inside the IN list an entry appears multiple times like in the example above.

asked 15 May '12, 12:43

Martin's gravatar image

Martin
8.6k114149237
accept rate: 14%

edited 15 May '12, 15:48

Glenn%20Paulley's gravatar image

Glenn Paulley
10.7k568104


In a word, no. The predicates ID in (1,1,1,1) and ID in (1) are semantically equivalent.

The server software automatically eliminates duplicates in IN-lists where the size of the list is "sufficiently small" that we can incur the overhead of the sort (to eliminate the duplicate values) at compile time. This can only be done if all of the IN-list elements are known at optimization time - they must either be literal values, variables, or hostvars.

permanent link

answered 15 May '12, 15:47

Glenn%20Paulley's gravatar image

Glenn Paulley
10.7k568104
accept rate: 43%

edited 15 May '12, 16:11

What about IN with a subquery that returns duplicate values - like this one:

select * from T where ID in ((select SomeNonUniqueColumn from T1))

Can I conclude that the optimizer will eliminate duplicates in "SomeNonUniqueColumn" as well (treating this somewhat like "select distinct SomeNonUniqueColumn")?

(16 May '12, 04:00) Volker Barth
Replies hidden

Interesting test with v12.0.1.3726: When using

select * from T where ID in
  ((select SomeNonUniqueColumn from T1 where SomeNonUniqueColumn = 1))

this obviously isn't treated as

select * from T where ID in ((select 1 from T1))
(16 May '12, 04:03) Volker Barth
Replies hidden

Why would it be? What if SomeNonUniqueColumn = 1 doesn't exist in T1?

(16 May '12, 07:29) Breck Carter

No. The duplicate elmination is performed only when the IN-list contains only constant values (or values from variabies that are known and fixed at compile time).

(16 May '12, 07:31) Glenn Paulley

Correct, though we could enhance the product to do so if we found this to be a construction commonly generated by SQL generators. We have done so for other types of syntactic constructions (eg ISNULL(X,X) is one of these) but not this one.

(16 May '12, 07:39) Glenn Paulley

So would it make sense to use "in ((select distinct ...))" in such cases?

(16 May '12, 07:44) Volker Barth

Yes, I agree - it surely does exist in my tests but that can't be generalized. (And no, I won't delete my somewhat dumb question:))

(16 May '12, 07:47) Volker Barth
2

The optimizer has a number of rewrite and execution-time strategies for computing the result of nested queries, and it knows that, semantically, it doesn't matter if the subquery contains DISTINCT or not. So it doesn't really matter if you include DISTINCT or not in the original statement.

(16 May '12, 07:59) Glenn Paulley

Thanks for the clarification: So my underlying general assumption "The optimizer is smart enough to handle duplicates in subqueries for EXISTS/IN tests as well" seems to hold. As expected: "You do the hard part."

(16 May '12, 08:08) Volker Barth
showing 2 of 9 show all flat view

Indeed, I second the motion. Duplicates in an IN clause are NOT a problem.

However, I can say from experience that an IN clause on large number of items performs worse than an equivalent join ..

So ..

instead of doing:

select * from DataTable where ID in ( select ID from LookupTable )

try this instead:

select dt.* from ( select distinct ID from LookupTable ) x, DataTable dt where x.ID = dt.id

permanent link

answered 16 May '12, 10:20

Frum%20Dude's gravatar image

Frum Dude
136339
accept rate: 0%

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×238
×90

question asked: 15 May '12, 12:43

question was seen: 889 times

last updated: 16 May '12, 10:20