By ilitirit


2008-10-06 02:19:05 8 Comments

Which of these queries is the faster?

NOT EXISTS:

SELECT ProductID, ProductName 
FROM Northwind..Products p
WHERE NOT EXISTS (
    SELECT 1 
    FROM Northwind..[Order Details] od 
    WHERE p.ProductId = od.ProductId)

Or NOT IN:

SELECT ProductID, ProductName 
FROM Northwind..Products p
WHERE p.ProductID NOT IN (
    SELECT ProductID 
    FROM Northwind..[Order Details])

The query execution plan says they both do the same thing. If that is the case, which is the recommended form?

This is based on the NorthWind database.

[Edit]

Just found this helpful article: http://weblogs.sqlteam.com/mladenp/archive/2007/05/18/60210.aspx

I think I'll stick with NOT EXISTS.

10 comments

@Jeffrey L Whitledge 2008-10-06 02:54:46

In your specific example they are the same, because the optimizer has figured out what you are trying to do is the same in both examples. But it is possible that in non-trivial examples the optimizer may not do this, and in that case there are reasons to prefer one to other on occasion.

NOT IN should be preferred if you are testing multiple rows in your outer select. The subquery inside the NOT IN statement can be evaluated at the beginning of the execution, and the temporary table can be checked against each value in the outer select, rather than re-running the subselect every time as would be required with the NOT EXISTS statement.

If the subquery must be correlated with the outer select, then NOT EXISTS may be preferable, since the optimizer may discover a simplification that prevents the creation of any temporary tables to perform the same function.

@Onga Leo-Yoda Vellem 2018-03-19 08:27:30

They are very similar but not really the same.

In terms of efficiency, I've found the left join is null statement more efficient (when an abundance of rows are to be selected that is)

@Yella Chalamala 2014-07-07 17:12:53

I have a table which has about 120,000 records and need to select only those which does not exist (matched with a varchar column) in four other tables with number of rows approx 1500, 4000, 40000, 200. All the involved tables have unique index on the concerned Varchar column.

NOT IN took about 10 mins, NOT EXISTS took 4 secs.

I have a recursive query which might had some untuned section which might have contributed to the 10 mins, but the other option taking 4 secs explains, atleast to me that NOT EXISTS is far better or at least that IN and EXISTS are not exactly the same and always worth a check before going ahead with code.

@Martin Smith 2012-06-17 20:10:17

I always default to NOT EXISTS.

The execution plans may be the same at the moment but if either column is altered in the future to allow NULLs the NOT IN version will need to do more work (even if no NULLs are actually present in the data) and the semantics of NOT IN if NULLs are present are unlikely to be the ones you want anyway.

When neither Products.ProductID or [Order Details].ProductID allow NULLs the NOT IN will be treated identically to the following query.

SELECT ProductID,
       ProductName
FROM   Products p
WHERE  NOT EXISTS (SELECT *
                   FROM   [Order Details] od
                   WHERE  p.ProductId = od.ProductId) 

The exact plan may vary but for my example data I get the following.

Neither NULL

A reasonably common misconception seems to be that correlated sub queries are always "bad" compared to joins. They certainly can be when they force a nested loops plan (sub query evaluated row by row) but this plan includes an anti semi join logical operator. Anti semi joins are not restricted to nested loops but can use hash or merge (as in this example) joins too.

/*Not valid syntax but better reflects the plan*/ 
SELECT p.ProductID,
       p.ProductName
FROM   Products p
       LEFT ANTI SEMI JOIN [Order Details] od
         ON p.ProductId = od.ProductId 

If [Order Details].ProductID is NULL-able the query then becomes

SELECT ProductID,
       ProductName
FROM   Products p
WHERE  NOT EXISTS (SELECT *
                   FROM   [Order Details] od
                   WHERE  p.ProductId = od.ProductId)
       AND NOT EXISTS (SELECT *
                       FROM   [Order Details]
                       WHERE  ProductId IS NULL) 

The reason for this is that the correct semantics if [Order Details] contains any NULL ProductIds is to return no results. See the extra anti semi join and row count spool to verify this that is added to the plan.

One NULL

If Products.ProductID is also changed to become NULL-able the query then becomes

SELECT ProductID,
       ProductName
FROM   Products p
WHERE  NOT EXISTS (SELECT *
                   FROM   [Order Details] od
                   WHERE  p.ProductId = od.ProductId)
       AND NOT EXISTS (SELECT *
                       FROM   [Order Details]
                       WHERE  ProductId IS NULL)
       AND NOT EXISTS (SELECT *
                       FROM   (SELECT TOP 1 *
                               FROM   [Order Details]) S
                       WHERE  p.ProductID IS NULL) 

The reason for that one is because a NULL Products.ProductId should not be returned in the results except if the NOT IN sub query were to return no results at all (i.e. the [Order Details] table is empty). In which case it should. In the plan for my sample data this is implemented by adding another anti semi join as below.

Both NULL

The effect of this is shown in the blog post already linked by Buckley. In the example there the number of logical reads increase from around 400 to 500,000.

Additionally the fact that a single NULL can reduce the row count to zero makes cardinality estimation very difficult. If SQL Server assumes that this will happen but in fact there were no NULL rows in the data the rest of the execution plan may be catastrophically worse, if this is just part of a larger query, with inappropriate nested loops causing repeated execution of an expensive sub tree for example.

This is not the only possible execution plan for a NOT IN on a NULL-able column however. This article shows another one for a query against the AdventureWorks2008 database.

For the NOT IN on a NOT NULL column or the NOT EXISTS against either a nullable or non nullable column it gives the following plan.

Not EXists

When the column changes to NULL-able the NOT IN plan now looks like

Not In - Null

It adds an extra inner join operator to the plan. This apparatus is explained here. It is all there to convert the previous single correlated index seek on Sales.SalesOrderDetail.ProductID = <correlated_product_id> to two seeks per outer row. The additional one is on WHERE Sales.SalesOrderDetail.ProductID IS NULL.

As this is under an anti semi join if that one returns any rows the second seek will not occur. However if Sales.SalesOrderDetail does not contain any NULL ProductIDs it will double the number of seek operations required.

@xis 2014-07-15 05:11:49

May I ask how you get the profiling graph like shown?

@Martin Smith 2014-07-15 05:58:50

@xis These are execution plans opened in SQL Sentry plan explorer. You can also view execution plans graphically in SSMS.

@levininja 2015-06-03 17:10:53

I appreciate this for the sole reason that: NOT EXISTS functions the way I expect NOT IN to function (which, it doesn't).

@Mayur Patel 2015-06-25 20:25:54

With NOT EXISTS, I try to use SELECT 1 such as NOT EXISTS (SELECT 1 FROM sometable WHERE something) so that the database does not actually need to return columns from disk. Using EXPLAIN to determine whether this makes a difference in your case is probably a good idea.

@Martin Smith 2015-06-25 20:36:42

@Mayur No need for this in SQL Server. stackoverflow.com/questions/1597442/…

@PatsonLeaner 2018-08-19 13:57:10

@MartinSmith, this is just amazing... a day saver. this worked in SQL server 2012 to greater. Thanks

@buckley 2012-05-09 12:23:38

Also be aware that NOT IN is not equivalent to NOT EXISTS when it comes to null.

This post explains it very well

http://sqlinthewild.co.za/index.php/2010/02/18/not-exists-vs-not-in/

When the subquery returns even one null, NOT IN will not match any rows.

The reason for this can be found by looking at the details of what the NOT IN operation actually means.

Let’s say, for illustration purposes that there are 4 rows in the table called t, there’s a column called ID with values 1..4

WHERE SomeValue NOT IN (SELECT AVal FROM t)

is equivalent to

WHERE SomeValue != (SELECT AVal FROM t WHERE ID=1)
AND SomeValue != (SELECT AVal FROM t WHERE ID=2)
AND SomeValue != (SELECT AVal FROM t WHERE ID=3)
AND SomeValue != (SELECT AVal FROM t WHERE ID=4)

Let’s further say that AVal is NULL where ID = 4. Hence that != comparison returns UNKNOWN. The logical truth table for AND states that UNKNOWN and TRUE is UNKNOWN, UNKNOWN and FALSE is FALSE. There is no value that can be AND’d with UNKNOWN to produce the result TRUE

Hence, if any row of that subquery returns NULL, the entire NOT IN operator will evaluate to either FALSE or NULL and no records will be returned

@ravish.hacker 2013-06-13 15:02:31

I was using

SELECT * from TABLE1 WHERE Col1 NOT IN (SELECT Col1 FROM TABLE2)

and found that it was giving wrong results (By wrong I mean no results). As there was a NULL in TABLE2.Col1.

While changing the query to

SELECT * from TABLE1 T1 WHERE NOT EXISTS (SELECT Col1 FROM TABLE2 T2 WHERE T1.Col1 = T2.Col2)

gave me the correct results.

Since then I have started using NOT EXISTS every where.

@onedaywhen 2008-10-06 07:57:08

If the optimizer says they are the same then consider the human factor. I prefer to see NOT EXISTS :)

@James Curran 2008-10-06 02:40:33

Actually, I believe this would be the fastest:

SELECT ProductID, ProductName 
    FROM Northwind..Products p  
          outer join Northwind..[Order Details] od on p.ProductId = od.ProductId)
WHERE od.ProductId is null

@Cade Roux 2008-10-06 03:15:02

Might not be the fastest when the optimizer is doing it's job, but certainly will be faster when it's not.

@Kip 2008-10-06 03:57:00

He may have simplified his query for this post too

@HLGEM 2008-12-30 18:03:01

Agree Left outer join is often faster than a subquery.

@Martin Smith 2012-06-17 19:01:21

@HLGEM Disagree. In my experience the best case for LOJ is that they are the same and SQL Server converts the LOJ to an anti semi join. In the worst case SQL Server LEFT JOINs everything and filters the NULLs out after which can be much more inefficient. Example of that at bottom of this article

@Greg Ogle 2008-10-06 02:32:34

It depends..

SELECT x.col
FROM big_table x
WHERE x.key IN( SELECT key FROM really_big_table );

would not be relatively slow the isn't much to limit size of what the query check to see if they key is in. EXISTS would be preferable in this case.

But, depending on the DBMS's optimizer, this could be no different.

As an example of when EXISTS is better

SELECT x.col
FROM big_table x
WHERE EXISTS( SELECT key FROM really_big_table WHERE key = x.key);
  AND id = very_limiting_criteria

@Martin Smith 2012-06-17 18:54:34

IN and EXISTS get the same plan in SQL Server. The question is about NOT IN vs NOT EXISTS anyway.

@whytheq 2013-04-27 11:11:33

+1 for "it depends..". Classic sql answer

@John Millikin 2008-10-06 02:21:46

If the execution planner says they're the same, they're the same. Use whichever one will make your intention more obvious -- in this case, the second.

@nanonerd 2015-03-13 20:56:19

execution planner time may be same but execution results can differ so there is a difference. NOT IN will produce unexpected results if you have NULL in your dataset (see buckley's answer). Best to use NOT EXISTS as a default.

@Philippe 2017-06-27 18:05:56

In this example, ProductID is a key (not null) field, so whatever...

Related Questions

Sponsored Content

23 Answered Questions

[SOLVED] Check if table exists in SQL Server

24 Answered Questions

2 Answered Questions

[SOLVED] group by and sub queries

  • 2018-02-21 18:58:36
  • Thorin Oakenshield
  • 82 View
  • 2 Score
  • 2 Answer
  • Tags:   sql sql-server

12 Answered Questions

[SOLVED] How to drop a table if it exists in SQL Server?

  • 2011-10-25 09:05:46
  • tmaster
  • 981315 View
  • 610 Score
  • 12 Answer
  • Tags:   sql sql-server

37 Answered Questions

12 Answered Questions

[SOLVED] SQL Server: Query fast, but slow from procedure

1 Answered Questions

[SOLVED] Incorrect syntax near the keyword 'DEFAULT'

2 Answered Questions

[SOLVED] SQL INSERT INTO FROM SELECT

  • 2014-01-28 23:16:27
  • David Richardson
  • 420 View
  • 1 Score
  • 2 Answer
  • Tags:   sql sql-server

1 Answered Questions

[SOLVED] JOIN vs. IN vs. EXISTS

  • 2013-08-04 21:53:08
  • user2240715
  • 1395 View
  • 1 Score
  • 1 Answer
  • Tags:   tsql exists notin

4 Answered Questions

[SOLVED] "Invalid column name" error on SQL statement from OpenQuery results

  • 2008-09-05 17:26:38
  • dmo
  • 79966 View
  • 20 Score
  • 4 Answer
  • Tags:   sql sql-server

Sponsored Content