For a moderately complex query I am trying to optimize, I noticed that removing the
TOP n clause changes the execution plan. I would have guessed that when a query includes
TOP n the database engine would run the query ignoring the the
TOP clause, and then at the end just shrink that result set down to the n number of rows that was requested. The graphical execution plan seems to indicate this is the case --
TOP is the "last" step. But it appears there is more going on.
My question is, how (and why) does a TOP n clause impact the execution plan of a query?
Here is a simplified version of what is going on in my case:
The query is matching rows from two tables, A and B.
TOP clause, the optimizer estimates there will be 19k rows from table A and 46k rows from table B. The actual number of rows returned is 16k for A and 13k for B. A hash match is used to join these two results sets for a total of 69 rows (then a sort is applied). This query happens very quickly.
When I add
TOP 1001 the optimizer does not use a hash match; instead it first sorts the results from table A (same estimate/actual of 19k/16k) and does a nested loop against table B. The estimated number of rows for table B is now 1, and the strange thing is that the
TOP n directly affects the estimated number of executions (index seek) against B -- it appears to always be 2n+1, or in my case 2003. This estimate changes accordingly if I change
TOP n. Of course, since this is a nested join the actual number of executions is 16k (the number of rows from table A) and this slows down the query.
The actual scenario is a bit more complex but this captures the basic idea/behavior. Both tables are searched using index seeks. This is SQL Server 2008 R2 Enterprise edition.