By sepupic


2019-02-06 09:18:55 8 Comments

Here is my repro that was tested on SQL Server 2008 R2, 2012, 2016.

The second and the third queries do use RowCount Spool, why the first does not?

create table dbo.t (id int identity primary key, v int);
--create statistics ST_t__v on dbo.t(v) with norecompute;

insert into dbo.t (v)
select top (10000) rand(checksum(newid())) * 5
from master.dbo.spt_values a cross join
     master.dbo.spt_values b;
go

declare @v int;

set statistics xml, io on;
select @v = v from dbo.t where exists(select 1 from dbo.t where v = 4);
select @v = v from dbo.t where exists(select 1 from dbo.t where v = 4 having count(*) > 0);
select @v = v from dbo.t where exists(select 1 from dbo.t where v = 40);
set statistics xml, io off;
go

drop table dbo.t;
go

As the result, IO statistics are:

Table 't'. Scan count 2, ***logical reads 20024***, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.


Table 't'. Scan count 2, logical reads 48, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.


Table 't'. Scan count 2, logical reads 27, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

enter image description here

1 comments

@Martin Smith 2019-02-06 10:37:13

This is just a cost based decision.

There are roughly 2,000 rows in the table for all v values from 0 to 4 inclusive and no other values of v.

The semi join can stop requesting rows after the first one is returned so SQL Server assumes it will only need to read 5 rows before it finds the first one matching v=4.

When you add having count(*) > 0 it now needs to read the whole table to get the COUNT(*) and can't stop after the first matching row is found. With v=40 this doesn't exist in the data at all but SQL server again needs to read the whole table to be sure of this.

The cost of reading the whole table for every outer row vs just reading a few rows from it is sufficient to make the plan with the row count spool optimisation attractive.

You can try the following with v=4 and v=40 and diff the results in a text diff tool.

SELECT v
FROM   dbo.t
WHERE  EXISTS (SELECT 1 FROM dbo.t WHERE v = 40) 
OPTION (RECOMPILE, QUERYTRACEON 3604, QUERYTRACEON 8615);

The results of the final memo are very comparable in terms of options explored but a different final result was chosen on cost grounds.

The plan with the spool chose PhyOp_LoopsJoinx_jtLeftSemi 6.2 13.1 and without it was PhyOp_LoopsJoinx_jtLeftSemi 6.2 8.2.

6.2 is the clustered index scan that is the same on both plans.

8.2 is the scan with filter executed 10,000 times.

13.1 is the spool built from executing the scan with filter once.

Those parts of the memo are copied below. When v=4 the cost of the spool is 1.0043 but the cost without the spool is lower (at 0.867567)

v=40

8.2 PhyOp_Filter 7.1 4.0  Cost(RowGoal 0,ReW 9999,ReB 0,Dist 0,Total 0)= 158.804 (Distance = 1)
13.1 PhyOp_Spool 8.4  Cost(RowGoal 0,ReW 9999,ReB 0,Dist 0,Total 0)= 1.03564 (Distance = 2)

v=4

8.2 PhyOp_Filter 7.1 4.0  Cost(RowGoal 1,ReW 9999,ReB 0,Dist 0,Total 0)= 0.867567 (Distance = 1)
13.1 PhyOp_Spool 8.4  Cost(RowGoal 1,ReW 9999,ReB 0,Dist 0,Total 0)= 1.0043 (Distance = 2)

The hint OPTION (NO_PERFORMANCE_SPOOL) removes the row count spool from the v=40 plan for comparison purposes.

I was expecting that USE HINT ('DISABLE_OPTIMIZER_ROWGOAL') would do the opposite for the v=4 plan but despite group 8.2 now being costed in the memo at 158.804 it still gets selected and the overall plan is costed at only 0.939204 as the anti semi join costing itself still scales down the costs to reflect that it will only require one row from that input. Moreover with this hint in place there is no PhyOp_Spool option available in the final memo anyway.

Paul White points out in the comments that the performance spool can be achieved with the use of OPTION (QUERYTRACEON 8691).

If the following batch is run then the second query should show up as being more than 50% of the cost of the batch (i.e. more expensive according to SQL Server's cost model)

declare @v int;
select @v = v from dbo.t where exists(select 1 from dbo.t where v = 4) 
select @v = v from dbo.t where exists(select 1 from dbo.t where v = 4) OPTION (QUERYTRACEON 8691);

Related Questions

Sponsored Content

0 Answered Questions

Sql query performance measure IO vs TIME

2 Answered Questions

[SOLVED] sqlpackage.exe SELECT statement causing massive reads

1 Answered Questions

0 Answered Questions

0 Answered Questions

Logical read when use Offset in SQL

1 Answered Questions

[SOLVED] Logical reads different when accessing the same LOB data

1 Answered Questions

[SOLVED] sql server partitioned view execution plan questions

1 Answered Questions

1 Answered Questions

2 Answered Questions

[SOLVED] Check existence with EXISTS outperform COUNT! ... Not?

Sponsored Content