By sharptooth


2015-08-03 15:15:51 8 Comments

I have the following definition:

CREATE TABLE [dbo].[JobItems] (
    [ItemId]            UNIQUEIDENTIFIER NOT NULL,
    [ItemState]         INT              NOT NULL,
    [ItemCreationTime]  DATETIME         NULL DEFAULT GETUTCDATE(),
    [ItemPriority]      TINYINT          NOT NULL DEFAULT(0),
    [ItemRefreshTime]   DATETIME         NULL,
    -- lots of other columns
    CONSTRAINT [PrimaryKey_GUID_HERE] PRIMARY KEY NONCLUSTERED ([ItemId] ASC)
);

CREATE UNIQUE CLUSTERED INDEX [JobItemsIndex]
    ON [dbo].[JobItems]([ItemId] ASC);

CREATE INDEX [GetTaskToProcessIndex]
    ON [dbo].[JobItems]([ItemState], [ItemPriority], [ItemCreationTime])

and the following query:

SELECT TOP(1) ItemId FROM JobItems
WHERE ItemState = 5 OR
   ( ( ItemState = 11 ) AND ( DATEDIFF( SECOND, ItemRefreshTime, GETUTCDATE() ) > 14 ) )
ORDER BY ItemPriority ASC, ItemCreationTime ASC

I run this query and inspect the actual execution plan and here's what's going on:

  1. Index seek is done to retrieve items with ItemState=5.
  2. Index seek is done to retrieve items with ItemState=11 and then for each row a separate seek is done to retrieve ItemRefreshTime and the results of two seeks are filtered using nested loops.
  3. Sets from 1 and 2 containing ItemId, ItemCreationTime and ItemPriority are concatenated and then...
  4. Magical DistinctSort happens with ORDER BY ItemId ASC and finally
  5. TopNSort happens with ORDER BY ItemPriority ASC, ItemCreationTime ASC

TopNSort and DistinctSort take something like 32 percent each so I'd be happy to get rid of DistinctSort - I don't even understand its purpose.

What's this magical TopNSort which useful DistinctSort and why is it there?

1 comments

@Martin Smith 2015-08-03 17:58:13

I can reproduce the plan that you describe on SQL Server 2012 (on prem) by running the DDL in your question and then fiddling the stats so SQL Server thinks that the table is much larger than reality.

UPDATE STATISTICS [dbo].[JobItems] WITH ROWCOUNT = 10000000, pagecount = 10000000

And then running the query with OPTION (MAXDOP 1, CONCAT UNION, ORDER GROUP).

enter image description here

This is an index union plan. The concatenation operator implements UNION ALL. The Distinct Sort changes the semantics to a UNION operation to prevent the same row being returned multiple times. (In the event the table had no index key to act as a unique row identifier the physical rid would have been used here to avoid incorrectly de-duping different rows that happen to have the same column values)

An example of where this might be needed is in the query below. (note the two parameters are set to the same value so an index union plan there would seek the same rows twice)

DECLARE @ItemState1   INT = 5
        , @ItemState2 INT = 5

SELECT ItemId
FROM   JobItems
WHERE  ItemState = @ItemState1
        OR ( ( ItemState = @ItemState2 )
             AND ( DATEDIFF(SECOND, ItemRefreshTime, GETUTCDATE()) > 14 ) )

The Top N Sort then re-sorts the data to implement the TOP 1.

In your case the Distinct Sort is logically not required for multiple reasons. The ItemState = 5 and ItemState = 11 branches are mutually exclusive (and this could be determined at compile time) and additionally the TOP 1 ... ORDER BY ItemPriority ASC, ItemCreationTime ASC semantics wouldn't be affected even if there were erroneous duplicates.

An alternative way of writing the query (that gives a better plan using the indexes to avoid any sorts at all) is

SELECT TOP(1) ItemId
FROM   (SELECT ItemId,
               ItemPriority,
               ItemCreationTime
        FROM   JobItems
        WHERE  ItemState = 5
        UNION ALL
        SELECT ItemId,
               ItemPriority,
               ItemCreationTime
        FROM   JobItems
        WHERE  ( ( ItemState = 11 )
                 AND ( DATEDIFF(SECOND, ItemRefreshTime, GETUTCDATE()) > 14 ) )) T
ORDER  BY ItemPriority ASC,
          ItemCreationTime ASC 

enter image description here

You could consider adding ItemRefreshTime as an included column to the index to avoid the key lookup if in practice quite a few may be required before finding a single row satisfying the residual predicate.

Related Questions

Sponsored Content

1 Answered Questions

[SOLVED] Azure SQL Server - big query choosing bad execution plan

1 Answered Questions

[SOLVED] Reasons why batch generates more than one query plan?

2 Answered Questions

[SOLVED] Why does sp_executesql use a different query plan?

3 Answered Questions

[SOLVED] Parent-Child Tree Hierarchical ORDER

1 Answered Questions

2 Answered Questions

[SOLVED] Why query plan changes by number of rows

1 Answered Questions

[SOLVED] deteriorating stored procedure running times

1 Answered Questions

[SOLVED] Why SELECT COUNT() query execution plan includes left joined table?

Sponsored Content