By Roman Pekar


2016-11-10 16:45:14 8 Comments

I need to store simple data - suppose I have some products with codes as a primary key, some properties and validity ranges. So data could look like this:

Products
code    value   begin_date  end_date
10905   13      2005-01-01  2016-12-31
10905   11      2017-01-01  null

Those ranges are not overlapping, so on every date I have a list of unique products and their properties. So to ease the use of it I've created the function:

create function dbo.f_Products
(
    @date date
)
returns table
as
return (
    select
    from dbo.Products as p
    where
        @date >= p.begin_date and
        @date <= p.end_date
)

This is how I'm going to use it:

select
    *
from <some table with product codes> as t
    left join dbo.f_Products(@date) as p on
        p.code = t.product_code

This is all fine, but how I can let optimizer know that those rows are unique to have better execution plan?

I did some googling, and found a couple of really nice articles for DDL which prevents storing overlapping ranges in the table:

But even if I try those constraint I see that optimizer cannot understand that resulting recordset will return unique codes.

What I'd like to have is certain approach which gives me basically the same performance as if I stored those products list on certain date and selected it with date = @date.

I know that some RDMBS (like PostgreSQL) have special data types for this (Range Types). But SQL Server doesn't have anything like this.

Am I missing something or there're no way to do this properly in SQL Server?

6 comments

@KumarHarsh 2018-02-19 10:45:49

suppose I have some products with codes as a primary key

how this is possible ? you have multiple codes.So please throw some correct sample data .

Those ranges are not overlapping, so on every date I have a list of unique products and their properties.

I have assume this in my query below. IMO,You are very correctly storing date range value.There is no other good way. you lack unique primary key in your table.Rest it is ok.

Yes,UDF is not the optimize way of achieving what you want.

Sample data,

create table #Products(productid int identity(1,1) primary key,code int
,value int,begin_date date not null,end_date date)

insert into #Products VALUES
 (10905,13,'2005-01-01','2016-12-31')
,(10905,11,'2017-01-01',null)
,(10906,13,'2005-01-01','2015-12-31')
,(10907,13,'2005-01-01','2014-12-31')

Notice that I have included productid is PK which is very necessary in your case.It will help in Record Insertion.

Creating Clustered index on code,begin_date,end_date is bad idea because of Index Fragmentation

   create unique nonclustered index IX_Product_Dates on #Products(begin_date,end_date,code) include(value)

declare @Input date='2006-04-01'

    SELECT p1.*
FROM #Products P1
WHERE p1.begin_date <= @input
    AND p1.end_date >= @Input
    AND p1.end_date IS NOT NULL

-- if sometable have one to one relation with product table then inner join -- Or it depend upon your requirement

select
    t.mentioncolumnname
from sometable  as t
    inner join #Products P1 on  p1.code = t.product_code
WHERE p1.begin_date <= @input
    AND p1.end_date >= @Input




drop table #Products

@Aducci 2018-02-16 22:05:58

You can create an indexed view that contains a row for each code/date in the range.

ProductDate (indexed view)
code    value   date
10905   13      2005-01-01
10905   13      2005-01-02
10905   13      ...
10905   13      2016-12-31
10905   11      2017-01-01
10905   11      2017-01-02
10905   11      ...
10905   11      Today

Like this:

create schema digits
go

create table digits.Ones (digit tinyint not null primary key)
insert into digits.Ones (digit) values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)

create table digits.Tens (digit tinyint not null primary key)
insert into digits.Tens (digit) values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)

create table digits.Hundreds (digit tinyint not null primary key)
insert into digits.Hundreds (digit) values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)

create table digits.Thousands (digit tinyint not null primary key)
insert into digits.Thousands (digit) values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)

create table digits.TenThousands (digit tinyint not null primary key)
insert into digits.TenThousands (digit) values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)
go

create schema info
go

create table info.Products (code int not null, [value] int not null, begin_date date not null, end_date date null, primary key (code, begin_date))
insert into info.Products (code, [value], begin_date, end_date) values 
(10905, 13, '2005-01-01', '2016-12-31'),
(10905, 11, '2017-01-01', null)

create table info.DateRange ([begin] date not null, [end] date not null, [singleton] bit not null default(1) check ([singleton] = 1))
insert into info.DateRange ([begin], [end]) values ((select min(begin_date) from info.Products), getdate())
go

create view info.ProductDate with schemabinding 
as
select
    p.code,
    p.value,
    dateadd(day, ones.digit + tens.digit*10 + huns.digit*100 + thos.digit*1000 + tthos.digit*10000, dr.[begin]) as [date]
from
    info.DateRange as dr
cross join
    digits.Ones as ones
cross join
    digits.Tens as tens
cross join
    digits.Hundreds as huns
cross join
    digits.Thousands as thos
cross join
    digits.TenThousands as tthos
join
    info.Products as p on
    dateadd(day, ones.digit + tens.digit*10 + huns.digit*100 + thos.digit*1000 + tthos.digit*10000, dr.[begin]) between p.begin_date and isnull(p.end_date, datefromparts(9999, 12, 31))
go

create unique clustered index idx_ProductDate on info.ProductDate ([date], code)
go

select *
from info.ProductDate with (noexpand)
where 
    date = '2014-01-01'

drop view info.ProductDate
drop table info.Products
drop table info.DateRange
drop table digits.Ones
drop table digits.Tens
drop table digits.Hundreds
drop table digits.Thousands
drop table digits.TenThousands
drop schema digits
drop schema info
go

@Roman Pekar 2018-02-17 09:31:55

That one is good, i was thinking about this myself, the problem is that for long periods it will take a lot of space

@Pittsburgh DBA 2018-02-17 13:11:03

@RomanPekar What are you trying to accomplish, exactly? Do you want all product and date compbinations, with no gaps, or do you simply want to efficiently query the table you already have, which is Code, Value, Begin_Date, and End_Date? If it's the latter, these solutions are getting out of control. If it's the former, you need to edit your question to make this clear, because it is not.

@Aducci 2018-02-18 21:19:44

@RomanPekar - I can't imagine it taking up that much space...how many products are there? What is the average time span?

@Salman A 2018-02-19 07:08:43

@RomanPekar search for "space-time tradeoff". You cannot minimize both at the same time.

@Roman Pekar 2018-02-19 11:56:55

@Aducci If we want to make it easy-to-use for end user, then it should work with, basically, any date passed (periods could be open-ended). Well, let's say there're 6000 products. If we use time from, let's say, 1990-01-01 till 2100-01-01 it will be 110 * 365 * 6000, which is 219M rows already.And it's still doesn't work with any date (for example 9999-12-31). So it's a fine working solution, but good 'range' solution is still better.

@Aducci 2018-02-19 16:22:59

@RomanPekar you could wrap the view with an inline table function in order to reduce the number of rows you need to store...using a where clause similar to where [date] = case when @date > getdate() then getdate() else @date end

@Pittsburgh DBA 2018-02-16 19:37:36

EDIT: My original answer was using an INNER JOIN, but the questioner wanted a LEFT JOIN.

CREATE TABLE Products
  (
  [Code] INT NOT NULL
  , [Value] VARCHAR(30) NOT NULL
  , Begin_Date DATETIME NOT NULL
  , End_Date DATETIME NULL
  )

/*
Products
code    value   begin_date  end_date
10905   13      2005-01-01  2016-12-31
10905   11      2017-01-01  null
*/
INSERT INTO Products ([Code], [Value], Begin_Date, End_Date) VALUES (10905, 13, '2005-01-01', '2016-12-31')
INSERT INTO Products ([Code], [Value], Begin_Date, End_Date) VALUES (10905, 11, '2017-01-01', NULL)

CREATE NONCLUSTERED INDEX SK_ProductDate ON Products ([Code], Begin_Date, End_Date) INCLUDE ([Value])

CREATE TABLE SomeTableWithProductCodes
  (
  [CODE] INT NOT NULL 
  )

 INSERT INTO SomeTableWithProductCodes ([Code]) VALUES (10905)

Here is a prototypical query, with a date predicate. Note that there are more optimal ways to do this in a bulletproof fashion, using a "less than" operator on the upper bound, but that's a different discussion.

SELECT
  P.[Code]
  , P.[Value]
  , P.[Begin_Date]
  , P.[End_Date]
FROM
   SomeTableWithProductCodes ST
   LEFT JOIN Products AS P ON
     ST.[Code] = P.[Code]
     AND '2016-06-30' BETWEEN P.[Begin_Date] AND ISNULL(P.[End_Date], '9999-12-31')

This query will perform an Index Seek on the Product table.

Here is a SQL Fiddle: SQL Fiddle - Products and Dates

Query Execution Plan

@Vladimir Baranov 2018-02-16 05:12:57

Approach with ROW_NUMBER scans the whole Products table once. It is the best method if you have a lot of product codes in the Products table and few validity ranges for each code.

WITH
CTE_rn
AS
(
    SELECT
        code
        ,value
        ,ROW_NUMBER() OVER (PARTITION BY code ORDER BY begin_date DESC) AS rn
    FROM Products
    WHERE begin_date <= @date
)
SELECT *
FROM
    <some table with product codes> as t
    LEFT JOIN CTE_rn ON CTE_rn.code = t.product_code AND CTE_rn.rn = 1
;

If you have few product codes and a lot of validity ranges for each code in the Products table, then it is better to seek the Products table for each code using OUTER APPLY.

SELECT *
FROM
    <some table with product codes> as t
    OUTER APPLY
    (
        SELECT TOP(1)
            Products.value
        FROM Products
        WHERE
            Products.code = t.product_code
            AND Products.begin_date <= @date
        ORDER BY Products.begin_date DESC
    ) AS A
;

Both variants need unique index on (code, begin_date DESC) include (value).

Note how the queries don't even look at end_date, because they assume that intervals don't have gaps. They will work in SQL Server 2008.

@Anton 2018-02-15 23:45:37

First you need to create a unique clustered index for (begin_date, end_date, code)

Then SQL engine will be able to do INDEX SEEK.

Additionally, you can also try to create a view for dbo.Products table to join that table with pre-populated dbo.Dates table.

select p.code, p.val, p.begin_date, p.end_date, d.[date]
    from dbo.Product as p
        inner join dbo.dates d on p.begin_date <= d.[date] and d.[date] <= p.end_date

Then in your function, you use that view as "where @date = view.date". The result can be either better or slightly worse... it depends on the actual data.

You also can try to make that view indexed (depends on how often it is being updated).

Alternatively, you can have better performance if you populate dbo.Products table for every date in the [begin_date] .. [end_date] range.

@Shnugo 2016-11-10 17:02:50

A solution without gaps might be this:

DECLARE @tbl TABLE(ID INT IDENTITY,[start_date] DATE);
INSERT INTO @tbl VALUES({d'2016-10-01'}),({d'2016-09-01'}),({d'2016-08-01'}),({d'2016-07-01'}),({d'2016-06-01'});

SELECT * FROM @tbl;

DECLARE @DateFilter DATE={d'2016-08-13'};

SELECT TOP 1 * 
FROM @tbl
WHERE [start_date]<[email protected]
ORDER BY [start_date] DESC

Important: Be sure that there is an (unique) index on start_date

UPDATE: for different products

DECLARE @tbl TABLE(ID INT IDENTITY,ProductID INT,[start_date] DATE);
INSERT INTO @tbl VALUES
--product 1
(1,{d'2016-10-01'}),(1,{d'2016-09-01'}),(1,{d'2016-08-01'}),(1,{d'2016-07-01'}),(1,{d'2016-06-01'})
--product 1
,(2,{d'2016-10-17'}),(2,{d'2016-09-16'}),(2,{d'2016-08-15'}),(2,{d'2016-07-10'}),(2,{d'2016-06-11'});

DECLARE @DateFilter DATE={d'2016-08-13'};

WITH PartitionedCount AS
(
    SELECT ROW_NUMBER() OVER(PARTITION BY ProductID ORDER BY [start_date] DESC) AS Nr
          ,*
    FROM @tbl
    WHERE [start_date]<[email protected]
)
SELECT *
FROM PartitionedCount
WHERE Nr=1

@Roman Pekar 2016-11-10 17:06:32

Yes, this is nice way of solving the task. However, the performance is still not optimal. What I'd like to have is certain approach which gives me basically the same performance as if I stored those products list on certain date and selected it with date = @date.

@Shnugo 2016-11-10 17:09:39

@RomanPekar There must be a difference whether you grab into an index based on equality or if you have to locate a value within intervalls. Nevertheless: If there is an index (check the EP if it is used!) the WHERE [start_date]<[email protected] should be lightning fast and the TOP 1 ORDER BY [start_date] DESC should pick the relevant row instantly... It makes me wonder, that you do not observe better performance...

@Roman Pekar 2016-11-10 17:12:16

well, for one row it works fine, but what if I want to get the list of valid products on certain date? I can of course explode my table into dates and use equality check, but I really would like to know if there's more beautiful solution

@Shnugo 2016-11-10 17:16:51

@RomanPekar If you have the same start-dates (first of month) for all your products, simply use SELECT TOP 1 WITH TIES... Does this help?

@Shnugo 2016-11-10 17:20:24

@RomanPekar I just added an approach using ROW_NUMBER which would get all products in one go. Just join this resultset to your main query...

@Shnugo 2016-11-17 07:51:04

@RomanPekar Is this question solved? Do you need further help? Please allow me one hint: If this question is solved, it would be very kind of you, to tick the acceptance check below the (best) answer's vote counter. This will 1) mark this issue as solved 2) make it easier for followers to find the best solution 3) pay points to the answerer and 4) pay points to you. Since you've crossed the 15 points border yourself, you are - additionally - asked to vote on contributions. This is the SO-way to say thank you. Happy Coding!

Related Questions

Sponsored Content

15 Answered Questions

[SOLVED] Select columns from result set of stored procedure

24 Answered Questions

4 Answered Questions

[SOLVED] Fastest way to split overlapping date ranges

11 Answered Questions

[SOLVED] What are the options for storing hierarchical data in a relational database?

10 Answered Questions

[SOLVED] Best way to get identity of inserted row?

  • 2008-09-03 21:32:02
  • Oded
  • 611815 View
  • 816 Score
  • 10 Answer
  • Tags:   sql sql-server tsql

36 Answered Questions

[SOLVED] How to return only the Date from a SQL Server DateTime datatype

3 Answered Questions

[SOLVED] SQL: Selecting between dates

  • 2014-06-30 13:23:24
  • Lehel
  • 57 View
  • 0 Score
  • 3 Answer
  • Tags:   sql database

1 Answered Questions

[SOLVED] SQL - Select Data Based on Date Range Efficiently

  • 2013-11-11 12:18:38
  • user2906488
  • 1954 View
  • 2 Score
  • 1 Answer
  • Tags:   sql sql-server

3 Answered Questions

Sponsored Content