2 comments

@Erwin Brandstetter 2015-09-25 15:18:14

Typically you would have another table (let's name it tbl) with all distinct id values in separate rows. If you don't, create it:

CREATE TABLE tbl AS 
SELECT DISTINCT id FROM postcode ORDER BY id;  -- ORDER is optional

Or replace tbl with in below query with the same SELECT as subquery, but that's (much) more expensive.

If there can be many rows per id, a recursive CTE should be fastest:

WITH RECURSIVE cte AS (
   SELECT t.id, 1 AS rnk, p.*, ARRAY[postcode] AS arr
   FROM   tbl t
        , LATERAL (
      SELECT postcode, date_created
      FROM   postcode
      WHERE  id = t.id
      ORDER  BY date_created DESC NULLS LAST
      LIMIT  1
      ) p

   UNION ALL
   SELECT t.id, rnk + 1, p.*, arr || p.postcode
   FROM   cte t
        , LATERAL (
      SELECT postcode, date_created
      FROM   postcode
      WHERE  id = t.id
      AND    date_created < t.date_created
      AND    postcode <> ALL (t.arr)
      ORDER  BY date_created DESC NULLS LAST
      LIMIT  1
      ) p
   WHERE rnk < 5
   )
SELECT id, rnk, postcode, date_created
FROM   cte
ORDER  BY id, rnk;

Assuming postcode to be text or varchar. There can be issues in this particular query if postcode has type modifiers (like varchar(50) or similar):

An index on (id, date_created) is essential for performance with big tables:

CREATE INDEX postcode_foo_idx ON postcode(id, date_created DESC NULLS LAST);

SQL Fiddle.

You can skip NULLS LAST everywhere if date_created is defined NOT NULL.

If substantially more than 5 rows per id is a rare case, @ypercube's queries will be faster. Test with EXPLAIN ANALYZE.

The difference: my rCTE comes with more overhead, but performance is hardly influenced by older surplus rows (those are not touched in the query). Both of @ypercube's queries have less overhead, but get slower with more rows per id.

Basics with links and more explanation:

If you don't have a table tbl, you can use a similar technique to distill distinct id from postcode in the first place:

@Erwin Brandstetter 2015-09-26 01:34:13

@ypercube moved our comments about a possible bug with type modifiers to a chat. Here is the follow-up question resulting from it.

@RoboBex 2015-09-28 13:12:35

thanks for your answer. We're working with a small amount of data at the moment so I went initially with ypercube's first query. However, the data will increase substantially so I'm going to try your way also. I haven't worked with recursive before so I'm going to do a little research first. Thanks again.

@ypercubeᵀᴹ 2015-09-25 14:21:42

There are probably many ways to do this. The first that comes to mind is to use window functions:

SELECT 
    id, postcode
FROM
  ( SELECT id, postcode, 
           ROW_NUMBER() OVER (PARTITION BY id
                              ORDER BY MAX(date_created) DESC
                             ) AS rn
    FROM tablename
    GROUP BY id, postcode
  ) AS t
WHERE
    rn <= 5
ORDER BY 
    id, rn ;

Test at SQLfiddle.

If there are ties, say the 5th, 6th and 7th postcode for an id have the same date_created, only one of them (choice will be arbitrary) will be in the results. If you want all the tied postcodes in those cases, use RANK() instead of ROW_NUMBER().


Another option is to use the LATERAL syntax. I'm not sure which will be more efficient, it wil probably depend on the values distribution of the two columns (id and postcode), i.e. how many distinct ids in the whole table, how many distinct postcodes per id and how many rows per (id, postcode) combinations.

SELECT 
    t.id, ti.postcode
FROM
    ( SELECT DISTINCT id
      FROM tablename
    ) AS t
  CROSS JOIN LATERAL
    ( SELECT tt.postcode,
             MAX(tt.date_created) AS date_created
      FROM tablename AS tt
      WHERE tt.id = t.id
      GROUP BY tt.postcode
      ORDER BY date_created DESC
      LIMIT 5
    ) AS ti 
ORDER BY 
    t.id, ti.date_created DESC;

Adding an index on (id, postcode, date_created) would be a good idea, too - or on (id, postcode, date_created DESC).

@RoboBex 2015-09-28 13:05:38

Thanks @ypercube. We've added an index and are now working with your first suggestion.

Related Questions

Sponsored Content

1 Answered Questions

[SOLVED] Improving distinct values estimates in Postgres

1 Answered Questions

[SOLVED] Mysql query to find last entry for each of the last 7 days

2 Answered Questions

3 Answered Questions

[SOLVED] Get rows with most recent date for each different item

1 Answered Questions

[SOLVED] Count all rows for each DISTINCT

2 Answered Questions

[SOLVED] Efficiently get the last record

3 Answered Questions

[SOLVED] Get last two IDs from rows for last two days

2 Answered Questions

Sponsored Content