By Wells


2010-01-25 01:25:02 8 Comments

The following query:

SELECT
year, id, rate
FROM h
WHERE year BETWEEN 2000 AND 2009
AND id IN (SELECT rid FROM table2)
GROUP BY id, year
ORDER BY id, rate DESC

yields:

year    id  rate
2006    p01 8
2003    p01 7.4
2008    p01 6.8
2001    p01 5.9
2007    p01 5.3
2009    p01 4.4
2002    p01 3.9
2004    p01 3.5
2005    p01 2.1
2000    p01 0.8
2001    p02 12.5
2004    p02 12.4
2002    p02 12.2
2003    p02 10.3
2000    p02 8.7
2006    p02 4.6
2007    p02 3.3

What I'd like is only the top 5 results for each id:

2006    p01 8
2003    p01 7.4
2008    p01 6.8
2001    p01 5.9
2007    p01 5.3
2001    p02 12.5
2004    p02 12.4
2002    p02 12.2
2003    p02 10.3
2000    p02 8.7

Is there a way to do this using some kind of LIMIT like modifier that works within the GROUP BY?

13 comments

@Salman A 2015-05-15 21:49:31

The original query used user variables and ORDER BY on derived tables; the behavior of both quirks is not guaranteed. Revised answer as follows.

You can use poor man's rank over partition to achieve desired result. Just outer join the table with itself and for each row, count the number of rows lesser than it:

SELECT testdata.id, testdata.rate, testdata.year, COUNT(lesser.rate) AS rank
FROM testdata
LEFT JOIN testdata AS lesser ON testdata.id = lesser.id AND testdata.rate < lesser.rate
GROUP BY testdata.id, testdata.rate, testdata.year
HAVING COUNT(lesser.rate) < 5
ORDER BY testdata.id, testdata.rate DESC

Note that:

  1. COUNT is zero-based
  2. For sort descending the lesser row is the one with higher rate
  3. All rows that tie for last place are returned

Result:

+------+-------+------+------+
| id   | rate  | year | rank |
+------+-------+------+------+
| p01  |  8.00 | 2006 |    0 |
| p01  |  7.40 | 2003 |    1 |
| p01  |  6.80 | 2008 |    2 |
| p01  |  5.90 | 2001 |    3 |
| p01  |  5.30 | 2007 |    4 |
| p02  | 12.50 | 2001 |    0 |
| p02  | 12.40 | 2004 |    1 |
| p02  | 12.20 | 2002 |    2 |
| p02  | 10.30 | 2003 |    3 |
| p02  |  8.70 | 2000 |    4 |
+------+-------+------+------+

@ruuter 2015-08-27 23:23:06

I think it is worth mentioning that the key part is to ORDER BY id since any change of the value of id will restart counting in rank.

@Brenno Leal 2016-08-01 17:37:12

Why should i run it twice to get the response from WHERE rank <=5? For the first time im not getting 5 rows from each id, but after it i am able to get as you said.

@Salman A 2016-08-01 18:12:10

@BrennoLeal I think you are forgetting the SET statement (see first query). It is necessary.

@Rick James 2018-02-10 04:21:26

In newer versions, the ORDER BY in the derived table can, and often will, be ignored. This defeats the goal. Efficient group-wise are found here.

@Salman A 2018-02-28 09:46:04

@RickJames you are right. Rewrote the answer.

@John 2016-10-25 13:20:14

Took some working, but I thougth my solution would be something to share as it is seems elegant as well as quite fast.

SELECT h.year, h.id, h.rate 
  FROM (
    SELECT id, 
      SUBSTRING_INDEX(GROUP_CONCAT(CONCAT(id, '-', year) ORDER BY rate DESC), ',' , 5) AS l
      FROM h
      WHERE year BETWEEN 2000 AND 2009
      GROUP BY id
      ORDER BY id
  ) AS h_temp
    LEFT JOIN h ON h.id = h_temp.id 
      AND SUBSTRING_INDEX(h_temp.l, CONCAT(h.id, '-', h.year), 1) != h_temp.l

Note that this example is specified for the purpose of the question and can be modified quite easily for other similar purposes.

@Wang Wen'an 2016-05-09 10:16:02

Build the virtual columns(like RowID in Oracle)

table:

`
CREATE TABLE `stack` 
(`year` int(11) DEFAULT NULL,
`id` varchar(10) DEFAULT NULL,
`rate` float DEFAULT NULL) 
ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
`

data:

insert into stack values(2006,'p01',8);
insert into stack values(2001,'p01',5.9);
insert into stack values(2007,'p01',5.3);
insert into stack values(2009,'p01',4.4);
insert into stack values(2001,'p02',12.5);
insert into stack values(2004,'p02',12.4);
insert into stack values(2005,'p01',2.1);
insert into stack values(2000,'p01',0.8);
insert into stack values(2002,'p02',12.2);
insert into stack values(2002,'p01',3.9);
insert into stack values(2004,'p01',3.5);
insert into stack values(2003,'p02',10.3);
insert into stack values(2000,'p02',8.7);
insert into stack values(2006,'p02',4.6);
insert into stack values(2007,'p02',3.3);
insert into stack values(2003,'p01',7.4);
insert into stack values(2008,'p01',6.8);

SQL like this:

select t3.year,t3.id,t3.rate 
from (select t1.*, (select count(*) from stack t2 where t1.rate<=t2.rate and t1.id=t2.id) as rownum from stack t1) t3 
where rownum <=3 order by id,rate DESC;

if delete the where clause in t3, it shows like this:

enter image description here

GET "TOP N Record" --> add the "rownum <=3" in where clause(the where-clause of t3);

CHOOSE "the year" --> add the "BETWEEN 2000 AND 2009" in where clause(the where-clause of t3);

@starvator 2016-06-23 14:52:14

If you have rates that repeat for the same id, then this will not work because your rowNum count will increase higher; you will not get 3 per row, you can get 0, 1 or 2. Can you think of any solution to this?

@Wang Wen'an 2016-07-12 07:11:49

@starvator change the "t1.rate<=t2.rate" to "t1.rate<t2.rate", if the best rate has same values in the same id, all of them has the same rownum but will not increase higher; like "rate 8 in id p01", if it repeats, by using "t1.rate<t2.rate", both of "rate 8 in id p01" has the same rownum 0; if using "t1.rate<=t2.rate", the rownum is 2;

@Himanshu Patel 2016-07-15 12:39:33

Please try below stored procedure. I have already verified. I am getting proper result but without using groupby.

CREATE DEFINER=`ks_root`@`%` PROCEDURE `first_five_record_per_id`()
BEGIN
DECLARE query_string text;
DECLARE datasource1 varchar(24);
DECLARE done INT DEFAULT 0;
DECLARE tenants varchar(50);
DECLARE cur1 CURSOR FOR SELECT rid FROM demo1;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;

    SET @query_string='';

      OPEN cur1;
      read_loop: LOOP

      FETCH cur1 INTO tenants ;

      IF done THEN
        LEAVE read_loop;
      END IF;

      SET @datasource1 = tenants;
      SET @query_string = concat(@query_string,'(select * from demo  where `id` = ''',@datasource1,''' order by rate desc LIMIT 5) UNION ALL ');

       END LOOP; 
      close cur1;

    SET @query_string  = TRIM(TRAILING 'UNION ALL' FROM TRIM(@query_string));  
  select @query_string;
PREPARE stmt FROM @query_string;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;

END

@Vishal Kumar 2013-10-04 15:35:28

For me something like

SUBSTRING_INDEX(group_concat(col_name order by desired_col_order_name), ',', N) 

works perfectly. No complicated query.


for example: get top 1 for each group

SELECT 
    *
FROM
    yourtable
WHERE
    id IN (SELECT 
            SUBSTRING_INDEX(GROUP_CONCAT(id
                            ORDER BY rate DESC),
                        ',',
                        1) id
        FROM
            yourtable
        GROUP BY year)
ORDER BY rate DESC;

@MLF 2014-12-24 08:07:44

Try this:

SET @num := 0, @type := '';
SELECT `year`, `id`, `rate`,
    @num := if(@type = `id`, @num + 1, 1) AS `row_number`,
    @type := `id` AS `dummy`
FROM (
    SELECT *
    FROM `h`
    WHERE (
        `year` BETWEEN '2000' AND '2009'
        AND `id` IN (SELECT `rid` FROM `table2`) AS `temp_rid`
    )
    ORDER BY `id`
) AS `temph`
GROUP BY `year`, `id`, `rate`
HAVING `row_number`<='5'
ORDER BY `id`, `rate DESC;

@fthiella 2013-03-23 09:47:24

You could use GROUP_CONCAT aggregated function to get all years into a single column, grouped by id and ordered by rate:

SELECT   id, GROUP_CONCAT(year ORDER BY rate DESC) grouped_year
FROM     yourtable
GROUP BY id

Result:

-----------------------------------------------------------
|  ID | GROUPED_YEAR                                      |
-----------------------------------------------------------
| p01 | 2006,2003,2008,2001,2007,2009,2002,2004,2005,2000 |
| p02 | 2001,2004,2002,2003,2000,2006,2007                |
-----------------------------------------------------------

And then you could use FIND_IN_SET, that returns the position of the first argument inside the second one, eg.

SELECT FIND_IN_SET('2006', '2006,2003,2008,2001,2007,2009,2002,2004,2005,2000');
1

SELECT FIND_IN_SET('2009', '2006,2003,2008,2001,2007,2009,2002,2004,2005,2000');
6

Using a combination of GROUP_CONCAT and FIND_IN_SET, and filtering by the position returned by find_in_set, you could then use this query that returns only the first 5 years for every id:

SELECT
  yourtable.*
FROM
  yourtable INNER JOIN (
    SELECT
      id,
      GROUP_CONCAT(year ORDER BY rate DESC) grouped_year
    FROM
      yourtable
    GROUP BY id) group_max
  ON yourtable.id = group_max.id
     AND FIND_IN_SET(year, grouped_year) BETWEEN 1 AND 5
ORDER BY
  yourtable.id, yourtable.year DESC;

Please see fiddle here.

Please note that if more than one row can have the same rate, you should consider using GROUP_CONCAT(DISTINCT rate ORDER BY rate) on the rate column instead of the year column.

The maximum length of the string returned by GROUP_CONCAT is limited, so this works well if you need to select a few records for every group.

@user10089632 2018-01-21 16:26:14

Clever trick and nice explanation!

@Timothy Johns 2018-03-15 23:35:43

That's beautifully performant, comparatively simple, and great explanation; thank you SO MUCH. To your last point, Where a reasonable maximum length can be computed, one can use SET SESSION group_concat_max_len = <maximum length>; In the OP's case, a non-issue (since the default is 1024), but by way of example, group_concat_max_len should be at least 25: 4 (max length of a year string) + 1 (separator character), times 5 (first 5 years). The strings are truncated rather than throwing an error, so watch for warnings such as 1054 rows in set, 789 warnings (0.31 sec).

@Amogh 2018-08-31 07:32:21

If i want to fetch exact 2 rows rather than 1 to 5 than what should I use with FIND_IN_SET(). I tried for FIND_IN_SET() =2 but not showing result as expected.

@jDub9 2018-11-15 14:57:37

FIND_IN_SET BETWEEN 1 and 5 will take the first 5 positions of GROUP_CONCAT set if size equal to or greater than 5. So FIND_IN_SET = 2 will take only the data with the 2nd position in your GROUP_CONCAT. Getting 2 rows you can try BETWEEN 1 and 2 for 1st and 2nd position assuming set has 2 rows to give.

@tiomno 2019-02-06 05:20:45

This solution has much better performance than Salman's for large datasets. I gave a thumbs up to both for such clever solutions anyway. Thanks!!

@Saharsh Shah 2013-01-05 09:37:40

Try this:

SELECT h.year, h.id, h.rate 
FROM (SELECT h.year, h.id, h.rate, IF(@lastid = (@lastid:=h.id), @index:[email protected]+1, @index:=0) indx 
      FROM (SELECT h.year, h.id, h.rate 
            FROM h
            WHERE h.year BETWEEN 2000 AND 2009 AND id IN (SELECT rid FROM table2)
            GROUP BY id, h.year
            ORDER BY id, rate DESC
            ) h, (SELECT @lastid:='', @index:=0) AS a
    ) h 
WHERE h.indx <= 5;

@anu 2014-10-08 17:25:09

unknown column a.type in field list

@Dev-Ria 2012-12-17 17:14:52

for those like me that had queries time out. I made the below to use limits and anything else by a specific group.

DELIMITER $$
CREATE PROCEDURE count_limit200()
BEGIN
    DECLARE a INT Default 0;
    DECLARE stop_loop INT Default 0;
    DECLARE domain_val VARCHAR(250);
    DECLARE domain_list CURSOR FOR SELECT DISTINCT domain FROM db.one;

    OPEN domain_list;

    SELECT COUNT(DISTINCT(domain)) INTO stop_loop 
    FROM db.one;
    -- BEGIN LOOP
    loop_thru_domains: LOOP
        FETCH domain_list INTO domain_val;
        SET a=a+1;

        INSERT INTO db.two(book,artist,title,title_count,last_updated) 
        SELECT * FROM 
        (
            SELECT book,artist,title,COUNT(ObjectKey) AS titleCount, NOW() 
            FROM db.one 
            WHERE book = domain_val
            GROUP BY artist,title
            ORDER BY book,titleCount DESC
            LIMIT 200
        ) a ON DUPLICATE KEY UPDATE title_count = titleCount, last_updated = NOW();

        IF a = stop_loop THEN
            LEAVE loop_thru_domain;
        END IF;
    END LOOP loop_thru_domain;
END $$

it loops through a list of domains and then inserts only a limit of 200 each

@Ricky Moreno 2012-11-29 00:04:32

SELECT year, id, rate
FROM (SELECT
  year, id, rate, row_number() over (partition by id order by rate DESC)
  FROM h
  WHERE year BETWEEN 2000 AND 2009
  AND id IN (SELECT rid FROM table2)
  GROUP BY id, year
  ORDER BY id, rate DESC) as subquery
WHERE row_number <= 5

The subquery is almost identical to your query. Only change is adding

row_number() over (partition by id order by rate DESC)

@ypercubeᵀᴹ 2013-01-07 16:06:06

This is nice but MySQL has no window functions (like ROW_NUMBER()).

@fantasticfears 2017-11-27 13:47:34

As of MySQL 8.0, row_number() is available.

@Brian L Cartwright 2012-11-02 00:11:19

This requires a series of subqueries to rank the values, limit them, then perform the sum while grouping

@Rnk:=0;
@N:=2;
select
  c.id,
  sum(c.val)
from (
select
  b.id,
  b.bal
from (
select   
  if(@last_id=id,@Rnk+1,1) as Rnk,
  a.id,
  a.val,
  @last_id=id,
from (   
select 
  id,
  val 
from list
order by id,val desc) as a) as b
where b.rnk < @N) as c
group by c.id;

@Shlomi Noach 2012-07-06 17:10:04

The following post: sql: selcting top N record per group describes the complicated way of achieving this without subqueries.

It improves on other solutions offered here by:

  • Doing everything in a single query
  • Being able to properly utilize indexes
  • Avoiding subqueries, notoriously known to produce bad execution plans in MySQL

It is however not pretty. A good solution would be achievable were Window Functions (aka Analytic Functions) enabled in MySQL -- but they are not. The trick used in said post utilizes GROUP_CONCAT, which is sometimes described as "poor man's Window Functions for MySQL".

@bobince 2010-01-25 01:37:08

No, you can't LIMIT subqueries arbitrarily (you can do it to a limited extent in newer MySQLs, but not for 5 results per group).

This is a groupwise-maximum type query, which is not trivial to do in SQL. There are various ways to tackle that which can be more efficient for some cases, but for top-n in general you'll want to look at Bill's answer to a similar previous question.

As with most solutions to this problem, it can return more than five rows if there are multiple rows with the same rate value, so you may still need a quantity of post-processing to check for that.

Related Questions

Sponsored Content

29 Answered Questions

[SOLVED] How to output MySQL query results in CSV format?

  • 2008-12-10 15:59:51
  • MCS
  • 1029533 View
  • 1028 Score
  • 29 Answer
  • Tags:   mysql csv quotes

24 Answered Questions

[SOLVED] Retrieving the last record in each group - MySQL

14 Answered Questions

[SOLVED] Select first row in each GROUP BY group?

15 Answered Questions

[SOLVED] How to get a list of MySQL user accounts

  • 2009-07-16 03:23:53
  • burntsugar
  • 1398008 View
  • 1286 Score
  • 15 Answer
  • Tags:   mysql mysql5

2 Answered Questions

[SOLVED] Using group by on multiple columns

  • 2010-03-10 23:11:23
  • l--''''''---------''''''''''''
  • 832213 View
  • 869 Score
  • 2 Answer
  • Tags:   sql group-by

16 Answered Questions

[SOLVED] Get top 1 row of each group

25 Answered Questions

4 Answered Questions

[SOLVED] SQL SORT BY ALTERNATE YEAR

2 Answered Questions

[SOLVED] Finding an element with max. no of appearing in SQL

1 Answered Questions

[SOLVED] SQL Close By Operator

  • 2011-02-03 22:30:02
  • Mike Silvis
  • 58 View
  • 0 Score
  • 1 Answer
  • Tags:   php mysql

Sponsored Content