By Yarin


2012-08-24 17:00:44 8 Comments

The following is the simplest possible example, though any solution should be able to scale to however many n top results are needed:

Given a table like that below, with person, group, and age columns, how would you get the 2 oldest people in each group? (Ties within groups should not yield more results, but give the first 2 in alphabetical order)

+--------+-------+-----+
| Person | Group | Age |
+--------+-------+-----+
| Bob    | 1     | 32  |
| Jill   | 1     | 34  |
| Shawn  | 1     | 42  |
| Jake   | 2     | 29  |
| Paul   | 2     | 36  |
| Laura  | 2     | 39  |
+--------+-------+-----+

Desired result set:

+--------+-------+-----+
| Shawn  | 1     | 42  |
| Jill   | 1     | 34  |
| Laura  | 2     | 39  |
| Paul   | 2     | 36  |
+--------+-------+-----+

NOTE: This question builds on a previous one- Get records with max value for each group of grouped SQL results - for getting a single top row from each group, and which received a great MySQL-specific answer from @Bohemian:

select * 
from (select * from mytable order by `Group`, Age desc, Person) x
group by `Group`

Would love to be able to build off this, though I don't see how.

10 comments

@swdon 2018-03-28 09:04:56

There is a really nice answer to this problem at MySQL - How To Get Top N Rows per Each Group

Based on the solution in the referenced link, your query would be like:

SELECT Person, Group, Age
   FROM
     (SELECT Person, Group, Age, 
                  @group_rank := IF(@group = Group, @group_rank + 1, 1) AS group_rank,
                  @current_group := Group 
       FROM `your_table`
       ORDER BY Group, Age DESC
     ) ranked
   WHERE group_rank <= `n`
   ORDER BY Group, Age DESC;

where n is the top n and your_table is the name of your table.

I think the explanation in the reference is really clear. For quick reference I will copy and paste it here:

Currently MySQL does not support ROW_NUMBER() function that can assign a sequence number within a group, but as a workaround we can use MySQL session variables.

These variables do not require declaration, and can be used in a query to do calculations and to store intermediate results.

@current_country := country This code is executed for each row and stores the value of country column to @current_country variable.

@country_rank := IF(@current_country = country, @country_rank + 1, 1) In this code, if @current_country is the same we increment rank, otherwise set it to 1. For the first row @current_country is NULL, so rank is also set to 1.

For correct ranking, we need to have ORDER BY country, population DESC

@Laurent PELE 2018-07-18 12:58:17

Well, it is the principle used by solutions of Marc Byers, Rick James and mine.

@Laurent PELE 2018-07-18 13:05:18

Difficult to say which post (Stack Overflow or SQLlines) was the first

@Rick James 2018-07-24 00:40:12

@LaurentPELE - Mine was posted Feb, 2015. I see no timestamp or name on SQLlines. MySQL blogs have been around for long enough that some of them are out of date, and should be removed -- people are quoting mis-information.

@Jon Bown 2018-03-10 03:27:29

I wanted to share this because I spent a long time searching for an easy way to implement this in a java program I'm working on. This doesn't quite give the output you're looking for but its close. The function in mysql called GROUP_CONCAT() worked really well for specifying how many results to return in each group. Using LIMIT or any of the other fancy ways of trying to do this with COUNT didn't work for me. So if you're willing to accept a modified output, its a great solution. Lets say I have a table called 'student' with student ids, their gender, and gpa. Lets say I want to top 5 gpas for each gender. Then I can write the query like this

SELECT sex, SUBSTRING_INDEX(GROUP_CONCAT(cast(gpa AS char ) ORDER BY gpa desc), ',',5) 
AS subcategories FROM student GROUP BY sex;

Note that the parameter '5' tells it how many entries to concatenate into each row

And the output would look something like

+--------+----------------+
| Male   | 4,4,4,4,3.9    |
| Female | 4,4,3.9,3.9,3.8|
+--------+----------------+

You can also change the ORDER BY variable and order them a different way. So if I had the student's age I could replace the 'gpa desc' with 'age desc' and it will work! You can also add variables to the group by statement to get more columns in the output. So this is just a way I found that is pretty flexible and works good if you are ok with just listing results.

@Laurent PELE 2017-12-07 13:57:12

Snuffin solution seems quite slow to execute when you've got plenty of rows and Mark Byers/Rick James and Bluefeet solutions doesn't work on my environnement (MySQL 5.6) because order by is applied after execution of select, so here is a variant of Marc Byers/Rick James solutions to fix this issue (with an extra imbricated select):

select person, groupname, age
from
(
    select person, groupname, age,
    (@rn:=if(@prev = groupname, @rn +1, 1)) as rownumb,
    @prev:= groupname 
    from 
    (
        select person, groupname, age
        from persons 
        order by groupname ,  age desc, person
    )   as sortedlist
    JOIN (select @prev:=NULL, @rn :=0) as vars
) as groupedlist 
where rownumb<=2
order by groupname ,  age desc, person;

I tried similar query on a table having 5 millions rows and it returns result in less than 3 seconds

@herrherr 2018-01-11 08:45:33

This is the only query that has been working in my environment. Thanks!

@Rick James 2018-06-30 17:52:19

Add LIMIT 9999999 to any derived table with an ORDER BY. This may prevent the ORDER BY from being ignored.

@Evan 2018-08-09 00:52:50

I ran a a similar query on a table containing a few thousand rows, and it took 60 seconds to return one result, so... thanks for the post, it's a start for me. (ETA: down to 5 seconds. Good!)

@Taryn 2012-08-24 17:25:29

Here is one way to do this, using UNION ALL (See SQL Fiddle with Demo). This works with two groups, if you have more than two groups, then you would need to specify the group number and add queries for each group:

(
  select *
  from mytable 
  where `group` = 1
  order by age desc
  LIMIT 2
)
UNION ALL
(
  select *
  from mytable 
  where `group` = 2
  order by age desc
  LIMIT 2
)

There are a variety of ways to do this, see this article to determine the best route for your situation:

http://www.xaprb.com/blog/2006/12/07/how-to-select-the-firstleastmax-row-per-group-in-sql/

Edit:

This might work for you too, it generates a row number for each record. Using an example from the link above this will return only those records with a row number of less than or equal to 2:

select person, `group`, age
from 
(
   select person, `group`, age,
      (@num:=if(@group = `group`, @num +1, if(@group := `group`, 1, 1))) row_number 
  from test t
  CROSS JOIN (select @num:=0, @group:=null) c
  order by `Group`, Age desc, person
) as x 
where x.row_number <= 2;

See Demo

@Charles Forest 2012-08-24 17:27:27

if he have 1 000+ groups, wouldn't that make this a bit scary?

@Taryn 2012-08-24 17:28:22

@CharlesForest yes, it would and that Is why I stated that you would have to specify it for more than two groups. It would become ugly.

@Charles Forest 2012-08-24 17:30:08

isn't there some kind of FOR in sql, he could just send the amount of loops to add in the query (mostly working with php on these situations, he asked in SQL so i'm not sure)

@Taryn 2012-08-24 17:43:17

@CharlesForest I think I found a better solution, see my edit

@Taryn 2012-08-24 18:17:32

@Yarin there are many ways to do this you will have to decide based on your needs and your situation.

@Yarin 2012-08-24 19:39:05

This wins because it handles tied results according to the requirements. See demo. Also, according to the article this will be a more performant option. Thanks-

@user610650 2012-08-24 19:55:01

@Yarin: won't this not order by names when ties occur?

@Taryn 2012-08-24 19:57:09

@Ludo if the requirement is to order by name in the event of tie, then it just need to be added to the order by clause.

@user610650 2012-08-24 20:00:13

@bluefeet: yes for sure, I was just curious to know why the OP concluded that the so-called requirements were respected before you made the edit.

@Gordon Linoff 2016-07-22 21:10:01

A note for anyone reading this: The version is the variables is close to being correct. However, MySQL does not guarantee the order of evaluation of expressions in the SELECT (and, in fact, sometimes evaluates them out-of-order). The key to the solution is to put all the variable assignments in a single expression; here is an example: stackoverflow.com/questions/38535020/….

@Taryn 2017-04-27 14:46:31

@GordonLinoff Updated my answer, thanks for pointing it out. It also took far too long for me to update it.

@Prakash 2016-12-05 14:25:07

In SQL Server row_numer() is a powerful function that can get result easily as below

select Person,[group],age
from
(
select * ,row_number() over(partition by [group] order by age desc) rn
from mytable
) t
where rn <= 2

@Rick James 2018-07-24 00:41:28

With 8.0 and 10.2 being GA, this answer is becoming reasonable.

@iedmrc 2018-10-01 17:37:39

@RickJames what does 'being GA' means? Window functions (dev.mysql.com/doc/refman/8.0/en/window-functions.html) solved my problem very well.

@Rick James 2018-10-01 17:56:48

@iedmrc - "GA" means "Generally Available". It is tech-speak for "ready for prime time", or "released". They are through developing the version and will be focusing on bug that they missed. That link discusses MySQL 8.0's implementation, which might be different than MariaDB 10.2's implementation.

@Rick James 2016-04-08 03:20:34

If the other answers are not fast enough Give this code a try:

SELECT
        province, n, city, population
    FROM
      ( SELECT  @prev := '', @n := 0 ) init
    JOIN
      ( SELECT  @n := if(province != @prev, 1, @n + 1) AS n,
                @prev := province,
                province, city, population
            FROM  Canada
            ORDER BY
                province   ASC,
                population DESC
      ) x
    WHERE  n <= 3
    ORDER BY  province, n;

Output:

+---------------------------+------+------------------+------------+
| province                  | n    | city             | population |
+---------------------------+------+------------------+------------+
| Alberta                   |    1 | Calgary          |     968475 |
| Alberta                   |    2 | Edmonton         |     822319 |
| Alberta                   |    3 | Red Deer         |      73595 |
| British Columbia          |    1 | Vancouver        |    1837970 |
| British Columbia          |    2 | Victoria         |     289625 |
| British Columbia          |    3 | Abbotsford       |     151685 |
| Manitoba                  |    1 | ...

@Vérace 2016-04-11 11:06:00

Looked at your site - where would I get the data source for the cities' populations? TIA and rgs.

@Rick James 2016-04-11 18:36:55

maxmind.com/en/worldcities -- I find it handy for experimenting with lat/lng searches, queries, partitioning, etc. It is big enough to be interesting, yet readable enough to recognize the answers. The Canadian subset is handy for this kind of question. (Fewer provinces than US cities.)

@Mark Byers 2012-08-24 22:55:03

In other databases you can do this using ROW_NUMBER. MySQL doesn't support ROW_NUMBER but you can use variables to emulate it:

SELECT
    person,
    groupname,
    age
FROM
(
    SELECT
        person,
        groupname,
        age,
        @rn := IF(@prev = groupname, @rn + 1, 1) AS rn,
        @prev := groupname
    FROM mytable
    JOIN (SELECT @prev := NULL, @rn := 0) AS vars
    ORDER BY groupname, age DESC, person
) AS T1
WHERE rn <= 2

See it working online: sqlfiddle


Edit I just noticed that bluefeet posted a very similar answer: +1 to him. However this answer has two small advantages:

  1. It it is a single query. The variables are initialized inside the SELECT statement.
  2. It handles ties as described in the question (alphabetical order by name).

So I'll leave it here in case it can help someone.

@Yarin 2012-08-25 20:29:35

Mark- This is working well for us. Thanks for providing another good alternative to compliment @bluefeet's- much appreciated.

@Mr A 2016-09-27 03:55:58

Why doesn't this Query worked? pastebin.com/u17NRM8B

@Aditya Hajare 2017-09-09 10:11:50

+1. This worked for me. Really clean and to the point answer. Can you please explain how exactly this works? Whats the logic behind this?

@Laurent PELE 2017-12-07 14:00:32

Nice solution but it seems it is not working in my environment (MySQL 5.6) because the order by clause is applied after select so it doesn't return the top result, see my alternative solution to fix this issue

@Joseph Cho 2018-06-06 18:42:36

While running this I was able to delete JOIN (SELECT @prev := NULL, @rn := 0) AS vars. I get the idea is to declare empty variables, but it seems extraneous for MySql.

@George B 2018-06-13 04:31:13

This works great for me in MySQL 5.7, but it would be awesome if someone could explain how it works

@Rick James 2018-06-29 21:33:47

MySQL 8 and MariaDB 10.2 have ROW_NUMBER() and related Windowing functions.

@user610650 2012-08-24 17:37:59

How about using self-joining:

CREATE TABLE mytable (person, groupname, age);
INSERT INTO mytable VALUES('Bob',1,32);
INSERT INTO mytable VALUES('Jill',1,34);
INSERT INTO mytable VALUES('Shawn',1,42);
INSERT INTO mytable VALUES('Jake',2,29);
INSERT INTO mytable VALUES('Paul',2,36);
INSERT INTO mytable VALUES('Laura',2,39);

SELECT a.* FROM mytable AS a
  LEFT JOIN mytable AS a2 
    ON a.groupname = a2.groupname AND a.age <= a2.age
GROUP BY a.person
HAVING COUNT(*) <= 2
ORDER BY a.groupname, a.age DESC;

gives me:

a.person    a.groupname  a.age     
----------  -----------  ----------
Shawn       1            42        
Jill        1            34        
Laura       2            39        
Paul        2            36      

I was strongly inspired by the answer from Bill Karwin to Select top 10 records for each category

Also, I'm using SQLite, but this should work on MySQL.

Another thing: in the above, I replaced the group column with a groupname column for convenience.

Edit:

Following-up on the OP's comment regarding missing tie results, I incremented on snuffin's answer to show all the ties. This means that if the last ones are ties, more than 2 rows can be returned, as shown below:

.headers on
.mode column

CREATE TABLE foo (person, groupname, age);
INSERT INTO foo VALUES('Paul',2,36);
INSERT INTO foo VALUES('Laura',2,39);
INSERT INTO foo VALUES('Joe',2,36);
INSERT INTO foo VALUES('Bob',1,32);
INSERT INTO foo VALUES('Jill',1,34);
INSERT INTO foo VALUES('Shawn',1,42);
INSERT INTO foo VALUES('Jake',2,29);
INSERT INTO foo VALUES('James',2,15);
INSERT INTO foo VALUES('Fred',1,12);
INSERT INTO foo VALUES('Chuck',3,112);


SELECT a.person, a.groupname, a.age 
FROM foo AS a 
WHERE a.age >= (SELECT MIN(b.age)
                FROM foo AS b 
                WHERE (SELECT COUNT(*)
                       FROM foo AS c
                       WHERE c.groupname = b.groupname AND c.age >= b.age) <= 2
                GROUP BY b.groupname)
ORDER BY a.groupname ASC, a.age DESC;

gives me:

person      groupname   age       
----------  ----------  ----------
Shawn       1           42        
Jill        1           34        
Laura       2           39        
Paul        2           36        
Joe         2           36        
Chuck       3           112      

@Yarin 2012-08-24 17:53:27

@Ludo- Just saw that answer from Bill Karwin - thanks for applying it here

@Yarin 2012-08-24 17:58:50

What do you think of Snuffin's answer? I'm trying to compare the two

@Yarin 2012-08-24 18:38:10

There's a problem with this. If there's a tie for second place within the group, only one top result is returned- See demo

@Yarin 2012-08-24 19:34:28

@Ludo- the original requirement was that each group return the exact n results, with any ties being resolved alphabetically

@verbamour 2018-02-14 19:15:53

The edit to include the ties does not work for me. I get ERROR 1242 (21000): Subquery returns more than 1 row, presumably because of the GROUP BY. When I execute the SELECT MIN subquery alone, it generates three rows: 34, 39, 112 and there it appears the second value should be 36, not 39.

@Choix 2018-03-19 01:53:52

similar to the previous answer, nice but not working in my case: missing records or second order item is wrongly ordered.

@snuffn 2012-08-24 17:40:10

Try this:

SELECT a.person, a.group, a.age FROM person AS a WHERE 
(SELECT COUNT(*) FROM person AS b 
WHERE b.group = a.group AND b.age >= a.age) <= 2 
ORDER BY a.group ASC, a.age DESC

DEMO

@Yarin 2012-08-24 17:57:08

snuffin coming out of nowhere with the simplest solution! Is this more elegant than Ludo's/Bill Karwin's? Can I get some commentary

@snuffn 2012-08-24 18:07:45

Hm, not sure if it is more elegant. But judging from the votes, I guess bluefeet might have the better solution.

@Yarin 2012-08-24 18:37:36

There's a problem with this. If there's a tie for second place within the group, only one top result is returned. See modified demo

@Alberto Leal 2017-09-21 18:18:33

It's not a problem if it's desired. You can set the order of a.person.

@Choix 2018-03-19 01:39:54

no, it is not working in my case, neither does the DEMO work

@Rick James 2018-06-29 21:32:00

This algorithm dies not scale well -- it is O(n*n).

@Travesty3 2012-08-24 17:48:35

Check this out:

SELECT
  p.Person,
  p.`Group`,
  p.Age
FROM
  people p
  INNER JOIN
  (
    SELECT MAX(Age) AS Age, `Group` FROM people GROUP BY `Group`
    UNION
    SELECT MAX(p3.Age) AS Age, p3.`Group` FROM people p3 INNER JOIN (SELECT MAX(Age) AS Age, `Group` FROM people GROUP BY `Group`) p4 ON p3.Age < p4.Age AND p3.`Group` = p4.`Group` GROUP BY `Group`
  ) p2 ON p.Age = p2.Age AND p.`Group` = p2.`Group`
ORDER BY
  `Group`,
  Age DESC,
  Person;

SQL Fiddle: http://sqlfiddle.com/#!2/cdbb6/15

@Travesty3 2012-08-24 17:53:42

Man, others found much simpler solutions...I just spent like 15 minutes on this and was incredibly proud of myself for coming up with such a complicated solution, too. That sucks.

@Yarin 2012-08-24 17:54:45

@Keep your head up Travesty3- upvoted your effort, thanks

@Jamie Strauss 2015-02-18 21:57:52

I had to find an internal version number that was 1 less than the current - this gave me the answer to do this: max(internal_version - 1) - so stress less :)

Related Questions

Sponsored Content

15 Answered Questions

[SOLVED] How to get a list of MySQL user accounts

  • 2009-07-16 03:23:53
  • burntsugar
  • 1386103 View
  • 1271 Score
  • 15 Answer
  • Tags:   mysql mysql5

22 Answered Questions

[SOLVED] Retrieving the last record in each group - MySQL

13 Answered Questions

[SOLVED] Select first row in each GROUP BY group?

17 Answered Questions

[SOLVED] Get records with max value for each group of grouped SQL results

16 Answered Questions

[SOLVED] Get top 1 row of each group

25 Answered Questions

13 Answered Questions

[SOLVED] Using LIMIT within GROUP BY to get N results per group?

2 Answered Questions

1 Answered Questions

[SOLVED] Get the latest N results for each person

5 Answered Questions

[SOLVED] Selecting top result from SQL

Sponsored Content