By Trying_hard


2016-10-07 17:36:25 8 Comments

I am using this data frame:

Fruit   Date      Name  Number
Apples  10/6/2016 Bob    7
Apples  10/6/2016 Bob    8
Apples  10/6/2016 Mike   9
Apples  10/7/2016 Steve 10
Apples  10/7/2016 Bob    1
Oranges 10/7/2016 Bob    2
Oranges 10/6/2016 Tom   15
Oranges 10/6/2016 Mike  57
Oranges 10/6/2016 Bob   65
Oranges 10/7/2016 Tony   1
Grapes  10/7/2016 Bob    1
Grapes  10/7/2016 Tom   87
Grapes  10/7/2016 Bob   22
Grapes  10/7/2016 Bob   12
Grapes  10/7/2016 Tony  15

I want to aggregate this by name and then by fruit to get a total number of fruit per name.

Bob,Apples,16 ( for example )

I tried grouping by Name and Fruit but how do I get the total number of fruit.

8 comments

@Steven G 2016-10-07 17:37:45

use the sum() method

df.groupby(['Fruit','Name']).sum()

Out[31]: 
               Number
Fruit   Name         
Apples  Bob        16
        Mike        9
        Steve      10
Grapes  Bob        35
        Tom        87
        Tony       15
Oranges Bob        67
        Mike       57
        Tom        15
        Tony        1

@Kingname 2017-10-23 12:32:50

How can pandas knows that I want to sum the col named Number ?

@Steven G 2017-10-23 16:51:53

@Kingname it's the last column left if you take out NAME and FRUIT. if you add 2 columns left, it would sum both columns

@Daniyal Shahrokhian 2018-05-20 06:11:14

@StevenG the Date column is also left...

@matanster 2018-07-24 17:20:13

I find this solution a little hackish compared to the others

@Wassadamo 2018-09-01 02:28:48

Date is not summed because it has dtype = string yes?

@Chirayu05 2019-04-22 02:49:04

What if I want to aggregate this only based on the group Fruit?

@piRSquared 2019-05-03 20:21:19

Gratz on another gold badge (-:

@WeNYoBen 2018-11-21 03:01:52

You can set the groupby column to index then using sum with level

df.set_index(['Fruit','Name']).sum(level=[0,1])
Out[175]: 
               Number
Fruit   Name         
Apples  Bob        16
        Mike        9
        Steve      10
Oranges Bob        67
        Tom        15
        Mike       57
        Tony        1
Grapes  Bob        35
        Tom        87
        Tony       15

@vaibhav pawar 2018-08-03 10:06:32

This answer only to understand how groupby and sum works.

I am using data-set "Rainfall in India since 1900 to 2015"

My Dataset includes columns like "subdivision" and "annual".

So, here i would like to calculate sum of annual rainfall for each subdivision.

Total = Data.groupby('SUBDIVISION')['ANNUAL'].sum()
print (Total)

this is how subdivision will get grouped and we will get sum of annual rainfall per subdivision.

@Gazala Muhamed 2018-07-02 10:01:31

If you want to keep the original columns Fruit and Name, use reset_index(). Otherwise Fruit and Name will become part of the index.

df.groupby(['Fruit','Name'])['Number'].sum().reset_index()

Fruit   Name       Number
Apples  Bob        16
Apples  Mike        9
Apples  Steve      10
Grapes  Bob        35
Grapes  Tom        87
Grapes  Tony       15
Oranges Bob        67
Oranges Mike       57
Oranges Tom        15
Oranges Tony        1

As seen in the other answers:

df.groupby(['Fruit','Name'])['Number'].sum()

               Number
Fruit   Name         
Apples  Bob        16
        Mike        9
        Steve      10
Grapes  Bob        35
        Tom        87
        Tony       15
Oranges Bob        67
        Mike       57
        Tom        15
        Tony        1

@jared 2018-03-11 00:29:59

df.groupby(['Fruit','Name'])['Number'].sum()

You can select different columns to sum numbers.

@Saurabh 2016-10-08 11:40:26

Also you can use agg function,

df.groupby(['Name', 'Fruit'])['Number'].agg('sum')

@shahar_m 2018-10-04 10:02:59

This should be the accepted answer, it is more explicit.

@Gaurang Tandon 2019-05-08 15:53:03

This differs from the accepted answer in that this returns a Series whereas the other returns a GroupBy object.

@Demetri Pananos 2016-10-07 18:35:14

Both the other answers accomplish what you want.

You can use the pivot functionality to arrange the data in a nice table

df.groupby(['Fruit','Name'],as_index = False).sum().pivot('Fruit','Name').fillna(0)



Name    Bob     Mike    Steve   Tom    Tony
Fruit                   
Apples  16.0    9.0     10.0    0.0     0.0
Grapes  35.0    0.0     0.0     87.0    15.0
Oranges 67.0    57.0    0.0     15.0    1.0

@Batsu 2016-10-07 17:44:57

You can use groupby and sum:

df.groupby(['Name', 'Fruit']).sum()

               Number
Name  Fruit          
Bob   Apples       16
      Grapes       35
      Oranges      67
Mike  Apples        9
      Oranges      57
Steve Apples       10
Tom   Grapes       87
      Oranges      15
Tony  Grapes       15
      Oranges       1

Related Questions

Sponsored Content

33 Answered Questions

[SOLVED] Renaming columns in pandas

18 Answered Questions

[SOLVED] Get list from pandas DataFrame column headers

17 Answered Questions

[SOLVED] How to iterate over rows in a DataFrame in Pandas?

14 Answered Questions

[SOLVED] Select rows from a DataFrame based on values in a column in pandas

23 Answered Questions

[SOLVED] Adding new column to existing DataFrame in Python pandas

15 Answered Questions

[SOLVED] Delete column from pandas DataFrame by column name

13 Answered Questions

[SOLVED] "Large data" work flows using pandas

5 Answered Questions

14 Answered Questions

[SOLVED] Select first row in each GROUP BY group?

13 Answered Questions

[SOLVED] Group By Multiple Columns

Sponsored Content