By Trying_hard


2016-10-07 17:36:25 8 Comments

I am using this data frame:

Fruit   Date      Name  Number
Apples  10/6/2016 Bob    7
Apples  10/6/2016 Bob    8
Apples  10/6/2016 Mike   9
Apples  10/7/2016 Steve 10
Apples  10/7/2016 Bob    1
Oranges 10/7/2016 Bob    2
Oranges 10/6/2016 Tom   15
Oranges 10/6/2016 Mike  57
Oranges 10/6/2016 Bob   65
Oranges 10/7/2016 Tony   1
Grapes  10/7/2016 Bob    1
Grapes  10/7/2016 Tom   87
Grapes  10/7/2016 Bob   22
Grapes  10/7/2016 Bob   12
Grapes  10/7/2016 Tony  15

I want to aggregate this by name and then by fruit to get a total number of fruit per name.

Bob,Apples,16 ( for example )

I tried grouping by Name and Fruit but how do I get the total number of fruit.

7 comments

@Steven G 2016-10-07 17:37:45

Use GroupBy.sum:

df.groupby(['Fruit','Name']).sum()

Out[31]: 
               Number
Fruit   Name         
Apples  Bob        16
        Mike        9
        Steve      10
Grapes  Bob        35
        Tom        87
        Tony       15
Oranges Bob        67
        Mike       57
        Tom        15
        Tony        1

@Kingname 2017-10-23 12:32:50

How can pandas knows that I want to sum the col named Number ?

@Steven G 2017-10-23 16:51:53

@Kingname it's the last column left if you take out NAME and FRUIT. if you add 2 columns left, it would sum both columns

@Wassadamo 2018-09-01 02:28:48

Date is not summed because it has dtype = string yes?

@tgdn 2019-11-05 14:38:01

How to specify which column to sum?

@Steven G 2019-11-08 17:34:21

@tgdn df.groupby(['Name', 'Fruit'])['Number'].sum()

@tgdn 2019-11-08 17:34:53

Thanks @StevenG

@skdhfgeq2134 2020-01-16 10:41:22

@StevenG For the answer provided to sum up a specific column, the output comes out as a Pandas series instead of Dataframe. From the comment by Jakub Kukul (in below answer), we can use double square brackets around 'Number' to get a Dataframe.

@xxyjoel 2020-02-02 08:25:23

A variation on the .agg() function; provides the ability to (1) persist type DataFrame, (2) apply averages, counts, summations, etc. and (3) enables groupby on multiple columns while maintaining legibility.

df.groupby(['att1', 'att2']).agg({'att1': "count", 'att3': "sum",'att4': 'mean'})

using your values...

df.groupby(['Name', 'Fruit']).agg({'Number': "sum"})

@YOBEN_S 2018-11-21 03:01:52

You can set the groupby column to index then using sum with level

df.set_index(['Fruit','Name']).sum(level=[0,1])
Out[175]: 
               Number
Fruit   Name         
Apples  Bob        16
        Mike        9
        Steve      10
Oranges Bob        67
        Tom        15
        Mike       57
        Tony        1
Grapes  Bob        35
        Tom        87
        Tony       15

@Gazala Muhamed 2018-07-02 10:01:31

If you want to keep the original columns Fruit and Name, use reset_index(). Otherwise Fruit and Name will become part of the index.

df.groupby(['Fruit','Name'])['Number'].sum().reset_index()

Fruit   Name       Number
Apples  Bob        16
Apples  Mike        9
Apples  Steve      10
Grapes  Bob        35
Grapes  Tom        87
Grapes  Tony       15
Oranges Bob        67
Oranges Mike       57
Oranges Tom        15
Oranges Tony        1

As seen in the other answers:

df.groupby(['Fruit','Name'])['Number'].sum()

               Number
Fruit   Name         
Apples  Bob        16
        Mike        9
        Steve      10
Grapes  Bob        35
        Tom        87
        Tony       15
Oranges Bob        67
        Mike       57
        Tom        15
        Tony        1

@jared 2018-03-11 00:29:59

df.groupby(['Fruit','Name'])['Number'].sum()

You can select different columns to sum numbers.

@Saurabh 2016-10-08 11:40:26

Also you can use agg function,

df.groupby(['Name', 'Fruit'])['Number'].agg('sum')

@Gaurang Tandon 2019-05-08 15:53:03

This differs from the accepted answer in that this returns a Series whereas the other returns a GroupBy object.

@Jakub Kukul 2019-08-21 17:05:32

@GaurangTandon to get DataFrame object instead (like in the accepted answer), use double square brackets around 'Number', i.e.: df.groupby(['Name', 'Fruit'])[['Number']].agg('sum')

@avirr 2019-10-09 20:39:18

Very helpful in cleaning up badly-encoded query report.

@Demetri Pananos 2016-10-07 18:35:14

Both the other answers accomplish what you want.

You can use the pivot functionality to arrange the data in a nice table

df.groupby(['Fruit','Name'],as_index = False).sum().pivot('Fruit','Name').fillna(0)



Name    Bob     Mike    Steve   Tom    Tony
Fruit                   
Apples  16.0    9.0     10.0    0.0     0.0
Grapes  35.0    0.0     0.0     87.0    15.0
Oranges 67.0    57.0    0.0     15.0    1.0

Related Questions

Sponsored Content

22 Answered Questions

[SOLVED] How to iterate over rows in a DataFrame in Pandas?

23 Answered Questions

[SOLVED] Adding new column to existing DataFrame in Python pandas

15 Answered Questions

[SOLVED] Delete column from pandas DataFrame

17 Answered Questions

[SOLVED] Selecting multiple columns in a pandas dataframe

26 Answered Questions

[SOLVED] Renaming columns in pandas

19 Answered Questions

[SOLVED] Get list from pandas DataFrame column headers

16 Answered Questions

[SOLVED] "Large data" work flows using pandas

14 Answered Questions

[SOLVED] Group By Multiple Columns

7 Answered Questions

16 Answered Questions

[SOLVED] Pandas - How to flatten a hierarchical index in columns

Sponsored Content