By ely

2012-09-25 19:05:26 8 Comments

Is there a pandas built-in way to apply two different aggregating functions f1, f2 to the same column df["returns"], without having to call agg() multiple times?

Example dataframe:

import pandas as pd
import datetime as dt
df = pd.DataFrame({
         "date"    :  [, x, 1) for x in range(1, 11)], 
         "returns" :  0.05 * np.random.randn(10), 
         "dummy"   :  np.repeat(1, 10)

The syntactically wrong, but intuitively right, way to do it would be:

# Assume `f1` and `f2` are defined for aggregating.
df.groupby("dummy").agg({"returns": f1, "returns": f2})

Obviously, Python doesn't allow duplicate keys. Is there any other manner for expressing the input to agg()? Perhaps a list of tuples [(column, function)] would work better, to allow multiple functions applied to the same column? But agg() seems like it only accepts a dictionary.

Is there a workaround for this besides defining an auxiliary function that just applies both of the functions inside of it? (How would this work with aggregation anyway?)


@cs95 2019-01-22 01:50:41

TLDR; Pandas groupby.agg has a new, easier syntax for specifying (1) aggregations on multiple columns, and (2) multiple aggregations on a column. So, to do this for pandas >= 0.25, use

df.groupby('dummy').agg(Mean=('returns', 'mean'), Sum=('returns', 'sum'))

           Mean       Sum
1      0.036901  0.369012


df.groupby('dummy')['returns'].agg(Mean='mean', Sum='sum')

           Mean       Sum
1      0.036901  0.369012

Pandas >= 0.25: Named Aggregation

Pandas has changed the behavior of GroupBy.agg in favour of a more intuitive syntax for specifying named aggregations. See the 0.25 docs section on Enhancements as well as relevant GitHub issues GH18366 and GH26512.

From the documentation,

To support column-specific aggregation with control over the output column names, pandas accepts the special syntax in GroupBy.agg(), known as “named aggregation”, where

  • The keywords are the output column names
  • The values are tuples whose first element is the column to select and the second element is the aggregation to apply to that column. Pandas provides the pandas.NamedAgg namedtuple with the fields ['column', 'aggfunc'] to make it clearer what the arguments are. As usual, the aggregation can be a callable or a string alias.

You can now pass a tuple via keyword arguments. The tuples follow the format of (<colName>, <aggFunc>).

import pandas as pd

# '0.25.0.dev0+840.g989f912ee'

# Setup
df = pd.DataFrame({'kind': ['cat', 'dog', 'cat', 'dog'],
                   'height': [9.1, 6.0, 9.5, 34.0],
                   'weight': [7.9, 7.5, 9.9, 198.0]

    max_height=('height', 'max'), min_weight=('weight', 'min'),)

      max_height  min_weight
cat          9.5         7.9
dog         34.0         7.5

Alternatively, you can use pd.NamedAgg (essentially a namedtuple) which makes things more explicit.

    max_height=pd.NamedAgg(column='height', aggfunc='max'), 
    min_weight=pd.NamedAgg(column='weight', aggfunc='min')

      max_height  min_weight
cat          9.5         7.9
dog         34.0         7.5

It is even simpler for Series, just pass the aggfunc to a keyword argument.

df.groupby('kind')['height'].agg(max_height='max', min_height='min')    

      max_height  min_height
cat          9.5         9.1
dog         34.0         6.0       

Lastly, if your column names aren't valid python identifiers, use a dictionary with unpacking:

df.groupby('kind')['height'].agg(**{'max height': 'max', ...})

Pandas < 0.25

In more recent versions of pandas leading upto 0.24, if using a dictionary for specifying column names for the aggregation output, you will get a FutureWarning:

df.groupby('dummy').agg({'returns': {'Mean': 'mean', 'Sum': 'sum'}})
# FutureWarning: using a dict with renaming is deprecated and will be removed 
# in a future version

Using a dictionary for renaming columns is deprecated in v0.20. On more recent versions of pandas, this can be specified more simply by passing a list of tuples. If specifying the functions this way, all functions for that column need to be specified as tuples of (name, function) pairs.

df.groupby("dummy").agg({'returns': [('op1', 'sum'), ('op2', 'mean')]})

            op1       op2
1      0.328953  0.032895


df.groupby("dummy")['returns'].agg([('op1', 'sum'), ('op2', 'mean')])

            op1       op2
1      0.328953  0.032895

@NKSHELL 2019-10-18 17:40:00

This should be the top answer because of using a more clear and clean solution using the newer version of the interface.

@victor 2019-12-08 01:31:53

The examples used for named aggregation doesn't solve the original problem of using multiple aggregations on the same column. For example, can you aggregate by both min and max for height without first subsetting for df.groupby('kind')['height']?

@cs95 2019-12-08 09:21:52

@victor I added a TLDR at the top of the answer that directly addresses the question. And the answer to your second question is yes, please take a look at the edit on my answer.

@Onur Ece 2020-02-18 16:45:29

A more generic code to the last example of your >=0.25 answer to handle aggregating multiple columns like this would've been great. df.groupby("kind").agg(**{ 'max height': pd.NamedAgg(column='height', aggfunc=max), 'min weight': pd.NamedAgg(column='weight', aggfunc=min) })

@bmu 2012-11-27 20:57:33

You can simply pass the functions as a list:

In [20]: df.groupby("dummy").agg({"returns": [np.mean, np.sum]})
           mean       sum
1      0.036901  0.369012

or as a dictionary:

In [21]: df.groupby('dummy').agg({'returns':
                                  {'Mean': np.mean, 'Sum': np.sum}})
           Mean       Sum
1      0.036901  0.369012

@Ben 2015-12-23 02:27:43

Is there a way to specify the result column names?

@Stewbaca 2016-01-14 17:22:04

@Ben I think you must use a rename afterwards. example by Tom Augspurger (see cell 25)

@bmu 2016-04-08 12:20:51

@Ben: I added an example

@sparc_spread 2017-03-31 14:27:29

Upvoted this yesterday, because this is such a common use case, and yet it is not in the pandas agg() documentation at all! Excellent solution.

@joelostblom 2017-06-03 15:13:55

@sparc_spread Passing multiple functions as a list is well described in the pandas documentation. Renaming and passing multiple functions as a dictionary will be deprecated in a future version of pandas. Details are in the 0.20 change log, which I also summarized elsewhere on SO.

@sparc_spread 2017-06-03 16:18:54

Thanks - will check it out

@user3226167 2018-01-03 03:21:21

@ben flatten column index:…

@ShikharDua 2018-03-27 21:45:52

What if we have to use a list of lambda functions ?

@ad_s 2018-10-02 09:42:59

As for the 'as a dictionary' example : using a dict with renaming is deprecated and will be removed in a future version

@cs95 2019-01-22 01:53:00

It has already been said, but using dictionaries for renaming output columns from age is deprecated. You can instead specify a list of tuples. See this answer.

@ely 2020-06-02 19:13:51

@cs95 The deprecation warning that appears in the dictionary-renaming case is unfortunately not good enough to help users. For example, it is not intuitive that the name keyword used in that warning message should be the new name to use, and it seems like a confusing reorganization of this much clearer dict-based approach. For example, what if the new column names need to be derived programmatically at an intermediate stage of the calculation, and thus won't be available to use this way as kwargs without constructing another dict?

@Chang She 2012-09-26 01:30:08

Would something like this work:

In [7]: df.groupby('dummy').returns.agg({'func1' : lambda x: x.sum(), 'func2' : lambda x:})
              func2     func1
1     -4.263768e-16 -0.188565

@ely 2012-09-26 17:31:20

No, this does not work. If you look at the doc string for aggregate it explicitly says that when a dict is passed, the keys must be column names. So either your example is something you typed in without checking for this error, or else Pandas breaks its own docs here.

@ely 2012-09-26 17:52:42

N/M I didn't see the extra call to returns in there. So this is the Series version of aggregate? I'm looking to do the DataFrame version of aggregate, and I want to apply several different aggregations to each column all at once.

@Chang She 2012-09-26 19:35:59

Try this: df.groupby('dummy').agg({'returns': {'func1' : lambda x: x.sum(), 'func2' : lambda x: x.mean()}})

@ely 2012-09-26 19:39:49

It gives an assertion error with no message. From the looks of the code (, lines 406-408, version 0.7.3) it looks like it does a check at the end to make sure it's not returning more columns than there are keys within the first layer of the aggregation dictionary.

@Chang She 2012-09-26 21:11:17

Works fine on master. You want to try updating?

@ely 2012-09-26 21:13:26

Can't: it's a network maintined Python install. I can only use the packages in our network version, which is going to remain at 0.7.3 for a while.

Related Questions

Sponsored Content

9 Answered Questions

[SOLVED] Change data type of columns in Pandas

27 Answered Questions

[SOLVED] Renaming columns in pandas

18 Answered Questions

[SOLVED] Selecting multiple columns in a pandas dataframe

22 Answered Questions

[SOLVED] How to iterate over rows in a DataFrame in Pandas

15 Answered Questions

[SOLVED] Delete column from pandas DataFrame

24 Answered Questions

[SOLVED] Adding new column to existing DataFrame in Python pandas

5 Answered Questions

[SOLVED] Catch multiple exceptions in one line (except block)

19 Answered Questions

[SOLVED] Get list from pandas DataFrame column headers

16 Answered Questions

[SOLVED] "Large data" work flows using pandas

10 Answered Questions

[SOLVED] How to select rows from a DataFrame based on column values?

Sponsored Content