By binaryEcon


2019-05-15 16:25:46 8 Comments

I have following dataframe:

import pandas as pd
df = pd.DataFrame(data=[[1,2,3,4,5,6,1,2,3], [7,8,9,10,11,12,7,8,9], [13,14,15,16,17,18,4,5,6]], index=pd.date_range('2004-01-01', '2004-01-03'))
df.columns = pd.MultiIndex.from_product([['x', 'y', 'z'], list('abc')])

df
                 x           y          z      
             a   b   c   a   b   c  a  b  c
2004-01-01   1   2   3   4   5   6  1  2  3
2004-01-02   7   8   9  10  11  12  7  8  9
2004-01-03  13  14  15  16  17  18  4  5  6

I want to sum groups of columns at the first level (level=0) and get something like this:

                (x+z)           y      
             a   b   c   a   b   c
2004-01-01   2   4   6   4   5   6
2004-01-02  14  16  18  10  11  12
2004-01-03  17  19  21  16  17  18

What I have tried:

mapping = {'x': 'r1', 'y': 'r2', 'z': 'r1'}
df.groupby(mapping, axis=1, level=0).apply(lambda g: g.sum(axis=1, level=1))

But I got an error.

Thanks for any advise.

2 comments

@ALollz 2019-05-15 16:46:33

Can map the first level to create a new MultiIndex used to group. This collapses the MultiIndex (to tuples) so we set it back.

d = {'x': 'r1', 'y': 'r2', 'z': 'r1'}
idx = pd.MultiIndex.from_tuples([(d.get(x, x), y) for x, y in df.columns])

df1 = df.groupby(idx, axis=1).sum()
df1.columns = pd.MultiIndex.from_tuples(df1.columns)

            r1          r2        
             a   b   c   a   b   c
2004-01-01   2   4   6   4   5   6
2004-01-02  14  16  18  10  11  12
2004-01-03  17  19  21  16  17  18

@binaryEcon 2019-05-17 13:26:31

A little improvement in the second line of code: Instead of using direct access to the dict "d[x]", you can use "d.get(x, x)". With that you don't have to specify unmapped columns in the dict.

@cs95 2019-05-15 16:29:10

Summing two levels is simple as long as the lower level keys are the same. You can sum and then concatenate:

pd.concat([df['x'] + df['z'], df['y']], keys=['(x+z)', 'y'], axis=1)

           (x+z)           y        
               a   b   c   a   b   c
2004-01-01     2   4   6   4   5   6
2004-01-02    14  16  18  10  11  12
2004-01-03    17  19  21  16  17  18

Don't forget to assign the result back to a variable.


If you need to sum an arbitrary number of values, slice with pd.IndexSlice and use sum:

cols_to_sum = ['x', 'y']
sums = df.loc[:, pd.IndexSlice[cols_to_sum]].sum(level=1, axis=1)
sums.columns = pd.MultiIndex.from_product([['+'.join(cols_to_sum)], sums.columns])
sums

           x+y        
             a   b   c
2004-01-01   5   7   9
2004-01-02  17  19  21
2004-01-03  29  31  33

To concatenate this back, use concat as usual,

pd.concat([sums, df.drop(cols_to_sum, axis=1, level=0)], axis=1)

           x+y          z      
             a   b   c  a  b  c
2004-01-01   5   7   9  1  2  3
2004-01-02  17  19  21  7  8  9
2004-01-03  29  31  33  4  5  6

Related Questions

Sponsored Content

41 Answered Questions

[SOLVED] How to merge two dictionaries in a single expression?

9 Answered Questions

[SOLVED] Select rows from a DataFrame based on values in a column in pandas

22 Answered Questions

[SOLVED] Renaming columns in pandas

30 Answered Questions

[SOLVED] How do I check if a list is empty?

  • 2008-09-10 06:20:11
  • Ray Vega
  • 2313383 View
  • 3236 Score
  • 30 Answer
  • Tags:   python list

17 Answered Questions

[SOLVED] How to iterate over rows in a DataFrame in Pandas?

22 Answered Questions

[SOLVED] How do I list all files of a directory?

  • 2010-07-08 19:31:22
  • duhhunjonn
  • 3378619 View
  • 3474 Score
  • 22 Answer
  • Tags:   python directory

38 Answered Questions

[SOLVED] How do I check whether a file exists without exceptions?

14 Answered Questions

[SOLVED] "Large data" work flows using pandas

14 Answered Questions

[SOLVED] Delete column from pandas DataFrame

34 Answered Questions

[SOLVED] How do I sort a dictionary by value?

Sponsored Content