By Owen


2014-05-02 19:07:56 8 Comments

I feel like there is a better way than this:

import pandas as pd
df = pd.DataFrame(
    [['A', 'X', 3], ['A', 'X', 5], ['A', 'Y', 7], ['A', 'Y', 1],
     ['B', 'X', 3], ['B', 'X', 1], ['B', 'X', 3], ['B', 'Y', 1],
     ['C', 'X', 7], ['C', 'Y', 4], ['C', 'Y', 1], ['C', 'Y', 6]],
    columns=['c1', 'c2', 'v1'])
def callback(x):
    x['seq'] = range(1, x.shape[0] + 1)
    return x
df = df.groupby(['c1', 'c2']).apply(callback)
print df

To achieve this:

   c1 c2  v1  seq
0   A  X   3    1
1   A  X   5    2
2   A  Y   7    1
3   A  Y   1    2
4   B  X   3    1
5   B  X   1    2
6   B  X   3    3
7   B  Y   1    1
8   C  X   7    1
9   C  Y   4    1
10  C  Y   1    2
11  C  Y   6    3

Is there a way to do it that avoids the callback?

2 comments

@Shaina Raza 2020-05-12 08:50:21

This might be useful

df = df.sort_values(['userID', 'date'])
grp = df.groupby('userID')['ItemID'].aggregate(lambda x: '->'.join(tuple(x))).reset_index()
print(grp)

it will create a sequence like this enter image description here

@Jeff 2014-05-02 19:11:17

use cumcount(), see docs here

In [4]: df.groupby(['c1', 'c2']).cumcount()
Out[4]: 
0     0
1     1
2     0
3     1
4     0
5     1
6     2
7     0
8     0
9     0
10    1
11    2
dtype: int64

If you want orderings starting at 1

In [5]: df.groupby(['c1', 'c2']).cumcount()+1
Out[5]: 
0     1
1     2
2     1
3     2
4     1
5     2
6     3
7     1
8     1
9     1
10    2
11    3
dtype: int64

@Boris 2019-11-14 11:29:43

how can you add the count as an extra column?

@JohanC 2020-01-15 14:49:33

@Boris Use df['seq'] = df.groupby(['c1', 'c2']).cumcount()

@Bowen Liu 2020-04-23 15:57:52

Not OP but thanks a lot for this great answer. Is it safe to assume that the result of cumcount() will always have the same length as the original dataframe, and that you group by the columns that you want to do cumcount on?

Related Questions

Sponsored Content

22 Answered Questions

[SOLVED] How to iterate over rows in a DataFrame in Pandas?

23 Answered Questions

[SOLVED] Adding new column to existing DataFrame in Python pandas

15 Answered Questions

[SOLVED] Delete column from pandas DataFrame

17 Answered Questions

[SOLVED] Selecting multiple columns in a pandas dataframe

26 Answered Questions

[SOLVED] Renaming columns in pandas

19 Answered Questions

[SOLVED] Get list from pandas DataFrame column headers

16 Answered Questions

[SOLVED] "Large data" work flows using pandas

25 Answered Questions

[SOLVED] Add one row to pandas DataFrame

16 Answered Questions

10 Answered Questions

[SOLVED] How to select rows from a DataFrame based on column values?

Sponsored Content