By atomh33ls


2014-06-06 13:15:41 8 Comments

Consider a csv file:

string,date,number
a string,2/5/11 9:16am,1.0
a string,3/5/11 10:44pm,2.0
a string,4/22/11 12:07pm,3.0
a string,4/22/11 12:10pm,4.0
a string,4/29/11 11:59am,1.0
a string,5/2/11 1:41pm,2.0
a string,5/2/11 2:02pm,3.0
a string,5/2/11 2:56pm,4.0
a string,5/2/11 3:00pm,5.0
a string,5/2/14 3:02pm,6.0
a string,5/2/14 3:18pm,7.0

I can read this in, and reformat the date column into datetime format:

b=pd.read_csv('b.dat')
b['date']=pd.to_datetime(b['date'],format='%m/%d/%y %I:%M%p')

I have been trying to group the data by month. It seems like there should be an obvious way of accessing the month and grouping by that. But I can't seem to do it. Does anyone know how?

What I am currently trying is re-indexing by the date:

b.index=b['date']

I can access the month like so:

b.index.month

However I can't seem to find a function to lump together by month.

4 comments

@atomh33ls 2014-06-06 13:38:42

Managed to do it:

b = pd.read_csv('b.dat')
b.index = pd.to_datetime(b['date'],format='%m/%d/%y %I:%M%p')
b.groupby(by=[b.index.month, b.index.year])

Or

b.groupby(pd.Grouper(freq='M'))  # update for v0.21+

@Karl D. 2014-06-06 16:57:16

I think the more pandonic ways are to either use resample (when it provides the functionality you need) or use a TimeGrouper: df.groupby(pd.TimeGrouper(freq='M'))

@Alexandre 2016-01-28 21:58:47

to get the result DataFrame sum or average, df.groupby(pd.TimeGrouper(freq='M')).sum() or df.groupby(pd.TimeGrouper(freq='M')).mean()

@BallpointBen 2018-12-18 01:18:13

pd.TimeGrouper has been deprecated in favor of pd.Grouper, which is a bit more flexible but still takes freq and level arguments.

@ely 2019-09-09 20:36:47

the first method doesn't not appear to work. It gives the error, 'Series object has no attribute 'month'' for a Series created via to_datetime.

@goodside 2020-03-05 18:09:22

@ely The answer implicitly relies on the lines in the original question where b is given an index after being read from CSV. Add b.index = pd.to_datetime(b['date'],format='%m/%d/%y %I:%M%p') after the line b = pd.read_csv('b.dat'). [I've edited the answer just now too.]

@jpp 2018-03-28 02:01:13

One solution which avoids MultiIndex is to create a new datetime column setting day = 1. Then group by this column. Trivial example below.

df = pd.DataFrame({'Date': pd.to_datetime(['2017-10-05', '2017-10-20']),
                   'Values': [5, 10]})

# normalize day to beginning of month
df['YearMonth'] = df['Date'] - pd.offsets.MonthBegin(1)

# two alternative methods
df['YearMonth'] = df['Date'] - pd.to_timedelta(df['Date'].dt.day-1, unit='D')
df['YearMonth'] = df['Date'].map(lambda dt: dt.replace(day=1))

g = df.groupby('YearMonth')

res = g['Values'].sum()

# YearMonth
# 2017-10-01    15
# Name: Values, dtype: int64

The subtle benefit of this solution is, unlike pd.Grouper, the grouper index is normalized to the beginning of each month rather than the end, and therefore you can easily extract groups via get_group:

some_group = g.get_group('2017-10-01')

Calculating the last day of October is slightly more cumbersome. pd.Grouper, as of v0.23, does support a convention parameter, but this is only applicable for a PeriodIndex grouper.

@tsando 2018-10-11 10:58:29

Slightly alternative solution to @jpp's but outputting a YearMonth string:

df['YearMonth'] = pd.to_datetime(df['Date']).apply(lambda x: '{year}-{month}'.format(year=x.year, month=x.month))

res = df.groupby('YearMonth')['Values'].sum()

@PandasRocks 2018-01-20 12:38:02

(update: 2018)

Note that pd.Timegrouper is depreciated and will be removed. Use instead:

 df.groupby(pd.Grouper(freq='M'))

@Kim 2020-03-25 12:09:55

Find the Grouper docs here and the frequency specifications (freq=...) here. Some examples are freq=D for days, freq=B for business days, freq=W for weeks or even freq=Q for quarters.

Related Questions

Sponsored Content

27 Answered Questions

[SOLVED] Renaming columns in pandas

18 Answered Questions

[SOLVED] Selecting multiple columns in a pandas dataframe

22 Answered Questions

[SOLVED] How to iterate over rows in a DataFrame in Pandas

15 Answered Questions

[SOLVED] Delete column from pandas DataFrame

24 Answered Questions

[SOLVED] Adding new column to existing DataFrame in Python pandas

19 Answered Questions

[SOLVED] Get list from pandas DataFrame column headers

9 Answered Questions

20 Answered Questions

[SOLVED] Converting string into datetime

  • 2009-01-21 18:00:29
  • Oli
  • 3028082 View
  • 2216 Score
  • 20 Answer
  • Tags:   python datetime

39 Answered Questions

[SOLVED] Should I use the datetime or timestamp data type in MySQL?

10 Answered Questions

[SOLVED] How to select rows from a DataFrame based on column values?

Sponsored Content