By atomh33ls


2014-06-06 13:15:41 8 Comments

Consider a csv file:

string,date,number
a string,2/5/11 9:16am,1.0
a string,3/5/11 10:44pm,2.0
a string,4/22/11 12:07pm,3.0
a string,4/22/11 12:10pm,4.0
a string,4/29/11 11:59am,1.0
a string,5/2/11 1:41pm,2.0
a string,5/2/11 2:02pm,3.0
a string,5/2/11 2:56pm,4.0
a string,5/2/11 3:00pm,5.0
a string,5/2/14 3:02pm,6.0
a string,5/2/14 3:18pm,7.0

I can read this in, and reformat the date column into datetime format:

b=pd.read_csv('b.dat')
b['date']=pd.to_datetime(b['date'],format='%m/%d/%y %I:%M%p')

I have been trying to group the data by month. It seems like there should be an obvious way of accessing the month and grouping by that. But I can't seem to do it. Does anyone know how?

What I am currently trying is re-indexing by the date:

b.index=b['date']

I can access the month like so:

b.index.month

However I can't seem to find a function to lump together by month.

4 comments

@jpp 2018-03-28 02:01:13

One solution which avoids MultiIndex is to create a new datetime column setting day = 1. Then group by this column. Trivial example below.

df = pd.DataFrame({'Date': pd.to_datetime(['2017-10-05', '2017-10-20']),
                   'Values': [5, 10]})

# normalize day to beginning of month
df['YearMonth'] = df['Date'] - pd.offsets.MonthBegin(1)

# two alternative methods
df['YearMonth'] = df['Date'] - pd.to_timedelta(df['Date'].dt.day-1, unit='D')
df['YearMonth'] = df['Date'].map(lambda dt: dt.replace(day=1))

g = df.groupby('YearMonth')

res = g['Values'].sum()

# YearMonth
# 2017-10-01    15
# Name: Values, dtype: int64

The subtle benefit of this solution is, unlike pd.Grouper, the grouper index is normalized to the beginning of each month rather than the end, and therefore you can easily extract groups via get_group:

some_group = g.get_group('2017-10-01')

Calculating the last day of October is slightly more cumbersome. pd.Grouper, as of v0.23, does support a convention parameter, but this is only applicable for a PeriodIndex grouper.

@atomh33ls 2014-06-06 13:38:42

Managed to do it:

b=pd.read_csv('b.dat')
b.groupby(by=[b.index.month, b.index.year])

Or

b.groupby(pd.Grouper(freq='M'))  # update for v0.21+

@Karl D. 2014-06-06 16:57:16

I think the more pandonic ways are to either use resample (when it provides the functionality you need) or use a TimeGrouper: df.groupby(pd.TimeGrouper(freq='M'))

@Alexandre 2016-01-28 21:58:47

to get the result DataFrame sum or average, df.groupby(pd.TimeGrouper(freq='M')).sum() or df.groupby(pd.TimeGrouper(freq='M')).mean()

@BallpointBen 2018-12-18 01:18:13

pd.TimeGrouper has been deprecated in favor of pd.Grouper, which is a bit more flexible but still takes freq and level arguments.

@ely 2019-09-09 20:36:47

the first method doesn't not appear to work. It gives the error, 'Series object has no attribute 'month'' for a Series created via to_datetime.

@tsando 2018-10-11 10:58:29

Slightly alternative solution to @jpp's but outputting a YearMonth string:

df['YearMonth'] = pd.to_datetime(df['Date']).apply(lambda x: '{year}-{month}'.format(year=x.year, month=x.month))

res = df.groupby('YearMonth')['Values'].sum()

@PandasRocks 2018-01-20 12:38:02

(update: 2018)

Note that pd.Timegrouper is depreciated and will be removed. Use instead:

 df.groupby(pd.Grouper(freq='M'))

@jtromans 2018-02-21 13:42:52

Reading in 2018 :)

Related Questions

Sponsored Content

20 Answered Questions

[SOLVED] How to iterate over rows in a DataFrame in Pandas?

19 Answered Questions

[SOLVED] Get list from pandas DataFrame column headers

42 Answered Questions

[SOLVED] How to return only the Date from a SQL Server DateTime datatype

15 Answered Questions

[SOLVED] Selecting multiple columns in a pandas dataframe

20 Answered Questions

[SOLVED] Converting string into datetime

  • 2009-01-21 18:00:29
  • Oli
  • 2777438 View
  • 2067 Score
  • 20 Answer
  • Tags:   python datetime

13 Answered Questions

[SOLVED] Delete column from pandas DataFrame

39 Answered Questions

[SOLVED] Should I use the datetime or timestamp data type in MySQL?

23 Answered Questions

[SOLVED] Adding new column to existing DataFrame in Python pandas

10 Answered Questions

[SOLVED] How to select rows from a DataFrame based on column values?

23 Answered Questions

[SOLVED] Renaming columns in pandas

Sponsored Content