2012-03-15 15:44:54 8 Comments

From a data frame, is there a easy way to aggregate (`sum`

, `mean`

, `max`

et c) multiple variables simultaneously?

Below are some sample data:

```
library(lubridate)
days = 365*2
date = seq(as.Date("2000-01-01"), length = days, by = "day")
year = year(date)
month = month(date)
x1 = cumsum(rnorm(days, 0.05))
x2 = cumsum(rnorm(days, 0.05))
df1 = data.frame(date, year, month, x1, x2)
```

I would like to simultaneously aggregate the `x1`

and `x2`

variables from the `df2`

data frame by year and month. The following code aggregates the `x1`

variable, but is it also possible to simultaneously aggregate the `x2`

variable?

```
### aggregate variables by year month
df2=aggregate(x1 ~ year+month, data=df1, sum, na.rm=TRUE)
head(df2)
```

Any suggestions would be greatly appreciated.

### Related Questions

#### Sponsored Content

#### 13 Answered Questions

#### 13 Answered Questions

#### 9 Answered Questions

#### 2 Answered Questions

#### 3 Answered Questions

### [SOLVED] Aggregate one data frame by time intervals from another data frame

**2016-03-29 12:22:58****EdM****319**View**5**Score**3**Answer- Tags: r data.table aggregate plyr data-manipulation

## 7 comments

## @akrun 2020-01-05 21:37:58

With the

`devel`

version of`dplyr`

(version -`‘0.8.99.9000’`

), we can also use`summarise`

to apply function on multiple columns with`across`

## @Jaap 2015-10-16 10:19:12

With the

`dplyr`

package, you can use`summarise_all`

,`summarise_at`

or`summarise_if`

functions to aggregate multiple variables simultaneously. For the example dataset you can do this as follows:The result of the latter two options:

Note:

`summarise_each`

is deprecated in favor of`summarise_all`

,`summarise_at`

and`summarise_if`

.As mentioned in my comment above, you can also use the

`recast`

function from the`reshape2`

-package:which will give you the same result.

## @Jozef 2018-12-27 15:18:36

Interestingly, base R

`aggregate`

's`data.frame`

method is not showcased here, above the formula interface is used, so for completeness:More generic use of aggregate's data.frame method:Since we are providing a

`data.frame`

as`x`

and`list`

(`data.frame`

is also a`list`

) as`by`

, this is very useful if we need to use it in a dynamic manner, e.g. using other columns to be aggregated and to aggregate by is very simpleFor example like so:

## @britt 2018-08-15 16:22:53

Late to the party, but recently found another way to get the summary statistics.

`library(psych) describe(data)`

Will output: mean, min, max, standard deviation, n, standard error, kurtosis, skewness, median, and range for each variable.

## @Gregor - reinstate Monica 2019-06-17 17:07:48

The question is about doing aggregations

by group, but`describe`

doesn't do anythingby group...## @britt 2019-06-19 21:09:00

`describe.by(column, group = grouped_column)`

will group the values## @Gregor - reinstate Monica 2019-06-20 00:55:53

Well, put that in the answer then! Don't hide it in a comment!

## @EDi 2012-03-15 15:56:53

Where is this

`year()`

function from?You could also use the

`reshape2`

package for this task:## @Jaap 2016-05-13 06:17:18

The

`recast`

function (also from`reshape2`

) integrates the`melt`

and`dcast`

function in one go for tasks like this:`recast(df1, year + month ~ variable, sum, id.var = c("date", "year", "month"))`

## @Andrie 2012-03-15 15:50:01

Yes, in your

`formula`

, you can`cbind`

the numeric variables to be aggregated:See

`?aggregate`

, the`formula`

argument and the examples.## @pdb 2015-11-13 05:29:28

Is it possible for the cbind to use dynamic variables?

## @pdb 2015-11-13 06:19:09

It's worth noting that when any of the variables that is in the cbind has an NA the row will be dropped for every variable in the cbind. This is not the behavior I was expecting.

## @Clock Slave 2016-03-16 11:22:07

what if I instead of x1 and x2 I want to use all the remaining variables (other than year, month)

## @A5C1D2H2I1M1N2O1R2T1 2016-03-21 03:53:44

@ClockSlave, then you need to just use

`.`

on the LHS.`aggregate(. ~ year + month, df1, sum, na.rm = TRUE)`

. In this example,`sum`

for "date" doesn't make sense though....## @skan 2016-04-14 19:15:13

What if I don't want two variables but two functions?. For example mean and sd.

## @DatamineR 2017-06-23 16:03:17

In the case of

`NA`

s this approach is really problematic. Setting`na.rm = TRUE`

does not affect anything and the`NA`

cases are ignored...## @lmo 2017-07-13 02:05:15

@andrie. The use of

`.`

in the formula interface mentioned recently in the comments is probably worth adding to the answer.## @theforestecologist 2018-04-30 18:50:55

Is there a way to perform different functions (e.g.,

`mean`

,`max`

,`min`

,etc.) to each of the different variables in`cbind`

?## @numbercruncher 2012-03-15 23:00:07

Using the

`data.table`

package, which is fast (useful for larger datasets)https://github.com/Rdatatable/data.table/wiki

Using the plyr package

Using summarize() from the Hmisc package (column headings are messy in my example though)

## @Bulat 2018-10-13 12:00:09

why not do this for data.table option:

`dt[, .(x1.sum = sum(x1), x2.sum = sum(x2), by = c(year, month)`

?