[SOLVED] Faster ways to calculate frequencies and cast from long to wide

I am trying to obtain counts of each combination of levels of two variables, "week" and "id". I'd like the result to have "id" as rows, and "week" as columns, and the counts as the values.

Example of what I've tried so far (tried a bunch of other things, including adding a dummy variable = 1 and then `fun.aggregate = sum` over that):

``````library(plyr)
ddply(data, .(id), dcast, id ~ week, value_var = "id",
fun.aggregate = length, fill = 0, .parallel = TRUE)
``````

However, I must be doing something wrong because this function is not finishing. Is there a better way to do this?

Input:

``````id      week
1       1
1       2
1       3
1       1
2       3
``````

Output:

``````  1  2  3
1 2  1  1
2 0  0  1
``````

@Ronak Shah 2019-02-05 00:49:54

A `tidyverse` option could be :

``````library(dplyr)
library(tidyr)

df %>%
count(id, week) %>%
pivot_wider(names_from = week, values_from = n, values_fill = list(n = 0))
#spread(week, n, fill = 0) #In older version of tidyr

#     id   `1`   `2`   `3`
#   <dbl> <dbl> <dbl> <dbl>
#1     1     2     1     1
#2     2     0     0     1
``````

data

``````df <- structure(list(id = c(1L, 1L, 1L, 1L, 2L), week = c(1L, 2L, 3L,
1L, 3L)), class = "data.frame", row.names = c(NA, -5L))
``````

@Joshua Ulrich 2011-11-18 17:16:25

You could just use the `table` command:

``````table(data\$id,data\$week)

1 2 3
1 2 1 1
2 0 0 1
``````

If "id" and "week" are the only columns in your data frame, you can simply use:

``````table(data)
#    week
# id  1 2 3
#   1 2 1 1
#   2 0 0 1
``````

@Andrie 2011-11-18 17:17:55

+1 Blast. You have a knack of making my solutions look totally long-winded, roundabout and pedestrian.

@Patrick Burns 2011-11-18 18:15:45

If you have a lot of data and operations that can't be simplified so much, then the 'data.table' package may help you.

@mnel 2012-09-14 02:42:45

The reason `ddply` is taking so long is that the splitting by group is not run in parallel (only the computations on the 'splits'), therefore with a large number of groups it will be slow (and `.parallel = T`) will not help.

An approach using `data.table::dcast` (`data.table` version >= 1.9.2) should be extremely efficient in time and memory. In this case, we can rely on default argument values and simply use:

``````library(data.table)
dcast(setDT(data), id ~ week)
# Using 'week' as value column. Use 'value.var' to override
# Aggregate function missing, defaulting to 'length'
#    id 1 2 3
# 1:  1 2 1 1
# 2:  2 0 0 1
``````

Or setting the arguments explicitly:

``````dcast(setDT(data), id ~ week, value.var = "week", fun = length)
#    id 1 2 3
# 1:  1 2 1 1
# 2:  2 0 0 1
``````

For pre-`data.table` 1.9.2 alternatives, see edits.

@Andrie 2011-11-18 17:14:59

You don't need `ddply` for this. The `dcast` from `reshape2` is sufficient:

``````dat <- data.frame(
id = c(rep(1, 4), 2),
week = c(1:3, 1, 3)
)

library(reshape2)
dcast(dat, id~week, fun.aggregate=length)

id 1 2 3
1  1 2 1 1
2  2 0 0 1
``````

Edit : For a base R solution (other than `table` - as posted by Joshua Uhlrich), try `xtabs`:

``````xtabs(~id+week, data=dat)

week
id  1 2 3
1 2 1 1
2 0 0 1
``````

[SOLVED] data.table vs dplyr: can one do something well the other can't or does poorly?

• 2014-01-29 15:21:45
• BrodieG
• 126222 View
• 771 Score
• Tags:   r data.table dplyr

[SOLVED] How to reshape data from long to wide format

• 2011-05-04 22:27:50
• Steve
• 212130 View
• 268 Score
• Tags:   r reshape r-faq

[SOLVED] dplyr: mean of a group count

• 2015-12-30 13:24:05
• Ryan Castner
• 2563 View
• 3 Score
• Tags:   r aggregate dplyr

Consolidating dataframe count

• 2017-10-05 18:00:29
• kmathers
• 33 View
• 0 Score
• Tags:   r dplyr plyr

[SOLVED] R: hourly to daily precipitation, Package plyr, ER:"no numeric values"

• 2014-04-01 07:52:30
• Anne
• 1101 View
• 0 Score
• Tags:   r sum plyr

[SOLVED] represent all levels of grouping factor with ddply

• 2014-01-13 16:38:44
• user1320502
• 552 View
• 0 Score