[SOLVED] Faster ways to calculate frequencies and cast from long to wide

I am trying to obtain counts of each combination of levels of two variables, "week" and "id". I'd like the result to have "id" as rows, and "week" as columns, and the counts as the values.

Example of what I've tried so far (tried a bunch of other things, including adding a dummy variable = 1 and then `fun.aggregate = sum` over that):

``````library(plyr)
ddply(data, .(id), dcast, id ~ week, value_var = "id",
fun.aggregate = length, fill = 0, .parallel = TRUE)
``````

However, I must be doing something wrong because this function is not finishing. Is there a better way to do this?

Input:

``````id      week
1       1
1       2
1       3
1       1
2       3
``````

Output:

``````  1  2  3
1 2  1  1
2 0  0  1
``````

@Ronak Shah 2019-02-05 00:49:54

A `tidyverse` option could be :

``````library(dplyr)
library(tidyr)

df %>%
count(id, week) %>%
pivot_wider(names_from = week, values_from = n, values_fill = list(n = 0))
#spread(week, n, fill = 0) #In older version of tidyr

#     id   `1`   `2`   `3`
#   <dbl> <dbl> <dbl> <dbl>
#1     1     2     1     1
#2     2     0     0     1
``````

data

``````df <- structure(list(id = c(1L, 1L, 1L, 1L, 2L), week = c(1L, 2L, 3L,
1L, 3L)), class = "data.frame", row.names = c(NA, -5L))
``````

@Joshua Ulrich 2011-11-18 17:16:25

You could just use the `table` command:

``````table(data\$id,data\$week)

1 2 3
1 2 1 1
2 0 0 1
``````

If "id" and "week" are the only columns in your data frame, you can simply use:

``````table(data)
#    week
# id  1 2 3
#   1 2 1 1
#   2 0 0 1
``````

@Andrie 2011-11-18 17:17:55

+1 Blast. You have a knack of making my solutions look totally long-winded, roundabout and pedestrian.

@Patrick Burns 2011-11-18 18:15:45

If you have a lot of data and operations that can't be simplified so much, then the 'data.table' package may help you.

@mnel 2012-09-14 02:42:45

The reason `ddply` is taking so long is that the splitting by group is not run in parallel (only the computations on the 'splits'), therefore with a large number of groups it will be slow (and `.parallel = T`) will not help.

An approach using `data.table::dcast` (`data.table` version >= 1.9.2) should be extremely efficient in time and memory. In this case, we can rely on default argument values and simply use:

``````library(data.table)
dcast(setDT(data), id ~ week)
# Using 'week' as value column. Use 'value.var' to override
# Aggregate function missing, defaulting to 'length'
#    id 1 2 3
# 1:  1 2 1 1
# 2:  2 0 0 1
``````

Or setting the arguments explicitly:

``````dcast(setDT(data), id ~ week, value.var = "week", fun = length)
#    id 1 2 3
# 1:  1 2 1 1
# 2:  2 0 0 1
``````

For pre-`data.table` 1.9.2 alternatives, see edits.

@Andrie 2011-11-18 17:14:59

You don't need `ddply` for this. The `dcast` from `reshape2` is sufficient:

``````dat <- data.frame(
id = c(rep(1, 4), 2),
week = c(1:3, 1, 3)
)

library(reshape2)
dcast(dat, id~week, fun.aggregate=length)

id 1 2 3
1  1 2 1 1
2  2 0 0 1
``````

Edit : For a base R solution (other than `table` - as posted by Joshua Uhlrich), try `xtabs`:

``````xtabs(~id+week, data=dat)

week
id  1 2 3
1 2 1 1
2 0 0 1
``````

[SOLVED] data.table vs dplyr: can one do something well the other can't or does poorly?

• 2014-01-29 15:21:45
• BrodieG
• 115897 View
• 729 Score
• 3 Answer
• Tags:   r data.table dplyr

[SOLVED] Converting data from wide to frequency counts of columns

• 2018-03-01 18:45:41
• Anne
• 30 View
• 0 Score
• 2 Answer
• Tags:   r plyr

[SOLVED] R formating long data to wide data... but with linked results

• 2018-02-25 20:50:29
• Ruth Johnson
• 68 View
• -4 Score
• 1 Answer
• Tags:   r reshape2

Long to Wide format: dcast does not conserve values (just aggregates)

• 2017-07-20 14:45:45
• Tamara Dominguez Poncelas
• 43 View
• 0 Score
• 0 Answer
• Tags:   r format reshape2

[SOLVED] Convert from long to wide format counting frequency of eliminated factor level (Prepping dataframe for input into iNEXT Online)

• 2017-07-06 23:47:30
• Danielle
• 64 View
• 1 Score
• 1 Answer
• Tags:   r

[SOLVED] trouble getting unbalanced df from wide to long

• 2016-07-14 03:43:34
• Eric Green
• 86 View
• 1 Score
• 3 Answer
• Tags:   r reshape reshape2

[SOLVED] Frequency of all combinations of two

• 2014-05-27 14:14:35
• jenswirf
• 94 View
• 0 Score
• 2 Answer
• Tags:   r