By user592419


2011-11-18 17:07:18 8 Comments

I am trying to obtain counts of each combination of levels of two variables, "week" and "id". I'd like the result to have "id" as rows, and "week" as columns, and the counts as the values.

Example of what I've tried so far (tried a bunch of other things, including adding a dummy variable = 1 and then fun.aggregate = sum over that):

library(plyr)
ddply(data, .(id), dcast, id ~ week, value_var = "id", 
        fun.aggregate = length, fill = 0, .parallel = TRUE)

However, I must be doing something wrong because this function is not finishing. Is there a better way to do this?

Input:

id      week
1       1
1       2
1       3
1       1
2       3

Output:

  1  2  3
1 2  1  1
2 0  0  1

4 comments

@Ronak Shah 2019-02-05 00:49:54

A tidyverse option could be :

library(dplyr)
library(tidyr)

df %>%
  count(id, week) %>%
  pivot_wider(names_from = week, values_from = n, values_fill = list(n = 0))
  #spread(week, n, fill = 0) #In older version of tidyr

#     id   `1`   `2`   `3`
#   <dbl> <dbl> <dbl> <dbl>
#1     1     2     1     1
#2     2     0     0     1

data

df <- structure(list(id = c(1L, 1L, 1L, 1L, 2L), week = c(1L, 2L, 3L, 
1L, 3L)), class = "data.frame", row.names = c(NA, -5L))

@Joshua Ulrich 2011-11-18 17:16:25

You could just use the table command:

table(data$id,data$week)

    1 2 3
  1 2 1 1
  2 0 0 1

If "id" and "week" are the only columns in your data frame, you can simply use:

table(data)
#    week
# id  1 2 3
#   1 2 1 1
#   2 0 0 1

@Andrie 2011-11-18 17:17:55

+1 Blast. You have a knack of making my solutions look totally long-winded, roundabout and pedestrian.

@Patrick Burns 2011-11-18 18:15:45

If you have a lot of data and operations that can't be simplified so much, then the 'data.table' package may help you.

@mnel 2012-09-14 02:42:45

The reason ddply is taking so long is that the splitting by group is not run in parallel (only the computations on the 'splits'), therefore with a large number of groups it will be slow (and .parallel = T) will not help.

An approach using data.table::dcast (data.table version >= 1.9.2) should be extremely efficient in time and memory. In this case, we can rely on default argument values and simply use:

library(data.table) 
dcast(setDT(data), id ~ week)
# Using 'week' as value column. Use 'value.var' to override
# Aggregate function missing, defaulting to 'length'
#    id 1 2 3
# 1:  1 2 1 1
# 2:  2 0 0 1

Or setting the arguments explicitly:

dcast(setDT(data), id ~ week, value.var = "week", fun = length)
#    id 1 2 3
# 1:  1 2 1 1
# 2:  2 0 0 1

For pre-data.table 1.9.2 alternatives, see edits.

@Andrie 2011-11-18 17:14:59

You don't need ddply for this. The dcast from reshape2 is sufficient:

dat <- data.frame(
    id = c(rep(1, 4), 2),
    week = c(1:3, 1, 3)
)

library(reshape2)
dcast(dat, id~week, fun.aggregate=length)

  id 1 2 3
1  1 2 1 1
2  2 0 0 1

Edit : For a base R solution (other than table - as posted by Joshua Uhlrich), try xtabs:

xtabs(~id+week, data=dat)

   week
id  1 2 3
  1 2 1 1
  2 0 0 1

Related Questions

Sponsored Content

4 Answered Questions

[SOLVED] data.table vs dplyr: can one do something well the other can't or does poorly?

  • 2014-01-29 15:21:45
  • BrodieG
  • 126222 View
  • 771 Score
  • 4 Answer
  • Tags:   r data.table dplyr

11 Answered Questions

[SOLVED] How to reshape data from long to wide format

  • 2011-05-04 22:27:50
  • Steve
  • 212130 View
  • 268 Score
  • 11 Answer
  • Tags:   r reshape r-faq

1 Answered Questions

[SOLVED] dplyr: mean of a group count

  • 2015-12-30 13:24:05
  • Ryan Castner
  • 2563 View
  • 3 Score
  • 1 Answer
  • Tags:   r aggregate dplyr

1 Answered Questions

Consolidating dataframe count

  • 2017-10-05 18:00:29
  • kmathers
  • 33 View
  • 0 Score
  • 1 Answer
  • Tags:   r dplyr plyr

2 Answered Questions

2 Answered Questions

[SOLVED] R: hourly to daily precipitation, Package plyr, ER:"no numeric values"

  • 2014-04-01 07:52:30
  • Anne
  • 1101 View
  • 0 Score
  • 2 Answer
  • Tags:   r sum plyr

3 Answered Questions

[SOLVED] mapply of aggregate with named output columns in R

1 Answered Questions

[SOLVED] represent all levels of grouping factor with ddply

  • 2014-01-13 16:38:44
  • user1320502
  • 552 View
  • 0 Score
  • 1 Answer
  • Tags:   r plyr

2 Answered Questions

[SOLVED] Use ddply() to aggregate relative histogram counts

Sponsored Content