By Dirk Calloway


2014-12-02 10:27:19 8 Comments

I have a long data set I would like to make wide and I'm curious if there is a way to do this all in one step using the reshape2 or tidyr packages in R.

The data frame df looks like this:

id  type    transactions    amount
20  income       20          100
20  expense      25          95
30  income       50          300
30  expense      45          250

I'd like to get to this:

id  income_transactions expense_transactions    income_amount   expense_amount
20       20                           25                 100             95
30       50                           45                 300             250

I know I can get part of the way there with reshape2 via for example:

dcast(df, id ~  type, value.var="transactions")

But is there a way to reshape the entire df in one shot addressing both the "transactions" and "amount" variables at once? And ideally with new more appropriate column names?

2 comments

@Arun 2015-03-16 01:40:58

With data.table v1.9.6+, we can cast multiple value.var columns simultaneously (and also use multiple aggregation functions in fun.aggregate). Please see ?dcast for more and also the examples section.

require(data.table) # v1.9.6+
dcast(dt, id ~ type, value.var=names(dt)[3:4])
#    id transactions_expense transactions_income amount_expense amount_income
# 1: 20                   25                  20             95           100
# 2: 30                   45                  50            250           300

@A5C1D2H2I1M1N2O1R2T1 2014-12-02 10:31:48

In "reshape2", you can use recast (though in my experience, this isn't a widely known function).

library(reshape2)
recast(mydf, id ~ variable + type, id.var = c("id", "type"))
#   id transactions_expense transactions_income amount_expense amount_income
# 1 20                   25                  20             95           100
# 2 30                   45                  50            250           300

You can also use base R's reshape:

reshape(mydf, direction = "wide", idvar = "id", timevar = "type")
#   id transactions.income amount.income transactions.expense amount.expense
# 1 20                  20           100                   25             95
# 3 30                  50           300                   45            250

Or, you can melt and dcast, like this (here with "data.table"):

library(data.table)
library(reshape2)
dcast.data.table(melt(as.data.table(mydf), id.vars = c("id", "type")), 
                 id ~ variable + type, value.var = "value")
#    id transactions_expense transactions_income amount_expense amount_income
# 1: 20                   25                  20             95           100
# 2: 30                   45                  50            250           300

In later versions of dcast.data.table from "data.table" (1.9.8) you will be able to do this directly. If I understand correctly, what @Arun is trying to implement would be doing the reshaping without first having to melt the data, which is what happens presently with recast, which is essentially a wrapper for a melt + dcast sequence of operations.


And, for thoroughness, here's the tidyr approach:

library(dplyr)
library(tidyr)
mydf %>% 
  gather(var, val, transactions:amount) %>% 
  unite(var2, type, var) %>% 
  spread(var2, val)
#   id expense_amount expense_transactions income_amount income_transactions
# 1 20             95                   25           100                  20
# 2 30            250                   45           300                  50

@David Arenburg 2014-12-02 11:09:04

This is a nice answer (Enlightened + Nice answer badges on the way...), but not sure what we need all this mess with tidyr, dplyr, data.table, reshape etc. when there is such a simple way of doing it with base R

@A5C1D2H2I1M1N2O1R2T1 2014-12-02 11:10:47

@DavidArenburg, speed at scale, really. I'm an old hardcore fan of reshape, but it would not be my first recommendation if someone is working with remotely large data.

@David Arenburg 2014-12-02 11:13:24

Now that I'm looking into reshapes source code it looks like it's all about endless for/apply loops. It is pity it wasn't written in C/C++ as all base R function should be

@hadley 2014-12-02 16:09:31

reshape() is simple? All I have to say to that is "bahahahaha"

@A5C1D2H2I1M1N2O1R2T1 2014-12-02 16:12:05

@hadley, I can't speak for David, but I didn't interpret his comment to mean that reshape() was simple, but rather implied that the reshape() approach here is actually reasonably straightforward.

@hadley 2014-12-02 19:49:44

In that case, the whole point of "mess"ing with tidyr, dplyr, data.table, reshape2 etc, is that they generalise better to new problems, which reshape() does not.

Related Questions

Sponsored Content

3 Answered Questions

[SOLVED] POSIXct values become numeric in reshape2 dcast

  • 2012-09-05 21:02:24
  • gkaupas
  • 2085 View
  • 6 Score
  • 3 Answer
  • Tags:   r reshape2

10 Answered Questions

[SOLVED] How to select rows from a DataFrame based on column values?

19 Answered Questions

[SOLVED] How to sort a dataframe by multiple column(s)

3 Answered Questions

[SOLVED] Rearrange dataframe using dcast using a dummy

  • 2019-01-28 14:29:45
  • Leprechault
  • 91 View
  • 2 Score
  • 3 Answer
  • Tags:   r reshape reshape2

4 Answered Questions

[SOLVED] Change default separator in cast

  • 2012-09-20 16:32:26
  • dayne
  • 1047 View
  • 5 Score
  • 4 Answer
  • Tags:   r reshape reshape2

1 Answered Questions

[SOLVED] Reshape DF from long to wide in R using Reshape2 without an aggregation function

  • 2016-01-25 19:07:15
  • Derek Darves
  • 437 View
  • 1 Score
  • 1 Answer
  • Tags:   r reshape2

2 Answered Questions

[SOLVED] Reshape a data frame to matrix from text file value.var errors

2 Answered Questions

[SOLVED] How to "unmelt" data with reshape r

  • 2014-09-19 13:57:55
  • LP_640
  • 19672 View
  • 16 Score
  • 2 Answer
  • Tags:   r reshape reshape2

Sponsored Content