By Steve

2011-05-04 22:27:50 8 Comments

I'm having trouble rearranging the following data frame:

dat1 <- data.frame(
    name = rep(c("firstName", "secondName"), each=4),
    numbers = rep(1:4, 2),
    value = rnorm(8)

       name  numbers      value
1  firstName       1  0.3407997
2  firstName       2 -0.7033403
3  firstName       3 -0.3795377
4  firstName       4 -0.7460474
5 secondName       1 -0.8981073
6 secondName       2 -0.3347941
7 secondName       3 -0.5013782
8 secondName       4 -0.1745357

I want to reshape it so that each unique "name" variable is a rowname, with the "values" as observations along that row and the "numbers" as colnames. Sort of like this:

     name          1          2          3         4
1  firstName  0.3407997 -0.7033403 -0.3795377 -0.7460474
5 secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357

I've looked at melt and cast and a few other things, but none seem to do the job.


@Adam Erickson 2018-08-02 23:50:52

The base reshape function works perfectly fine:

df <- data.frame(
  year   = c(rep(2000, 12), rep(2001, 12)),
  month  = rep(1:12, 2),
  values = rnorm(24)
df_wide <- reshape(df, idvar="year", timevar="month", v.names="values", direction="wide", sep="_")


  • idvar is the column of classes that separates rows
  • timevar is the column of classes to cast wide
  • v.names is the column containing numeric values
  • direction specifies wide or long format
  • the optional sep argument is the separator used in between timevar class names and v.names in the output data.frame.

If no idvar exists, create one before using the reshape() function:

df$id   <- c(rep("year1", 12), rep("year2", 12))
df_wide <- reshape(df, idvar="id", timevar="month", v.names="values", direction="wide", sep="_")

Just remember that idvar is required! The timevar and v.names part is easy. The output of this function is more predictable than some of the others, as everything is explicitly defined.

@Ronak Shah 2016-09-02 07:52:19

Using base R aggregate function:

aggregate(value ~ name, dat1, I)

# name           value.1  value.2  value.3  value.4
#1 firstName      0.4145  -0.4747   0.0659   -0.5024
#2 secondName    -0.8259   0.1669  -0.8962    0.1681

@dmi3kno 2017-12-23 23:01:37

There's very powerful new package from genius data scientists at Win-Vector (folks that made vtreat, seplyr and replyr) called cdata. It implements "coordinated data" principles described in this document and also in this blog post. The idea is that regardless how you organize your data, it should be possible to identify individual data points using a system of "data coordinates". Here's a excerpt from the recent blog post by John Mount:

The whole system is based on two primitives or operators cdata::moveValuesToRowsD() and cdata::moveValuesToColumnsD(). These operators have pivot, un-pivot, one-hot encode, transpose, moving multiple rows and columns, and many other transforms as simple special cases.

It is easy to write many different operations in terms of the cdata primitives. These operators can work-in memory or at big data scale (with databases and Apache Spark; for big data use the cdata::moveValuesToRowsN() and cdata::moveValuesToColumnsN() variants). The transforms are controlled by a control table that itself is a diagram of (or picture of) the transform.

We will first build the control table (see blog post for details) and then perform the move of data from rows to columns.

# first build the control table
pivotControlTable <- buildPivotControlTableD(table = dat1, # reference to dataset
                        columnToTakeKeysFrom = 'numbers', # this will become column headers
                        columnToTakeValuesFrom = 'value', # this contains data
                        sep="_")                          # optional for making column names

# perform the move of data to columns
dat_wide <- moveValuesToColumnsD(tallTable =  dat1, # reference to dataset
                    keyColumns = c('name'),         # this(these) column(s) should stay untouched 
                    controlTable = pivotControlTable# control table above

#>         name  numbers_1  numbers_2  numbers_3  numbers_4
#> 1  firstName  0.3407997 -0.7033403 -0.3795377 -0.7460474
#> 2 secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357

@Jim M. 2011-05-04 22:58:48

Using your example dataframe, we could:

xtabs(value ~ name + numbers, data = dat1)

@cloudscomputes 2017-10-20 04:44:02

this one is good, but the result is of format table which not may be not so easy to handle as data.frame or data.table, both has plenty of packages

@SymbolixAU 2016-03-27 22:35:51

Another option if performance is a concern is to use data.table's extension of reshape2's melt & dcast functions

(Reference: Efficient reshaping using data.tables)


dcast(dat1, name ~ numbers, value.var = "value")

#          name          1          2         3         4
# 1:  firstName  0.1836433 -0.8356286 1.5952808 0.3295078
# 2: secondName -0.8204684  0.4874291 0.7383247 0.5757814

And, as of data.table v1.9.6 we can cast on multiple columns

## add an extra column
dat1[, value2 := value * 2]

## cast multiple value columns
dcast(dat1, name ~ numbers, value.var = c("value", "value2"))

#          name    value_1    value_2   value_3   value_4   value2_1   value2_2 value2_3  value2_4
# 1:  firstName  0.1836433 -0.8356286 1.5952808 0.3295078  0.3672866 -1.6712572 3.190562 0.6590155
# 2: secondName -0.8204684  0.4874291 0.7383247 0.5757814 -1.6409368  0.9748581 1.476649 1.1515627

@joel.wilson 2017-08-31 12:06:44

data.table approach is the best ! very efficient ... you will see the difference when name is a combination of 30-40 columns !!

@T.Fung 2019-03-19 22:54:46

What if I wanted to take the max?

@SymbolixAU 2019-03-19 23:54:40

@T.Fung I don't understand what you're asking. Might be best to open a new question?

@T.Fung 2019-03-20 10:11:06

@SymbolixAU in op's question 'name' and 'numbers' are unique combinations. What if they were not and I wanted to fetch the max value for each combination after pivoting? Not a problem if too fiddly a question. Just food for thoughts. Thank you.

@Gregor 2014-07-29 19:37:09

The new (in 2014) tidyr package also does this simply, with gather()/spread() being the terms for melt/cast.

spread(dat1, key = numbers, value = value)

From github,

tidyr is a reframing of reshape2 designed to accompany the tidy data framework, and to work hand-in-hand with magrittr and dplyr to build a solid pipeline for data analysis.

Just as reshape2 did less than reshape, tidyr does less than reshape2. It's designed specifically for tidying data, not the general reshaping that reshape2 does, or the general aggregation that reshape did. In particular, built-in methods only work for data frames, and tidyr provides no margins or aggregation.

@Jake 2017-04-12 13:01:40

Just wanted to add a link to the R Cookbook page that discusses the use of these functions from tidyr and reshape2. It provides good examples and explanations.

@mpalanco 2015-07-14 17:44:08

Other two options:

Base package:

df <- unstack(dat1, form = value ~ numbers)
rownames(df) <- unique(dat1$name)

sqldf package:

sqldf('SELECT name,
      MAX(CASE WHEN numbers = 1 THEN value ELSE NULL END) x1, 
      MAX(CASE WHEN numbers = 2 THEN value ELSE NULL END) x2,
      MAX(CASE WHEN numbers = 3 THEN value ELSE NULL END) x3,
      MAX(CASE WHEN numbers = 4 THEN value ELSE NULL END) x4
      FROM dat1
      GROUP BY name')

@M-M 2019-04-29 17:58:35

Instead of hardcoding numbers, the query can be set up like this: ValCol <- unique(dat1$numbers);s <- sprintf("MAX(CASE WHEN numbers = %s THEN value ELSE NULL END) `%s`,", ValCol, ValCol);mquerym <- gsub('.{1}$','',paste(s, collapse = "\n"));mquery <- paste("SELECT name,", mquerym, "FROM dat1", "GROUP BY name", sep = "\n");sqldf(mquery)

@mpalanco 2019-04-30 06:39:32

@M-M That was neat. Thank you.

@Ista 2011-05-04 22:42:14

You can do this with the reshape() function, or with the melt() / cast() functions in the reshape package. For the second option, example code is

cast(dat1, name ~ numbers)

Or using reshape2

dcast(dat1, name ~ numbers)

@Andrie 2011-05-06 11:56:54

+1 And use reshape2 for a performance gain.

@thelatemail 2017-06-21 22:37:03

It might be worth noting that just using cast or dcast will not work nicely if you don't have a clear "value" column. Try dat <- data.frame(id=c(1,1,2,2),blah=c(8,4,7,6),index=c(1,2,1,2)); dcast(dat, id ~ index); cast(dat, id ~ index) and you will not get what you expect. You need to explicitly note the value/value.var - cast(dat, id ~ index, value="blah") and dcast(dat, id ~ index, value.var="blah") for instance.

@Chase 2011-05-04 23:20:03

Using reshape function:

reshape(dat1, idvar = "name", timevar = "numbers", direction = "wide")

@aL3xa 2011-05-05 00:07:44

+1 and you don't need to rely on external packages, since reshape comes with stats. Not to mention that it's faster! =)

@hadley 2011-05-05 01:40:56

Good luck figuring out the arguments you need though

@Chase 2016-02-10 03:16:18

@indra_patil - I would likely use the reshape2 package as indicated in one of the other answers. You could create a new question that's specific to your use case and post it if you can't figure it out.

@cloudscomputes 2017-10-20 04:39:22

it seems it will create duplicated "name" columns, that may be wanted or not wanted by the author of this thread

@NoBackingDown 2017-10-26 15:18:26

reshape is an outstanding example for a horrible function API. It is very close to useless.

@Brian D 2017-11-17 17:11:44

The reshape comments and similar argument names aren't all that helpful. However, I have found that for long to wide, you need to provide data = your data.frame, idvar = the variable that identifies your groups, v.names = the variables that will become multiple columns in wide format, timevar = the variable containing the values that will be appended to v.names in wide format, direction = wide, and sep = "_". Clear enough? ;)

@vonjd 2018-11-22 15:14:51

I would say base R still wins vote-wise by a factor of about 2 to 1

Related Questions

Sponsored Content

13 Answered Questions

[SOLVED] How to join (merge) data frames (inner, outer, left, right)

6 Answered Questions

[SOLVED] Reshaping data.frame from wide to long format

4 Answered Questions

[SOLVED] Reshaping wide to long with multiple row

  • 2019-02-21 13:18:22
  • Iskandar The Pupsi
  • 73 View
  • 0 Score
  • 4 Answer
  • Tags:   r transpose

5 Answered Questions

[SOLVED] Reshape three column data frame to matrix ("long" to "wide" format)

1 Answered Questions

[SOLVED] Long to Wide Data-reshaping

  • 2018-06-19 20:46:06
  • klee907
  • 65 View
  • -2 Score
  • 1 Answer
  • Tags:   r

0 Answered Questions

How to reshape data from long to wide format so to achieve this output?

  • 2018-04-26 14:50:11
  • Marie-Eve
  • 69 View
  • 2 Score
  • 0 Answer
  • Tags:   r tidyr

2 Answered Questions

[SOLVED] Reshaping a data frame based on column names

1 Answered Questions

[SOLVED] Reshaping a dataframe from long to wide with irregular row lengths

1 Answered Questions

[SOLVED] How can I get lists of rownames as values after aggregating a dataframe?

  • 2013-03-08 16:38:43
  • Harry Palmer
  • 115 View
  • 8 Score
  • 1 Answer
  • Tags:   r reshape

1 Answered Questions

[SOLVED] Trouble getting my data into wide form with the reshape package

  • 2011-03-26 18:44:24
  • richiemorrisroe
  • 337 View
  • 1 Score
  • 1 Answer
  • Tags:   r reshape

Sponsored Content