By daj


2012-09-17 20:17:29 8 Comments

I have a dataframe in a wide format, with repeated measurements taken within different date ranges. In my example there are three different periods, all with their corresponding values. E.g. the first measurement (Value1) was measured in the period from DateRange1Start to DateRange1End:

ID DateRange1Start DateRange1End Value1 DateRange2Start DateRange2End Value2 DateRange3Start DateRange3End Value3
1 1/1/90 3/1/90 4.4 4/5/91 6/7/91 6.2 5/5/95 6/6/96 3.3 

I'm looking to reshape the data to a long format such that the DateRangeXStart and DateRangeXEnd columns are grouped,. Thus, what was 1 row in the original table becomes 3 rows in the new table:

ID DateRangeStart DateRangeEnd Value
1 1/1/90 3/1/90 4.4
1 4/5/91 6/7/91 6.2
1 5/5/95 6/6/96 3.3

I know there must be a way to do this with reshape2/melt/recast/tidyr, but I can't seem to figure it out how to map the multiple sets of measure variables into single sets of value columns in this particular way.

7 comments

@hplieninger 2019-03-20 13:56:52

Reshaping from wide to long format with multiple value/measure columns is now possible with the new function pivot_longer() of the current development version of tidyr. See also: https://tidyr.tidyverse.org/dev/articles/pivot.html.

This is superior to the previous tidyr strategy of gather() than spread(), because the attributes are no longer dropped (dates remain dates and numerics remain numerics in the present example).

# devtools::install_github("tidyverse/tidyr")
packageVersion("tidyr")
#> [1] '0.8.3.9000'
library(magrittr)

a <- structure(list(ID = 1L, 
                    DateRange1Start = structure(7305, class = "Date"), 
                    DateRange1End = structure(7307, class = "Date"), 
                    Value1 = 4.4, 
                    DateRange2Start = structure(7793, class = "Date"),
                    DateRange2End = structure(7856, class = "Date"), 
                    Value2 = 6.2, 
                    DateRange3Start = structure(9255, class = "Date"), 
                    DateRange3End = structure(9653, class = "Date"), 
                    Value3 = 3.3),
               row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame"))

pivot_longer() (counterpart: pivot_widee()) works similar to gather(). However, it offers additional functionality (see link above) such as multiple value columns. For this to work, a pivot spec has to be specified, which is a data frame with one row for each column involved in the reshape. In spec, the first column .name lists all column names of a and the additional columns specify how the reshape is carried out:

# First, create pivot spec with one row for each variable in 'a'
spec <- a %>% 
    tidyr::pivot_longer_spec(-ID) %>% 
    dplyr::transmute(.name = .name,
                     group = readr::parse_number(name),
                     .value = sub("DateRange[1-9]|(?<=Value)[1-9]", "", name, perl = TRUE))
spec
#> # A tibble: 9 x 3
#>   .name           group .value
#>   <chr>           <dbl> <chr> 
#> 1 DateRange1Start     1 Start 
#> 2 DateRange1End       1 End   
#> 3 Value1              1 Value 
#> 4 DateRange2Start     2 Start 
#> 5 DateRange2End       2 End   
#> 6 Value2              2 Value 
#> 7 DateRange3Start     3 Start 
#> 8 DateRange3End       3 End   
#> 9 Value3              3 Value

# Second, reshape/gather using the spec
tidyr::pivot_longer(a, spec = spec)
#> # A tibble: 3 x 5
#>      ID group End        Start      Value
#>   <int> <dbl> <date>     <date>     <dbl>
#> 1     1     1 1990-01-03 1990-01-01   4.4
#> 2     1     2 1991-07-06 1991-05-04   6.2
#> 3     1     3 1996-06-06 1995-05-05   3.3

Created on 2019-03-20 by the reprex package (v0.2.1)

@zx8754 2018-04-10 11:00:01

Using recycling:

data.frame(ID = d[, 1],
           DateRangeStart = unlist(d[, -1][, c(TRUE, FALSE, FALSE)]),
           DateRangeEnd  = unlist(d[, -1][, c(FALSE, TRUE, FALSE)]),
           Value =  unlist(d[, -1][, c(FALSE, FALSE, TRUE)]))

@Jaap 2017-11-26 07:02:26

Two additional options (with an example dataframe with more than one row to better show the working of the code):

1) with base R:

l <- lapply(split.default(d[-1], cumsum(grepl('Start$', names(d)[-1]))),
            setNames, c('DateRangeStart','DateRangeEnd','Value'))
data.frame(ID = d[,1], do.call(rbind, l), row.names = NULL)

which gives:

  ID DateRangeStart DateRangeEnd Value
1  1         1/1/90       3/1/90   4.4
2  2         1/2/90       3/2/90   6.1
3  1         4/5/91       6/7/91   6.2
4  2         4/6/91       6/8/91   3.2
5  1         5/5/95       6/6/96   3.3
6  2         5/5/97       6/6/98   1.3

2) with the tidyverse:

library(dplyr)
library(purrr)

split.default(d[-1], cumsum(grepl('Start$', names(d)[-1]))) %>%
  map_dfr(~set_names(., c('DateRangeStart','DateRangeEnd','Value'))) %>% 
  bind_cols(ID = rep(d$ID, nrow(.)/nrow(d)), .)

3) with the sjmisc-package:

library(sjmisc)
to_long(d, keys = 'group',
        values = c('DateRangeStart','DateRangeEnd','Value'), 
        c('DateRange1Start','DateRange2Start','DateRange3Start'),
        c('DateRange1End','DateRange2End','DateRange3End'),
        c('Value1','Value2','Value3'))[,-2]

If you also want a group/time column, you can adapt the approaches above to:

1) with base R:

l <- lapply(split.default(d[-1], cumsum(grepl('Start$', names(d)[-1]))),
            setNames, c('DateRangeStart','DateRangeEnd','Value'))
data.frame(ID = d[,1],
           group = rep(seq_along(l), each = nrow(d)),
           do.call(rbind, l), row.names = NULL)

which gives:

  ID group DateRangeStart DateRangeEnd Value
1  1     1         1/1/90       3/1/90   4.4
2  2     1         1/2/90       3/2/90   6.1
3  1     2         4/5/91       6/7/91   6.2
4  2     2         4/6/91       6/8/91   3.2
5  1     3         5/5/95       6/6/96   3.3
6  2     3         5/5/97       6/6/98   1.3

2) with the tidyverse:

split.default(d[-1], cumsum(grepl('Start$', names(d)[-1]))) %>%
  map_dfr(~set_names(., c('DateRangeStart','DateRangeEnd','Value'))) %>% 
  bind_cols(ID = rep(d$ID, nrow(.)/nrow(d)),
            group = rep(1:(nrow(.)/nrow(d)), each = nrow(d)), .)

3) with the sjmisc-package:

library(sjmisc)
to_long(d, keys = 'group', recode.key = TRUE,
        values = c('DateRangeStart','DateRangeEnd','Value'), 
        c('DateRange1Start','DateRange2Start','DateRange3Start'),
        c('DateRange1End','DateRange2End','DateRange3End'),
        c('Value1','Value2','Value3'))

Used data:

d <- read.table(text = "ID DateRange1Start DateRange1End Value1 DateRange2Start DateRange2End Value2 DateRange3Start DateRange3End Value3
1 1/1/90 3/1/90 4.4 4/5/91 6/7/91 6.2 5/5/95 6/6/96 3.3
2 1/2/90 3/2/90 6.1 4/6/91 6/8/91 3.2 5/5/97 6/6/98 1.3", header = TRUE, stringsAsFactors = FALSE)

@Arun 2015-02-28 20:30:21

data.table's melt function can melt into multiple columns. Using that, we can simply do:

require(data.table)
melt(setDT(dat), id=1L,
     measure=patterns("Start$", "End$", "^Value"), 
     value.name=c("DateRangeStart", "DateRangeEnd", "Value"))

#    ID variable DateRangeStart DateRangeEnd Value
# 1:  1        1         1/1/90       3/1/90   4.4
# 2:  1        2         4/5/91       6/7/91   6.2
# 3:  1        3         5/5/95       6/6/96   3.3

Alternatively, you can also reference the three sets of measure columns by the column position:

melt(setDT(dat), id = 1L, 
     measure = list(c(2,5,8), c(3,6,9), c(4,7,10)), 
     value.name = c("DateRangeStart", "DateRangeEnd", "Value"))

@AndrewMacDonald 2014-06-23 20:05:20

Here is an approach to the problem using tidyr. This is an interesting use case for its function extract_numeric(), which I used to pull out the group from the column names

library(dplyr)
library(tidyr)

a <- read.table(textConnection("
ID DateRange1Start DateRange1End Value1 DateRange2Start DateRange2End Value2 DateRange3Start DateRange3End Value3
1 1/1/90 3/1/90 4.4 4/5/91 6/7/91 6.2 5/5/95 6/6/96 3.3 
"),header=TRUE)

a %>%
  gather(variable,value,-ID) %>%
  mutate(group = extract_numeric(variable)) %>%
  mutate(variable =  gsub("\\d","",x = variable)) %>%
  spread(variable,value)

  ID group DateRangeEnd DateRangeStart Value
1  1     1       3/1/90         1/1/90   4.4
2  1     2       6/7/91         4/5/91   6.2
3  1     3       6/6/96         5/5/95   3.3

@42- 2012-09-17 20:31:51

reshape(dat, idvar="ID", direction="long", 
             varying=list(Start=c(2,5,8), End=c(3,6,9), Value=c(4,7,10)),
             v.names = c("DateRangeStart", "DateRangeEnd", "Value") )
#-------------
    ID time DateRangeStart DateRangeEnd Value
1.1  1    1          1/1/90        3/1/90    4.4
1.2  1    2          4/5/91        6/7/91    6.2
1.3  1    3          5/5/95        6/6/96    3.3

(Added the v.names per Josh's suggestion.)

@Josh O'Brien 2012-09-17 20:35:22

+1 for showing off the power of that varying= argument. Following up, the v.names argument can also pretty up those column names, like this: v.names = c("DateRangeStart", "DateRangeEnd", "Value")

@Blue Magister 2012-09-17 20:29:28

You don't need anything fancy; base R functions will do.

a <- read.table(textConnection("
ID DateRange1Start DateRange1End Value1 DateRange2Start DateRange2End Value2 DateRange3Start DateRange3End Value3
1 1/1/90 3/1/90 4.4 4/5/91 6/7/91 6.2 5/5/95 6/6/96 3.3 
"),header=TRUE)
b1 <- a[,c(1:4)]; b2 <- a[,c(1,5:7)]; b3 <- a[,c(1,8:10)]
colnames(b1) <- colnames(b2) <- colnames(b3) <- c("ID","DateRangeStart","DateRangeEnd","Value")
b <- rbind(b1,b2,b3)

Related Questions

Sponsored Content

18 Answered Questions

[SOLVED] How to sort a dataframe by multiple column(s)?

9 Answered Questions

[SOLVED] How to reshape data from long to wide format?

  • 2011-05-04 22:27:50
  • Steve
  • 149503 View
  • 207 Score
  • 9 Answer
  • Tags:   r reshape r-faq

1 Answered Questions

[SOLVED] Reshaping data: long to wide; multiple variables, multiple values

  • 2018-01-11 03:18:27
  • ryebox
  • 471 View
  • 0 Score
  • 1 Answer
  • Tags:   r reshape tidyr

4 Answered Questions

[SOLVED] Reshape wide format, to multi-column long format

2 Answered Questions

[SOLVED] Convert Wide to Long with frequency column

  • 2017-08-01 19:11:52
  • user27612
  • 226 View
  • 3 Score
  • 2 Answer
  • Tags:   r dataframe tidyr

2 Answered Questions

[SOLVED] R using tidyr, reshape2 to convert multiple column sets to rows

1 Answered Questions

[SOLVED] R: reshaping wide to long

1 Answered Questions

[SOLVED] Reshaping a complex dataset from long to wide using recast()

  • 2012-11-27 18:49:06
  • Twitch_City
  • 5039 View
  • 6 Score
  • 1 Answer
  • Tags:   r dataset reshape2

2 Answered Questions

[SOLVED] R - Dramatic increase in file size after reshaping wide to long data

  • 2013-03-17 20:03:46
  • user1926173
  • 225 View
  • 2 Score
  • 2 Answer
  • Tags:   r reshape reshape2

1 Answered Questions

[SOLVED] Reshaping repeated measures data in R wide to long

  • 2013-01-04 17:51:53
  • N Brouwer
  • 1061 View
  • 0 Score
  • 1 Answer
  • Tags:   r reshape

Sponsored Content