#### [SOLVED] Use a value from the previous row in an R data.table calculation

By Corone

I want to create a new column in a data.table calculated from the current value of one column and the previous of another. Is it possible to access previous rows?

E.g.:

``````> DT <- data.table(A=1:5, B=1:5*10, C=1:5*100)
> DT
A  B   C
1: 1 10 100
2: 2 20 200
3: 3 30 300
4: 4 40 400
5: 5 50 500
> DT[, D := C + BPreviousRow] # What is the correct code here?
``````

``````> DT
A  B   C   D
1: 1 10 100  NA
2: 2 20 200 210
3: 3 30 300 320
4: 4 40 400 430
5: 5 50 500 540
``````

#### @Abdullah Al Mahmud 2018-07-05 10:51:14

Here is my intuitive solution:

``````#create data frame
df <- data.frame(A=1:5, B=seq(10,50,10), C=seq(100,500, 100))`
#subtract the shift from num rows
shift  <- 1 #in this case the shift is 1
invshift <- nrow(df) - shift
#Now create the new column
df\$D <- c(NA, head(df\$B, invshift)+tail(df\$C, invshift))`
``````

Here `invshift`, the number of rows minus 1, is 4. `nrow(df)` provides you with the number of rows in a data frame or in a vector. Similarly, if you want to take still earlier values, subtract from nrow 2, 3, ...etc, and also put NA's accordingly at the beginning.

#### @Arun 2013-02-04 15:02:24

With `shift()` implemented in v1.9.6, this is quite straightforward.

``````DT[ , D := C + shift(B, 1L, type="lag")]
# or equivalently, in this case,
DT[ , D := C + shift(B)]
``````

From NEWS:

1. New function `shift()` implements fast `lead/lag` of vector, list, data.frames or data.tables. It takes a `type` argument which can be either "lag" (default) or "lead". It enables very convenient usage along with `:=` or `set()`. For example: `DT[, (cols) := shift(.SD, 1L), by=id]`. Please have a look at `?shift` for more info.

#### @SlowLearner 2013-02-04 15:24:33

Does that `.N` hold the current row number or something? Sorry to ask here, but I can't seem to find it in the help files...

#### @Steve Lianoglou 2013-02-04 16:02:27

@SlowLearner: You might also find `.I` useful, which holds the row indices for the rows in the curren group.

#### @mnel 2013-02-04 19:08:20

Use seq_len(.N - 1) instead of 1:(.N-1). This avoids problems associated with 1:0.

#### @MichaelChirico 2015-04-26 22:41:26

+1 for the `.SD` example--I was trying to use a `lapply` and getting funky results. this is much simpler.

#### @skan 2015-04-30 16:45:31

Where can I find an updated pdf with all this new information ? The official 1.9.4 vignettes and webminars don't include it. And the Rmd 1.9.5 vignettes are not comfortable and don't include it either.

#### @skan 2015-04-30 16:51:43

@Arun , your first solution says "Error in C + shift(B, 1L, type = "lag") : non-numeric argument to binary operator" in my data.table 1.9.5

#### @Arun 2015-04-30 17:10:54

@skan, update to the latest commit of data.table... Made changes yesterday. If using devel, better to watch the project on GitHub

#### @Andreas 2019-10-21 10:30:48

Note - for me at least - using shift was much slower that using 'slicing' with -N - see timings here (for a data.table of 1,5 mio. rows) stackoverflow.com/questions/58372627/…

#### @Steven Beaupré 2015-04-27 01:52:21

Using `dplyr` you could do:

``````mutate(DT, D = lag(B) + C)
``````

Which gives:

``````#   A  B   C   D
#1: 1 10 100  NA
#2: 2 20 200 210
#3: 3 30 300 320
#4: 4 40 400 430
#5: 5 50 500 540
``````

#### @B C 2018-11-19 18:25:57

Underrated answer - thank you for sharing.

#### @geneorama 2014-11-03 22:03:58

I added a padding argument and changed some names and called it `shift`. https://github.com/geneorama/geneorama/blob/master/R/shift.R

#### @geneorama 2015-01-09 20:06:56

Thanks so much for the note. I'll be on the lookout for it, and most likely use it and deprecate my geneorama version.

#### @dnlbrky 2014-08-01 16:24:39

Several folks have answered the specific question. See the code below for a general purpose function that I use in situations like this that may be helpful. Rather than just getting the prior row, you can go as many rows in the "past" or "future" as you'd like.

``````rowShift <- function(x, shiftLen = 1L) {
r <- (1L + shiftLen):(length(x) + shiftLen)
r[r<1] <- NA
return(x[r])
}

# Create column D by adding column C and the value from the previous row of column B:
DT[, D := C + rowShift(B,-1)]

# Get the Old Faithul eruption length from two events ago, and three events in the future:
as.data.table(faithful)[1:5,list(eruptLengthCurrent=eruptions,
eruptLengthTwoPrior=rowShift(eruptions,-2),
eruptLengthThreeFuture=rowShift(eruptions,3))]
##   eruptLengthCurrent eruptLengthTwoPrior eruptLengthThreeFuture
##1:              3.600                  NA                  2.283
##2:              1.800                  NA                  4.533
##3:              3.333               3.600                     NA
##4:              2.283               1.800                     NA
##5:              4.533               3.333                     NA
``````

#### @geneorama 2014-11-03 19:52:37

This is a brilliant answer, I'm annoyed that I've already upvoted the other answers because this is a far more general answer. In fact, I'm going to use it in my geneorama package (if you don't mind).

#### @dnlbrky 2014-11-03 20:33:04

Sure, go for it. I was hoping to get some free time and submit it as a pull request to the `data.table` package, but alas...

#### @dnlbrky 2015-02-19 18:53:10

A similar function called `shift` has been added to `data.table` as of version 1.9.5. See the updated answer from @Arun.

#### @Gary Weissman 2014-05-04 04:25:05

Based on @Steve Lianoglou 's comment above, why not just:

``````DT[, D:= C + c(NA, B[.I - 1]) ]
#    A  B   C   D
# 1: 1 10 100  NA
# 2: 2 20 200 210
# 3: 3 30 300 320
# 4: 4 40 400 430
# 5: 5 50 500 540
``````

And avoid using `seq_len` or `head` or any other function.

#### @Matthew 2014-09-02 18:39:00

Nice - however this would not work if you wanted to find the previous within a group.

#### @Gary Weissman 2015-02-15 20:00:33

@Matthew you are right. If subsetting by group I would replace `.I` with `seq_len(.N)`

#### @Ryogi 2013-02-04 15:53:25

Following Arun's solution, a similar results can be obtained without referring to to `.N`

``````> DT[, D := C + c(NA, head(B, -1))][]
A  B   C   D
1: 1 10 100  NA
2: 2 20 200 210
3: 3 30 300 320
4: 4 40 400 430
5: 5 50 500 540
``````

#### @Corone 2013-02-04 16:10:38

Is there a reason to prefer one method to another? Or is it simply an aesthetic difference?

#### @Ryogi 2013-02-04 16:24:10

I think that in this scenario (i.e. where `.N` is readily available) it is mostly aesthetic choice. I am not aware of any important difference.

### R data.table calculations with previous row dependencies

• 2019-02-02 12:37:56
• Sarel Louw
• 92 View
• 1 Score
• Tags:   r data.table

### [SOLVED] data.table vs dplyr: can one do something well the other can't or does poorly?

• 2014-01-29 15:21:45
• BrodieG
• 114074 View
• 723 Score
• Tags:   r data.table dplyr

### [SOLVED] calculate new column on factor level based on two data.frames/data.tables

• 2018-06-14 13:26:39
• crazysantaclaus
• 62 View
• 0 Score
• Tags:   r data.table

### [SOLVED] Creating a vector from data.table row without using apply

• 2018-04-04 00:47:30
• Dan Rosenheck
• 111 View
• 3 Score
• Tags:   r data.table

### Referencing previous value in dplyr

• 2017-12-11 20:03:36
• Ziyan Xu
• 173 View
• 1 Score
• Tags:   r dplyr data.table

### [SOLVED] How to use a shift calculation with a cumulative value in a data.table in R

• 2017-04-12 14:49:26
• Sven
• 158 View
• 3 Score
• Tags:   r data.table shift