By Chinmay Patil

2013-03-28 03:05:35 8 Comments

I am going through documentation of data.table and also noticed from some of the conversations over here on SO that rbindlist is supposed to be better than rbind.

I would like to know why is rbindlist better than rbind and in which scenarios rbindlist really excels over rbind?

Is there any advantage in terms of memory utilization?


@Arun 2014-06-01 19:32:49

By v1.9.2, rbindlist had evolved quite a bit, implementing many features including:

  • Choosing the highest SEXPTYPE of columns while binding - implemented in v1.9.2 closing FR #2456 and Bug #4981.
  • Handling factor columns properly - first implemented in v1.8.10 closing Bug #2650 and extended to binding ordered factors carefully in v1.9.2 as well, closing FR #4856 and Bug #5019.

In addition, in v1.9.2, also gained a fill argument, that allows to bind by filling missing columns, implemented in R.

Now in v1.9.3, there are even more improvements on these existing features:

  • rbindlist gains an argument use.names, which by default is FALSE for backwards compatibility.
  • rbindlist also gains an argument fill, which by default is also FALSE for backwards compatibility.
  • These features are all implemented in C, and written carefully to not compromise in speed while adding functionalities.
  • Since rbindlist can now match by names and fill missing columns, just calls rbindlist now. The only difference is that use.names=TRUE by default for, for backwards compatibility. slows down quite a bit mostly due to copies (which @mnel points out as well) that could be avoided (by moving to C). I think that's not the only reason. The implementation for checking/matching column names in could also get slower when there are many columns per data.frame and there are many such data.frames to bind (as shown in the benchmark below).

However, that rbindlist lack(ed) certain features (like checking factor levels or matching names) bears very tiny (or no) weight towards it being faster than It's because they were carefully implemented in C, optimised for speed and memory.

Here's a benchmark that highlights the efficient binding while matching by column names as well using rbindlist's use.names feature from v1.9.3. The data set consists of 10000 data.frames each of size 10*500.

NB: this benchmark has been updated to include a comparison to dplyr's bind_rows

library(data.table) # 1.11.5, 2018-06-02 00:09:06 UTC
library(dplyr) #, 2018-06-12 01:41:40 UTC
names = paste0("V", 1:500)
cols = 500L
foo <- function() {
    data =, function(x) sample(10))))
    setnames(data, sample(names))
n = 10e3L
ll = vector("list", n)
for (i in 1:n) {
    .Call("Csetlistelt", ll, i, foo())

system.time(ans1 <- rbindlist(ll))
#  user  system elapsed 
# 1.226   0.070   1.296 

system.time(ans2 <- rbindlist(ll, use.names=TRUE))
#  user  system elapsed 
# 2.635   0.129   2.772 

system.time(ans3 <-"rbind", ll))
#   user  system elapsed 
# 36.932   1.628  38.594 

system.time(ans4 <- bind_rows(ll))
#   user  system elapsed 
# 48.754   0.384  49.224 

identical(ans2, setDT(ans3)) 
# [1] TRUE
identical(ans2, setDT(ans4))
# [1] TRUE

Binding columns as such without checking for names took just 1.3 where as checking for column names and binding appropriately took just 1.5 seconds more. Compared to base solution, this is 14x faster, and 18x faster than dplyr's version.

@mnel 2013-03-28 03:16:17

rbindlist is an optimized version of, list(...)), which is known for being slow when using

Where does it really excel

Some questions that show where rbindlist shines are

Fast vectorized merge of list of data.frames by row

Trouble converting long list of data.frames (~1 million) to single data.frame using and ldply

These have benchmarks that show how fast it can be. is slow, for a reason does lots of checking, and will match by name. (i.e. will account for the fact that columns may be in different orders, and match up by name), rbindlist doesn't do this kind of checking, and will join by position

eg, list(data.frame(a = 1:2, b = 2:3), data.frame(b = 1:2, a = 2:3)))
##    a b
## 1  1 2
## 2  2 3
## 3  2 1
## 4  3 2

rbindlist(list(data.frame(a = 1:5, b = 2:6), data.frame(b = 1:5, a = 2:6)))
##     a b
##  1: 1 2
##  2: 2 3
##  3: 1 2
##  4: 2 3

Some other limitations of rbindlist

It used to struggle to deal with factors, due to a bug that has since been fixed:

rbindlist two data.tables where one has factor and other has character type for a column (Bug #2650)

It has problems with duplicate column names

see Warning message: in rbindlist(allargs) : NAs introduced by coercion: possible bug in data.table? (Bug #2384) rownames can be frustrating

rbindlist can handle lists data.frames and data.tables, and will return a data.table without rownames

you can get in a muddle of rownames using, list(...)) see

How to avoid renaming of rows when using rbind inside

Memory efficiency

In terms of memory rbindlist is implemented in C, so is memory efficient, it uses setattr to set attributes by reference is implemented in R, it does lots of assigning, and uses attr<- (and class<- and rownames<- all of which will (internally) create copies of the created data.frame.

@hadley 2013-03-28 12:13:33

FYI attr<-, class<- and (I think) rownames<- all modify in place.

@Matt Dowle 2013-03-28 13:38:57

@hadley Are you sure? Try DF = data.frame(a=1:3); .Internal(inspect(DF)); tracemem(DF); attr(DF,"test") <- "hello"; .Internal(inspect(DF)).

@hadley 2013-03-28 14:15:44

@MatthewDowle hmmm, attr<- does make a copy of data frames, but not of atomic vectors or lists. Confusing!

@Matt Dowle 2013-03-28 15:03:35

@Ken Williams 2013-07-10 20:46:26 has special "hijacking" logic - when its first argument is a data.table, it calls instead, which does a little checking & then calls rbindlist internally. So if you already have data.table objects to bind, there's probably little performance difference between rbind and rbindlist.

@Tyler 2013-11-14 14:52:23

Note that bug 2650 has been fixed. I've edited the answer to indicate this.

@Arun 2014-05-29 13:09:27

mnel, this post perhaps needs editing, now that rbindlist is capable of matching by names (use.names=TRUE) and also fill missing columns (fill=TRUE). I've updated this, this and this post. Do you mind editing this one or is it okay if I do it? Either way is fine by me.

@hadley 2014-06-06 19:30:24

dplyr::rbind_list is also pretty similar

Related Questions

Sponsored Content

3 Answered Questions

[SOLVED] data.table vs dplyr: can one do something well the other can't or does poorly?

  • 2014-01-29 15:21:45
  • BrodieG
  • 123307 View
  • 759 Score
  • 3 Answer
  • Tags:   r data.table dplyr

5 Answered Questions

[SOLVED] Why is as.Date slow on a character vector?

  • 2012-10-08 17:12:19
  • krhlk
  • 5956 View
  • 27 Score
  • 5 Answer
  • Tags:   r data.table

4 Answered Questions

3 Answered Questions

[SOLVED] Dispatch of `rbind` and `cbind` for a `data.frame`

  • 2017-12-25 09:12:35
  • Stef van Buuren
  • 594 View
  • 17 Score
  • 3 Answer
  • Tags:   r dispatch rbind cbind

2 Answered Questions

[SOLVED] Why is `[` better than `subset`?

2 Answered Questions

[SOLVED] What is the purpose of setting a key in data.table?

  • 2013-11-18 02:56:49
  • Wet Feet
  • 37799 View
  • 113 Score
  • 2 Answer
  • Tags:   r data.table

1 Answered Questions

1 Answered Questions

Sponsored Content