By Jojo Ono

2012-07-11 13:10:50 8 Comments

Suppose we have a folder containing multiple data.csv files, each containing the same number of variables but each from different times. Is there a way in R to import them all simultaneously rather than having to import them all individually?

My problem is that I have around 2000 data files to import and having to import them individually just by using the code:

read.delim(file="filename", header=TRUE, sep="\t")

is not very efficient.


@Mauro Lepore 2019-01-05 22:30:57

I like the approach using list.files(), lapply() and list2env() (or fs::dir_ls(), purrr::map() and list2env()). That seems simple and flexible.

Alternatively, you may try the small package {tor} (to-R): By default it imports files from the working directory into a list (list_*() variants) or into the global environment (load_*() variants).

For example, here I read all the .csv files from my working directory into a list using tor::list_csv():


#>  [1] "_pkgdown.yml"     "" "csv1.csv"        
#>  [4] "csv2.csv"         "datasets"         "DESCRIPTION"     
#>  [7] "docs"             "inst"             ""      
#> [10] "man"              "NAMESPACE"        ""         
#> [13] "R"                ""        "README.Rmd"      
#> [16] "tests"            "tmp.R"            "tor.Rproj"

#> $csv1
#>   x
#> 1 1
#> 2 2
#> $csv2
#>   y
#> 1 a
#> 2 b

And now I load those files into my global environment with tor::load_csv():

# The working directory contains .csv files
#>  [1] "_pkgdown.yml"     "" "CRAN-RELEASE"    
#>  [4] "csv1.csv"         "csv2.csv"         "datasets"        
#>  [7] "DESCRIPTION"      "docs"             "inst"            
#> [10] ""       "man"              "NAMESPACE"       
#> [13] ""          "R"                ""       
#> [16] "README.Rmd"       "tests"            "tmp.R"           
#> [19] "tor.Rproj"


# Each file is now available as a dataframe in the global environment
#>   x
#> 1 1
#> 2 2
#>   y
#> 1 a
#> 2 b

Should you need to read specific files, you can match their file-path with regexp, and invert.

For even more flexibility use list_any(). It allows you to supply the reader function via the argument .f.

(path_csv <- tor_example("csv"))
#> [1] "C:/Users/LeporeM/Documents/R/R-3.5.2/library/tor/extdata/csv"
#> [1] "file1.csv" "file2.csv"

list_any(path_csv, read.csv)
#> $file1
#>   x
#> 1 1
#> 2 2
#> $file2
#>   y
#> 1 a
#> 2 b

Pass additional arguments via ... or inside the lambda function.

path_csv %>% 
  list_any(readr::read_csv, skip = 1)
#> Parsed with column specification:
#> cols(
#>   `1` = col_double()
#> )
#> Parsed with column specification:
#> cols(
#>   a = col_character()
#> )
#> $file1
#> # A tibble: 1 x 1
#>     `1`
#>   <dbl>
#> 1     2
#> $file2
#> # A tibble: 1 x 1
#>   a    
#>   <chr>
#> 1 b

path_csv %>% 
  list_any(~read.csv(., stringsAsFactors = FALSE)) %>% 
#> $file1
#> # A tibble: 2 x 1
#>       x
#>   <int>
#> 1     1
#> 2     2
#> $file2
#> # A tibble: 2 x 1
#>   y    
#>   <chr>
#> 1 a    
#> 2 b

@leerssej 2016-12-03 01:09:11

A speedy and succinct tidyverse solution: (more than twice as fast as Base R's read.csv)

tbl <-
    list.files(pattern = "*.csv") %>% 

and data.table's fread() can even cut those load times by half again. (for 1/4 the Base R times)


tbl_fread <- 
    list.files(pattern = "*.csv") %>% 
    map_df(~fread(., stringsAsFactors = FALSE))

The stringsAsFactors = FALSE argument keeps the dataframe factor free.

If the typecasting is being cheeky, you can force all the columns to be as characters with the col_types argument.

tbl <-
    list.files(pattern = "*.csv") %>% 
    map_df(~read_csv(., col_types = cols(.default = "c")))

If you are wanting to dip into subdirectories to construct your list of files to eventually bind, then be sure to include the path name, as well as register the files with their full names in your list. This will allow the binding work to go on outside of the current directory. (Thinking of the full pathnames as operating like passports to allow movement back across directory 'borders'.)

tbl <-
    list.files(path = "./subdirectory/",
               pattern = "*.csv", 
               full.names = T) %>% 
    map_df(~read_csv(., col_types = cols(.default = "c"))) 

As Hadley describes here (about halfway down):

map_df(x, f) is effectively the same as"rbind", lapply(x, f))....

Bonus Feature - adding filenames to the records per Niks feature request in comments below:
* Add original filename to each record.

Code explained: make a function to append the filename to each record during the initial reading of the tables. Then use that function instead of the simple read_csv() function.

read_plus <- function(flnm) {
    read_csv(flnm) %>% 
        mutate(filename = flnm)

tbl_with_sources <-
    list.files(pattern = "*.csv", 
               full.names = T) %>% 

(The typecasting and subdirectory handling approaches can also be handled inside the read_plus() function in the same manner as illustrated in the second and third variants suggested above.)

### Benchmark Code & Results 

### Base R Approaches
#### Instead of a dataframe, this approach creates a list of lists
#### removed from analysis as this alone doubled analysis time reqd
# lapply_read.delim <- function(path, pattern = "*.csv") {
#     temp = list.files(path, pattern, full.names = TRUE)
#     myfiles = lapply(temp, read.delim)
# }

#### `read.csv()`
do.call_rbind_read.csv <- function(path, pattern = "*.csv") {
    files = list.files(path, pattern, full.names = TRUE), lapply(files, function(x) read.csv(x, stringsAsFactors = FALSE)))

map_df_read.csv <- function(path, pattern = "*.csv") {
    list.files(path, pattern, full.names = TRUE) %>% 
    map_df(~read.csv(., stringsAsFactors = FALSE))

### *dplyr()*
#### `read_csv()`
lapply_read_csv_bind_rows <- function(path, pattern = "*.csv") {
    files = list.files(path, pattern, full.names = TRUE)
    lapply(files, read_csv) %>% bind_rows()

map_df_read_csv <- function(path, pattern = "*.csv") {
    list.files(path, pattern, full.names = TRUE) %>% 
    map_df(~read_csv(., col_types = cols(.default = "c")))

### *data.table* / *purrr* hybrid
map_df_fread <- function(path, pattern = "*.csv") {
    list.files(path, pattern, full.names = TRUE) %>% 
    map_df(~fread(., stringsAsFactors = FALSE))

### *data.table*
rbindlist_fread <- function(path, pattern = "*.csv") {
    files = list.files(path, pattern, full.names = TRUE)
    rbindlist(lapply(files, function(x) fread(x, stringsAsFactors = FALSE)))

do.call_rbind_fread <- function(path, pattern = "*.csv") {
    files = list.files(path, pattern, full.names = TRUE), lapply(files, function(x) fread(x, stringsAsFactors = FALSE)))

read_results <- function(dir_size){
        # lapply_read.delim = lapply_read.delim(dir_size), # too slow to include in benchmarks
        do.call_rbind_read.csv = do.call_rbind_read.csv(dir_size),
        map_df_read.csv = map_df_read.csv(dir_size),
        lapply_read_csv_bind_rows = lapply_read_csv_bind_rows(dir_size),
        map_df_read_csv = map_df_read_csv(dir_size),
        rbindlist_fread = rbindlist_fread(dir_size),
        do.call_rbind_fread = do.call_rbind_fread(dir_size),
        map_df_fread = map_df_fread(dir_size),
        times = 10L) 

read_results_lrg_mid_mid <- read_results('./testFolder/500MB_12.5MB_40files')
print(read_results_lrg_mid_mid, digits = 3)

read_results_sml_mic_mny <- read_results('./testFolder/5MB_5KB_1000files/')
read_results_sml_tny_mod <- read_results('./testFolder/5MB_50KB_100files/')
read_results_sml_sml_few <- read_results('./testFolder/5MB_500KB_10files/')

read_results_med_sml_mny <- read_results('./testFolder/50MB_5OKB_1000files')
read_results_med_sml_mod <- read_results('./testFolder/50MB_5OOKB_100files')
read_results_med_med_few <- read_results('./testFolder/50MB_5MB_10files')

read_results_lrg_sml_mny <- read_results('./testFolder/500MB_500KB_1000files')
read_results_lrg_med_mod <- read_results('./testFolder/500MB_5MB_100files')
read_results_lrg_lrg_few <- read_results('./testFolder/500MB_50MB_10files')

read_results_xlg_lrg_mod <- read_results('./testFolder/5000MB_50MB_100files')

print(read_results_sml_mic_mny, digits = 3)
print(read_results_sml_tny_mod, digits = 3)
print(read_results_sml_sml_few, digits = 3)

print(read_results_med_sml_mny, digits = 3)
print(read_results_med_sml_mod, digits = 3)
print(read_results_med_med_few, digits = 3)

print(read_results_lrg_sml_mny, digits = 3)
print(read_results_lrg_med_mod, digits = 3)
print(read_results_lrg_lrg_few, digits = 3)

print(read_results_xlg_lrg_mod, digits = 3)

# display boxplot of my typical use case results & basic machine max load
par(oma = c(0,0,0,0)) # remove overall margins if present
par(mfcol = c(1,1)) # remove grid if present
par(mar = c(12,5,1,1) + 0.1) # to display just a single boxplot with its complete labels
boxplot(read_results_lrg_mid_mid, las = 2, xlab = "", ylab = "Duration (seconds)", main = "40 files @ 12.5MB (500MB)")
boxplot(read_results_xlg_lrg_mod, las = 2, xlab = "", ylab = "Duration (seconds)", main = "100 files @ 50MB (5GB)")

# generate 3x3 grid boxplots
par(oma = c(12,1,1,1)) # margins for the whole 3 x 3 grid plot
par(mfcol = c(3,3)) # create grid (filling down each column)
par(mar = c(1,4,2,1)) # margins for the individual plots in 3 x 3 grid
boxplot(read_results_sml_mic_mny, las = 2, xlab = "", ylab = "Duration (seconds)", main = "1000 files @ 5KB (5MB)", xaxt = 'n')
boxplot(read_results_sml_tny_mod, las = 2, xlab = "", ylab = "Duration (milliseconds)", main = "100 files @ 50KB (5MB)", xaxt = 'n')
boxplot(read_results_sml_sml_few, las = 2, xlab = "", ylab = "Duration (milliseconds)", main = "10 files @ 500KB (5MB)",)

boxplot(read_results_med_sml_mny, las = 2, xlab = "", ylab = "Duration (microseconds)        ", main = "1000 files @ 50KB (50MB)", xaxt = 'n')
boxplot(read_results_med_sml_mod, las = 2, xlab = "", ylab = "Duration (microseconds)", main = "100 files @ 500KB (50MB)", xaxt = 'n')
boxplot(read_results_med_med_few, las = 2, xlab = "", ylab = "Duration (seconds)", main = "10 files @ 5MB (50MB)")

boxplot(read_results_lrg_sml_mny, las = 2, xlab = "", ylab = "Duration (seconds)", main = "1000 files @ 500KB (500MB)", xaxt = 'n')
boxplot(read_results_lrg_med_mod, las = 2, xlab = "", ylab = "Duration (seconds)", main = "100 files @ 5MB (500MB)", xaxt = 'n')
boxplot(read_results_lrg_lrg_few, las = 2, xlab = "", ylab = "Duration (seconds)", main = "10 files @ 50MB (500MB)")

Middling Use Case

Boxplot Comparison of Elapsed Time my typical use case

Larger Use Case

Boxplot Comparison of Elapsed Time for Extra Large Load

Variety of Use Cases

Rows: file counts (1000, 100, 10)
Columns: final dataframe size (5MB, 50MB, 500MB)
(click on image to view original size) Boxplot Comparison of Directory Size Variations

The base R results are better for the smallest use cases where the overhead of bringing the C libraries of purrr and dplyr to bear outweigh the performance gains that are observed when performing larger scale processing tasks.

if you want to run your own tests you may find this bash script helpful.

for ((i=1; i<=$2; i++)); do 
  cp "$1" "${1:0:8}_${i}.csv";

bash "fileName_you_want_copied" 100 will create 100 copies of your file sequentially numbered (after the initial 8 characters of the filename and an underscore).

Attributions and Appreciations

With special thanks to:

  • Tyler Rinker and Akrun for demonstrating microbenchmark.
  • Jake Kaupp for introducing me to map_df() here.
  • David McLaughlin for helpful feedback on improving the visualizations and discussing/confirming the performance inversions observed in the small file, small dataframe analysis results.

@Niks 2017-12-05 06:19:52

you solution works for me. In this I want to store that file name to differentiate them.. Is it possible ?

@leerssej 2017-12-10 01:04:58

@Niks - Certainly! Just write and swap in a little function that not only reads the files but immediately appends a filename to each record read. Like so readAddFilename <- function(flnm) { read_csv(flnm) %>% mutate(filename = flnm) } Then just drop that in to the map_df instead of the simple read only read_csv() that is there now. I can update the entry above to show the function and how it would fit into the pipe if you still have questions or you think that will be helpful.

@marbel 2018-03-16 03:26:03

The problem in practice is that read_csv is much more slower than fread. I would include a benchmark if you are going to say something is faster. One idea is creating 30 1GB files and reading them, that would be a case where performance matters.

@leerssej 2018-04-24 20:04:10

@marbel: Thank you for the suggestion! On 530 MB and smaller directories (with up to 100 files) I am finding a 25% improvement in performance between data.table's fread() and dplyr's read_csv(): 14.2 vs 19.9 secs. TBH, I had only been comparing base R to dplyr and as read_csv() is around 2-4x faster than the read.csv(), benchmarking didn't seem necessary. It has however been interesting to give fread() a whirl and pause to check out more complete benchmark results. Thanks again!

@marbel 2018-04-27 05:02:08

on my experience fread just works. I never needed something else.

@user3603486 2018-05-09 10:31:19

It would be interesting to see how my answer using rio::import_list compares.

@Amir 2018-10-10 08:57:58

This is such a good answer. Thanks!

@marbel 2018-11-14 03:02:35

@leerssej my proposal was to create x amount of files where each has 1 GB. 500MB is tiny data for today standards. But conceptually it's not surprising that rbindlist + fread is among the faster options.

@marbel 2014-05-09 03:04:03

Here is another options to convert the .csv files into one data.frame. Using R base functions. This is order of magnitude slower than the options below.

# Get the files names
files = list.files(pattern="*.csv")
# First apply read.csv, then rbind
myfiles =, lapply(files, function(x) read.csv(x, stringsAsFactors = FALSE)))

Edit: - A few more extra choices using data.table and readr

A fread() version, which is a function of the data.table package. This should be the fastest option.

DT =, lapply(files, fread))
# The same using `rbindlist`
DT = rbindlist(lapply(files, fread))

Using readr, which is a new hadley package for reading csv files. A bit slower than fread but with different functionalities.

tbl = lapply(files, read_csv) %>% bind_rows()

@aaron 2014-07-18 16:37:45

how does this perform vs. Reduce(rbind, lapply(...))? Just learning R but my guess is less performant

@marbel 2015-05-18 00:55:04

I've added a data.table version, that should improve performance.

@SoilSciGuy 2016-08-08 16:41:08

Is it possible to read only specific files? e.x. Files that contain 'weather' in the name?

@marbel 2016-08-08 16:42:08

Just add that to pattern='wheather'

@SoilSciGuy 2016-08-08 17:24:21

@The Red Pea 2016-09-04 16:36:19

+1 seems like producing a single data frame -- the SQL UNION of all CSV files -- is the easiest to work with. Since OP didn't specify whether they want 1 data frame or many data frames, I assumed 1 data frame is best, so I am surprised the accepted answer does not do any of the "UNION". I like this answer, which is consistent with this explanation of

@A5C1D2H2I1M1N2O1R2T1 2012-07-11 13:16:28

Something like the following should result in each data frame as a separate element in a single list:

temp = list.files(pattern="*.csv")
myfiles = lapply(temp, read.delim)

This assumes that you have those CSVs in a single directory--your current working directory--and that all of them have the lower-case extension .csv.

If you then want to combine those data frames into a single data frame, see the solutions in other answers using things like,...), dplyr::bind_rows() or data.table::rbindlist().

If you really want each data frame in a separate object, even though that's often inadvisable, you could do the following with assign:

temp = list.files(pattern="*.csv")
for (i in 1:length(temp)) assign(temp[i], read.csv(temp[i]))

Or, without assign, and to demonstrate (1) how the file name can be cleaned up and (2) show how to use list2env, you can try the following:

temp = list.files(pattern="*.csv")
  lapply(setNames(temp, make.names(gsub("*.csv$", "", temp))), 
         read.csv), envir = .GlobalEnv)

But again, it's often better to leave them in a single list.

@Jojo Ono 2012-07-11 13:34:46

Thanks! this works very would I go about naming each file I have just imported so I can easily call them up?

@Spacedman 2012-07-11 14:13:26

if you can show us the first few lines of some of your files we might have some suggestions - edit your question for that!

@Jojo Ono 2012-07-11 15:07:27

The above code works perfectly for importing them as single objects but when I try to call up a column from the data set it doesnt recognise it as it is only a single object not a data frame i.e. my version of the above code is: setwd('C:/Users/new/Desktop/Dives/0904_003') temp<-list.files(pattern="*.csv") ddives <- lapply(temp, read.csv) So now each file is called ddives[n] but how would I go about writing a loop to make them all data frames rather than single objects? I can achieve this individually using the data.frame operator but am unsure as to how to loop this. @mrdwab

@A5C1D2H2I1M1N2O1R2T1 2012-07-11 15:56:56

@JosephOnoufriou, see my update. But generally, I find working with lists easier if I'm going to be doing similar calculations on all data frames.

@MySchizoBuddy 2014-03-27 21:31:34

can the files be read in parallel.

@dnlbrky 2014-04-30 02:06:36

For anyone trying to write a function to do the updated version of this answer using assign... If you want the assigned values to reside in the global environment, make sure you set inherits=T.

@grisaitis 2015-01-19 20:12:16

I'm getting an error with the "*.csv" pattern: invalid 'pattern' regular expression. To fix this, I use the following pattern: pattern=".*\\.csv". The error was caused, I think, by the leading asterisk, which (on my machine at least, with R 2.3.1) needs a character / symbol to precede it. The pattern I used works as follows: first, it starts with ., denoting any character. The * then repeats the preceding . as needed. Lastly, the pattern ends with a .csv literal, where the . in .csv is escaped by the double backslash `\`.

@Konrad 2015-06-05 09:02:16

This is an excellent solution that I'm currently using. If I can take the liberty of suggesting a way to clean the file names I would provide df_name <- sub("\\.[[:alnum:]]+$", "", basename(as.character(files[i]))) and then pass the df_name in the loop instead of the files[i], on the lines: assign(df_name, read.csv(files[i])) which works like a charm.

@Niks 2017-12-05 05:27:52

Great solutions.. Thanks . I am facing issue while reading "myfiles" variable data.. it showing error like "Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 256992, 281250, 143140, 1220"

@joran 2018-08-03 19:33:40

Apologies for the heavy edits, but I've grown tired of people finding this question and not being able to get past this first answer to see the other options (combining into a single data frame, avoiding assign).

@Rakurai 2018-08-14 13:01:26

It might be worth noting that the example pattern "*.csv" looks at first glance like an OS wildcard. It is, in fact, a regular expression, so if you're intending something like "july-*.csv" to match july-2.csv and july-31.csv, the pattern would look like pattern="july-[0-9]+.csv" instead.

@user3603486 2018-05-09 10:30:09

In my view, most of the other answers are obsoleted by rio::import_list, which is a succinct one-liner:

my_data <- import_list(dir("path_to_directory", pattern = ".csv", rbind = TRUE))

Any extra arguments are passed to rio::import. rio can deal with almost any file format R can read, and it uses data.table's fread where possible, so it should be fast too.

@manotheshark 2016-12-16 23:55:51

Using plyr::ldply there is roughly a 50% speed increase by enabling the .parallel option while reading 400 csv files roughly 30-40 MB each. Example includes a text progress bar.


csv.list <- list.files(path="t:/data", pattern=".csv$", full.names=TRUE)

cl <- makeCluster(4)

pb <- txtProgressBar(max=length(csv.list), style=3)
pbu <- function(i) setTxtProgressBar(pb, i)
dt <- setDT(ldply(csv.list, fread, .parallel=TRUE, .paropts=list(.options.snow=list(progress=pbu))))


@user6741397 2016-09-24 10:23:08

Building on dnlbrk's comment, assign can be considerably faster than list2env for big files.


List_of_file_paths <- list.files(path ="C:/Users/Anon/Documents/Folder_with_csv_files/", pattern = ".csv", all.files = TRUE, full.names = TRUE)

By setting the full.names argument to true, you will get the full path to each file as a separate character string in your list of files, e.g., List_of_file_paths[1] will be something like "C:/Users/Anon/Documents/Folder_with_csv_files/file1.csv"

for(f in 1:length(List_of_filepaths)) {
  file_name <- str_sub(string = List_of_filepaths[f], start = 46, end = -5)
  file_df <- read_csv(List_of_filepaths[f])  
  assign( x = file_name, value = file_df, envir = .GlobalEnv)

You could use the data.table package's fread or base R read.csv instead of read_csv. The file_name step allows you to tidy up the name so that each data frame does not remain with the full path to the file as it's name. You could extend your loop to do further things to the data table before transferring it to the global environment, for example:

for(f in 1:length(List_of_filepaths)) {
  file_name <- str_sub(string = List_of_filepaths[f], start = 46, end = -5)
  file_df <- read_csv(List_of_filepaths[f])  
  file_df <- file_df[,1:3] #if you only need the first three columns
  assign( x = file_name, value = file_df, envir = .GlobalEnv)

@Chris Fees 2014-02-05 21:44:09

This is the code I developed to read all csv files into R. It will create a dataframe for each csv file individually and title that dataframe the file's original name (removing spaces and the .csv) I hope you find it useful!

path <- "C:/Users/cfees/My Box Files/Fitness/"
files <- list.files(path=path, pattern="*.csv")
for(file in files)
perpos <- which(strsplit(file, "")[[1]]==".")
gsub(" ","",substr(file, 1, perpos-1)), 

@Spacedman 2012-07-11 13:28:30

As well as using lapply or some other looping construct in R you could merge your CSV files into one file.

In Unix, if the files had no headers, then its as easy as:

cat *.csv > all.csv

or if there are headers, and you can find a string that matches headers and only headers (ie suppose header lines all start with "Age"), you'd do:

cat *.csv | grep -v ^Age > all.csv

I think in Windows you could do this with COPY and SEARCH (or FIND or something) from the DOS command box, but why not install cygwin and get the power of the Unix command shell?

@leerssej 2017-04-10 23:03:42

or even go with the Git Bash that tumbles in with the Git install?

@Amir 2018-10-10 08:58:39

In my experience, this is not the fastest solution if your files starting to get rather large.

Related Questions

Sponsored Content

23 Answered Questions

[SOLVED] How to make a great R reproducible example

  • 2011-05-11 11:12:02
  • Andrie
  • 244215 View
  • 2481 Score
  • 23 Answer
  • Tags:   r r-faq

17 Answered Questions

[SOLVED] Save PL/pgSQL output from PostgreSQL to a CSV file

54 Answered Questions

[SOLVED] How do I include a JavaScript file in another JavaScript file?

41 Answered Questions

[SOLVED] How to import an SQL file using the command line in MySQL?

29 Answered Questions

[SOLVED] How to output MySQL query results in CSV format?

  • 2008-12-10 15:59:51
  • MCS
  • 1016156 View
  • 1018 Score
  • 29 Answer
  • Tags:   mysql csv quotes

13 Answered Questions

[SOLVED] How to join (merge) data frames (inner, outer, left, right)?

44 Answered Questions

18 Answered Questions

[SOLVED] How to sort a dataframe by multiple column(s)?

13 Answered Questions

[SOLVED] How to import CSV file data into a PostgreSQL table?

2 Answered Questions

[SOLVED] R import multiple csv files

Sponsored Content