By mark999


2020-05-22 23:17:22 8 Comments

Consider the following:

library(data.table)

DataTableA <- data.table(v1 = c(1, 2, NA, 6, 3, NA),
                         v2 = c(NA, 4, NA, NA, 1, 2),
                         v3 = c(3, 3, NA, 4, 2, NA),
                         v4 = c(2, NA, 3, NA, 3, NA),
                         v5 = c(1, NA, NA, NA, 3, 4))

DataTableA

##    v1 v2 v3 v4 v5
## 1:  1 NA  3  2  1
## 2:  2  4  3 NA NA
## 3: NA NA NA  3 NA
## 4:  6 NA  4 NA NA
## 5:  3  1  2  3  3
## 6: NA  2 NA NA  4

varnames <- c("v2", "v4", "v5")

What is the best way of getting the rows of DataTableA where at least one of the variables named in varnames is not NA, without explicitly referring to the variable names?

I know I could do

DataTableA[!is.na(v2) | !is.na(v4) | !is.na(v5)]

but I want to avoid writing out the variable names.

Something that works is

DataTableA[apply(!is.na(DataTableA[, ..varnames]), 1, any)]

but I'm wondering if there's a better way. If there's not, that's OK of course. I don't have any problem with using apply as above, but what I've seen of data.table so far makes me think there might be a simpler approach.

This question is similar, but more complex.

Thanks for any help you can give.

1 comments

@akrun 2020-05-22 23:19:54

We can use specify the 'varnames' in .SDcols, loop over the .SD (Subset of Data.table), apply the function and Reduce

DataTableA[DataTableA[, Reduce(`|`, lapply(.SD, is.na)), .SDcols = varnames]]

Or with rowSums

DataTableA[DataTableA[, rowSums(!is.na(.SD)) > 0, .SDcols = varnames]]

@mark999 2020-05-22 23:30:39

Thanks for your answer akrun. I don't see those as being simpler than using apply as in my question, but as a new data.table user it's useful for me to see more examples using .SD.

@Ian Campbell 2020-05-22 23:31:11

rowSums will be significantly faster for large data.

@mark999 2020-05-22 23:32:22

@IanCampbell Thanks, I hadn't considered the speed.

@akrun 2020-05-22 23:32:51

@mark999 Yes, apply is more easier to understand, but looping over rows is less efficient in R compared to columns. In the first solution, loop over the columns and Reduce it. It should be faster than the apply. In addition, apply would create a matrix and this would have some performance issues along with changing the colums types in case if the columns are differen types

@akrun 2020-05-22 23:33:46

@mark999. Also, if you want only a base R solution with apply, then converting to data.table makes not much sense to me because the peformance improviement with dat.atable would be much higher when working on big datasets

@mark999 2020-05-22 23:37:15

@akrun Thanks. I didn't just want a base R solution.

@akrun 2020-05-22 23:38:21

@mark999 in your example and code, the apply also works, but i am saying with a general case. It would give you some unexpected bugs

@mark999 2020-05-23 00:37:28

I might just mention that the thinking behind my question was that because data.table allows, for example, DataTableA[!is.na(v1)] instead of DataTableA[!is.na(DataTable$v1)], there might be a way of doing it with varnames but without having to write DataTableA again inside the [ ]. But it looks like there isn't, which is fine.

@akrun 2020-05-23 16:40:36

@mark999 The .SDcols is one and if there is only column, u can also use get or eval(as.name

Related Questions

Sponsored Content

2 Answered Questions

3 Answered Questions

[SOLVED] data.table vs dplyr: can one do something well the other can't or does poorly?

  • 2014-01-29 15:21:45
  • BrodieG
  • 122949 View
  • 756 Score
  • 3 Answer
  • Tags:   r data.table dplyr

2 Answered Questions

[SOLVED] Flag rows based on multiple conditions on specific columns in data.table

  • 2019-07-12 12:56:44
  • ishan
  • 328 View
  • 4 Score
  • 2 Answer
  • Tags:   r data.table

1 Answered Questions

Create new data.table columns based on other columns

  • 2018-08-20 13:18:50
  • Gautam
  • 515 View
  • 0 Score
  • 1 Answer
  • Tags:   r data.table

1 Answered Questions

[SOLVED] Select rows in a data.table given by a filter in an other data.table

  • 2017-07-13 10:05:37
  • Jakob Gepp
  • 645 View
  • 1 Score
  • 1 Answer
  • Tags:   r data.table

7 Answered Questions

[SOLVED] Applying a function to each row of a data.table

  • 2013-03-28 03:17:00
  • Victor K.
  • 11985 View
  • 20 Score
  • 7 Answer
  • Tags:   r data.table

1 Answered Questions

2 Answered Questions

[SOLVED] How do I select rows by two criteria in data.table in R

Sponsored Content