Data.table: `WHERE var IS NOT TRUE` does not work as expected

Created on 17 Jun 2019  路  4Comments  路  Source: Rdatatable/data.table

In Postgres I am used to deal with NULLs in a boolean column by explicitly saying IS TRUE or IS NOT TRUE. To my great astonishment the same logic does not hold for data.table.
Specifically, in a column with NA putting var_name != TRUE into i does not select rows with NA.
I consider this to be a bug.

Reproducible example

library(data.table)
dt = data.table(
  idx=1:10
)
dt[idx<5, exclude_me:=TRUE]
dt[exclude_me!=TRUE]
#> Empty data.table (0 rows and 2 cols): idx,exclude_me
dt[!(exclude_me==TRUE)]
#> Empty data.table (0 rows and 2 cols): idx,exclude_me
dt[(exclude_me==TRUE)]
#>    idx exclude_me
#> 1:   1       TRUE
#> 2:   2       TRUE
#> 3:   3       TRUE
#> 4:   4       TRUE

Output of sessionInfo()

R version 3.6.0 (2019-04-26)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_AU.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_AU.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_AU.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.1      ps_1.3.0        digest_0.6.19   R6_2.4.0       
 [5] reprex_0.3.0    evaluate_0.13   rlang_0.3.4     fs_1.3.1       
 [9] callr_3.2.0     whisker_0.3-2   rmarkdown_1.11  tools_3.6.0    
[13] xfun_0.7        compiler_3.6.0  processx_3.3.1  clipr_0.6.0    
[17] htmltools_0.3.6 knitr_1.23     

p.s. I installed the latest data.table using:
R install.packages("data.table", type = "source", repos = "http://Rdatatable.github.io/data.table")
let me know if that's not the way to go

Most helpful comment

All 4 comments

This is behaving as documented, from ?data.table

integer and logical vectors work the same way they do in [.data.frame except logical NAs are treated as FALSE.

Personally I find this quite convenient as I almost never want to include NA. If you really want to include NA, you can use dt[!exclude_me | is.na(exclude_me)]

AFAIR there is also FAQ entry about that

@franknarf1
Thanks for digging this up! That helped me understanding why dt behaves this way.

Was this page helpful?
0 / 5 - 0 ratings