In Postgres I am used to deal with NULLs in a boolean column by explicitly saying IS TRUE or IS NOT TRUE. To my great astonishment the same logic does not hold for data.table.
Specifically, in a column with NA putting var_name != TRUE into i does not select rows with NA.
I consider this to be a bug.
library(data.table)
dt = data.table(
idx=1:10
)
dt[idx<5, exclude_me:=TRUE]
dt[exclude_me!=TRUE]
#> Empty data.table (0 rows and 2 cols): idx,exclude_me
dt[!(exclude_me==TRUE)]
#> Empty data.table (0 rows and 2 cols): idx,exclude_me
dt[(exclude_me==TRUE)]
#> idx exclude_me
#> 1: 1 TRUE
#> 2: 2 TRUE
#> 3: 3 TRUE
#> 4: 4 TRUE
R version 3.6.0 (2019-04-26)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_AU.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_AU.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
loaded via a namespace (and not attached):
[1] Rcpp_1.0.1 ps_1.3.0 digest_0.6.19 R6_2.4.0
[5] reprex_0.3.0 evaluate_0.13 rlang_0.3.4 fs_1.3.1
[9] callr_3.2.0 whisker_0.3-2 rmarkdown_1.11 tools_3.6.0
[13] xfun_0.7 compiler_3.6.0 processx_3.3.1 clipr_0.6.0
[17] htmltools_0.3.6 knitr_1.23
p.s. I installed the latest data.table using:
R
install.packages("data.table", type = "source", repos = "http://Rdatatable.github.io/data.table")
let me know if that's not the way to go
This is behaving as documented, from ?data.table
integer and logical vectors work the same way they do in [.data.frame except logical NAs are treated as FALSE.
Personally I find this quite convenient as I almost never want to include NA. If you really want to include NA, you can use dt[!exclude_me | is.na(exclude_me)]
AFAIR there is also FAQ entry about that
@Demetrio92 Related: https://github.com/Rdatatable/data.table/issues/368
@franknarf1
Thanks for digging this up! That helped me understanding why dt behaves this way.
Most helpful comment
@Demetrio92 Related: https://github.com/Rdatatable/data.table/issues/368