Data.table: na.omit(iris, invert=TRUE) gives unexpected result

Created on 7 Mar 2018  路  5Comments  路  Source: Rdatatable/data.table

na.omit seems to be not be working as expected when invert=TRUE.

Here is an example:

library(data.table)
#data.table 1.10.5 IN DEVELOPMENT built 2018-03-02 08:25:10 UTC; travis
#The fastest way to learn (by data.table authors): https://www.datacamp.com/courses/data-analysis-# #the-data-table-way
#Documentation: ?data.table, example(data.table) and browseVignettes("data.table")
#Release notes, videos and slides: http://r-datatable.com
iris <- data.table(iris)
na.omit(iris, invert=TRUE)

Output

     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
  1:          5.1         3.5          1.4         0.2    setosa
  2:          4.9         3.0          1.4         0.2    setosa
  3:          4.7         3.2          1.3         0.2    setosa
  4:          4.6         3.1          1.5         0.2    setosa
  5:          5.0         3.6          1.4         0.2    setosa
 ---                                                            
146:          6.7         3.0          5.2         2.3 virginica
147:          6.3         2.5          5.0         1.9 virginica
148:          6.5         3.0          5.2         2.0 virginica
149:          6.2         3.4          5.4         2.3 virginica
150:          5.9         3.0          5.1         1.8 virginica

Session Info

sessionInfo()

R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.3 LTS

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.10.5

loaded via a namespace (and not attached):
[1] compiler_3.4.1 tools_3.4.1    yaml_2.1.14   

All 5 comments

It's not clear what behavior you expect, and anyway, the behavior is identical for data.table and base R's data.frames, so I think this is not related to data.table; closing pending further details.

all.equal(na.omit(setDT(copy(iris)), invert=TRUE), 
          setDT(na.omit(setDF(copy(iris)), invert = TRUE)))
# [1] TRUE

Thanks for the quick response! And I apologize that I was unclear. This was my expected behavior was an empty data.table (from the CRAN data.table version):

Example

library(data.table)
#data.table 1.10.4.3
#The fastest way to learn (by data.table authors): https://www.datacamp.com/courses/data-analysis-the-data-table-way
#Documentation: ?data.table, example(data.table) and browseVignettes("data.table")
#Release notes, videos and slides: http://r-datatable.com
iris <- data.table(iris)
na.omit(iris, invert=TRUE)

Output
Empty data.table (0 rows) of 5 cols: Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species

It seems like this was the behavior for quite a few versions going back.

Is this going to be the result moving forward? If so, it at least seems NEWS.md worthy, as it definitely caught me by surprise.

After some digging, it seems like this behavior changed in the following commit:

https://github.com/Rdatatable/data.table/commit/1fd38629ec81af80c2ff57e475ff2e7f2c55f844

In the the commit right before (below), the same behavior found in the CRAN version was returned.

https://github.com/Rdatatable/data.table/commit/e871a4ffbbe3e67cdcf6912c7b24d165cd9ec6ab

Both commits were January 12th.

Oh, I see, na.omit.data.frame doesn't have any invert argument, so a difference vis-a-vis base is expected.

I identified the issue, filed a PR #2661, should be integrated soon. Thanks.

Thanks @MichaelChirico

Was this page helpful?
0 / 5 - 0 ratings