Dplyr: na_if() fails if date column present in data frame

Created on 8 Feb 2018  路  7Comments  路  Source: tidyverse/dplyr

Hello,

I have a consistent issue with na_if() returning an error when my data frame includes a date column, even when that column does not contain any values to be replaced by NA. If I exclude the date column I can use na_if() as expected. I tried the now deprecated janitor::convert_to_NA() function by @sfirke and that does work when there is a date present. I am using dplyr_0.7.4.9000.


library(dplyr)
library(tibble)
test <- tibble(a = lubridate::today() + runif(5) * 30, b = c(1:4, ""), c = c(runif(4), 
  ""), d = c(sample(letters, 4, replace = TRUE), ""))

test
#> # A tibble: 5 x 4
#>   a          b     c                  d    
#>   <date>     <chr> <chr>              <chr>
#> 1 2018-02-22 1     0.0842239991761744 g    
#> 2 2018-03-06 2     0.470980274491012  e    
#> 3 2018-02-17 3     0.515603368869051  c    
#> 4 2018-02-12 4     0.703944058623165  z    
#> 5 2018-03-08 ""    ""                 ""

test %>% na_if("")
#> Error in as.POSIXlt.character(x, tz, ...): character string is not in a standard unambiguous format

test %>% select(-a) %>% na_if("")
#> # A tibble: 5 x 3
#>   b     c                  d    
#>   <chr> <chr>              <chr>
#> 1 1     0.0842239991761744 g    
#> 2 2     0.470980274491012  e    
#> 3 3     0.515603368869051  c    
#> 4 4     0.703944058623165  z    
#> 5 <NA>  <NA>               <NA>

library(janitor)
test %>% convert_to_NA("")
#> Warning: 'convert_to_NA' is deprecated.
#> Use 'dplyr::na_if()' instead.
#> See help("Deprecated")
#> # A tibble: 5 x 4
#>   a          b     c                  d    
#>   <date>     <chr> <chr>              <chr>
#> 1 2018-02-22 1     0.0842239991761744 g    
#> 2 2018-03-06 2     0.470980274491012  e    
#> 3 2018-02-17 3     0.515603368869051  c    
#> 4 2018-02-12 4     0.703944058623165  z    
#> 5 2018-03-08 <NA>  <NA>               <NA>

Most helpful comment

Nice timing: naniar 0.2.0 is on CRAN now and there is a whole vignette dedicated to the topic of making values into NA. It has a bunch of variants including running on an entire data.frame. Granted, it's another package to load, but I don't think there will be a tool more specifically suited to this task than replace_with_na_all.

http://naniar.njtierney.com/articles/replace-with-na.html

All 7 comments

Thanks. na_if() is meant to use with vectors, not entire data frames (e.g. via mutate_all(funs(na_if(., ""))), but even then I see:

dplyr::na_if(Sys.Date(), "")
#> Error in charToDate(x): character string is not in a standard unambiguous format

Created on 2018-02-28 by the reprex package (v0.2.0).

By failing early we err on the side of caution/safety, other tools may be more user-friendly. @sfirke: Maybe you'd like to support convert_to_NA(), given that na_if() is not a perfect replacement at this time?

We still can achieve the desired result with na_if():

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
test <- tibble(
  a = lubridate::today() + runif(5) * 30, b = c(1:4, ""), c = c(
    runif(4),
    ""
  ),
  d = c(sample(letters, 4, replace = TRUE), "")
)
test %>% mutate_if(is.character, funs(na_if(., "")))
#> # A tibble: 5 x 4
#>   a          b     c                  d    
#>   <date>     <chr> <chr>              <chr>
#> 1 2018-03-10 1     0.51049040001817   z    
#> 2 2018-03-06 2     0.627419021213427  c    
#> 3 2018-03-13 3     0.203217827249318  f    
#> 4 2018-03-27 4     0.0666704142931849 i    
#> 5 2018-03-28 <NA>  <NA>               <NA>

Created on 2018-02-28 by the reprex package (v0.2.0).

Nice timing: naniar 0.2.0 is on CRAN now and there is a whole vignette dedicated to the topic of making values into NA. It has a bunch of variants including running on an entire data.frame. Granted, it's another package to load, but I don't think there will be a tool more specifically suited to this task than replace_with_na_all.

http://naniar.njtierney.com/articles/replace-with-na.html

Thank you both- this totally clears the issue up for me. On my own, I had been using na_if only with vectors (usually within a mutate_if), but saw several presentations (including at Studio conf) where it was used on a data frame. I was puzzled because I always got an error doing this with my own data, and realized it was because my data frames always had date variables present.

I am developing a course for DataCamp using these tools and want to teach the tools they way they are intended to be used, so I appreciate knowing the intention of na_if for vectors. And thank you @sfirke for the reference to naniar- I will include this as a tool too!

@krlmlr in the dplyr reference doc for na_if, would a PR to include an example usage of na_if within a mutate_if (as in your example) be welcome?

Thanks, a PR would be nice! Maybe add the same example to both na_if() and mutate_if()?

@apreshill any update on this PR ?

Thank you for the nudge! Will do: to be sure, submit PR with changes here?

R/na_if.R

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

Was this page helpful?
0 / 5 - 0 ratings