Hello,
I have a consistent issue with na_if() returning an error when my data frame includes a date column, even when that column does not contain any values to be replaced by NA. If I exclude the date column I can use na_if() as expected. I tried the now deprecated janitor::convert_to_NA() function by @sfirke and that does work when there is a date present. I am using dplyr_0.7.4.9000.
library(dplyr)
library(tibble)
test <- tibble(a = lubridate::today() + runif(5) * 30, b = c(1:4, ""), c = c(runif(4),
""), d = c(sample(letters, 4, replace = TRUE), ""))
test
#> # A tibble: 5 x 4
#> a b c d
#> <date> <chr> <chr> <chr>
#> 1 2018-02-22 1 0.0842239991761744 g
#> 2 2018-03-06 2 0.470980274491012 e
#> 3 2018-02-17 3 0.515603368869051 c
#> 4 2018-02-12 4 0.703944058623165 z
#> 5 2018-03-08 "" "" ""
test %>% na_if("")
#> Error in as.POSIXlt.character(x, tz, ...): character string is not in a standard unambiguous format
test %>% select(-a) %>% na_if("")
#> # A tibble: 5 x 3
#> b c d
#> <chr> <chr> <chr>
#> 1 1 0.0842239991761744 g
#> 2 2 0.470980274491012 e
#> 3 3 0.515603368869051 c
#> 4 4 0.703944058623165 z
#> 5 <NA> <NA> <NA>
library(janitor)
test %>% convert_to_NA("")
#> Warning: 'convert_to_NA' is deprecated.
#> Use 'dplyr::na_if()' instead.
#> See help("Deprecated")
#> # A tibble: 5 x 4
#> a b c d
#> <date> <chr> <chr> <chr>
#> 1 2018-02-22 1 0.0842239991761744 g
#> 2 2018-03-06 2 0.470980274491012 e
#> 3 2018-02-17 3 0.515603368869051 c
#> 4 2018-02-12 4 0.703944058623165 z
#> 5 2018-03-08 <NA> <NA> <NA>
Thanks. na_if() is meant to use with vectors, not entire data frames (e.g. via mutate_all(funs(na_if(., ""))), but even then I see:
dplyr::na_if(Sys.Date(), "")
#> Error in charToDate(x): character string is not in a standard unambiguous format
Created on 2018-02-28 by the reprex package (v0.2.0).
By failing early we err on the side of caution/safety, other tools may be more user-friendly. @sfirke: Maybe you'd like to support convert_to_NA(), given that na_if() is not a perfect replacement at this time?
We still can achieve the desired result with na_if():
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
test <- tibble(
a = lubridate::today() + runif(5) * 30, b = c(1:4, ""), c = c(
runif(4),
""
),
d = c(sample(letters, 4, replace = TRUE), "")
)
test %>% mutate_if(is.character, funs(na_if(., "")))
#> # A tibble: 5 x 4
#> a b c d
#> <date> <chr> <chr> <chr>
#> 1 2018-03-10 1 0.51049040001817 z
#> 2 2018-03-06 2 0.627419021213427 c
#> 3 2018-03-13 3 0.203217827249318 f
#> 4 2018-03-27 4 0.0666704142931849 i
#> 5 2018-03-28 <NA> <NA> <NA>
Created on 2018-02-28 by the reprex package (v0.2.0).
Nice timing: naniar 0.2.0 is on CRAN now and there is a whole vignette dedicated to the topic of making values into NA. It has a bunch of variants including running on an entire data.frame. Granted, it's another package to load, but I don't think there will be a tool more specifically suited to this task than replace_with_na_all.
Thank you both- this totally clears the issue up for me. On my own, I had been using na_if only with vectors (usually within a mutate_if), but saw several presentations (including at Studio conf) where it was used on a data frame. I was puzzled because I always got an error doing this with my own data, and realized it was because my data frames always had date variables present.
I am developing a course for DataCamp using these tools and want to teach the tools they way they are intended to be used, so I appreciate knowing the intention of na_if for vectors. And thank you @sfirke for the reference to naniar- I will include this as a tool too!
@krlmlr in the dplyr reference doc for na_if, would a PR to include an example usage of na_if within a mutate_if (as in your example) be welcome?
Thanks, a PR would be nice! Maybe add the same example to both na_if() and mutate_if()?
@apreshill any update on this PR ?
Thank you for the nudge! Will do: to be sure, submit PR with changes here?
This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/
Most helpful comment
Nice timing: naniar 0.2.0 is on CRAN now and there is a whole vignette dedicated to the topic of making values into NA. It has a bunch of variants including running on an entire data.frame. Granted, it's another package to load, but I don't think there will be a tool more specifically suited to this task than
replace_with_na_all.http://naniar.njtierney.com/articles/replace-with-na.html