Data.table: as.IDate error that can be fixed

Created on 18 Aug 2020  Â·  12Comments  Â·  Source: Rdatatable/data.table

require(data.table) # 1.13.1 from today
x <- c("2020-01-01", "")
y <- c("", "2020-01-01")
as.IDate(x) # works fine
as.IDate(y)
# Error in charToDate(x) : 
#   character string is not in a standard unambiguous format
Low bug idatitime

Most helpful comment

I have submitted a bug to R:

https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17909

I've fixed the bug in R now (svn rev 79119 for 'R-devel' -- to be ported to "R 4.0.2 patched" in a few days ("almost surely").

All 12 comments

Thanks @arunsrinivasan , this is inherited from Date, so to get around it, we'd need to build our own parser basically:

require(data.table) # 1.13.1 from today
x <- c("2020-01-01", "")
y <- c("", "2020-01-01")
as.Date(x) # works fine
as.Date(y) # same issue

I am aware that the issue is in as.Date. I forgot to mention it in the original post. I was thinking the fix could be as simple as replacing all "" with NA before passing it on to as.Date() method.

require(data.table) # 1.13.1 from today
x <- c("2020-01-01", "")
y <- c("", "2020-01-01")
y[y %chin% ""] <- NA_character_
as.IDate(x)
# [1] "2020-01-01" NA
as.IDate(y)
# [1] NA           "2020-01-01"

isn't this something that should be fixed in base, though? that was my
point in attributing the issue to base

On Fri, Aug 21, 2020, 1:59 PM Arun Srinivasan notifications@github.com
wrote:

I am aware that the issue is in as.Date. I forgot to mention it in the
original post. I was thinking the fix could be as simple as replacing all
"" with NA before passing it on to as.Date() method.

require(data.table) # 1.13.1 from todayx <- c("2020-01-01", "")y <- c("", "2020-01-01")y[y %chin% ""] <- NA_character_
as.IDate(x)# [1] "2020-01-01" NA
as.IDate(y)# [1] NA "2020-01-01"

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/Rdatatable/data.table/issues/4676#issuecomment-678416631,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AB2BA5OWDKRUVCIQDXK2WVLSB2YZ3ANCNFSM4QDBXBKQ
.

That was my feeling too when I saw Arun's very clever "fudge".

Drop in %in% instead and you have a fix. Now, will "they" take it for base R? Nobody knows...

Why not? But I won't be using the recent version of R anytime soon for work. But I would be using a more recent data.table version sooner. So I'd suggest a/this workaround fix until R fixes this.

BTW if the fix were to go into base R, they just need to add "" to the charToDate internal function in as.Date.character. Here's the relevant part:

# this logic just needs to account for `""`: is.na(xx) ==> xx %in% c(NA_character_, "")
charToDate <- function(x) {
    xx <- x[1L]
    if (is.na(xx)) {
        j <- 1L
        while (is.na(xx) && (j <- j + 1L) <= length(x)) xx <- x[j]
        if (is.na(xx))
            f <- "%Y-%m-%d"
    }
    ....

In any case we should raise the issue on Bugzilla. If base would change, we
need to be consistent with that. If they won't change, we can make our own
fix.

On Fri, Aug 21, 2020, 3:37 PM Arun Srinivasan notifications@github.com
wrote:

BTW if the fix were to go into base R, they just need to add "" to the
charToDate internal function in as.Date.character. Here's the relevant
part:

this logic just needs to account for ""charToDate <- function(x) {

    xx <- x[1L]
    if (is.na(xx)) {
        j <- 1L
        while (is.na(xx) && (j <- j + 1L) <= length(x)) xx <- x[j]
        if (is.na(xx))
            f <- "%Y-%m-%d"
    }

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/Rdatatable/data.table/issues/4676#issuecomment-678458295,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AB2BA5JT6BU4FEGAYMVSVDLSB3EIPANCNFSM4QDBXBKQ
.

I disagree. We should issue a fix while filing an issue there. Like I said, it takes time to get things into base R and people don't always use the most recent version of R, if you use in production. I'm all up for filing an issue there, but I'd like the issue taken care of in data.table nonetheless.

I agree with you about it being easier/more common to update data.table than R, and agree we should make a fix in data.table no matter what, I am just pushing back on doing so immediately.

What I'm trying to prevent is setting ourselves up for a breaking change in a later release:

T0: we submit an issue to Bugzilla & merge our own solution in data.table
T1: release this version of data.table
T2: R disagrees with our solution, but _does_ fix the issue in a different, incompatible way
T3: Because we inherit from/extend Date, we have to patch our original fix [breaking change]

So I think we should at least file the issue first & see if R core has any opinion. If not we just merge & move on.

I have submitted a bug to R:

https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17909

Note the simpler fix:

is.na(x) <- !nzchar(x)

T2: R disagrees with our solution, but does fix the issue in a different, incompatible way

Sorry but I'm having a hard time making sense of this. There are only two possible scenarios I can think of.

  1. as.Date(x) is the same as as.Date(x, format="%Y-%m-%d"), in which case it doesn't matter how you get there, at least temporarily until base R accepts your PR or comes up with another.

  2. They keep the error, in which case, we all agree data.table could and should handle this.

It has to be equal to the output of as.Date(x, format=) in a non-error scenario. What other incompatible ways are there?

I have submitted a bug to R:

https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17909

I've fixed the bug in R now (svn rev 79119 for 'R-devel' -- to be ported to "R 4.0.2 patched" in a few days ("almost surely").

Was this page helpful?
0 / 5 - 0 ratings