Lubridate: fast_strptime POSIX default incompatible with dplyr

Created on 18 Apr 2016  路  14Comments  路  Source: tidyverse/lubridate

Users applying lubridate as well as dplyr ran into troubles after fast_strptime default output class was change from POSIXct to POSIXlt, as highly useful functions like left_join or bind_rows do no longer work, as dplyr cannot handle POSIXlt. A global option to change fast_strptime's lt option would be highly appreciated! Thanks, SVEN.

Most helpful comment

I basically need to check what version of lubridate

You can use parse_date_time2 or explicitly convert as.POSIXct(fast_strptime(...)) to avoid checking for the version.

Sorry for the trouble. This lt/ct change (here and in ymd) turned to be a much bigger headache than anyone anticipated, but it's the only right decision.

All 14 comments

POSIXlt is for compatibility with strptime and is faster because of now internal conversion.

You can pass lt=F to fast_strptime or just use parse_date_time2 which returns POSIXct by default and works with strptfime style formats just fine.

If any of the above are "inconvenient", you can overwrite the function with your own in your .Rprofile. I am quite reluctant to add a global option for these high performance functions.

Thanks very much for your reply!
I was misled in assuming that "Hadley" package compatibility had priority.
If fact, makes me wonder why dplyr functions do not convert lt to ct if
encountered, as other coercions are made without much ado, like factor to
character if levels differ...

POSIXlt is for compatibility with strptime and is faster because of now
internal conversion.

You can pass lt=F to fast_strptime or just use parse_date_time2 which
returns POSIXct by default and works with strptfime style formats just fine.

Sure. Did it.

If any of the above are "inconvenient", you can overwrite the function
with your own in your .Rprofile. I am quite reluctant to add a global
option for these high performance functions.

As sort of "R scriptor" at the department, that is of course not an option:)
Thanks again,
SVEN.

Dr. Sven Krackow
Univ. Z眉rich, Anatomie

If fact, makes me wonder why dplyr functions do not convert lt to ct if encountered, as other coercions are made without much ado, like factor to character if levels differ..

Good question. I think you should bring it to @hadley's attention in a separate dplyr thread.

I strongly believe that no function should generate POSIXlt.

stptime and fast_strptime do it for efficiency reasons. It avoids the overhead of double conversion (to POSIXct internally, and to POSIXlt by the final consumer). The consumer might want to use only some components of POSIXlt (year, month etc). In any case, whatever is the source of POSIXlt, I think dplyr should handle it gracefully.

I intended to open a new issue but I see this very recent discussion in relation with POSIXlt and POSIXct ant changement in behaviour between 1.5.0 and 1.5.6
In fact, now that default behaviour as changed for dmy functions (in #391), functions that update date-time components generate a POSIXlt and not a POSIXct, when updating a Date. Code that worked with dplyr before are no more working since POSIXct became POSIXlt.

For example

library(lubridate)
x <- dmy("21-02-2010")
class(x)
class(update(x, hour = 20))
hour(x) <- 20
class(x)
  • with 1.5.0
library(lubridate)
x <- dmy("21-02-2010")
class(x)
#> [1] "POSIXct" "POSIXt" 
class(update(x, hour = 20))
#> [1] "POSIXct" "POSIXt" 
hour(x) <- 20
class(x)
#> [1] "POSIXct" "POSIXt" 
  • with 1.5.6
library(lubridate)
x <- dmy("21-02-2010")
class(x)
#> [1] "Date"
class(update(x, hour = 20))
#> [1] "POSIXlt" "POSIXt"
hour(x) <- 20
class(x)
#> [1] "POSIXlt" "POSIXt"

Because of this change, a use case to create datetime from 2 column of a data.frame is not working anymore with dplyr.

  • with 1.5.6
library(dplyr)
library(lubridate) # 1.5.6
DF <- data_frame(date = c("21-10-2012", "10-07-2015"), hour = c("5","20"))
DF %>%
  mutate(date = dmy(date)) %>%
  mutate(dtime = update(date, hour = hour))
#> Error: `mutate` does not support `POSIXlt` results
  • whereas with 1.5.0
library(dplyr)
library(lubridate)

DF <- data_frame(date = c("21-10-2012", "10-07-2015"), hour = c("5","20"))

DF %>%
  mutate(date = dmy(date)) %>%
  mutate(dtime = update(date, hour = hour))
#> Source: local data frame [2 x 3]
#> 
#>         date  hour               dtime
#>       (time) (chr)              (time)
#> 1 2012-10-21     5 2012-10-21 05:00:00
#> 2 2015-07-10    20 2015-07-10 20:00:00

If I change all my code, I could find the old behaviour with tz different to NULL

library(dplyr)
library(lubridate)
DF <- data_frame(date = c("21-10-2012", "10-07-2015"), hour = c("5","20"))
DF %>%
  mutate(date = dmy(date, tz = "UTC")) %>%
  mutate(dtime = update(date, hour = hour))
#> Source: local data frame [2 x 3]
#> 
#>         date  hour               dtime
#>       (time) (chr)              (time)
#> 1 2012-10-21     5 2012-10-21 05:00:00
#> 2 2015-07-10    20 2015-07-10 20:00:00 

I do not know if it should be considered as a problem or not. This issues between POSIXct, POSIXlt and Date sure break some previous code and cause some incompatibility between _lubridate_ and _dplyr_ - at least some checks to be sure to use POSIXct and not POSIXlt.

I believe _very strongly_ that the POSIXlt data structure is an internal implementation detail, and should not be exposed to the user. The data structure (a list of length 9 pretending to be a vector of length n) is fragile, and breaks in a wide variety of cases. Adding dplyr support for POSIXlt is not on the cards.

I would rather we look towards a hybrid data structure, i.e. something that is POSIXct at the core, but uses attributes to store parsed date time components. Then lubridate could use those attributes if available, or could otherwise compute from scratch.

Adding dplyr support for POSIXlt is not on the cards.

Cannot you simply convert it to POSIXct on assignment?

I would rather we look towards a hybrid data structure, i.e. something that is POSIXct at the core, but uses attributes to store parsed date time components.

Yet another data structure for date-times? Sound like a complication.

Lubridate should produce lt objects only when requested or input is already an lt. The fact that update(date, hour = hour) produce an lt object is a concern. We can change that, but I wonder if there was a special reason to update date into lt in the first place.

I'd rather people never produce POSIXlt in the first place, rather than trying to patch things up after the fact.

Lubridate should produce lt objects only when requested or input is already an lt. The fact that update(date, hour = hour) produce an lt object is a concern.

update.Date is define to convert object argument to POSIXlt, then it calls update.POSIXt with a POSIXlt object, and this method give a results in the same class so POSIXlt.

update.Date <- function(object, ...){
  lt <- as.POSIXlt(object, tz = "UTC")
  new <- update(lt, ...)
  if (sum(c(new$hour, new$min, new$sec), na.rm = TRUE)) {
    new
  } else {
    as.Date(new)
  }
}

as.POSIXct could be added in the result to insure the class

update.Date <- function(object, ...){
  lt <- as.POSIXlt(object, tz = "UTC")
  new <- update(lt, ...)
  if (sum(c(new$hour, new$min, new$sec), na.rm = TRUE)) {
    as.POSIXct(new)
  } else {
    as.Date(new)
  }
}

We can change that, but I wonder if there was a special reason to update date into lt in the first place.

I wonder it too know but it is a convenient way to create datetime from date and times columns. Other ways is to use truncated arguments of lubridate's functions or to concatenate in one strings date and times to read them with dmy_hms functions (and others). I find update pretty convenient and as it allows to update hour, minutes and seconds from a Date object, I thought it could be use that way.

I have changed the update on Date to return POSIXct. Surprisingly, that didn't break any tests.

As already said, changing default output of fast_strptime is out of question for consistency and efficiency reasons. Use lt=F, or parse_date_time[2] or high level ymd[_hms] family.

Other ways is to use truncated arguments

BTW, you can still use ymd to produce POSIXct instead of Date by supplying tz argumetn, like ymd(.., tz="UTC"). This should be rarely needed though.

I just want to add that the change in behavior for fast_strptime has cause much headache for me (where older versions of lubridate defaulted to POSIXct, and newer versions default to POSIXlt). If I add in the argument lt=TRUE, users with older versions of lubridate get the error "unused argument (lt = FALSE)". Even if I use:

lubridate (>= 1.5.0),

in the DESCRIPTION, some user only update specific packages (why? I have no idea, it's so easy to just update all....), and R doesn't automatically update versions of dependent packages as far as I can tell. I basically need to check what version of lubridate the user has within my package's function, then call fast_strptime with or without the argument.

Also, for what it's worth, that is still vastly superior than using base R!

I basically need to check what version of lubridate

You can use parse_date_time2 or explicitly convert as.POSIXct(fast_strptime(...)) to avoid checking for the version.

Sorry for the trouble. This lt/ct change (here and in ymd) turned to be a much bigger headache than anyone anticipated, but it's the only right decision.

Was this page helpful?
0 / 5 - 0 ratings