Hi Hadley,
any ideas on why it's not reliably possible to map ISO 8601 week numbers to month of the year numbers? It's also not possible if using the US or UK convention. Tried to outline the problem in this Stackoverflow post. As so often with that sort of stuff I'm suspecting Windows to be the problem here (possibly mixed with a German locale).
Is there some hidden kung-fu in lubridate
to hack around this nasty problem? If not, would you kindly consider addressing this issue in one of your next releases?
Best regards,
Janko
This issue is not well defined. Could you please state precisely the functionality that you would like to see in lubridate with concrete code examples?
Hi @vspinu and sorry if I wasn't precise enough.
I'm looking for a way to map a particular week of the year to the correct month of the year and have drafted an example function weeksToMonth
below:
weeksToMonth <- function(x, frmt = "%Y-%V-%u", day_of_week = 1) {
y <- sprintf("%s-%s", x, day_of_week)
format(as.POSIXct(y, format = frmt), "%Y-%m")
}
I'm aware that in order to do this at all, an algorithm needs the "additional workaround information" of the particular day of the week as otherwise things are completely ambiguous (see argument day_of_week
below).
Reason: in order to produce inputs of format YYYY-<weeknumber>
(posix <- c(as.POSIXct(c("2015-12-24", "2015-12-31")),
seq(as.POSIXct("2016-01-01"), length.out = 6, by = "1 week")))
# [1] "2015-12-24 CET" "2015-12-31 CET" "2016-01-01 CET" "2016-01-08 CET"
# [5] "2016-01-15 CET" "2016-01-22 CET" "2016-01-29 CET" "2016-02-05 CET"
weeksToMonth(format(posix, "%Y-%V"))
# [1] "2015-01" "2015-01" "2016-01" "2016-01" "2016-01" "2016-01" "2016-01" "2016-01"
weeksToMonth(format(posix, "%Y-%U"), frmt = "%Y-%U-%u")
# [1] "2015-12" "2015-12" NA "2016-01" "2016-01" "2016-01" "2016-01" "2016-02"
weeksToMonth(format(posix, "%Y-%W"), frmt = "%Y-%W-%u")
# [1] "2015-12" "2015-12" NA "2016-01" "2016-01" "2016-01" "2016-01" "2016-02"
weeksToMonth(format(posix, "%Y-%V"))
# [1] "2015-12" "2015-12" "2015-12" "2016-01" "2016-01" "2016-01" "2016-01" "2016-02"
weeksToMonth(format(posix, "%Y-%U"), frmt = "%Y-%U-%u")
# [1] "2015-12" "2015-12" "2016-01" "2016-01" "2016-01" "2016-01" "2016-01" "2016-02"
weeksToMonth(format(posix, "%Y-%W"), frmt = "%Y-%W-%u")
# [1] "2015-12" "2015-12" "2016-01" "2016-01" "2016-01" "2016-01" "2016-01" "2016-02"
As this seems to be a hard one with no easy answer (for Windows OS and German locale, at least), I also posted to r-help
which possibly gives you even more background information: http://r.789695.n4.nabble.com/Match-ISO-8601-week-of-year-numbers-to-month-of-year-numbers-on-Windows-with-German-locale-td4728059.html
In simple words, do you want to be able to parse weeks (US, ISO, EU)? Aka, be able to parse %V, %U and %W formats?
As in:
parse_date_time(c("2015-52", "2015-53"), "YV")
## [1] "2015-12-01 UTC" "2015-12-01 UTC"
Note that R doesn't have partial date-times other than Date object. So the result of this should be either POSIXct or Date.
Yes, if there's no "direct approach", I'd like to parse weeks to a POSIX date (including whatever workarounds are necessary to get there; e.g. defining a day of the week) to then express the date as YYYY-mm (or %Y-%m
) again.
Note, though, that while this seems to work on MacOS and Ubuntu/Linux at least for %U
and %W
, it doesn't on Windows (at least for me operating on German locale settings)
It should be straightforward addition to the internal parser, and should work the same on all platforms. System's strptime is known for buggy/inconsistent parsing of partial date-times.
I'm not familiar with the internals of lubridate
, but from your answer I take it you're not using whateverstrptime
is using internaly but some better/more consistent parser, is that correct? In this case it would be really awesome if you could consider adding parsing weeks to the functionality!
It seems strange to me that the documentation for parse_date_time()
suggests that we can use W
to parse weeks into dates but that it doesn't work at all:
lubridate::parse_date_time("2015 03", "Y W")
#> [1] "2015-10-29 UTC"
Created on 2020-10-29 by the reprex package (v0.3.0)
Created on 2020-10-29 by the reprex package (v0.3.0)
Just giving this a :+1: as @hadley suggests in #729
In particular, it is annoying that it as_date()
incorrectly parses the ISO 8601 standard week format incorrectly (interpreting week as month) without any warning:
lubridate::as_date("2019-W02-1")
#> [1] "2019-02-01"
Created on 2020-12-01 by the reprex package (v0.3.0)
If we force strptime evaluation I get a different silent error in both ISO week, %V
and week %W
. (this time using the current month and day instead of the correct one).
lubridate::as_date("2019-W02-1", format = "%Y-W%V-%d")
#> [1] "2019-12-01"
lubridate::as_date("2019-W02-1", format = "%Y-W%W-%d")
#> [1] "2019-12-01"
Created on 2020-12-01 by the reprex package (v0.3.0)
Related, there is "%G" (week-based year, or more precisely ISO-week-based year, relevant for year-ends).
This is also not supported by lubridate nor in strptime()
as per its help, but does work in format()
.
I suspect this has become more relevant as more week-based data using ISO week standard has been released due to Covid-19.
It would be very helpful to have lubridate as the one go-to package to handle even these seemingly odd situations.
A brief investigation of what happens in different scenarios and parsers:
lubridate::parse_date_time("2020-W53-1", "%G-%U-%u") # should return a date in 2020
#> Error in FUN(X[[i]], ...): Unknown formats supplied: G
lubridate::parse_date_time("2020-W53-7", "%G-%U-%u") # should return a date in 2021
#> Error in FUN(X[[i]], ...): Unknown formats supplied: G
lubridate::parse_date_time("2020-W53-1", "%G-%V-%u") # should return a date in 2020
#> Error in FUN(X[[i]], ...): Unknown formats supplied: GV
lubridate::parse_date_time("2020-W53-7", "%G-%V-%u") # should return a date in 2021
#> Error in FUN(X[[i]], ...): Unknown formats supplied: GV
strptime("2020-W53-1", "%G-%U-%u") # should return a date in 2020
#> [1] NA
strptime("2020-W53-7", "%G-%U-%u") # should return a date in 2021
#> [1] NA
# the wrong way to do it, correctly fails (though with odd message) on wk 53 when using %Y and %U
lubridate::as_date("2020-W53-1", format = "%Y-W%U-%u")
#> Warning in strptime(x, format, tz = "UTC"): (0-based) yday 369 in year 2020 is
#> invalid
#> [1] NA
lubridate::as_date("2021-W01-1", format = "%Y-W%U-%u")
#> [1] "2021-01-04"
lubridate::as_date("2020-W53-1", format = "%G-W%U-%u")
#> Warning in strptime(x, format, tz = "UTC"): (0-based) yday 371 in year
#> -2147481748 is invalid
#> [1] NA
lubridate::as_date("2021-W01-1", format = "%Y-W%U-%u")
#> [1] "2021-01-04"
# ISOweek handles this correctly
ISOweek::ISOweek2date("2020-W53-1")
#> [1] "2020-12-28"
ISOweek::ISOweek2date("2020-W53-7")
#> [1] "2021-01-03"
ISOweek::date2ISOweek("2020-12-28") # should be in ISO wk 53 of 2020
#> [1] "2020-W53-1"
ISOweek::date2ISOweek("2021-01-03") # ditto
#> [1] "2020-W53-7"
ISOweek::date2ISOweek("2021-01-04") # should be ISO week 1 of 2021
#> [1] "2021-W01-1"
# somehow format does too
format(as.Date("2020-12-28"), "%G-W%V") # should be in ISO wk 53 of 2020
#> [1] "2020-W53"
format(as.Date("2021-01-03"), "%G-W%V") # ditto
#> [1] "2020-W53"
format(as.Date("2021-01-04"), "%G-W%V") # should be ISO week 1 of 2021
#> [1] "2021-W01"
Created on 2021-01-31 by the reprex package (v0.3.0)
As many people here have seen, strptime()
lacks the capability to correctly _parse_ the ISO 8601 week based format, although strftime()
can _format_ it.
This is in the docs for ?strptime
:
%V
Week of the year as decimal number (01–53) as defined in ISO 8601. If the week (starting on Monday)
containing 1 January has four or more days in the new year, then it is considered week 1.
Otherwise, it is the last week of the previous year, and the next week is week 1.
(Accepted but ignored on input.)
The key bit is the last line in parenthesis. This %V
command is ignored when parsing, but works when formatting.
The clock package fully supports the ISO 8601 week based format. The correct format string to use is "%G-W%V-%u"
.
library(clock)
date_parse(c("2019-W02-1", "2020-W01-1", "2020-W02-1"), format = "%G-W%V-%u")
#> [1] "2019-01-07" "2019-12-30" "2020-01-06"
https://clock.r-lib.org/reference/date_parse.html
It also supports _partial_ ISO 8601 week based dates that don't have a day. There currently isn't a parser straight into a week precision type, but if you add a dummy -1
day then you can make it work:
library(clock)
library(magrittr)
week_strings <- c("2019-W02", "2020-W01", "2020-W02")
week_strings <- paste0(week_strings, "-1")
week_strings %>%
date_parse(format = "%G-W%V-%u") %>%
as_iso_year_week_day() %>%
calendar_narrow("week")
#> <iso_year_week_day<week>[3]>
#> [1] "2019-W02" "2020-W01" "2020-W02"
Most helpful comment
Just giving this a :+1: as @hadley suggests in #729
In particular, it is annoying that it
as_date()
incorrectly parses the ISO 8601 standard week format incorrectly (interpreting week as month) without any warning:Created on 2020-12-01 by the reprex package (v0.3.0)
If we force strptime evaluation I get a different silent error in both ISO week,
%V
and week%W
. (this time using the current month and day instead of the correct one).Created on 2020-12-01 by the reprex package (v0.3.0)