I only tested this for hours and minutes (as you can see below), so I'm not sure how general the issue is.
I'm also not sure if indeed it is an issue or just outside the intended use of the function (or if I'm doing something stupid).
> duration("2d 0H 0M 1s")
[1] "176461s (~2.04 days)"
> as.numeric(duration("2d 0H 0M 1s"))
[1] 176461
> 2*24*3600 + 1
[1] 172801
>
> duration("2d 1s")
[1] "172801s (~2 days)"
> as.numeric(duration("2d 1s"))
[1] 172801
>
> as.numeric(duration("2d 0H 0M 1s")) - as.numeric(duration("2d 1s")) == 1*3600 + 1*60
[1] TRUE
>
>
> session_info()
Session info ---------------------------------------------------------------
setting value
version R version 3.3.2 (2016-10-31)
system x86_64, mingw32
ui RStudio (1.0.136)
language (EN)
collate English_Ireland.1252
tz Europe/Berlin
date 2017-01-13
Packages -------------------------------------------------------------------
package * version date source
devtools * 1.12.0 2016-06-24 CRAN (R 3.3.2)
digest 0.6.11 2017-01-03 CRAN (R 3.3.2)
lubridate * 1.6.0 2016-09-13 CRAN (R 3.3.2)
magrittr 1.5 2014-11-22 CRAN (R 3.3.2)
memoise 1.0.0 2016-01-29 CRAN (R 3.3.2)
stringi 1.1.2 2016-10-01 CRAN (R 3.3.2)
stringr 1.1.0 2016-08-19 CRAN (R 3.3.2)
withr 1.0.2 2016-06-20 CRAN (R 3.3.2)
>
You are right. It's a bug in parsing 0 units. Will check this asap.
Hi Vitalie,
I had some free time on my hands recently so I decided to take a look at the parsing code.
Lines 45 and 46 of src/period.c, inside function parse_period_unit()
, read:
out.val = parse_int(c, 100, FALSE);
if(out.val == 0) out.val = 1;
I guess this is to enforce the statement, on line 197 of R/durations.r, that
# Missing numerals default to 1.
i.e. to take care of the case where the numeral is missing but the unit is there (e.g. "1days 3mins" and "days 3mins" will yield the same result).
As noted in a pertinent comment inside function parse_int()
(line 50 of src/utils.c)
// maybe: fixme: this returns 0 if no parsing happened and strict = FALSE
it seems that the current way of detecting missing numerals (receiving a zero from parse_int()
) does not actually distinguish missing numerals from the numeral zero.
If the current (missing numeral) behavior is to be kept, maybe a counter should be implemented to check if at least 1 character has been parsed? You could then make the function return an int
different than 0 (e.g. -2) if no parsing has occurred, which would then be caught in parse_period_unit()
with
if(out.val == -2) out.val = 1;
I'm not familiar enough with the package to judge whether the current missing numeral behavior is the most desirable, but, from a user's perspective, I think I would prefer to be informed if I forgot to pass a numeral next to my unit; it's also more likely (I would think) that, if the input was produced by another function/program/software, a missing numeral really "meant to denote" 0 units and not 1. Or is this not your experience? Like I said, I'm not really sure what the big picture is in this situation.
Don't know if this is entirely correct or actually useful to you (it was certainly not a very extensive investigation) but I just thought I'd share. I didn't run any code/tests, just followed the function trail.
Best regards,
Dulce
Your identification of the problem is correct. There must be an easy fix but I have been very busy recently.
I think I made the decision to default missing units to 1 because that parsing code is likely to be used in other places such as rounding or conversion where most common case is to specify one unit of time ("day", "minute" etc). I think I also didn't think much about 0 units back then. Let me have a fresh look.
I have recently encountered this issue in my work and would appreciate a fix if possible
This issue was fixed back in February. Pls use the dev version or wait for new CRAN release (which should be there soon depending on when CRAN folks will find time to process it).
Most helpful comment
Your identification of the problem is correct. There must be an easy fix but I have been very busy recently.
I think I made the decision to default missing units to 1 because that parsing code is likely to be used in other places such as rounding or conversion where most common case is to specify one unit of time ("day", "minute" etc). I think I also didn't think much about 0 units back then. Let me have a fresh look.