I'm taking the max or min of an empty IDate vector, but finding my code breaks since the return value is sometimes float, sometimes integer:
library(data.table)
DT = data.table(d = as.IDate(Sys.Date()), g = 1:2, cond = c(TRUE, FALSE))
DT[, max(d[cond]), by=g]
Error in `[.data.table`(DT, , max(d[cond]), by = g) :
Column 1 of result for group 2 is type 'double' but expecting type 'integer'. Column types must be consistent for each group.
In addition: Warning message:
In max.default(integer(0), na.rm = FALSE) :
no non-missing arguments to max; returning -Inf
Since R's integer storage mode doesn't allow Inf or -Inf, I'm guessing IDate cannot be used for this..? My workaround is dropping to Date class:
DT[, {
d = as.Date(d)
max(d[cond])
}, by=g]
just adding that i've run into this issue as well, had to write a wrapper function for the special case.
I'm also dealing with this issue. If you want to stick with IDate you can wrap it with an as.IDate:
DT[, as.IDate(max(d[cond])), by=g]
related discussion on r-devel https://stat.ethz.ch/pipermail/r-devel/2019-March/077435.html
We could do something like this (and analogously for min)?
max.IDate = function(x, na.rm = FALSE) {
if (!length(x)) {
out = as.IDate(-.Machine$integer.max)
warning("No non-missing arguments to max; returning the minimum integer date, ", .Machine$integer.max, " days before epoch time: ", out)
return(out)
}
return(as.IDate(max(as.integer(x), na.rm = na.rm)))
}
Otherwise, change the warning to give some advice about adding conditions like:
if (any(idx <- !is.na(x))) max(x[idx])
.Machine$integer.max is the best we can do -- supporting Inf in IDate columns won't be possible.
A little off-topic but I really think max(double()) returns -Inf is annoying. Personally, I never ever need it returns an infinite number. I always prefer NA for such case - if there's no number then we can't find the "max" -> so it's Not / Applicable.
So personally I avoid to use the na.rm argument in max(). If I have to, I would write a small function like function(x) {x = na.omit(x); if (length(x)) max(x) else NA_real_ }.
Maybe we should provide an alternative to the base::max(), which never returns an Inf...
This is a mathematical consideration, from ?max:
The minimum and maximum of a numeric empty set are +Inf and -Inf (in this order!) which ensures transitivity, e.g., min(x1, min(x2)) == min(x1, x2). For numeric x max(x) == -Inf and min(x) == +Inf whenever length(x) == 0 (after removing missing values if requested). However, pmax and pmin return NA if all the parallel elements are NA even for na.rm = TRUE.
Also note that in this case, the issue doesn't come from na.rm, but rather from subsetting the input and getting empty output.
I think the behavior is a bit annoying but reasonable. As you said, if you anticipate an empty set into min, you code around it -- the behavior here is a shield against cases when the empty set is unanticipated. I'd rather get hit with a warning then have data.table do something silently in such a case.
I like the idea of using +/- the machine max int, though I guess further related changes would be desirable for consistency:
print.IDate would print both as NA like print.Date doesas.IDate would map +/- Inf as well as other doubles outside the int range to these values as.Date would map these values to +/- Inf+.IDate/-.IDate would need to ensure that the results are still intsIn math-like jargon, the goal is that the space of IDates is closed under +, min, max and that NAs in an IDate only mean missing data (while inexpressibly large and small IDates are distinguished from each other and from NA).
A little off-topic but I really think max(double()) returns -Inf is annoying.
Maybe I have just gotten used to it, but I find the behavior makes sense.
sounds good. will do this but it'll be breaking in case anyone was relying on the bounds for other purposes. will have to wait for next release
Most helpful comment
related discussion on r-devel https://stat.ethz.ch/pipermail/r-devel/2019-March/077435.html