Data.table: Can infinite values be supported in IDate?

Created on 7 Jul 2017 · 8Comments · Source: Rdatatable/data.table

I'm taking the max or min of an empty IDate vector, but finding my code breaks since the return value is sometimes float, sometimes integer:

library(data.table)
DT = data.table(d = as.IDate(Sys.Date()), g = 1:2, cond = c(TRUE, FALSE))
DT[, max(d[cond]), by=g]

Error in `[.data.table`(DT, , max(d[cond]), by = g) : 
  Column 1 of result for group 2 is type 'double' but expecting type 'integer'. Column types must be consistent for each group.
In addition: Warning message:
In max.default(integer(0), na.rm = FALSE) :
  no non-missing arguments to max; returning -Inf

Since R's integer storage mode doesn't allow Inf or -Inf, I'm guessing IDate cannot be used for this..? My workaround is dropping to Date class:

DT[, {
  d = as.Date(d)
  max(d[cond])
}, by=g]

breaking-change idatitime question

Source

franknarf1

👍1

Most helpful comment

related discussion on r-devel https://stat.ethz.ch/pipermail/r-devel/2019-March/077435.html

jangorecki on 6 Mar 2019

👍2

All 8 comments

just adding that i've run into this issue as well, had to write a wrapper function for the special case.

MichaelChirico on 7 Jul 2017

I'm also dealing with this issue. If you want to stick with IDate you can wrap it with an as.IDate:

DT[, as.IDate(max(d[cond])), by=g]

dlksmc on 12 Jan 2018

related discussion on r-devel https://stat.ethz.ch/pipermail/r-devel/2019-March/077435.html

jangorecki on 6 Mar 2019

👍2

We could do something like this (and analogously for min)?

max.IDate = function(x, na.rm = FALSE) {
  if (!length(x)) {
    out = as.IDate(-.Machine$integer.max)
    warning("No non-missing arguments to max; returning the minimum integer date, ", .Machine$integer.max, " days before epoch time: ", out)
    return(out)
  }
  return(as.IDate(max(as.integer(x), na.rm = na.rm)))
}

Otherwise, change the warning to give some advice about adding conditions like:

if (any(idx <- !is.na(x))) max(x[idx])

.Machine$integer.max is the best we can do -- supporting Inf in IDate columns won't be possible.

MichaelChirico on 11 Sep 2019

A little off-topic but I really think max(double()) returns -Inf is annoying. Personally, I never ever need it returns an infinite number. I always prefer NA for such case - if there's no number then we can't find the "max" -> so it's Not / Applicable.

So personally I avoid to use the na.rm argument in max(). If I have to, I would write a small function like function(x) {x = na.omit(x); if (length(x)) max(x) else NA_real_ }.

Maybe we should provide an alternative to the base::max(), which never returns an Inf...

shrektan on 11 Sep 2019

This is a mathematical consideration, from ?max:

The minimum and maximum of a numeric empty set are +Inf and -Inf (in this order!) which ensures transitivity, e.g., min(x1, min(x2)) == min(x1, x2). For numeric x max(x) == -Inf and min(x) == +Inf whenever length(x) == 0 (after removing missing values if requested). However, pmax and pmin return NA if all the parallel elements are NA even for na.rm = TRUE.

Also note that in this case, the issue doesn't come from na.rm, but rather from subsetting the input and getting empty output.

I think the behavior is a bit annoying but reasonable. As you said, if you anticipate an empty set into min, you code around it -- the behavior here is a shield against cases when the empty set is unanticipated. I'd rather get hit with a warning then have data.table do something silently in such a case.

MichaelChirico on 12 Sep 2019

I like the idea of using +/- the machine max int, though I guess further related changes would be desirable for consistency:

print.IDate would print both as NA like print.Date does
as.IDate would map +/- Inf as well as other doubles outside the int range to these values
as.Date would map these values to +/- Inf
+.IDate/-.IDate would need to ensure that the results are still ints

In math-like jargon, the goal is that the space of IDates is closed under +, min, max and that NAs in an IDate only mean missing data (while inexpressibly large and small IDates are distinguished from each other and from NA).