Data.table: Add general warning to fread for integer64 columns

Created on 28 May 2019  路  2Comments  路  Source: Rdatatable/data.table

Hi,

integer64 simply is not transparent in R with wrong results often popping up in unexpected places. E.g. there is no as.matrix method and so it produces wrong results:

> x <- integer64(10)
> x[1:5] <- 1:5
> x
integer64
 [1] 1 2 3 4 5 0 0 0 0 0
> as.matrix(x)
               [,1]
 [1,] 4.940656e-324
 [2,] 9.881313e-324
 [3,] 1.482197e-323
 [4,] 1.976263e-323
 [5,] 2.470328e-323
 [6,]  0.000000e+00
 [7,]  0.000000e+00
 [8,]  0.000000e+00
 [9,]  0.000000e+00
[10,]  0.000000e+00

Because it is dangerous, it would be nice if fread() printed a message/warning by default whenever it creates an integer64 column. Perhaps with a function parameter (no.integer64.warning) to disable it if the user knows what they're doing.

This wouldn't slow down fread() and would keep the functionality discoverable but it doesn't lead to hard-to-track bugs in downstream packages (tsbox::ts_ts() in my case).

Thanks,
Stefan

bit64 fread

Most helpful comment

I'd be completely happy with a message instead of a warning.

I think it's great that the option to parse integers as integer64 exists. It just took me quite a while to debug the problem (especially as it happened in third-party code) and a quick heads-up from fread() would have helped a lot.

For me the single biggest advantage of fread() over readr is that I don't trust read_csv() not to silently corrupt my data and that data.table in general has the best message policy in all of R. Your messages are helpful, proactive and verbose and offer options instead of just errors. And that's why I'd be happy if there was a message of some kind to keep the user in the loop with integer64.

The best argument for a warning is imho that the kind of code that uses warn = 2 is the kind of code that should specifically opt-in into these kinds of features.

All 2 comments

fread has an argument integer64 = getOption("datatable.integer64", "integer64"). I suppose the default for this option could be changed to "warning". We'd have to run this past current users and revdeps to see what their view is. Since warnings are often turned to error in production using options(warn=2) perhaps message()? Please bear in mind that integer64 has been supported in data.table for many years, at user request. The integer64 behavior can be turned off by setting options(datatable.integer64="numeric") although there is an outstanding bug where this isn't respected in an out-of-sample type bump (#2749).
Looks like bit64::integer64 needs an as.matrix method adding, I guess would be best to coerce to numeric for onwards processing on matrix, rather than an integer64 matrix. I could ask about that on the bit64 issues list. Often it's ids that integer64 are used for in data.table (like UPC).

I'd be completely happy with a message instead of a warning.

I think it's great that the option to parse integers as integer64 exists. It just took me quite a while to debug the problem (especially as it happened in third-party code) and a quick heads-up from fread() would have helped a lot.

For me the single biggest advantage of fread() over readr is that I don't trust read_csv() not to silently corrupt my data and that data.table in general has the best message policy in all of R. Your messages are helpful, proactive and verbose and offer options instead of just errors. And that's why I'd be happy if there was a message of some kind to keep the user in the loop with integer64.

The best argument for a warning is imho that the kind of code that uses warn = 2 is the kind of code that should specifically opt-in into these kinds of features.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

mattdowle picture mattdowle  路  3Comments

jameslamb picture jameslamb  路  3Comments

tcederquist picture tcederquist  路  3Comments

rafapereirabr picture rafapereirabr  路  3Comments

symbalex picture symbalex  路  3Comments