@jangorecki this came up earlier in #3168: https://github.com/Rdatatable/data.table/pull/3168#issuecomment-443395054
The output on my machine differs from that on yours and on Travis/Codecov...
It's still going on... if we can figure out and squash before CRAN release that'd be great
DT = data.table(
color = c("yellow", "red", "green", "red", "green", "red",
"yellow", "yellow", "green", "green", "green", "yellow",
"red", "yellow", "red", "green", "yellow", "red", "yellow",
"red", "green", "yellow", "green", "green"),
year = structure(c(15340, 15340, 14975, 15706, 15706, 15340,
16436, 15340, 15340, 14975, 16436, 15706,
16436, 15340, 14975, 14975, 16071, 15340,
15706, 16071, 15706, 15340, 16436, 16071), class = "Date"),
status = structure(c(4L, 3L, 4L, 3L, 2L, 1L, 3L, 4L, 4L, 3L, 4L, 4L,
4L, 4L, 1L, 3L, 3L, 2L, 1L, 2L, 3L, 4L, 2L, 4L),
.Label = c("active", "archived", "inactive", "removed"),
class = "factor"),
amount = c(1L, 4L, 2L, 3L, 1L, 5L, 1L, 1L, 4L, 2L, 3L, 1L,
5L, 4L, 2L, 2L, 4L, 3L, 3L, 2L, 4L, 4L, 1L, 2L),
value = c(2.5, 2, 3, 3, 2.5, 3.5, 2.5, 3.5, 3, 2.5, 3.5, 2.5, 2,
2.5, 3, 3, 3, 3, 3, 3, 2, 2.5, 3, 3)
)
groupingsets(DT[ , .(amount, value)], j = 5, by = character(0L),
sets = list(character()), id=TRUE)
Output in test.data.table() (also for R --vanilla):
Running test id 1960 Test 1960 ran without errors but failed check that x equals y:
> x = ans
grouping V1 [Key= Types=int,dou Classes=int,num]
1: NA 5
> y = data.table(grouping = 0L, V1 = 5)
grouping V1 [Key= Types=int,dou Classes=int,num]
1: 0 5
Column 'grouping': 'is.NA' value mismatch: 0 in current 1 in target
My sessionInfo():
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] xts_0.11-2 zoo_1.8-4 bit64_0.9-7 bit_1.1-14
[5] data.table_1.11.9
loaded via a namespace (and not attached):
[1] compiler_3.5.1 tools_3.5.1 grid_3.5.1 lattice_0.20-38
Related: #3173
I don't think it is an urgent issue for 1.12.0, using grouping sets and constant j doesn't make much sense.
You seems to have some extra packages attached, be sure to try on clean data.table-only session.
Grouping ID column is created using the following, check if you are getting NA or 0 already there
by = character(0L)
sets = list(character())
lapply(sets, function(by.set) strtoi(paste(c("1", "0")[by %chin% by.set + 1L], collapse=""), base=2L))
#[[1]]
#[1] 0
you can also try putting browser() to investigate this branch
https://github.com/Rdatatable/data.table/blob/6346a88ab76dbdbb60746c28a4f5c9338e29260c/R/groupingsets.R#L79-L82
and then finally also
https://github.com/Rdatatable/data.table/blob/6346a88ab76dbdbb60746c28a4f5c9338e29260c/R/groupingsets.R#L108-L111
removing bug as of now as it is possible it is machine dependent
@jangorecki extra packages are attached by test.data.table(), are they supposed to be un-attached by the time test.data.table() finishes?
This was run via
$ R --vanilla
> library(data.table)
> test.data.table()
And yes, indeed we've narrowed the issue:
by = character(0L)
sets = list(character())
lapply(sets, function(by.set) strtoi(paste(c("1", "0")[by %chin% by.set + 1L], collapse=""), base=2L))
#[[1]]
# [1] NA
and in fact:
by = character(0L)
sets = list(character())
lapply(sets, function(by.set) {
dput(by.set)
dput(by %chin% by.set + 1L)
dput(c("1", "0")[by %chin% by.set + 1L])
dput(paste(c("1", "0")[by %chin% by.set + 1L], collapse=""))
dput(strtoi(paste(c("1", "0")[by %chin% by.set + 1L], collapse=""), base=2L))
strtoi(paste(c("1", "0")[by %chin% by.set + 1L], collapse=""), base=2L)
})
# character(0)
# integer(0)
# character(0)
# ""
# NA_integer_
# [[1]]
# [1] NA
From what I see I'm amazed other machines are coming out with 0!
Manual of strtoi
An integer vector of the same length as x. Values which cannot be interpreted as integers or would overflow are returned as NA_integer_.
maybe macOS is not equipped with proper strtol library, you can try to debug that outside of R
I have a remote machine where 0L is coming out:
by = character(0L)
sets = list(character())
lapply(sets, function(by.set) {
dput(by.set)
dput(by %chin% by.set + 1L)
dput(c("1", "0")[by %chin% by.set + 1L])
dput(paste(c("1", "0")[by %chin% by.set + 1L], collapse=""))
dput(strtoi(paste(c("1", "0")[by %chin% by.set + 1L], collapse=""), base=2L))
strtoi(paste(c("1", "0")[by %chin% by.set + 1L], collapse=""), base=2L)
})
# character(0)
# integer(0)
# character(0)
# ""
# 0L
# [[1]]
# [1] 0
so it's really down to strtoi output, not data.table...
sessionInfo() on a successful machine:
R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.5 LTS
Matrix products: default
BLAS: /opt/conda/lib/R/lib/libRblas.so
LAPACK: /opt/conda/lib/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.11.9
loaded via a namespace (and not attached):
[1] compiler_3.4.1 tools_3.4.1
yes,
strtoi("", base=2L)
#[1] 0
try pasting this code into strtol.c
#include <stdio.h>
#include <stdlib.h>
int main () {
char str[5] = "";
char *ptr;
int ans = strtol(str, &ptr, 2);
printf("ans: %d\n", ans);
printf("leftover: %s\n", ptr);
return(0);
}
and
gcc strtol.c -o strtol.out
./strtol.out
#ans: 0
#leftover:
Hmm I get ans 0 from the C program...
Maybe time to raise a bug on R-devel... at very least the documentation needs to be clarified...
Values which cannot be interpreted as integers or would overflow are returned as NA_integer_.
doc is correct, the problem is lack of consistency to C lib
anyway we can ignore that and just add extra branch for !nzchar
Yes sounds like the right solution. And I'll follow up with r-devel separately.
I'm not fully following. It sounds like some sort of change is needed to data.table in this 1.12.0 to ensure tests pass consistently on all platforms?
@mattdowle yes, a very simple fix I think, testing now
r-devel thread for the record: