Data.table: test 1960 still failing (grouping sets output mismatch)

Created on 11 Jan 2019  路  13Comments  路  Source: Rdatatable/data.table

@jangorecki this came up earlier in #3168: https://github.com/Rdatatable/data.table/pull/3168#issuecomment-443395054

The output on my machine differs from that on yours and on Travis/Codecov...

It's still going on... if we can figure out and squash before CRAN release that'd be great

DT = data.table(
  color = c("yellow", "red", "green", "red", "green", "red",
            "yellow", "yellow", "green", "green", "green", "yellow",
            "red", "yellow", "red", "green", "yellow", "red", "yellow",
            "red", "green", "yellow", "green", "green"),
  year = structure(c(15340, 15340, 14975, 15706, 15706, 15340,
                     16436, 15340, 15340, 14975, 16436, 15706,
                     16436, 15340, 14975, 14975, 16071, 15340,
                     15706, 16071, 15706, 15340, 16436, 16071), class = "Date"),
  status = structure(c(4L, 3L, 4L, 3L, 2L, 1L, 3L, 4L, 4L, 3L, 4L, 4L,
                       4L, 4L, 1L, 3L, 3L, 2L, 1L, 2L, 3L, 4L, 2L, 4L),
                     .Label = c("active", "archived", "inactive", "removed"),
                     class = "factor"),
  amount = c(1L, 4L, 2L, 3L, 1L, 5L, 1L, 1L, 4L, 2L, 3L, 1L,
             5L, 4L, 2L, 2L, 4L, 3L, 3L, 2L, 4L, 4L, 1L, 2L),
  value = c(2.5, 2, 3, 3, 2.5, 3.5, 2.5, 3.5, 3, 2.5, 3.5, 2.5, 2,
            2.5, 3, 3, 3, 3, 3, 3, 2, 2.5, 3, 3)
)
groupingsets(DT[ , .(amount, value)], j = 5, by = character(0L),
             sets = list(character()), id=TRUE)

Output in test.data.table() (also for R --vanilla):

Running test id 1960      Test 1960 ran without errors but failed check that x equals y:
> x = ans 
   grouping V1  [Key= Types=int,dou Classes=int,num]
1:       NA  5                                      
> y = data.table(grouping = 0L, V1 = 5) 
   grouping V1  [Key= Types=int,dou Classes=int,num]
1:        0  5                                      
Column 'grouping': 'is.NA' value mismatch: 0 in current 1 in target

My sessionInfo():

R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] xts_0.11-2        zoo_1.8-4         bit64_0.9-7       bit_1.1-14       
[5] data.table_1.11.9

loaded via a namespace (and not attached):
[1] compiler_3.5.1  tools_3.5.1     grid_3.5.1      lattice_0.20-38

Related: #3173

All 13 comments

I don't think it is an urgent issue for 1.12.0, using grouping sets and constant j doesn't make much sense.
You seems to have some extra packages attached, be sure to try on clean data.table-only session.
Grouping ID column is created using the following, check if you are getting NA or 0 already there

by = character(0L)
sets = list(character())
lapply(sets, function(by.set) strtoi(paste(c("1", "0")[by %chin% by.set + 1L], collapse=""), base=2L))
#[[1]]
#[1] 0

you can also try putting browser() to investigate this branch
https://github.com/Rdatatable/data.table/blob/6346a88ab76dbdbb60746c28a4f5c9338e29260c/R/groupingsets.R#L79-L82
and then finally also
https://github.com/Rdatatable/data.table/blob/6346a88ab76dbdbb60746c28a4f5c9338e29260c/R/groupingsets.R#L108-L111
removing bug as of now as it is possible it is machine dependent

@jangorecki extra packages are attached by test.data.table(), are they supposed to be un-attached by the time test.data.table() finishes?

This was run via

$ R --vanilla
> library(data.table)
> test.data.table()

And yes, indeed we've narrowed the issue:

by = character(0L)
sets = list(character())
lapply(sets, function(by.set) strtoi(paste(c("1", "0")[by %chin% by.set + 1L], collapse=""), base=2L))
#[[1]]
# [1] NA

and in fact:

by = character(0L)
sets = list(character())
lapply(sets, function(by.set) {
    dput(by.set)
    dput(by %chin% by.set + 1L)
    dput(c("1", "0")[by %chin% by.set + 1L])
    dput(paste(c("1", "0")[by %chin% by.set + 1L], collapse=""))
    dput(strtoi(paste(c("1", "0")[by %chin% by.set + 1L], collapse=""), base=2L))
    strtoi(paste(c("1", "0")[by %chin% by.set + 1L], collapse=""), base=2L)
})
# character(0)
# integer(0)
# character(0)
# ""
# NA_integer_
# [[1]]
# [1] NA

From what I see I'm amazed other machines are coming out with 0!

Manual of strtoi

An integer vector of the same length as x. Values which cannot be interpreted as integers or would overflow are returned as NA_integer_.

maybe macOS is not equipped with proper strtol library, you can try to debug that outside of R

I have a remote machine where 0L is coming out:

by = character(0L)
sets = list(character())
lapply(sets, function(by.set) {
    dput(by.set)
    dput(by %chin% by.set + 1L)
    dput(c("1", "0")[by %chin% by.set + 1L])
    dput(paste(c("1", "0")[by %chin% by.set + 1L], collapse=""))
    dput(strtoi(paste(c("1", "0")[by %chin% by.set + 1L], collapse=""), base=2L))
    strtoi(paste(c("1", "0")[by %chin% by.set + 1L], collapse=""), base=2L)
})

# character(0)
# integer(0)
# character(0)
# ""
# 0L
# [[1]]
# [1] 0

so it's really down to strtoi output, not data.table...

sessionInfo() on a successful machine:

R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.5 LTS

Matrix products: default
BLAS: /opt/conda/lib/R/lib/libRblas.so
LAPACK: /opt/conda/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] data.table_1.11.9

loaded via a namespace (and not attached):
[1] compiler_3.4.1 tools_3.4.1

yes,

strtoi("", base=2L)
#[1] 0

try pasting this code into strtol.c

#include <stdio.h>
#include <stdlib.h>

int main () {
   char str[5] = "";
   char *ptr;
   int ans = strtol(str, &ptr, 2);
   printf("ans: %d\n", ans);
   printf("leftover: %s\n", ptr);
   return(0);
}

and

gcc strtol.c -o strtol.out
./strtol.out
#ans: 0
#leftover: 

Hmm I get ans 0 from the C program...

Maybe time to raise a bug on R-devel... at very least the documentation needs to be clarified...

Values which cannot be interpreted as integers or would overflow are returned as NA_integer_.

doc is correct, the problem is lack of consistency to C lib

anyway we can ignore that and just add extra branch for !nzchar

Yes sounds like the right solution. And I'll follow up with r-devel separately.

I'm not fully following. It sounds like some sort of change is needed to data.table in this 1.12.0 to ensure tests pass consistently on all platforms?

@mattdowle yes, a very simple fix I think, testing now

Was this page helpful?
0 / 5 - 0 ratings

Related issues

alex46015 picture alex46015  路  3Comments

tcederquist picture tcederquist  路  3Comments

jangorecki picture jangorecki  路  3Comments

jimhester picture jimhester  路  3Comments

sengoku93 picture sengoku93  路  3Comments