Data.table: the class of objects inherited from data.table in list column may get changed

Created on 27 Mar 2020  Â·  12Comments  Â·  Source: Rdatatable/data.table

I am developing a class inheriting data.table but having trouble with the derived class being dropped. As can be seen in the example, when assigning the derived class to a list column, the class is initially retained but is subsequently dropped. Interestingly, it is only dropped for elements upon which an evaluation is performed.

I found one SO thread discussing similar but stated the issue had been resolved in 1.9.4. I am using 1.12.8.
https://stackoverflow.com/questions/24881842/deriving-a-new-class-from-data-table

Is this expected behavior?

# Minimal reproducible example

> tmp <- data.table(A = 1:5, B = letters[1:5])
> 
> tmp[, obj := .(.({
+   data.table(C = 1:5) %>%
+     setattr("class", c("newclass", class(.)))
+ })), by = A]
> tmp
   A B        obj
1: 1 a <newclass>
2: 2 b <newclass>
3: 3 c <newclass>
4: 4 d <newclass>
5: 5 e <newclass>
> 
> str(tmp)
Classes ‘data.table’ and 'data.frame':  5 obs. of  3 variables:
 $ A  : int  1 2 3 4 5
 $ B  : chr  "a" "b" "c" "d" ...
 $ obj:List of 5
  ..$ :Classes ‘newclass’, ‘data.table’ and 'data.frame':   5 obs. of  1 variable:
  .. ..$ C: int  1 2 3 4 5
  .. ..- attr(*, ".internal.selfref")=<externalptr> 
  ..$ :Classes ‘newclass’, ‘data.table’ and 'data.frame':   5 obs. of  1 variable:
  .. ..$ C: int  1 2 3 4 5
  .. ..- attr(*, ".internal.selfref")=<externalptr> 
  ..$ :Classes ‘newclass’, ‘data.table’ and 'data.frame':   5 obs. of  1 variable:
  .. ..$ C: int  1 2 3 4 5
  .. ..- attr(*, ".internal.selfref")=<externalptr> 
  ..$ :Classes ‘newclass’, ‘data.table’ and 'data.frame':   5 obs. of  1 variable:
  .. ..$ C: int  1 2 3 4 5
  .. ..- attr(*, ".internal.selfref")=<externalptr> 
  ..$ :Classes ‘newclass’, ‘data.table’ and 'data.frame':   5 obs. of  1 variable:
  .. ..$ C: int  1 2 3 4 5
  .. ..- attr(*, ".internal.selfref")=<externalptr> 
 - attr(*, ".internal.selfref")=<externalptr> 
> 
> class(tmp[1, obj[[1]]])
[1] "data.table" "data.frame"```
> tmp
   A B          obj
1: 1 a <data.table>
2: 2 b   <newclass>
3: 3 c   <newclass>
4: 4 d   <newclass>
5: 5 e   <newclass>

# Output of sessionInfo()

R version 3.5.3 (2019-03-11)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 16299)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] exoR_0.0.0.9000   data.table_1.12.8 purrr_0.3.3       tidyr_1.0.2       ggplot2_3.3.0    
 [6] dplyr_0.8.5       lubridate_1.7.4   LHfuns_0.0.0.9000 gfuns_0.0.0.9000  magrittr_1.5     

loaded via a namespace (and not attached):
 [1] xfun_0.12         zoo_1.8-7         tidyselect_1.0.0  remotes_2.1.1     pander_0.6.3     
 [6] lattice_0.20-40   RMariaDB_1.0.8    colorspace_1.4-1  vctrs_0.2.4       testthat_2.3.2   
[11] usethis_1.5.1     rlang_0.4.5       pkgbuild_1.0.6    pillar_1.4.3      glue_1.3.2       
[16] withr_2.1.2       DBI_1.1.0         bit64_0.9-7       readxl_1.3.1      sessioninfo_1.1.1
[21] lifecycle_0.2.0   stringr_1.4.0     cellranger_1.1.0  munsell_0.5.0     gtable_0.3.0     
[26] devtools_2.2.2    memoise_1.1.0     knitr_1.28        strucchange_1.5-2 callr_3.4.2      
[31] ps_1.3.2          fansi_0.4.1       Rcpp_1.0.3        scales_1.1.0      backports_1.1.5  
[36] desc_1.2.0        pkgload_1.0.2     fs_1.3.2          bit_1.1-15.2      hms_0.5.3        
[41] packrat_0.5.0     digest_0.6.25     stringi_1.4.6     processx_3.4.2    grid_3.5.3       
[46] rprojroot_1.3-2   cli_2.0.2         tools_3.5.3       sandwich_2.5-1    tibble_2.1.3     
[51] crayon_1.3.4      pkgconfig_2.0.3   ellipsis_0.3.0    xml2_1.2.5        prettyunits_1.1.1
[56] roxygen2_7.1.0    assertthat_0.2.1  rstudioapi_0.11   R6_2.4.1          compiler_3.5.3   
bug non-atomic column

All 12 comments

Thanks for filling out the report.
Are all those attached and loaded packages required to reproduce your issue? if not, then your example is definietly not a minimal one. It is also not obvious how they have been loaded or attached, as that part of the code seems to be missing.

Sorry. I ran it again without loading everything. Same result.

Code:

library(data.table)

tmp <- data.table(A = 1:5, B = letters[1:5])
tmp[, obj := .(.({
  x <- data.table(C = 1:5)
  setattr(x, "class", c("newclass", class(x)))
  x
})), by = A]

tmp

str(tmp)

class(tmp[1, obj[[1]]])

tmp

sessionInfo()

Result:

> tmp <- data.table(A = 1:5, B = letters[1:5])
> tmp[, obj := .(.({
+   x <- data.table(C = 1:5)
+   setattr(x, "class", c("newclass", class(x)))
+   x
+ })), by = A]
> 
> tmp
   A B        obj
1: 1 a <newclass>
2: 2 b <newclass>
3: 3 c <newclass>
4: 4 d <newclass>
5: 5 e <newclass>
> 
> str(tmp)
Classes ‘data.table’ and 'data.frame':  5 obs. of  3 variables:
 $ A  : int  1 2 3 4 5
 $ B  : chr  "a" "b" "c" "d" ...
 $ obj:List of 5
  ..$ :Classes ‘newclass’, ‘data.table’ and 'data.frame':   5 obs. of  1 variable:
  .. ..$ C: int  1 2 3 4 5
  .. ..- attr(*, ".internal.selfref")=<externalptr> 
  ..$ :Classes ‘newclass’, ‘data.table’ and 'data.frame':   5 obs. of  1 variable:
  .. ..$ C: int  1 2 3 4 5
  .. ..- attr(*, ".internal.selfref")=<externalptr> 
  ..$ :Classes ‘newclass’, ‘data.table’ and 'data.frame':   5 obs. of  1 variable:
  .. ..$ C: int  1 2 3 4 5
  .. ..- attr(*, ".internal.selfref")=<externalptr> 
  ..$ :Classes ‘newclass’, ‘data.table’ and 'data.frame':   5 obs. of  1 variable:
  .. ..$ C: int  1 2 3 4 5
  .. ..- attr(*, ".internal.selfref")=<externalptr> 
  ..$ :Classes ‘newclass’, ‘data.table’ and 'data.frame':   5 obs. of  1 variable:
  .. ..$ C: int  1 2 3 4 5
  .. ..- attr(*, ".internal.selfref")=<externalptr> 
 - attr(*, ".internal.selfref")=<externalptr> 
> 
> class(tmp[1, obj[[1]]])
[1] "data.table" "data.frame"
> 
> tmp
   A B          obj
1: 1 a <data.table>
2: 2 b   <newclass>
3: 3 c   <newclass>
4: 4 d   <newclass>
5: 5 e   <newclass>
> 
> sessionInfo()
R version 3.5.3 (2019-03-11)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 16299)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.12.8

loaded via a namespace (and not attached):
[1] compiler_3.5.3 tools_3.5.3    

If the element inside the list is also a data.table object, the other class attributes will be removed when the element is touched inside data.table::[. It looks like a bug to me.

The mini repro example


library(data.table)
val_dt <- data.table(COL = 'dt')
class(val_dt) <- c('aaa', class(val_dt))
val_df <- data.frame(COL = 'df', stringsAsFactors = FALSE)
class(val_df) <- c('aaa', class(val_df))
dt <- data.table(
  A = 1:2,
  B = list(val_dt, val_dt),
  C = list(val_df, val_df)
)
dt
#>        A          B          C
#>    <int>     <list>     <list>
#> 1:     1 <aaa[1x1]> <aaa[1x1]>
#> 2:     2 <aaa[1x1]> <aaa[1x1]>
dt[1, B[[1]]]
#>       COL
#>    <char>
#> 1:     dt
dt[1, C[[1]]]
#>   COL
#> 1  df
dt
#>        A                 B          C
#>    <int>            <list>     <list>
#> 1:     1 <data.table[1x1]> <aaa[1x1]>
#> 2:     2        <aaa[1x1]> <aaa[1x1]>

Created on 2020-03-28 by the reprex package (v0.3.0)

Just a note to myself, there're actually two issues, considering DT[,j] and when j returns a data.table:

  1. j will be forced to share the same class as DT(a fix for #64), but it should do that only when DT is a "plain" data.table (no other classes)

  2. j actually returns the reference of the stored data.table, when j is a list of data.tables, but it should copy the result.


After a 2nd though, the no.2 point may be expected behavior, as the user should be aware of that. I'm looking forward to your inputs.

Any progress on this? I see shrektan noted "looking forward to your inputs". Does this mean you are waiting for me or someone on the data.table team? I'm not intending to be pushy but am having to work around the issue in a package under development and my life would be much easier if I could rely on the attribute being retained.

@HJAllen The issue you bring up should have been addressed by #4354 but I think the PR needs @mattdowle 's review before merging it. In the meantime, you could use the PR version via remotes::install_github('rdatatable/data.table#4354').

Thanks! I will give it a try.

I confirm that the fix solves the problem. Thanks!

lets not close this issue till the fix is not yet merged to master branch

Sorry to bother, but any idea when this might get merged?

@jangorecki Any updates on why the fix by @shrektan can not be merged? I just ran into the same problem in a real-world scenario. I see there are some merge conflicts but all of them is trivial to resolve (willing to help if required).

@tdeenes AFAIK the fix can be merged, it is just that we have a long queue of open PRs.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jangorecki picture jangorecki  Â·  3Comments

arunsrinivasan picture arunsrinivasan  Â·  3Comments

DavidArenburg picture DavidArenburg  Â·  3Comments

alex46015 picture alex46015  Â·  3Comments

tcederquist picture tcederquist  Â·  3Comments