I found this bug after I freshly installed the newest available versions of R, RStudio, data.table, and dplyr on a new Mac. I tried to run a simple script that worked on my old computer and ran into this bug. It seemed to happen somewhat unpredictably after running several lines of dplyr, with a session restart occasionally fixing it but only temporarily. Once the bug had "hit" a specific object, the error occurred on any interaction with that object afterwards including RStudio View() and printing the object to the console. I'm a novice but I worked with a friend to create this reproducible example which will hopefully work on other computers. In my actual script, the bug did not always occur in the same part of the code.
library(dplyr)
library(data.table)
exp <-
structure(list(a = 1L, b = 1L, c = 1L, d = 1L, e = 1L, f = 1L, g = 1L, h = 1L, i = 1L, j = 1L, k = 1L, l = 1L, m = 1L, n = 1L, o = 1L, p = 1L, q = 1L, r = 1L, s = 1L, t = 1L, u = 1L, v = 1L, w = 1L, x = 1L, y = 1L, z = 1L,
aa = 1L, ab = 1L, ac = 1L, ad = 1L, ae = 1L, af = 1L, ag = 1L, ah = 1L, ai = 1L, aj = 1L, ak = 1L, al = 1L, am = 1L, an = 1L, ao = 1L, ap = 1L, aq = 1L, ar = 1L, as = 1L, at = 1L, au = 1L, av = 1L, aw = 1L, ax = 1L, ay = 1L, az = 1L,
ba = 1L, bb = 1L, bc = 1L, bd = 1L, be = 1L, bf = 1L, bg = 1L, bh = 1L, bi = 1L, bj = 1L, bk = 1L, bl = 1L, bm = 1L), class = "data.frame", row.names = c(NA, -1L))
expadj <- as.data.table(exp)
expadj <- expadj %>% mutate(a = 11) # overwrite preexisting column (??) (doesn't seem to matter which?)
expadj$NewColumn <- expadj$b
# fails here with error "Error in setalloccol(y) : can't set ALTREP truelength"
It seems possibly similar to #2990 or #3051 and in my case seems related to having at least 64 columns before the dplyr line. Changing the existing column without using dplyr prevents the bug from occurring.
R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.13.0 dplyr_1.0.2
loaded via a namespace (and not attached):
[1] crayon_1.3.4 R6_2.4.1 lifecycle_0.2.0 magrittr_1.5
[5] pillar_1.4.6 rlang_0.4.7 rstudioapi_0.11 vctrs_0.3.4
[9] generics_0.0.2 ellipsis_0.3.1 tools_4.0.2 glue_1.4.2
[13] purrr_0.3.4 compiler_4.0.2 pkgconfig_2.0.3 tidyselect_1.1.0
[17] tibble_3.0.3
Could you provide an example which uses data.table and base R? dplyr installation takes long time so it is not really good to force readers of your report to install it.
I can reproduce with R 4.0.2 / data.table 1.13.0 / dplyr 1.0.0 / Windows 10.
I cannot reproduce using base methods. However, here is a more minimal dplyr example:
library(dplyr)
library(data.table)
mutate(setDT(as.list(1:64)), V1 = 11)
##Error in .shallow(x, cols = cols, retain.key = TRUE) :
## can't set ALTREP truelength
The traceback indicates that for this, print.data.table is the culprit which means the object output from the mutate call is immediately bad. For example, we can change to print.data.frame and be OK:
print.data.frame(mutate(setDT(as.list(1:64)), V1 = 11))
#> V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21
#> 1 11 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
#> V22 V23 V24 V25 V26 V27 V28 V29 V30 V31 V32 V33 V34 V35 V36 V37 V38 V39 V40
#> 1 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
#> V41 V42 V43 V44 V45 V46 V47 V48 V49 V50 V51 V52 V53 V54 V55 V56 V57 V58 V59
#> 1 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
#> V60 V61 V62 V63 V64
#> 1 60 61 62 63 64
Thus far the requirements to reproduce:
data.table with at least 64 columnsdplyr::mutate call which modifies an existing column in the data.table. Just returned to the project where I first encountered this bug, and wanted to say the code now runs perfectly with data.table 1.13.4. Thanks so much for the fix!
+1
Most helpful comment
I can reproduce with R 4.0.2 / data.table 1.13.0 / dplyr 1.0.0 / Windows 10.
I cannot reproduce using base methods. However, here is a more minimal
dplyrexample:The traceback indicates that for this,
print.data.tableis the culprit which means the object output from themutatecall is immediately bad. For example, we can change toprint.data.frameand be OK:Thus far the requirements to reproduce:
data.tablewith at least 64 columnsdplyr::mutatecall which modifies an existing column in thedata.table.