Hi
I get the warning:
Invalid .internal.selfref detected and fixed by taking a (shallow) copy of the data.table so that := can add this new column by reference. At an earlier point, this data.table has been copied by R (or was created manually using structure() or similar). Avoid key<-, names<- and attr<- which in R currently (and oddly) may copy the whole data.table. Use set* syntax instead to avoid copying: ?set, ?setnames and ?setattr. If this message doesn't help, please report your use case to the data.table issue tracker so the root cause can be fixed or this message improved.
when creating a new variable by reference, on a data.table that, at least as far as I can tell, is just a normal data.table:
library(dplyr)
library(data.table)
a1 <- data.table(v1=c(1:10), v2=rep('A'))
a2 <- data.table(v1=c(1:10), v2=rep('B'))
a3 <- Reduce(bind_rows,list(a1,a2))
a3[, n_max:=.N, by=v2]
I do not get the same warning when using the '[, n_max:=.N, by=v2]' syntax on either a1 or a2. This does not make any sense, does it?
Thank you for your time,
Emil
UPDATE: Doesn't happen when using rbind() instead.
> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.10
Matrix products: default
BLAS/LAPACK: /opt/intel/compilers_and_libraries_2018.2.199/linux/mkl/lib/intel64_lin/libmkl_rt.so
locale:
[1] LC_CTYPE=en_DK.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_DK.UTF-8 LC_COLLATE=en_DK.UTF-8
[5] LC_MONETARY=en_DK.UTF-8 LC_MESSAGES=en_DK.UTF-8
[7] LC_PAPER=en_DK.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_DK.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.11.8 dplyr_0.7.8
loaded via a namespace (and not attached):
[1] Rcpp_1.0.0 crayon_1.3.4 assertthat_0.2.0 R6_2.3.0
[5] magrittr_1.5 pillar_1.3.1 rlang_0.3.0.1 bindrcpp_0.2.2
[9] glue_1.3.0 purrr_0.2.5 compiler_3.5.1 pkgconfig_2.0.2
[13] colorspace_1.3-2 bindr_0.1.1 tidyselect_0.2.5 tibble_1.4.2
quick fix may be to use rbindlist instead of bind_rows
On Fri, Jan 11, 2019, 7:49 PM Emil Erik Pula Bellamy Begtrup-Bright <
[email protected] wrote:
Hi
I get the warning:
Invalid .internal.selfref detected and fixed by taking a (shallow) copy of
the data.table so that := can add this new column by reference. At an
earlier point, this data.table has been copied by R (or was created
manually using structure() or similar). Avoid key<-, names<- and attr<-
which in R currently (and oddly) may copy the whole data.table. Use set*
syntax instead to avoid copying: ?set, ?setnames and ?setattr. If this
message doesn't help, please report your use case to the data.table issue
tracker so the root cause can be fixed or this message improved.when creating a new variable by reference, on a data.table that, at least
as far as I can tell, is just a normal data.table:a1 <- data.table(v1=c(1:10), v2=rep('A'))
a2 <- data.table(v1=c(1:10), v2=rep('B'))
a3 <- Reduce(bind_rows,list(a1,a2))
a3[, n_max:=.N, by=v2]I do not get the same warning when using the '[, n_max:=.N, by=v2]' syntax
on either a1 or a2. This does not make any sense, does it?Thank you for your time,
EmilsessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.10Matrix products: default
BLAS/LAPACK: /opt/intel/compilers_and_libraries_2018.2.199/linux/mkl/lib/intel64_lin/libmkl_rt.solocale:
[1] LC_CTYPE=en_DK.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_DK.UTF-8 LC_COLLATE=en_DK.UTF-8
[5] LC_MONETARY=en_DK.UTF-8 LC_MESSAGES=en_DK.UTF-8
[7] LC_PAPER=en_DK.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_DK.UTF-8 LC_IDENTIFICATION=Cattached base packages:
[1] stats graphics grDevices utils datasets methods baseother attached packages:
[1] openxlsx_4.1.0 colorout_1.2-0 wrapr_1.8.0
[4] data.table_1.11.8 fst_0.8.10 writexl_1.1
[7] readxl_1.1.0 feather_0.3.1 haven_2.0.0
[10] languageserver_0.2.5 rmarkdown_1.11.3 knitr_1.21
[13] scales_1.0.0 usethis_1.4.0 devtools_2.0.1
[16] viridis_0.5.1 viridisLite_0.3.0 ggthemes_4.0.1
[19] RColorBrewer_1.1-2 plyr_1.8.4 forcats_0.3.0
[22] stringr_1.3.1 dplyr_0.7.8 purrr_0.2.5
[25] readr_1.3.0 tidyr_0.8.2 tibble_1.4.2
[28] ggplot2_3.1.0 tidyverse_1.2.1loaded via a namespace (and not attached):
[1] httr_1.4.0 pkgload_1.0.2 jsonlite_1.6 modelr_0.1.2
[5] assertthat_0.2.0 cellranger_1.1.0 remotes_2.0.2 sessioninfo_1.1.1
[9] pillar_1.3.1 backports_1.1.3 lattice_0.20-38 glue_1.3.0
[13] digest_0.6.18 rvest_0.3.2 colorspace_1.3-2 htmltools_0.3.6
[17] pkgconfig_2.0.2 broom_0.5.1 processx_3.2.1 generics_0.0.2
[21] withr_2.1.2 lazyeval_0.2.1 cli_1.0.1 magrittr_1.5
[25] crayon_1.3.4 memoise_1.1.0 evaluate_0.12 ps_1.2.1
[29] fs_1.2.6 nlme_3.1-137 eliter_1.0 xml2_1.2.0
[33] pkgbuild_1.0.2 tools_3.5.1 prettyunits_1.0.2 hms_0.4.2
[37] munsell_0.5.0 zip_1.0.0 bindrcpp_0.2.2 callr_3.1.0
[41] compiler_3.5.1 rlang_0.3.0.1 grid_3.5.1 rstudioapi_0.8
[45] testthat_2.0.1 gtable_0.2.0 R6_2.3.0 gridExtra_2.3
[49] lubridate_1.7.4 bindr_0.1.1 rprojroot_1.3-2 desc_1.2.0
[53] stringi_1.2.4 parallel_3.5.1 Rcpp_1.0.0 tidyselect_0.2.5
[57] xfun_0.4—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/Rdatatable/data.table/issues/3274, or mute the thread
https://github.com/notifications/unsubscribe-auth/AHQQdRdcTjm6UdUf5UEdZAUlV93HHVOKks5vCHpFgaJpZM4Z7UR4
.
Please make your code reproducible, including calls to attach required libraries, to avoid errors like
Error in match.fun(f) : object 'bind_rows' not found
oh sorry, yes. Forgot about the libraries-part. Done.
Michael, yes thank you. It just seems strange that it does not work.
Using bind_rows loses over allocation. That results into warning later on when using :=. You can track that using truelength function.
library(dplyr)
library(data.table)
a1 <- data.table(v1=c(1:10), v2=rep('A'))
a2 <- data.table(v1=c(1:10), v2=rep('B'))
a3 <- Reduce(bind_rows,list(a1,a2))
a4 <- Reduce(rbind,list(a1,a2))
a5 <- rbindlist(list(a1,a2))
truelength(a3)
#[1] 0
truelength(a4)
#[1] 1026
truelength(a5)
#[1] 1026
I suggest to use rbindlist instead.
Thank you for the clarification. Kind of unexpected behaviour, but not on data.table's side, I gather.
@emilBeBri you might raise this over at dplyr, not sure it's something they'll fix though.
You could also use alloc.col or setDT on the result of bind_rows and that should help as well, if you're married to using bind_rows.
But yes, canonical approach is to use rbindlist, and in fact this should be more efficient than bind_rows anyway :)
probably more appropriate to raise on dtplyr: https://github.com/hadley/dtplyr
That project looks a bit dead unfortunately ☠️ no updates in 2 years+
it was just not actively maintained, but issues like this are good reasons to re-activate that project
Allright, I have done so.
Most helpful comment
it was just not actively maintained, but issues like this are good reasons to re-activate that project