I was trying to provide a data.table solution to this SO question: https://stackoverflow.com/questions/50881925/r-expand-nested-dataframe-into-parent but instead caused a segfault that forced me to exit R altogether.
Reproducible data and actions:
id <- c(1551, 1033, 1061, 1262, 1032, 1896, 1080, 1099, 1679, 1690)
fname <- list("Jack","Yogesh","Steven","Richard","Thomas","Craig","David","Aman","Frank","Robert")
mname <- list("B",NULL,"J","I","E","A","R",NULL,"J","E")
sub <- as.data.frame(cbind(fname, mname))
master <- as.data.frame(id)
master$personalInfo <- sub
setDT(master)
And the action that caused the segfault:
master[, unlist(personalInfo), by = id]
# Output of sessionInfo()
> sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)
Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.7.0
LAPACK: /usr/lib/lapack/liblapack.so.3.7.0
locale:
[1] LC_CTYPE=es_CO.UTF-8 LC_NUMERIC=C
[3] LC_TIME=es_CO.UTF-8 LC_COLLATE=es_CO.UTF-8
[5] LC_MONETARY=es_CO.UTF-8 LC_MESSAGES=es_CO.UTF-8
[7] LC_PAPER=es_CO.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=es_CO.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] lubridate_1.7.4 data.table_1.11.2
loaded via a namespace (and not attached):
[1] compiler_3.4.4 magrittr_1.5 tools_3.4.4 Rcpp_0.12.16 stringi_1.2.2
[6] stringr_1.3.1
thanks for report, reproduced on 1.11.5
Thanks @jangorecki . I really don't know if it's a valid expectation for a data.table to work with nested data frames. When I try to view the data.table, it gives a gentle error:
> master
Error in FUN(X[[i]], ...) :
Invalid column: it has dimensions. Can't format it. If it's the result of data.table(table()), use as.data.table(table()) instead.
perhaps it'd be better if it complained at the time of setting it to a DT? setDT()?
I've been thinking we should fix that "Invalid column: it has dimensions" error...
it's not very helpful (doesn't name which column, and never have I encountered it as a "result of data.table(table())" -- usually it's because a data.frame had a data.frame as a column. we could offer better alternatives for the most common cases...
Not a segfault anymore... #3770 would have given the error at setDT. Will tag this to close with that PR
Most helpful comment
Not a segfault anymore... #3770 would have given the error at
setDT. Will tag this to close with that PR