Data.table: segfault unlisting a nested data.frame

Created on 16 Jun 2018  路  4Comments  路  Source: Rdatatable/data.table

I was trying to provide a data.table solution to this SO question: https://stackoverflow.com/questions/50881925/r-expand-nested-dataframe-into-parent but instead caused a segfault that forced me to exit R altogether.

Reproducible data and actions:

id <- c(1551, 1033, 1061, 1262, 1032, 1896, 1080, 1099, 1679, 1690)
fname <- list("Jack","Yogesh","Steven","Richard","Thomas","Craig","David","Aman","Frank","Robert")
mname <- list("B",NULL,"J","I","E","A","R",NULL,"J","E")

sub <- as.data.frame(cbind(fname, mname))
master <- as.data.frame(id)
master$personalInfo <- sub
setDT(master)

And the action that caused the segfault:

master[, unlist(personalInfo), by = id]

# Output of sessionInfo()

> sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.7.0
LAPACK: /usr/lib/lapack/liblapack.so.3.7.0

locale:
 [1] LC_CTYPE=es_CO.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=es_CO.UTF-8        LC_COLLATE=es_CO.UTF-8    
 [5] LC_MONETARY=es_CO.UTF-8    LC_MESSAGES=es_CO.UTF-8   
 [7] LC_PAPER=es_CO.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=es_CO.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] lubridate_1.7.4   data.table_1.11.2

loaded via a namespace (and not attached):
[1] compiler_3.4.4 magrittr_1.5   tools_3.4.4    Rcpp_0.12.16   stringi_1.2.2 
[6] stringr_1.3.1 

bug segfault

Most helpful comment

Not a segfault anymore... #3770 would have given the error at setDT. Will tag this to close with that PR

All 4 comments

thanks for report, reproduced on 1.11.5

Thanks @jangorecki . I really don't know if it's a valid expectation for a data.table to work with nested data frames. When I try to view the data.table, it gives a gentle error:

> master
Error in FUN(X[[i]], ...) : 
  Invalid column: it has dimensions. Can't format it. If it's the result of data.table(table()), use as.data.table(table()) instead.

perhaps it'd be better if it complained at the time of setting it to a DT? setDT()?

I've been thinking we should fix that "Invalid column: it has dimensions" error...

it's not very helpful (doesn't name which column, and never have I encountered it as a "result of data.table(table())" -- usually it's because a data.frame had a data.frame as a column. we could offer better alternatives for the most common cases...

Not a segfault anymore... #3770 would have given the error at setDT. Will tag this to close with that PR

Was this page helpful?
0 / 5 - 0 ratings