Running unnest on this nested data.frame crashes R, while running unnest on the nested data.frame (correctly?) gives an error.
library(tidyr)
probdf <-
structure(list(ceref = "AUF080",
jsoninfo = list(structure(list(regulatedActivity = structure(list(
status = c("R", "R", "R", "R", "A", "A", "A", "A"), actType = c(1L, 2L,
2L, 2L, 1L, 2L, 4L, 5L)), .Names = c("status", "actType"
), class = "data.frame", row.names = c(NA, 8L)), effectivePeriodList = list(structure(list(endDate = "Apr 25, 2013 12:00:00 AM",
effectiveDate = "Dec 1, 2009 12:00:00 AM"), .Names = c("endDate",
"effectiveDate"), class = "data.frame", row.names = 1L),
structure(list(endDate = "Apr 25, 2013 12:00:00 AM",
effectiveDate = "Dec 1, 2009 12:00:00 AM"), .Names = c("endDate",
"effectiveDate"), class = "data.frame", row.names = 1L),
structure(list(endDate = "Apr 23, 2013 12:00:00 AM",
effectiveDate = "Feb 4, 2010 12:00:00 AM"), .Names = c("endDate",
"effectiveDate"), class = "data.frame", row.names = 1L),
structure(list(endDate = "Feb 19, 2010 12:00:00 AM",
effectiveDate = "Dec 1, 2009 12:00:00 AM"), .Names = c("endDate",
"effectiveDate"), class = "data.frame", row.names = 1L),
structure(list(endDate = NA, effectiveDate = "Apr 25, 2013 12:00:00 AM"), .Names = c("endDate",
"effectiveDate"), class = "data.frame", row.names = 1L),
structure(list(endDate = NA, effectiveDate = "Apr 25, 2013 12:00:00 AM"), .Names = c("endDate",
"effectiveDate"), class = "data.frame", row.names = 1L),
structure(list(endDate = NA, effectiveDate = "Mar 17, 2015 12:00:00 AM"), .Names = c("endDate",
"effectiveDate"), class = "data.frame", row.names = 1L),
structure(list(endDate = NA, effectiveDate = "Mar 17, 2015 12:00:00 AM"), .Names = c("endDate",
"effectiveDate"), class = "data.frame", row.names = 1L))), .Names = c("regulatedActivity", "effectivePeriodList"), class = "data.frame", row.names = c(NA,
8L)))), .Names = c("ceref", "jsoninfo"), row.names = 5L, class = "data.frame")
unnest(probdf$jsoninfo[[1]])
#> Error: Each variable must be a 1d atomic vector or list.
#> Problem variables: 'regulatedActivity'
### NOT RUN
unnest(probdf)
Session info
devtools::session_info()
#> Session info--------------------------------------------------------------
#> setting value
#> version R version 3.3.3 (2017-03-06)
#> system x86_64, linux-gnu
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> tz Asia/Hong_Kong
#> Packages------------------------------------------------------------------
#> package * version date source
#> assertthat 0.1 2013-12-06 CRAN (R 3.3.1)
#> backports 1.0.5 2017-01-18 CRAN (R 3.3.2)
#> DBI 0.6 2017-03-09 CRAN (R 3.3.3)
#> devtools 1.6.1 2014-10-07 CRAN (R 3.1.1)
#> digest 0.6.10 2016-08-02 CRAN (R 3.3.1)
#> dplyr 0.5.0 2016-06-24 CRAN (R 3.3.1)
#> evaluate 0.10 2016-10-11 CRAN (R 3.3.3)
#> htmltools 0.3.5 2016-03-21 CRAN (R 3.3.1)
#> knitr 1.15.1 2016-11-22 CRAN (R 3.3.3)
#> lazyeval 0.2.0 2016-06-12 CRAN (R 3.3.1)
#> magrittr 1.5 2014-11-22 CRAN (R 3.3.1)
#> R6 2.1.3 2016-08-19 CRAN (R 3.3.1)
#> Rcpp 0.12.7 2016-09-05 CRAN (R 3.3.1)
#> rmarkdown 1.3 2016-12-21 CRAN (R 3.3.2)
#> rprojroot 1.2 2017-01-16 CRAN (R 3.3.2)
#> rstudioapi 0.6 2016-06-27 CRAN (R 3.3.3)
#> stringi 1.1.2 2016-10-01 CRAN (R 3.3.2)
#> stringr 1.1.0 2016-08-19 CRAN (R 3.3.1)
#> tibble 1.2 2016-08-26 CRAN (R 3.3.1)
#> tidyr * 0.6.1 2017-01-10 CRAN (R 3.3.3)
#> yaml 2.1.13 2014-06-12 CRAN (R 3.1.1)
thanks for the report. Could you transform this into a minimal reprex please? dput() dumps are not the most readable form for an example.
It's hard to minimise it much further. I found a slightly simpler structure beyond the convoluted one I gave above that crashes unnest [now the nested data.frame is just made of two data.frames]. The context is that it is a data.frame given by jsonlite::fromJSON. Here I show the simplest json I could find that causes the crash, along with showing that even a small change avoids a crash:
library(tidyr)
library(jsonlite)
probdf <- fromJSON('[{"ceref":"AUF080","jsoninfo":[{"regulatedActivity":{"status":"R","actType":2},"effectivePeriodList":{"endDate":"Apr 25, 2013","effectiveDate":"Dec 1, 2009"}},{"regulatedActivity":{"status":"R","actType":2},"effectivePeriodList":{"endDate":"Apr 23, 2013","effectiveDate":"Feb 4, 2010"}},{"regulatedActivity":{"status":"R","actType":2},"effectivePeriodList":{"endDate":"Feb 19, 2010","effectiveDate":"Dec 1, 2009"}}]}]')
str(probdf)
#> 'data.frame': 1 obs. of 2 variables:
#> $ ceref : chr "AUF080"
#> $ jsoninfo:List of 1
#> ..$ :'data.frame': 3 obs. of 2 variables:
#> .. ..$ regulatedActivity :'data.frame': 3 obs. of 2 variables:
#> .. .. ..$ status : chr "R" "R" "R"
#> .. .. ..$ actType: int 2 2 2
#> .. ..$ effectivePeriodList:'data.frame': 3 obs. of 2 variables:
#> .. .. ..$ endDate : chr "Apr 25, 2013" "Apr 23, 2013" "Feb 19, 2010"
#> .. .. ..$ effectiveDate: chr "Dec 1, 2009" "Feb 4, 2010" "Dec 1, 2009"
# just deleting one row from nested data.frame avoids a crash
probdf12 <- probdf
probdf13 <- probdf
probdf23 <- probdf
probdf12$jsoninfo[[1]] <- probdf12$jsoninfo[[1]][c(1,2),]
probdf13$jsoninfo[[1]] <- probdf13$jsoninfo[[1]][c(1,3),]
probdf23$jsoninfo[[1]] <- probdf23$jsoninfo[[1]][c(2,3),]
str(probdf12)
#> 'data.frame': 1 obs. of 2 variables:
#> $ ceref : chr "AUF080"
#> $ jsoninfo:List of 1
#> ..$ :'data.frame': 2 obs. of 2 variables:
#> .. ..$ regulatedActivity :'data.frame': 2 obs. of 2 variables:
#> .. .. ..$ status : chr "R" "R"
#> .. .. ..$ actType: int 2 2
#> .. ..$ effectivePeriodList:'data.frame': 2 obs. of 2 variables:
#> .. .. ..$ endDate : chr "Apr 25, 2013" "Apr 23, 2013"
#> .. .. ..$ effectiveDate: chr "Dec 1, 2009" "Feb 4, 2010"
unnest(probdf12)
#> ceref regulatedActivity effectivePeriodList
#> 1 AUF080 R, R Apr 25, 2013, Apr 23, 2013
#> 2 AUF080 2, 2 Dec 1, 2009, Feb 4, 2010
unnest(probdf13)
#> ceref regulatedActivity effectivePeriodList
#> 1 AUF080 R, R Apr 25, 2013, Feb 19, 2010
#> 2 AUF080 2, 2 Dec 1, 2009, Dec 1, 2009
unnest(probdf23)
#> ceref regulatedActivity effectivePeriodList
#> 1 AUF080 R, R Apr 23, 2013, Feb 19, 2010
#> 2 AUF080 2, 2 Feb 4, 2010, Dec 1, 2009
### NOT RUN
unnest(probdf) # crashes R
I have a similar problem with some tweet analysis.

I put the offending file here for download https://storage.googleapis.com/mark-edmondson-public-files/tidyr_bug.rds
Reproduce via:
download.file("https://storage.googleapis.com/mark-edmondson-public-files/tidyr_bug.rds", "tidyr_bug.rds")
problem <- readRDS("tidyr_bug.rds")
problem
## A tibble: 5 脳 5
# status_id nlp sentiment_mag sentiment_score entities
#* <chr> <list> <dbl> <dbl> <list>
#1 861213541776449540 <list [5]> 0.1 -0.1 <data.frame [5 脳 6]>
#2 861213390349496320 <list [5]> 0.3 0.3 <data.frame [5 脳 6]>
#3 861211688015732736 <list [5]> 0.0 0.0 <data.frame [5 脳 6]>
#4 861211516426608640 <list [5]> 0.0 0.0 <data.frame [4 脳 6]>
#5 861211458419527680 <list [5]> 0.1 0.1 <data.frame [5 脳 6]>
library(tidyr)
unnest(problem, entities)
# RStudio crashes
> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12.4
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] tidyr_0.6.2
loaded via a namespace (and not attached):
[1] Rcpp_0.12.9 XML_3.98-1.4 assertthat_0.1 digest_0.6.12
[5] withr_1.0.2 mime_0.5 bitops_1.0-6 R6_2.2.0
[9] xtable_1.8-2 magrittr_1.5 httr_1.2.1.9000 googleAuthR_0.5.1.9000
[13] devtools_1.12.0.9000 RJSONIO_1.3-0 tools_3.3.2 RSelenium_1.4.2
[17] RCurl_1.95-4.8 shiny_1.0.0 httpuv_1.3.3 pkgload_0.0.0.9000
[21] pkgbuild_0.0.0.9000 caTools_1.17.1 memoise_1.0.0 htmltools_0.3.5
[25] tibble_1.2
I spotted the same crash. In my case, tibbles within the list-column have identical column names but are of different column types. The following code crashes R:
WARNING 鈿狅笍 CODE BELOW CRASHES R
library(tibble)
library(magrittr)
tib_one <- tibble(a = runif(10), b = runif(10))
tib_two <- tibble(a = runif(10), b = LETTERS[1:10])
tib_lc <- tibble(lc = list(tib_one, tib_two))
tib_lc %>% tidyr::unnest(lc)
> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.5
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] de_DE.UTF-8/de_DE.UTF-8/de_DE.UTF-8/C/de_DE.UTF-8/de_DE.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] purrr_0.2.2.2 tibble_1.3.1 magrittr_1.5
loaded via a namespace (and not attached):
[1] Rcpp_0.12.11 influxdbr_0.11.12 munsell_0.4.3 colorspace_1.3-2 lattice_0.20-35 R6_2.2.1
[7] rlang_0.1.1 httr_1.2.1 plyr_1.8.4 dplyr_0.5.0 tools_3.4.0 xts_0.9-7
[13] grid_3.4.0 gtable_0.2.0 DBI_0.6-1 yaml_2.1.14 lazyeval_0.2.0 assertthat_0.2.0
[19] crayon_1.3.2 ggplot2_2.2.1 tidyr_0.6.3 microbenchmark_1.4-2.1 testthat_1.0.2 curl_2.6
[25] compiler_3.4.0 scales_0.4.1 jsonlite_1.4 zoo_1.8-0
edit: executing the code above on an rstudio-server instance results in an error (instead of crashing):
Error in bind_rows_(x, .id) :
Can not automatically convert from numeric to character in column "b".
So maybe it's OS specific?
That narrows it down a bit for me, as I had duplicate names but within the nest e.g. data.frame(entities = nlp) where names(nlp) = "sentiment", "entities" etc. so perhaps me renaming one column will stop the crash.
I can't replicate any of the crashes with dplyr 0.7.1. Can you please confirm?
Both Mac OS and Debian with dplyr 0.7.1 give now an error (as expected). Thanks!
Error in bind_rows_(x, .id) :
Column `b` can't be converted from numeric to character
I can confirm I get an error instead of a crash now:
> library(tidyr)
> library(jsonlite)
> probdf <- fromJSON('[{"ceref":"AUF080","jsoninfo":[{"regulatedActivity":{"status":"R","actType":2},"effectivePeriodList":{"endDate":"Apr 25, 2013","effectiveDate":"Dec 1, 2009"}},{"regulatedActivity":{"status":"R","actType":2},"effectivePeriodList":{"endDate":"Apr 23, 2013","effectiveDate":"Feb 4, 2010"}},{"regulatedActivity":{"status":"R","actType":2},"effectivePeriodList":{"endDate":"Feb 19, 2010","effectiveDate":"Dec 1, 2009"}}]}]')
> unnest(probdf)
Error in bind_rows_(x, .id) :
Argument 1 can't be a list containing data frames
However, data frames that could be unnested before now cannot:
> probdf12 <- probdf
> probdf12$jsoninfo[[1]] <- probdf12$jsoninfo[[1]][c(1,2),]
> unnest(probdf12)
Error in bind_rows_(x, .id) :
Argument 1 can't be a list containing data frames
In conclusion, the bug I filed is fixed, with some regression.
@slygent: Would you mind filing a new issue with some more context? What is the expected output?
I think I have a similar issue to the one presented in the previous comment.
df <- structure(list(location = "SPARTAN - CITEDEF", measurements = list(
structure(list(parameter = "pm25", value = 18.1, lastUpdated = "2015-04-15T00:00:00.000Z",
unit = "脗碌g/m脗鲁", sourceName = "Spartan", averagingPeriod = structure(list(
unit = "hours", value = 1L), .Names = c("unit", "value"
), class = "data.frame", row.names = 1L)), .Names = c("parameter",
"value", "lastUpdated", "unit", "sourceName", "averagingPeriod"
), class = "data.frame", row.names = 1L))), .Names = c("location",
"measurements"), row.names = c(NA, -1L), class = c("tbl_df",
"tbl", "data.frame"))
df
#> location
#> 1 SPARTAN - CITEDEF
#> measurements
#> 1 pm25, 18.1, 2015-04-15T00:00:00.000Z, 脗碌g/m脗鲁, Spartan, hours, 1
class(df$measurements[[1]])
#> [1] "data.frame"
class(df$measurements[[1]]$averagingPeriod)
#> [1] "data.frame"
tidyr::unnest_(df, "measurements")
#> Error in bind_rows_(x, .id): Argument 6 can't be a list containing data frames
Session info
devtools::session_info()
#> Session info -------------------------------------------------------------
#> setting value
#> version R version 3.4.0 Patched (2017-05-10 r72669)
#> system x86_64, mingw32
#> ui RTerm
#> language (EN)
#> collate English_United States.1252
#> tz Europe/Paris
#> date 2017-07-23
#> Packages -----------------------------------------------------------------
#> package * version date source
#> assertthat 0.2.0 2017-04-11 CRAN (R 3.4.0)
#> backports 1.0.5 2017-01-18 CRAN (R 3.4.0)
#> base * 3.4.0 2017-05-13 local
#> bindr 0.1 2016-11-13 CRAN (R 3.4.1)
#> bindrcpp 0.2 2017-06-17 CRAN (R 3.4.1)
#> compiler 3.4.0 2017-05-13 local
#> datasets * 3.4.0 2017-05-13 local
#> devtools 1.13.2 2017-06-02 CRAN (R 3.4.1)
#> digest 0.6.12 2017-01-27 CRAN (R 3.4.0)
#> dplyr 0.7.1 2017-06-22 CRAN (R 3.4.1)
#> evaluate 0.10 2016-10-11 CRAN (R 3.4.0)
#> glue 1.1.1 2017-06-21 CRAN (R 3.4.1)
#> graphics * 3.4.0 2017-05-13 local
#> grDevices * 3.4.0 2017-05-13 local
#> htmltools 0.3.6 2017-04-28 CRAN (R 3.4.0)
#> knitr 1.16 2017-05-18 CRAN (R 3.4.1)
#> magrittr 1.5 2014-11-22 CRAN (R 3.4.0)
#> memoise 1.1.0 2017-04-21 CRAN (R 3.4.0)
#> methods * 3.4.0 2017-05-13 local
#> pkgconfig 2.0.1 2017-03-21 CRAN (R 3.4.0)
#> R6 2.2.1 2017-05-10 CRAN (R 3.4.0)
#> Rcpp 0.12.11 2017-05-22 CRAN (R 3.4.0)
#> rlang 0.1.1 2017-05-18 CRAN (R 3.4.0)
#> rmarkdown 1.6 2017-06-15 CRAN (R 3.4.1)
#> rprojroot 1.2 2017-01-16 CRAN (R 3.4.0)
#> stats * 3.4.0 2017-05-13 local
#> stringi 1.1.5 2017-04-07 CRAN (R 3.4.0)
#> stringr 1.2.0 2017-02-18 CRAN (R 3.4.0)
#> tibble 1.3.3 2017-05-28 CRAN (R 3.4.0)
#> tidyr 0.6.3 2017-05-15 CRAN (R 3.4.0)
#> tools 3.4.0 2017-05-13 local
#> utils * 3.4.0 2017-05-13 local
#> withr 1.0.2 2016-06-20 CRAN (R 3.4.0)
#> yaml 2.1.14 2016-11-12 CRAN (R 3.4.0)
I get an error because I try to unnest a data.frame with 2 levels of nested data.frames: measurements is a list of data.frames inside df and inside measurements there's a column with nested data.frames.
I'd expect the unnest_(df, "measurements") to give me a data.frame df2 with a list-column averagingPeriod that contains data.frames, and then I'd apply unnest_(df2, "averagingPeriod") on it. Or maybe I'm thinking about it the wrong way. :-)
I'm no longer seeing the crash, so I'm going to close this issue.