When a dataframe contains a list-column with a subclass, unnest() throws an error, when trying to unnest a regular list of dataframes column.
library(tidyr)
library(tibble)
df <-
tibble(
x = 1:2,
y = list(
tibble(a = 1, b = 2),
tibble(a = 1:3, b = 3:1)
),
z = list(
c(2, 5),
c(3, 6)
)
)
class(df$z) <- c("custom_subclass", "list")
unnest(df, y)
#> Error: Can't slice a scalar
This used to work with tidyr \< v1.0 (and still does with unnest_legacy()) and I don't understand why it shouldn't anymore:
unnest_legacy(df, y, .drop = FALSE)
#> # A tibble: 4 x 4
#> x z a b
#> <int> <list> <dbl> <dbl>
#> 1 1 <dbl [2]> 1 2
#> 2 2 <dbl [2]> 1 3
#> 3 2 <dbl [2]> 2 2
#> 4 2 <dbl [2]> 3 1
Is this intentional (and if it is, what am I missing?) or is it a bug?
As far as I can tell this is due to vctrs::vec_slice() refusing to slice a dataframe with said custom subclass list column.
vctrs::vec_slice(df, 1)
#> Error: Can't slice a scalar
FWIW, the same problem occurs when z has a custom class (which is still of type list)
class(df$z) <- c("custom_class")
typeof(df$z)
#> [1] "list"
unnest(df, y)
#> Error: Can't slice a scalar
devtools::session_info()
#> - Session info ----------------------------------------------------------
#> setting value
#> version R version 3.6.1 (2019-07-05)
#> os Windows 10 x64
#> system x86_64, mingw32
#> ui RTerm
#> language en
#> collate German_Germany.1252
#> ctype German_Germany.1252
#> tz Europe/Berlin
#> date 2019-09-24
#>
#> - Packages --------------------------------------------------------------
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.1)
#> backports 1.1.4 2019-04-10 [1] CRAN (R 3.6.0)
#> callr 3.3.2 2019-09-22 [1] CRAN (R 3.6.1)
#> cli 1.1.0 2019-03-19 [1] CRAN (R 3.6.1)
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.1)
#> desc 1.2.0 2018-05-01 [1] CRAN (R 3.6.1)
#> devtools 2.2.0 2019-09-07 [1] CRAN (R 3.6.1)
#> digest 0.6.21 2019-09-20 [1] CRAN (R 3.6.1)
#> dplyr 0.8.3 2019-07-04 [1] CRAN (R 3.6.1)
#> DT 0.9 2019-09-17 [1] CRAN (R 3.6.1)
#> ellipsis 0.3.0 2019-09-20 [1] CRAN (R 3.6.1)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 3.6.1)
#> fansi 0.4.0 2018-10-05 [1] CRAN (R 3.6.1)
#> fs 1.3.1 2019-05-06 [1] CRAN (R 3.6.1)
#> glue 1.3.1 2019-03-12 [1] CRAN (R 3.6.1)
#> highr 0.8 2019-03-20 [1] CRAN (R 3.6.1)
#> htmltools 0.3.6 2017-04-28 [1] CRAN (R 3.6.1)
#> htmlwidgets 1.3 2018-09-30 [1] CRAN (R 3.6.1)
#> knitr 1.25 2019-09-18 [1] CRAN (R 3.6.1)
#> lifecycle 0.1.0 2019-08-01 [1] CRAN (R 3.6.1)
#> magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.1)
#> memoise 1.1.0 2017-04-21 [1] CRAN (R 3.6.1)
#> pillar 1.4.2 2019-06-29 [1] CRAN (R 3.6.1)
#> pkgbuild 1.0.5 2019-08-26 [1] CRAN (R 3.6.1)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 3.6.1)
#> pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.6.1)
#> prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.6.1)
#> processx 3.4.1 2019-07-18 [1] CRAN (R 3.6.1)
#> ps 1.3.0 2018-12-21 [1] CRAN (R 3.6.1)
#> purrr 0.3.2 2019-03-15 [1] CRAN (R 3.6.1)
#> R6 2.4.0 2019-02-14 [1] CRAN (R 3.6.1)
#> Rcpp 1.0.2 2019-07-25 [1] CRAN (R 3.6.1)
#> remotes 2.1.0.9000 2019-07-22 [1] Github (r-lib/remotes@6e9eaa9)
#> rlang 0.4.0 2019-06-25 [1] CRAN (R 3.6.1)
#> rmarkdown 1.15 2019-08-21 [1] CRAN (R 3.6.1)
#> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.6.1)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.1)
#> stringi 1.4.3 2019-03-12 [1] CRAN (R 3.6.0)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 3.6.1)
#> testthat 2.2.1 2019-07-25 [1] CRAN (R 3.6.1)
#> tibble * 2.1.3 2019-06-06 [1] CRAN (R 3.6.1)
#> tidyr * 1.0.0 2019-09-11 [1] CRAN (R 3.6.1)
#> tidyselect 0.2.5 2018-10-11 [1] CRAN (R 3.6.1)
#> usethis 1.5.1 2019-07-04 [1] CRAN (R 3.6.1)
#> utf8 1.1.4 2018-05-24 [1] CRAN (R 3.6.1)
#> vctrs 0.2.0 2019-07-05 [1] CRAN (R 3.6.1)
#> withr 2.1.2 2018-03-15 [1] CRAN (R 3.6.1)
#> xfun 0.9 2019-08-21 [1] CRAN (R 3.6.1)
#> yaml 2.2.0 2018-07-25 [1] CRAN (R 3.6.0)
#> zeallot 0.1.0 2018-01-28 [1] CRAN (R 3.6.1)
#>
#> [1] C:/Users/daniel/Documents/.R/win-library
#> [2] C:/Program Files/R/R-3.6.1/library
This is obviously a bug
Or is a vec_proxy() method required? If I remember right we treat subclassed lists as scalars to prevent slicing of things like lm objects.
Maybe if it directly subclasses "list", rather than implicitly inheriting from it, it should be treated as a vector?
library(tidyr)
library(tibble)
library(vctrs)
df <- tibble(
x = 1,
y = list(tibble(a = 1)),
z = list(1)
)
class(df$z) <- c("custom_subclass", "list")
vec_slice(df$z, 1)
#> Expected a vector, not a `custom_subclass/list` object
unnest(df, y)
#> Error: Can't slice a scalar
vec_proxy.custom_subclass <- function(x, ...) {
unclass(x)
}
vec_slice(df$z, 1)
#> [[1]]
#> [1] 1
#>
#> attr(,"class")
#> [1] "custom_subclass" "list"
unnest(df, y)
#> # A tibble: 1 x 3
#> x a z
#> <dbl> <dbl> <list>
#> 1 1 1 <dbl [1]>
Created on 2019-09-24 by the reprex package (v0.2.1)
I tried the vec_proxy() method in my setting before opening this issue and it did not work. Don't really know why, because I'd say I did the same things you did here, @DavisVaughan, but apparently not. Will look into it more tomorrow.
But on a larger scale this would mean that one would only be able to use unnest() if one has full control of custom_subclass, right? (Disregarding defining the vec_proxy() method manually every time you want to use unnest).
Do I understand @hadley's comment "presence of the list class as a signal that the object is a vector" correctly to mean that a vec_proxy() method wouldn't be required in the package owning the custom_subclass?
Correct. We are starting to think that if you _explicitly_ inherit from "list" as you did above with "custom_subclass", then your example above should work without requiring a vec_proxy() method. The changes for this would be made in vctrs.
An _implicit_ inheritance from list would be something like an lm object. These would still be treated as scalar objects by default.
class(lm(1 ~ 1))
#> [1] "lm"
is.list(lm(1 ~ 1))
#> [1] TRUE
This might be known/obvious to you, but this bug extends to (character) vector columns with subclasses (so not only list() columns and not only custom subclasses).
library(tidyr)
library(glue)
g <- glue("some glue string")
class(g)
#> [1] "glue" "character"
df <- tibble(
x = 1:2,
y = list(
tibble(a = 1, b = 2, g = g),
tibble(a = 1:3, b = 3:1, g = rep.int(g, 3))
)
)
df %>% unnest(y)
#> Only bare vectors have shapes.
devtools::session_info()
#> - Session info ----------------------------------------------------------
#> setting value
#> version R version 3.6.1 (2019-07-05)
#> os Windows 10 x64
#> system x86_64, mingw32
#> ui RTerm
#> language en
#> collate German_Germany.1252
#> ctype German_Germany.1252
#> tz Europe/Berlin
#> date 2019-10-10
#>
#> - Packages --------------------------------------------------------------
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.1)
#> backports 1.1.5 2019-10-02 [1] CRAN (R 3.6.1)
#> callr 3.3.2 2019-09-22 [1] CRAN (R 3.6.1)
#> cli 1.1.0 2019-03-19 [1] CRAN (R 3.6.1)
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.1)
#> desc 1.2.0 2018-05-01 [1] CRAN (R 3.6.1)
#> devtools 2.2.1 2019-09-24 [1] CRAN (R 3.6.1)
#> digest 0.6.21 2019-09-20 [1] CRAN (R 3.6.1)
#> dplyr 0.8.3 2019-07-04 [1] CRAN (R 3.6.1)
#> ellipsis 0.3.0 2019-09-20 [1] CRAN (R 3.6.1)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 3.6.1)
#> fs 1.3.1 2019-05-06 [1] CRAN (R 3.6.1)
#> glue * 1.3.1 2019-03-12 [1] CRAN (R 3.6.1)
#> highr 0.8 2019-03-20 [1] CRAN (R 3.6.1)
#> htmltools 0.4.0 2019-10-04 [1] CRAN (R 3.6.1)
#> knitr 1.25 2019-09-18 [1] CRAN (R 3.6.1)
#> lifecycle 0.1.0 2019-08-01 [1] CRAN (R 3.6.1)
#> magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.1)
#> memoise 1.1.0 2017-04-21 [1] CRAN (R 3.6.1)
#> pillar 1.4.2 2019-06-29 [1] CRAN (R 3.6.1)
#> pkgbuild 1.0.6 2019-10-09 [1] CRAN (R 3.6.1)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 3.6.1)
#> pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.6.1)
#> prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.6.1)
#> processx 3.4.1 2019-07-18 [1] CRAN (R 3.6.1)
#> ps 1.3.0 2018-12-21 [1] CRAN (R 3.6.1)
#> purrr 0.3.2 2019-03-15 [1] CRAN (R 3.6.1)
#> R6 2.4.0 2019-02-14 [1] CRAN (R 3.6.1)
#> Rcpp 1.0.2 2019-07-25 [1] CRAN (R 3.6.1)
#> remotes 2.1.0.9000 2019-07-22 [1] Github (r-lib/remotes@6e9eaa9)
#> rlang 0.4.0.9004 2019-10-10 [1] Github (r-lib/rlang@a7d8177)
#> rmarkdown 1.16 2019-10-01 [1] CRAN (R 3.6.1)
#> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.6.1)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.1)
#> stringi 1.4.3 2019-03-12 [1] CRAN (R 3.6.0)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 3.6.1)
#> testthat 2.2.1 2019-07-25 [1] CRAN (R 3.6.1)
#> tibble 2.1.3 2019-06-06 [1] CRAN (R 3.6.1)
#> tidyr * 1.0.0 2019-09-11 [1] CRAN (R 3.6.1)
#> tidyselect 0.2.5 2018-10-11 [1] CRAN (R 3.6.1)
#> usethis 1.5.1 2019-07-04 [1] CRAN (R 3.6.1)
#> vctrs 0.2.0.9005 2019-10-10 [1] Github (r-lib/vctrs@1a96680)
#> withr 2.1.2 2018-03-15 [1] CRAN (R 3.6.1)
#> xfun 0.10 2019-10-01 [1] CRAN (R 3.6.1)
#> yaml 2.2.0 2018-07-25 [1] CRAN (R 3.6.0)
#> zeallot 0.1.0 2018-01-28 [1] CRAN (R 3.6.1)
#>
#> [1] C:/Users/daniel/Documents/.R/win-library
#> [2] C:/Program Files/R/R-3.6.1/library
We have seen that one here: https://github.com/r-lib/vctrs/issues/497
This in particular is different problem, but I do wonder if there are any advantages / disadvantages about letting a glue string be a vctrs_vctr with extends_type = TRUE set in new_vctr() as discussed in https://github.com/r-lib/vctrs/pull/593#discussion_r331987167
I uncountered a similar problem when trying to convert the result of a query to the github API with {gh}. Adding it here in case somebody search for issues with unnest_wider().
unnest_wider() fails as long as the column has a custom classlibrary(gh)
library(dplyr)
repos <- gh("/users/:username/repos", username = "jeroen")
tibble(repos = repo_api) %>%
purrr::walk(~print(class(.))) %>%
tidyr::unnest_wider(col = repos)
[1] "gh_response" "list" # repos column has a custom class
Error: Can't slice a scalar
unnest_wider() works as soon as the column is back to default list classLooks like using head() or dplyr::slice() will convert the column back to a default list class.
tibble(repos = repo_api) %>%
purrr::walk(~print(class(.))) %>%
slice(1:length(repo_api)) %>%
purrr::walk(~print(class(.))) %>%
tidyr::unnest_wider(col = repos) %>%
glimpse()
[1] "gh_response" "list" # before head()
[1] "list" # after head(), repos column lost its custom class
Observations: 30
Variables: 73
$ id <int> 130482052, 9…
$ node_id <chr> "MDEwOlJlcG9…
$ name <chr> "2018.erum.i…
$ full_name <chr> "jeroen/2018…
$ private <lgl> FALSE, FALSE…
$ owner <list> [["jeroen",…
$ html_url <chr> "https://git…
$ description <chr> "Homepage of…
... more columns ...
The issue with classes based on atomic vectors is fixed with dev tidyr (via dev vctrs).
The issue with classes based on lists is tracked at https://github.com/r-lib/vctrs/issues/666
I'm going to close this issue because it's tracked in vctrs, it'll be fixed in the next vctrs release, and the next tidyr release will bump the version of the vctrs dependency.