Tidyr: unnest errors when df contains a column with subclass or custom class

Created on 24 Sep 2019  ·  10Comments  ·  Source: tidyverse/tidyr

When a dataframe contains a list-column with a subclass, unnest() throws an error, when trying to unnest a regular list of dataframes column.

library(tidyr)
library(tibble)

df <-
  tibble(
    x = 1:2,
    y = list(
      tibble(a = 1, b = 2),
      tibble(a = 1:3, b = 3:1)
    ),
    z = list(
      c(2, 5),
      c(3, 6)
    )
  )

class(df$z) <- c("custom_subclass", "list")

unnest(df, y)
#> Error: Can't slice a scalar

This used to work with tidyr \< v1.0 (and still does with unnest_legacy()) and I don't understand why it shouldn't anymore:

unnest_legacy(df, y, .drop = FALSE)
#> # A tibble: 4 x 4
#>       x z             a     b
#>   <int> <list>    <dbl> <dbl>
#> 1     1 <dbl [2]>     1     2
#> 2     2 <dbl [2]>     1     3
#> 3     2 <dbl [2]>     2     2
#> 4     2 <dbl [2]>     3     1

Is this intentional (and if it is, what am I missing?) or is it a bug?

As far as I can tell this is due to vctrs::vec_slice() refusing to slice a dataframe with said custom subclass list column.

vctrs::vec_slice(df, 1)
#> Error: Can't slice a scalar

FWIW, the same problem occurs when z has a custom class (which is still of type list)

class(df$z) <- c("custom_class")

typeof(df$z)
#> [1] "list"

unnest(df, y)
#> Error: Can't slice a scalar

Session info

devtools::session_info()
#> - Session info ----------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.6.1 (2019-07-05)
#>  os       Windows 10 x64              
#>  system   x86_64, mingw32             
#>  ui       RTerm                       
#>  language en                          
#>  collate  German_Germany.1252         
#>  ctype    German_Germany.1252         
#>  tz       Europe/Berlin               
#>  date     2019-09-24                  
#> 
#> - Packages --------------------------------------------------------------
#>  package     * version    date       lib source                        
#>  assertthat    0.2.1      2019-03-21 [1] CRAN (R 3.6.1)                
#>  backports     1.1.4      2019-04-10 [1] CRAN (R 3.6.0)                
#>  callr         3.3.2      2019-09-22 [1] CRAN (R 3.6.1)                
#>  cli           1.1.0      2019-03-19 [1] CRAN (R 3.6.1)                
#>  crayon        1.3.4      2017-09-16 [1] CRAN (R 3.6.1)                
#>  desc          1.2.0      2018-05-01 [1] CRAN (R 3.6.1)                
#>  devtools      2.2.0      2019-09-07 [1] CRAN (R 3.6.1)                
#>  digest        0.6.21     2019-09-20 [1] CRAN (R 3.6.1)                
#>  dplyr         0.8.3      2019-07-04 [1] CRAN (R 3.6.1)                
#>  DT            0.9        2019-09-17 [1] CRAN (R 3.6.1)                
#>  ellipsis      0.3.0      2019-09-20 [1] CRAN (R 3.6.1)                
#>  evaluate      0.14       2019-05-28 [1] CRAN (R 3.6.1)                
#>  fansi         0.4.0      2018-10-05 [1] CRAN (R 3.6.1)                
#>  fs            1.3.1      2019-05-06 [1] CRAN (R 3.6.1)                
#>  glue          1.3.1      2019-03-12 [1] CRAN (R 3.6.1)                
#>  highr         0.8        2019-03-20 [1] CRAN (R 3.6.1)                
#>  htmltools     0.3.6      2017-04-28 [1] CRAN (R 3.6.1)                
#>  htmlwidgets   1.3        2018-09-30 [1] CRAN (R 3.6.1)                
#>  knitr         1.25       2019-09-18 [1] CRAN (R 3.6.1)                
#>  lifecycle     0.1.0      2019-08-01 [1] CRAN (R 3.6.1)                
#>  magrittr      1.5        2014-11-22 [1] CRAN (R 3.6.1)                
#>  memoise       1.1.0      2017-04-21 [1] CRAN (R 3.6.1)                
#>  pillar        1.4.2      2019-06-29 [1] CRAN (R 3.6.1)                
#>  pkgbuild      1.0.5      2019-08-26 [1] CRAN (R 3.6.1)                
#>  pkgconfig     2.0.3      2019-09-22 [1] CRAN (R 3.6.1)                
#>  pkgload       1.0.2      2018-10-29 [1] CRAN (R 3.6.1)                
#>  prettyunits   1.0.2      2015-07-13 [1] CRAN (R 3.6.1)                
#>  processx      3.4.1      2019-07-18 [1] CRAN (R 3.6.1)                
#>  ps            1.3.0      2018-12-21 [1] CRAN (R 3.6.1)                
#>  purrr         0.3.2      2019-03-15 [1] CRAN (R 3.6.1)                
#>  R6            2.4.0      2019-02-14 [1] CRAN (R 3.6.1)                
#>  Rcpp          1.0.2      2019-07-25 [1] CRAN (R 3.6.1)                
#>  remotes       2.1.0.9000 2019-07-22 [1] Github (r-lib/remotes@6e9eaa9)
#>  rlang         0.4.0      2019-06-25 [1] CRAN (R 3.6.1)                
#>  rmarkdown     1.15       2019-08-21 [1] CRAN (R 3.6.1)                
#>  rprojroot     1.3-2      2018-01-03 [1] CRAN (R 3.6.1)                
#>  sessioninfo   1.1.1      2018-11-05 [1] CRAN (R 3.6.1)                
#>  stringi       1.4.3      2019-03-12 [1] CRAN (R 3.6.0)                
#>  stringr       1.4.0      2019-02-10 [1] CRAN (R 3.6.1)                
#>  testthat      2.2.1      2019-07-25 [1] CRAN (R 3.6.1)                
#>  tibble      * 2.1.3      2019-06-06 [1] CRAN (R 3.6.1)                
#>  tidyr       * 1.0.0      2019-09-11 [1] CRAN (R 3.6.1)                
#>  tidyselect    0.2.5      2018-10-11 [1] CRAN (R 3.6.1)                
#>  usethis       1.5.1      2019-07-04 [1] CRAN (R 3.6.1)                
#>  utf8          1.1.4      2018-05-24 [1] CRAN (R 3.6.1)                
#>  vctrs         0.2.0      2019-07-05 [1] CRAN (R 3.6.1)                
#>  withr         2.1.2      2018-03-15 [1] CRAN (R 3.6.1)                
#>  xfun          0.9        2019-08-21 [1] CRAN (R 3.6.1)                
#>  yaml          2.2.0      2018-07-25 [1] CRAN (R 3.6.0)                
#>  zeallot       0.1.0      2018-01-28 [1] CRAN (R 3.6.1)                
#> 
#> [1] C:/Users/daniel/Documents/.R/win-library
#> [2] C:/Program Files/R/R-3.6.1/library

bug vctrs ↗️

All 10 comments

This is obviously a bug

Or is a vec_proxy() method required? If I remember right we treat subclassed lists as scalars to prevent slicing of things like lm objects.

Maybe if it directly subclasses "list", rather than implicitly inheriting from it, it should be treated as a vector?

library(tidyr)
library(tibble)
library(vctrs)

df <- tibble(
  x = 1,
  y = list(tibble(a = 1)),
  z = list(1)
)

class(df$z) <- c("custom_subclass", "list")

vec_slice(df$z, 1)
#> Expected a vector, not a `custom_subclass/list` object

unnest(df, y)
#> Error: Can't slice a scalar

vec_proxy.custom_subclass <- function(x, ...) {
  unclass(x)
}

vec_slice(df$z, 1)
#> [[1]]
#> [1] 1
#> 
#> attr(,"class")
#> [1] "custom_subclass" "list"

unnest(df, y)
#> # A tibble: 1 x 3
#>       x     a z        
#>   <dbl> <dbl> <list>   
#> 1     1     1 <dbl [1]>

Created on 2019-09-24 by the reprex package (v0.2.1)

I tried the vec_proxy() method in my setting before opening this issue and it did not work. Don't really know why, because I'd say I did the same things you did here, @DavisVaughan, but apparently not. Will look into it more tomorrow.

But on a larger scale this would mean that one would only be able to use unnest() if one has full control of custom_subclass, right? (Disregarding defining the vec_proxy() method manually every time you want to use unnest).

Do I understand @hadley's comment "presence of the list class as a signal that the object is a vector" correctly to mean that a vec_proxy() method wouldn't be required in the package owning the custom_subclass?

Correct. We are starting to think that if you _explicitly_ inherit from "list" as you did above with "custom_subclass", then your example above should work without requiring a vec_proxy() method. The changes for this would be made in vctrs.

An _implicit_ inheritance from list would be something like an lm object. These would still be treated as scalar objects by default.

class(lm(1 ~ 1))
#> [1] "lm"

is.list(lm(1 ~ 1))
#> [1] TRUE

This might be known/obvious to you, but this bug extends to (character) vector columns with subclasses (so not only list() columns and not only custom subclasses).

library(tidyr)
library(glue)

g <- glue("some glue string")
class(g)
#> [1] "glue"      "character"

df <- tibble(
  x = 1:2,
  y = list(
    tibble(a = 1, b = 2, g = g),
    tibble(a = 1:3, b = 3:1, g = rep.int(g, 3))
  )
)

df %>% unnest(y)
#> Only bare vectors have shapes.

Session info

devtools::session_info()
#> - Session info ----------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.6.1 (2019-07-05)
#>  os       Windows 10 x64              
#>  system   x86_64, mingw32             
#>  ui       RTerm                       
#>  language en                          
#>  collate  German_Germany.1252         
#>  ctype    German_Germany.1252         
#>  tz       Europe/Berlin               
#>  date     2019-10-10                  
#> 
#> - Packages --------------------------------------------------------------
#>  package     * version    date       lib source                        
#>  assertthat    0.2.1      2019-03-21 [1] CRAN (R 3.6.1)                
#>  backports     1.1.5      2019-10-02 [1] CRAN (R 3.6.1)                
#>  callr         3.3.2      2019-09-22 [1] CRAN (R 3.6.1)                
#>  cli           1.1.0      2019-03-19 [1] CRAN (R 3.6.1)                
#>  crayon        1.3.4      2017-09-16 [1] CRAN (R 3.6.1)                
#>  desc          1.2.0      2018-05-01 [1] CRAN (R 3.6.1)                
#>  devtools      2.2.1      2019-09-24 [1] CRAN (R 3.6.1)                
#>  digest        0.6.21     2019-09-20 [1] CRAN (R 3.6.1)                
#>  dplyr         0.8.3      2019-07-04 [1] CRAN (R 3.6.1)                
#>  ellipsis      0.3.0      2019-09-20 [1] CRAN (R 3.6.1)                
#>  evaluate      0.14       2019-05-28 [1] CRAN (R 3.6.1)                
#>  fs            1.3.1      2019-05-06 [1] CRAN (R 3.6.1)                
#>  glue        * 1.3.1      2019-03-12 [1] CRAN (R 3.6.1)                
#>  highr         0.8        2019-03-20 [1] CRAN (R 3.6.1)                
#>  htmltools     0.4.0      2019-10-04 [1] CRAN (R 3.6.1)                
#>  knitr         1.25       2019-09-18 [1] CRAN (R 3.6.1)                
#>  lifecycle     0.1.0      2019-08-01 [1] CRAN (R 3.6.1)                
#>  magrittr      1.5        2014-11-22 [1] CRAN (R 3.6.1)                
#>  memoise       1.1.0      2017-04-21 [1] CRAN (R 3.6.1)                
#>  pillar        1.4.2      2019-06-29 [1] CRAN (R 3.6.1)                
#>  pkgbuild      1.0.6      2019-10-09 [1] CRAN (R 3.6.1)                
#>  pkgconfig     2.0.3      2019-09-22 [1] CRAN (R 3.6.1)                
#>  pkgload       1.0.2      2018-10-29 [1] CRAN (R 3.6.1)                
#>  prettyunits   1.0.2      2015-07-13 [1] CRAN (R 3.6.1)                
#>  processx      3.4.1      2019-07-18 [1] CRAN (R 3.6.1)                
#>  ps            1.3.0      2018-12-21 [1] CRAN (R 3.6.1)                
#>  purrr         0.3.2      2019-03-15 [1] CRAN (R 3.6.1)                
#>  R6            2.4.0      2019-02-14 [1] CRAN (R 3.6.1)                
#>  Rcpp          1.0.2      2019-07-25 [1] CRAN (R 3.6.1)                
#>  remotes       2.1.0.9000 2019-07-22 [1] Github (r-lib/remotes@6e9eaa9)
#>  rlang         0.4.0.9004 2019-10-10 [1] Github (r-lib/rlang@a7d8177)  
#>  rmarkdown     1.16       2019-10-01 [1] CRAN (R 3.6.1)                
#>  rprojroot     1.3-2      2018-01-03 [1] CRAN (R 3.6.1)                
#>  sessioninfo   1.1.1      2018-11-05 [1] CRAN (R 3.6.1)                
#>  stringi       1.4.3      2019-03-12 [1] CRAN (R 3.6.0)                
#>  stringr       1.4.0      2019-02-10 [1] CRAN (R 3.6.1)                
#>  testthat      2.2.1      2019-07-25 [1] CRAN (R 3.6.1)                
#>  tibble        2.1.3      2019-06-06 [1] CRAN (R 3.6.1)                
#>  tidyr       * 1.0.0      2019-09-11 [1] CRAN (R 3.6.1)                
#>  tidyselect    0.2.5      2018-10-11 [1] CRAN (R 3.6.1)                
#>  usethis       1.5.1      2019-07-04 [1] CRAN (R 3.6.1)                
#>  vctrs         0.2.0.9005 2019-10-10 [1] Github (r-lib/vctrs@1a96680)  
#>  withr         2.1.2      2018-03-15 [1] CRAN (R 3.6.1)                
#>  xfun          0.10       2019-10-01 [1] CRAN (R 3.6.1)                
#>  yaml          2.2.0      2018-07-25 [1] CRAN (R 3.6.0)                
#>  zeallot       0.1.0      2018-01-28 [1] CRAN (R 3.6.1)                
#> 
#> [1] C:/Users/daniel/Documents/.R/win-library
#> [2] C:/Program Files/R/R-3.6.1/library

We have seen that one here: https://github.com/r-lib/vctrs/issues/497

This in particular is different problem, but I do wonder if there are any advantages / disadvantages about letting a glue string be a vctrs_vctr with extends_type = TRUE set in new_vctr() as discussed in https://github.com/r-lib/vctrs/pull/593#discussion_r331987167

I uncountered a similar problem when trying to convert the result of a query to the github API with {gh}. Adding it here in case somebody search for issues with unnest_wider().

unnest_wider() fails as long as the column has a custom class

library(gh)
library(dplyr)

repos <- gh("/users/:username/repos", username = "jeroen")

tibble(repos = repo_api) %>% 
  purrr::walk(~print(class(.))) %>% 
  tidyr::unnest_wider(col = repos)
[1] "gh_response" "list"       # repos column has a custom class
Error: Can't slice a scalar

unnest_wider() works as soon as the column is back to default list class

Looks like using head() or dplyr::slice() will convert the column back to a default list class.

tibble(repos = repo_api) %>% 
    purrr::walk(~print(class(.))) %>% 
    slice(1:length(repo_api)) %>% 
    purrr::walk(~print(class(.))) %>%
    tidyr::unnest_wider(col = repos) %>% 
    glimpse()
[1] "gh_response" "list" # before head()   
[1] "list" # after head(), repos column lost its custom class
Observations: 30
Variables: 73
$ id                <int> 130482052, 9…
$ node_id           <chr> "MDEwOlJlcG9…
$ name              <chr> "2018.erum.i…
$ full_name         <chr> "jeroen/2018…
$ private           <lgl> FALSE, FALSE…
$ owner             <list> [["jeroen",…
$ html_url          <chr> "https://git…
$ description       <chr> "Homepage of…
... more columns ...

The issue with classes based on atomic vectors is fixed with dev tidyr (via dev vctrs).

The issue with classes based on lists is tracked at https://github.com/r-lib/vctrs/issues/666

I'm going to close this issue because it's tracked in vctrs, it'll be fixed in the next vctrs release, and the next tidyr release will bump the version of the vctrs dependency.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

MarcusWalz picture MarcusWalz  ·  16Comments

leungi picture leungi  ·  19Comments

earowang picture earowang  ·  9Comments

ThierryO picture ThierryO  ·  12Comments

GillesSanMartin picture GillesSanMartin  ·  12Comments