Sf: `nest() and `unnest()` failed with tidyr (devlopment version 0.8.3.9000)

Created on 4 Jun 2019  Β·  24Comments  Β·  Source: r-spatial/sf

The test fails with the development version of tidyr installed. Specifically, there are the following two places.

In addition, separate_rows() can not be executed. (The first report is as it was here)

library(sf)
#> Linking to GEOS 3.5.1, GDAL 2.1.2, PROJ 4.9.3
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)
lapply(c("sf", "dplyr", "tidyr"), packageVersion)
#> [[1]]
#> [1] '0.7.5'
#> 
#> [[2]]
#> [1] '0.8.1'
#> 
#> [[3]]
#> [1] '0.8.3.9000'

d <- st_as_sf(data.frame(
    x = seq_len(3),
    y = c("a", "d,e,f", "g,h"),
    geometry = st_sfc(st_point(c(1, 1)),
                      st_point(c(2, 2)),
                      st_point(c(3, 3))),
    stringsAsFactors = FALSE))

d %>% 
    separate_rows(y, convert = TRUE)
#> Error: Can't slice a scalar

Created on 2019-06-04 by the reprex package (v0.3.0)

Session info

devtools::session_info()
#> ─ Session info ──────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 3.6.0 (2019-04-26)
#>  os       Debian GNU/Linux 9 (stretch)
#>  system   x86_64, linux-gnu           
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  ctype    en_US.UTF-8                 
#>  tz       Etc/UTC                     
#>  date     2019-06-04                  
#> 
#> ─ Packages ──────────────────────────────────────────────────────────────
#>  package     * version    date       lib source                          
#>  assertthat    0.2.1      2019-03-21 [1] CRAN (R 3.6.0)                  
#>  backports     1.1.4      2019-04-10 [1] CRAN (R 3.6.0)                  
#>  callr         3.2.0      2019-03-15 [1] CRAN (R 3.6.0)                  
#>  class         7.3-15     2019-01-01 [2] CRAN (R 3.6.0)                  
#>  classInt      0.3-3      2019-04-26 [1] CRAN (R 3.6.0)                  
#>  cli           1.1.0      2019-03-19 [1] CRAN (R 3.6.0)                  
#>  crayon        1.3.4      2017-09-16 [1] CRAN (R 3.6.0)                  
#>  DBI           1.0.0      2018-05-02 [1] CRAN (R 3.6.0)                  
#>  desc          1.2.0      2018-05-01 [1] CRAN (R 3.6.0)                  
#>  devtools      2.0.2      2019-04-08 [1] CRAN (R 3.6.0)                  
#>  digest        0.6.19     2019-05-20 [1] CRAN (R 3.6.0)                  
#>  dplyr       * 0.8.1      2019-05-14 [1] CRAN (R 3.6.0)                  
#>  e1071         1.7-1      2019-03-19 [1] CRAN (R 3.6.0)                  
#>  ellipsis      0.1.0.9000 2019-06-02 [1] Github (r-lib/ellipsis@d8bf8a3) 
#>  evaluate      0.14       2019-05-28 [1] CRAN (R 3.6.0)                  
#>  fs            1.3.1      2019-05-06 [1] CRAN (R 3.6.0)                  
#>  glue          1.3.1      2019-03-12 [1] CRAN (R 3.6.0)                  
#>  highr         0.8        2019-03-20 [1] CRAN (R 3.6.0)                  
#>  htmltools     0.3.6      2017-04-28 [1] CRAN (R 3.6.0)                  
#>  KernSmooth    2.23-15    2015-06-29 [2] CRAN (R 3.6.0)                  
#>  knitr         1.23       2019-05-18 [1] CRAN (R 3.6.0)                  
#>  magrittr      1.5        2014-11-22 [1] CRAN (R 3.6.0)                  
#>  memoise       1.1.0      2017-04-21 [1] CRAN (R 3.6.0)                  
#>  pillar        1.4.1      2019-05-28 [1] CRAN (R 3.6.0)                  
#>  pkgbuild      1.0.3      2019-03-20 [1] CRAN (R 3.6.0)                  
#>  pkgconfig     2.0.2      2018-08-16 [1] CRAN (R 3.6.0)                  
#>  pkgload       1.0.2      2018-10-29 [1] CRAN (R 3.6.0)                  
#>  prettyunits   1.0.2      2015-07-13 [1] CRAN (R 3.6.0)                  
#>  processx      3.3.1      2019-05-08 [1] CRAN (R 3.6.0)                  
#>  ps            1.3.0      2018-12-21 [1] CRAN (R 3.6.0)                  
#>  purrr         0.3.2      2019-03-15 [1] CRAN (R 3.6.0)                  
#>  R6            2.4.0      2019-02-14 [1] CRAN (R 3.6.0)                  
#>  Rcpp          1.0.1      2019-03-17 [1] CRAN (R 3.6.0)                  
#>  remotes       2.0.4      2019-04-10 [1] CRAN (R 3.6.0)                  
#>  rlang         0.3.4.9003 2019-06-02 [1] Github (r-lib/rlang@6a232c0)    
#>  rmarkdown     1.13       2019-05-22 [1] CRAN (R 3.6.0)                  
#>  rprojroot     1.3-2      2018-01-03 [1] CRAN (R 3.6.0)                  
#>  sessioninfo   1.1.1      2018-11-05 [1] CRAN (R 3.6.0)                  
#>  sf          * 0.7-5      2019-06-02 [1] Github (r-spatial/sf@20c6292)   
#>  stringi       1.4.3      2019-03-12 [1] CRAN (R 3.6.0)                  
#>  stringr       1.4.0      2019-02-10 [1] CRAN (R 3.6.0)                  
#>  testthat      2.1.1      2019-04-23 [1] CRAN (R 3.6.0)                  
#>  tibble        2.1.2      2019-05-29 [1] CRAN (R 3.6.0)                  
#>  tidyr       * 0.8.3.9000 2019-06-02 [1] Github (tidyverse/tidyr@56fb136)
#>  tidyselect    0.2.5      2018-10-11 [1] CRAN (R 3.6.0)                  
#>  units         0.6-3      2019-05-03 [1] CRAN (R 3.6.0)                  
#>  usethis       1.5.0      2019-04-07 [1] CRAN (R 3.6.0)                  
#>  vctrs         0.1.0.9003 2019-06-02 [1] Github (r-lib/vctrs@e0c0ed4)    
#>  withr         2.1.2      2018-03-15 [1] CRAN (R 3.6.0)                  
#>  xfun          0.7        2019-05-14 [1] CRAN (R 3.6.0)                  
#>  yaml          2.2.0      2018-07-25 [1] CRAN (R 3.6.0)                  
#>  zeallot       0.1.0      2018-01-28 [1] CRAN (R 3.6.0)                  
#> 
#> [1] /usr/local/lib/R/site-library
#> [2] /usr/local/lib/R/library

Most helpful comment

Guessing from the discussion on https://github.com/tidyverse/tibble/issues/624#issuecomment-519786850, maybe defining vec_proxy.sfc() is enough for now?

library(sf)
#> Linking to GEOS 3.7.2, GDAL 2.4.2, PROJ 6.1.1
library(dplyr, warn.conflicts = FALSE)
library(tidyr)
lapply(c("sf", "dplyr", "tidyr"), packageVersion)
#> [[1]]
#> [1] '0.7.7'
#> 
#> [[2]]
#> [1] '0.8.3'
#> 
#> [[3]]
#> [1] '0.8.99.9000'

d <- st_as_sf(data.frame(
  x = seq_len(3),
  y = c("a", "d,e,f", "g,h"),
  geometry = st_sfc(st_point(c(1, 1)),
                    st_point(c(2, 2)),
                    st_point(c(3, 3))),
  stringsAsFactors = FALSE))

d %>% 
  separate_rows(y, convert = TRUE)
#> Error: Can't slice a scalar

vec_proxy.sfc <- function(x, ...) {
  x
}

# I don't know what this warning is... (it disappears when convert = FALSE)
d %>% 
  separate_rows(y, convert = TRUE)
#> Warning in `[<-.data.frame`(`*tmp*`, vars, value = list(y = c("a", "d", :
#> provided 2 variables to replace 1 variables
#> Simple feature collection with 6 features and 2 fields
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: 1 ymin: 1 xmax: 3 ymax: 3
#> epsg (SRID):    NA
#> proj4string:    NA
#>   x y    geometry
#> 1 1 a POINT (1 1)
#> 2 2 d POINT (2 2)
#> 3 2 e POINT (2 2)
#> 4 2 f POINT (2 2)
#> 5 3 g POINT (3 3)
#> 6 3 h POINT (3 3)

Created on 2019-08-15 by the reprex package (v0.3.0)

All 24 comments

Similar to the discussion r-lib/vctrs/issues/362, I can solve the problem by adding the method for sfc of vctrs::vec_restore() and modifying nest.sf() as follows.

library(sf)
#> Linking to GEOS 3.5.1, GDAL 2.1.2, PROJ 4.9.3
library(dplyr, warn.conflicts = FALSE)
library(tidyr)
data(storms, package = "dplyr")

data <-
    storms %>%
    st_as_sf(coords = c("long", "lat"), crs = 4326) %>%
    group_by(year, name)

data %>%
    nest()
#> Quosures can only be unquoted within a quasiquotation context.
#> 
#>   # Bad:
#>   list(!!myquosure)
#> 
#>   # Good:
#>   dplyr::mutate(data, !!myquosure)

# Update nest.sf()
nest.sf = function(data, ...) {
    class(data) <- setdiff(class(data), "sf")
    if (!requireNamespace("rlang", quietly = TRUE))
        stop("rlang required: install first?")
    if (!requireNamespace("tidyr", quietly = TRUE))
        stop("tidyr required: install first?")
    ret = tidyr::nest(data)
    ret$data <-
        lapply(ret$data, st_sf)
    ret
}

vec_proxy.sfc <- function(x) {
    class(x) <- setdiff(class(x), "sfc")
    vctrs::vec_proxy(x)
}

x <- data %>%
    nest()
x
#> # A tibble: 426 x 3
#>    name      year data              
#>    <chr>    <dbl> <list>            
#>  1 Amy       1975 <tibble [30 Γ— 10]>
#>  2 Caroline  1975 <tibble [33 Γ— 10]>
#>  3 Doris     1975 <tibble [23 Γ— 10]>
#>  4 Belle     1976 <tibble [18 Γ— 10]>
#>  5 Gloria    1976 <tibble [34 Γ— 10]>
#>  6 Anita     1977 <tibble [20 Γ— 10]>
#>  7 Clara     1977 <tibble [24 Γ— 10]>
#>  8 Evelyn    1977 <tibble [9 Γ— 10]> 
#>  9 Amelia    1978 <tibble [6 Γ— 10]> 
#> 10 Bess      1978 <tibble [13 Γ— 10]>
#> # … with 416 more rows

x %>% 
    unnest(cols = data)
#> # A tibble: 10,010 x 12
#>    name   year month   day  hour status category  wind pressure ts_diameter
#>    <chr> <dbl> <dbl> <int> <dbl> <chr>  <ord>    <int>    <int>       <dbl>
#>  1 Amy    1975     6    27     0 tropi… -1          25     1013          NA
#>  2 Amy    1975     6    27     6 tropi… -1          25     1013          NA
#>  3 Amy    1975     6    27    12 tropi… -1          25     1013          NA
#>  4 Amy    1975     6    27    18 tropi… -1          25     1013          NA
#>  5 Amy    1975     6    28     0 tropi… -1          25     1012          NA
#>  6 Amy    1975     6    28     6 tropi… -1          25     1012          NA
#>  7 Amy    1975     6    28    12 tropi… -1          25     1011          NA
#>  8 Amy    1975     6    28    18 tropi… -1          30     1006          NA
#>  9 Amy    1975     6    29     0 tropi… 0           35     1004          NA
#> 10 Amy    1975     6    29     6 tropi… 0           40     1002          NA
#> # … with 10,000 more rows, and 2 more variables: hu_diameter <dbl>,
#> #   geometry <POINT [Β°]>

Created on 2019-06-04 by the reprex package (v0.3.0)

This seems brittle (since we don't have full vctrs implementation), but I'm fairly new to vctrs. Maybe @krlmlr have an opinion?

Oh, and thanks @uribo for coming up with the heads up AND a solution!

I wonder if it's sufficient to implement vec_restore.sf() without adding a nest.sf() method.

CC @lionel-.

The intention is to replace all custom methods by two implementations of vec_proxy() and vec_restore(). It might still be a bit early to do so, but any feedback would be great.

Are you planning to release these dev versions before UseR!, or can we postpone and look into this during the tidyverse dev day in Toulouse?

Yes we plan to release at least rlang and vctrs, and hopefully tidyr before UseR!. If we manage to release tidyr, it'll probably be right before the conference, so it's probably fine to look into it at the tidyverse dev day?

Thanks, yes that should work out.

Guessing from the discussion on https://github.com/tidyverse/tibble/issues/624#issuecomment-519786850, maybe defining vec_proxy.sfc() is enough for now?

library(sf)
#> Linking to GEOS 3.7.2, GDAL 2.4.2, PROJ 6.1.1
library(dplyr, warn.conflicts = FALSE)
library(tidyr)
lapply(c("sf", "dplyr", "tidyr"), packageVersion)
#> [[1]]
#> [1] '0.7.7'
#> 
#> [[2]]
#> [1] '0.8.3'
#> 
#> [[3]]
#> [1] '0.8.99.9000'

d <- st_as_sf(data.frame(
  x = seq_len(3),
  y = c("a", "d,e,f", "g,h"),
  geometry = st_sfc(st_point(c(1, 1)),
                    st_point(c(2, 2)),
                    st_point(c(3, 3))),
  stringsAsFactors = FALSE))

d %>% 
  separate_rows(y, convert = TRUE)
#> Error: Can't slice a scalar

vec_proxy.sfc <- function(x, ...) {
  x
}

# I don't know what this warning is... (it disappears when convert = FALSE)
d %>% 
  separate_rows(y, convert = TRUE)
#> Warning in `[<-.data.frame`(`*tmp*`, vars, value = list(y = c("a", "d", :
#> provided 2 variables to replace 1 variables
#> Simple feature collection with 6 features and 2 fields
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: 1 ymin: 1 xmax: 3 ymax: 3
#> epsg (SRID):    NA
#> proj4string:    NA
#>   x y    geometry
#> 1 1 a POINT (1 1)
#> 2 2 d POINT (2 2)
#> 3 2 e POINT (2 2)
#> 4 2 f POINT (2 2)
#> 5 3 g POINT (3 3)
#> 6 3 h POINT (3 3)

Created on 2019-08-15 by the reprex package (v0.3.0)

@yutannihilation Thank you for updating the information.

The warning occurs because there is no sf method for separate_rows(). This is solved by combining with my PR #1065

library(sf)
#> Linking to GEOS 3.5.1, GDAL 2.1.2, PROJ 4.9.3
library(dplyr, warn.conflicts = FALSE)
library(tidyr)

d <- st_as_sf(data.frame(
    x = seq_len(3),
    y = c("a", "d,e,f", "g,h"),
    geometry = st_sfc(st_point(c(1, 1)),
                      st_point(c(2, 2)),
                      st_point(c(3, 3))),
    stringsAsFactors = FALSE))

vec_proxy.sfc <- function(x) {
    x
}

d %>% 
    separate_rows(y, convert = TRUE)
#> Warning in `[<-.data.frame`(`*tmp*`, vars, value = list(y = c("a", "d", :
#> provided 2 variables to replace 1 variables
#> Simple feature collection with 6 features and 2 fields
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: 1 ymin: 1 xmax: 3 ymax: 3
#> epsg (SRID):    NA
#> proj4string:    NA
#>   x y    geometry
#> 1 1 a POINT (1 1)
#> 2 2 d POINT (2 2)
#> 3 2 e POINT (2 2)
#> 4 2 f POINT (2 2)
#> 5 3 g POINT (3 3)
#> 6 3 h POINT (3 3)

# FROM #1065
# https://github.com/r-spatial/sf/pull/1065/files#diff-93435dd5acbfc85b09ff75ef39510db5
separate_rows.sf <- function(data, ..., sep = "[^[:alnum:]]+", convert = FALSE) {
    if (!requireNamespace("tidyr", quietly = TRUE))
        stop("tidyr required: install first?")
    class(data) <- setdiff(class(data), "sf")
    ret = tidyr::separate_rows(data, ..., sep = sep, convert = convert)
    st_as_sf(ret, sf_column_name = attr(data, "sf_column"))
}

d %>% 
    separate_rows(y, convert = TRUE)
#> Simple feature collection with 6 features and 2 fields
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: 1 ymin: 1 xmax: 3 ymax: 3
#> epsg (SRID):    NA
#> proj4string:    NA
#>   x y    geometry
#> 1 1 a POINT (1 1)
#> 2 2 d POINT (2 2)
#> 3 2 e POINT (2 2)
#> 4 2 f POINT (2 2)
#> 5 3 g POINT (3 3)
#> 6 3 h POINT (3 3)

Created on 2019-08-16 by the reprex package (v0.3.0)

Thanks, commented there about a bit more detail of this warning (I agree the PR will fix this).

I'll close the issue since #1065 is merged.

@etiennebr Aw, sorry, my comment was not clear... the PR fixes the separate_rows()'s warning, not this issue; nest() and unnest() still fails until we define vec_proxy() or vctrs introduces some workaround. Could you reopen this?

@krlmlr @lionel-
On https://github.com/tidyverse/tibble/issues/624, you said it's "No hurry" to support vctrs, but I think we need to hurry a bit because I heard the next version of tidyr is planed to be submitted to CRAN soon. You proposed an workaround on vctrs side, Will this happen before the tidyr's release? If not, I think sf needs to implement vec_proxy.sfc().

@lionel-: Do you think it's worthwhile to build exceptions ("known vctrs-y classes") into vctrs, to ease the transition pain?

Sorry for closing it too rapidly!

Thanks.

Sorry, I misunderstood this issue...

nest() and unnest() still fails until we define vec_proxy() or vctrs introduces some workaround.

vec_proxy.sfc() fixes only separate_rows(), and the error of nest()/unnest() is another story.

library(sf)
#> Linking to GEOS 3.7.2, GDAL 2.4.2, PROJ 6.1.1
library(dplyr, warn.conflicts = FALSE)
library(tidyr)

storms.sf <- st_as_sf(storms, coords = c("long", "lat"), crs = 4326)

vec_proxy.sfc <- function(x, ...) {
  x
}

storms.sf %>%
  group_by(name, year) %>%
  nest()
#> Quosures can only be unquoted within a quasiquotation context.
#> 
#>   # Bad:
#>   list(!!myquosure)
#> 
#>   # Good:
#>   dplyr::mutate(data, !!myquosure)

Created on 2019-09-10 by the reprex package (v0.3.0)

The problem seems that sf::unnest.sf() uses a deprecated .key here:

https://github.com/r-spatial/sf/blob/3dd87621ca42a8fc18ff84b4698668466aae7a1e/R/tidyverse.R#L285

Good catch @yutannihilation. This seems to be an issue in the compatibility layer in nest(). I'm looking into it now, but unfortunately tidyr was just sent to CRAN.

Regarding your other solution of defining an identity proxy:

#' @export
vec_proxy.sfc <- function(x, ...) {
  x
}

This seems like the right thing to do. Defining a proxy allows S3 lists to be treated as vectors by the vctrs package, rather than scalars.

So, let me summarise the problems.

nest()

Error: Quosures can only be unquoted within a quasiquotation context.

  • nest() doesn't accept a quosure .key, but this will be fixed on tidyr's side
  • Now that .key is deprecated, nest.sf() should follow the change (change the default argument to lifecycle::deprecated()?).

unnest()

Error: Can't slice a scalar

  • Defining vec_proxy.sfc() solves the problem.
  • cols is now required argument, which means this test always fails and should be removed:

https://github.com/r-spatial/sf/blob/3dd87621ca42a8fc18ff84b4698668466aae7a1e/tests/testthat/test_tidy.R#L82

  • The order of columns seems the same as the original now, so it seems this line is not needed now:

https://github.com/r-spatial/sf/blob/3dd87621ca42a8fc18ff84b4698668466aae7a1e/tests/testthat/test_tidy.R#L85

separate_rows()

Error: Can't slice a scalar

  • Defining vec_proxy.sfc() solves the problem.

I'm still getting errors related to this in both the CRAN and GitHub versions, but I think a minimal vec_cast() implementation fixes it:

library(sf)
#> Linking to GEOS 3.7.2, GDAL 2.4.1, PROJ 6.1.0
library(vctrs)

sfc <- st_sfc(st_point(c(1, 1)), st_point(c(0, 0)))
sfc_df <-  tibble::tibble(x = 1, geometry = sfc)

# minimal vec_cast implementation
vec_cast.sfc <- function(x, to, ...) UseMethod("vec_cast.sfc")
vec_cast.sfc.sfc <- function(x, to, ...) {
  st_cast(x, gsub("sfc_", "", class(to)[1]))
}
vec_cast.sfc.default <- function(x, to, ...) vec_default_cast(x, to)

c(sfc, sfc)
#> Geometry set for 4 features 
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: 0 ymin: 0 xmax: 1 ymax: 1
#> epsg (SRID):    NA
#> proj4string:    NA
#> POINT (1 1)
#> POINT (0 0)
#> POINT (1 1)
#> POINT (0 0)
vec_c(sfc, sfc)
#> Geometry set for 4 features 
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: 0 ymin: 0 xmax: 1 ymax: 1
#> epsg (SRID):    NA
#> proj4string:    NA
#> POINT (1 1)
#> POINT (0 0)
#> POINT (1 1)
#> POINT (0 0)

rbind(sfc_df, sfc_df)
#> # A tibble: 4 x 2
#>       x geometry
#>   <dbl>  <POINT>
#> 1     1    (1 1)
#> 2     1    (0 0)
#> 3     1    (1 1)
#> 4     1    (0 0)
vec_rbind(sfc_df, sfc_df)
#> # A tibble: 4 x 2
#>       x geometry
#>   <dbl>  <POINT>
#> 1     1    (1 1)
#> 2     1    (0 0)
#> 3     1    (1 1)
#> 4     1    (0 0)

sfc_df_list <- tibble::tibble(y = 1, data = list(sfc_df, sfc_df))
tidyr::unnest(sfc_df_list, data)
#> # A tibble: 4 x 3
#>       y     x geometry
#>   <dbl> <dbl>  <POINT>
#> 1     1     1    (1 1)
#> 2     1     1    (0 0)
#> 3     1     1    (1 1)
#> 4     1     1    (0 0)

Created on 2019-10-23 by the reprex package (v0.2.1)

Thanks a lot, @paleolimbot !

By the way, I was wondering if we should add third party methods such as sf to vctrs itself while the API is maturing. This way we can update the API without breaking downstream packages.

Sounds like a good idea, @lionel- !

I lost track, but the initial issue raised here seems to work now.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jsta picture jsta  Β·  4Comments

ekarsten picture ekarsten  Β·  4Comments

happyshows picture happyshows  Β·  3Comments

matthewpaulking picture matthewpaulking  Β·  4Comments

Nowosad picture Nowosad  Β·  3Comments