Sf: Add as_sf_tibble?

Created on 17 Jan 2019  Â·  9Comments  Â·  Source: r-spatial/sf

Right now there're two ways to read simple features from file:

  • sf::st_read() returns an sf-data.frame, an object of class c("sf", "data.frame")
  • sf::read_sf() returns an sf-tibble, an object of class c("sf", "tbl_df", "tbl", "data.frame")

But for reading in-memory foreign objects, sf::st_as_sf() only returns an sf-data.frame.

Wounldn't it makes sense to provide an as_sf_tibble() function to convert an sf-data.frame into an sf-tibble?

The issue has been raised before: issue #404, PR #405, PR #927.

PR #405 suggests adding as_data_frame.sf() and as_tibble.sf(). PR #927 also suggests adding a as_tibble.sf().

@edzer provided two reasons for not supporting such functions. The first one is that sf-data.frame is very similar to sf-tibble:

I don't see so much point in moving between the two (very similar) representations when they're already converted to sf - that should better happen before that.

But the printing of tibble is much better.

The second reason is related to the name of the proposed method:

Isn't as_tibble supposed to return a tbl_df, rather than an sf object? Wouldn't you expect it to de-sf an object? as.data.frame.sf now de-sf-s an sf,data.frame object.

I agree that maybe one would expect as_tibble.sf() to de-sf an object. So how about renaming the function to be as_sf_tibble.sf()?

Most helpful comment

OK, thanks - I agree we don't want to split communities, and the current setup is actually quite nice, and the code clobber rather small; I will close this discussion here. In addition, this is not the right place to suggest strong design decisions.

As of the original issue raised: the answers by @kendonB and @etiennebr should work.

All 9 comments

Are these "in-memory foreign objects" not dataframes or similar that you can convert to tibble before calling st_as_sf?

As @kendonB mentionned, I believe the best way to do it is to use as_tibble() %>% st_as_sf()

library(sf)
#> Linking to GEOS 3.7.0, GDAL 2.3.2, PROJ 5.2.0
library(tibble)

x <- st_read(system.file("shape/nc.shp", package="sf"))
#> Reading layer `nc' from data source `/home/etienne/R/x86_64-pc-linux-gnu-library/3.5/sf/shape/nc.shp' using driver `ESRI Shapefile'
#> Simple feature collection with 100 features and 14 fields
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> epsg (SRID):    4267
#> proj4string:    +proj=longlat +datum=NAD27 +no_defs
x %>% 
  as_tibble() %>% 
  st_as_sf()
#> Simple feature collection with 100 features and 14 fields
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> epsg (SRID):    4267
#> proj4string:    +proj=longlat +datum=NAD27 +no_defs
#> # A tibble: 100 x 15
#>     AREA PERIMETER CNTY_ CNTY_ID NAME  FIPS  FIPSNO CRESS_ID BIR74 SID74
#>    <dbl>     <dbl> <dbl>   <dbl> <fct> <fct>  <dbl>    <int> <dbl> <dbl>
#>  1 0.114      1.44  1825    1825 Ashe  37009  37009        5  1091     1
#>  2 0.061      1.23  1827    1827 Alle… 37005  37005        3   487     0
#>  3 0.143      1.63  1828    1828 Surry 37171  37171       86  3188     5
#>  4 0.07       2.97  1831    1831 Curr… 37053  37053       27   508     1
#>  5 0.153      2.21  1832    1832 Nort… 37131  37131       66  1421     9
#>  6 0.097      1.67  1833    1833 Hert… 37091  37091       46  1452     7
#>  7 0.062      1.55  1834    1834 Camd… 37029  37029       15   286     0
#>  8 0.091      1.28  1835    1835 Gates 37073  37073       37   420     0
#>  9 0.118      1.42  1836    1836 Warr… 37185  37185       93   968     4
#> 10 0.124      1.43  1837    1837 Stok… 37169  37169       85  1612     1
#> # … with 90 more rows, and 5 more variables: NWBIR74 <dbl>, BIR79 <dbl>,
#> #   SID79 <dbl>, NWBIR79 <dbl>, geometry <MULTIPOLYGON [°]>

Created on 2019-01-17 by the reprex package (v0.2.1)

I gave this a second thought, because as you mentioned the issue is recurrent. So I had a discussion with @edzer (since we're both at rstudio::conf --come say hi if you are). I think we could think of multiple solutions. Supporting both data.frame and tibble makes things slightly more complicated both for usage and development, and it probably reflects that we need to change some of the package design. I see a few solutions:

  • Force users to use tibble.
  • Separate in two packages: one that would do basic spatial support on data.frame and another one that would add a tidyverse layer on top.

Dropping tibble support is not really an option, because the integration with the tidyverse makes a lot of sense and we get a lot of benefits out of it. These are just thoughts, but I think it could make sense to explore this.

There is still the option of doing nothing, just like UK and USA politics. I am in favor of solution 1, after user consultation / discussion. tibbles are now well understood, well supported, established, and mostly harmless. Does anyone see real disadvantages, beyond the additional dependency?

This might break old code due to the difference in how some functions behave between tbl_df and data.frame. [, for example, always yields a tibble when used on a tibble but not always when used on a data.frame. My view is that sf is new enough that there's not enough old code out there to worry about for something like this.

In sorrow and anger:

If sf goes to tibble only, all the many more users of sp who need a non-"tidy" transition to post-sp representations of spatial data will be betrayed. All who model will be betrayed. Making sf non-diverse will split a 20-year-old community. We do not need tibble for most work, certainly not modelling using most existing core and contributed code. So don't let this tiny discussion in this very limited group of cool people, not exposed to the much larger number of users, even say on R-sig-geo (>3500 subscribers), let you think you can take these kinds of very destructive steps. This is an absolute veto from me.

If you do this I will stop shifting spatial representations in spdep to sf and elsewhere, and advise against migration from sp to sf. sf and stars are only about more modern representations in R of spatial objects, not about forcing users into the "tidy" monoculture.

OK, thanks - I agree we don't want to split communities, and the current setup is actually quite nice, and the code clobber rather small; I will close this discussion here. In addition, this is not the right place to suggest strong design decisions.

As of the original issue raised: the answers by @kendonB and @etiennebr should work.

And a recent tibble wobbler: https://stat.ethz.ch/pipermail/r-package-devel/2019q1/003403.html - forcing users to avoid cli.unicode (glyph instead of ...) themselves, breaking testing diffs and backward compatibility.

Dear @kendonB, @etiennebr, @edzer, and @rsbivand,

Thanks for all your inputs. To provide a bit more context, I was hoping to convert a Spatial object into an sf object. For example, the following function in tigris returns a "SpatialPolygonsDataFrame".

# install.packages("tigris")
nc = tigris::counties("nc", cb = TRUE) 
class(nc)
#> [1] "SpatialPolygonsDataFrame"
#> attr(,"package")
#> [1] "sp"

Running sf::st_as_sf(nc) returns an sf-data.frame:

nc_sf = sf::st_as_sf(nc)
class(nc_sf)
#> [1] "sf"         "data.frame"

Because I found sf-tibbles to be easier to work with, I tended just to use sf::read_sf() to read sf objects from a file.

So when I saw sf::st_as_sf() only converts a foreign object to an sf-data.frame, I naturally thought either there would be an as_tibble argument in the sf::st_as_sf() function, or there would be an as_tibble.sf() method (or as_sf_tibble.sf()) in the sf package. I wasn't able to find either one of the two options.

Because similar issues have been raised before, I thought I'd try again. I didn't know that there's a tension between supporting sf-data.frame and sf-tibble. I considered the current implementation strikes a great balance, and my hope is to point out a possible small fix that might be useful for other users in the future.

In case other people might find it useful, here's what I came up with based on the answers by @kendonB and @etiennebr:

library(sf)
library(tibble)

# install.packages("tigris")
nc = tigris::counties("nc", cb = TRUE) 

nc %>% st_as_sf() %>% as_tibble() %>% st_as_sf()

#> Simple feature collection with 100 features and 9 fields
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: -84.32187 ymin: 33.84232 xmax: -75.46062 ymax: 36.58812
#> epsg (SRID):    4269
#> proj4string:    +proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs
#> # A tibble: 100 x 10
#>    STATEFP COUNTYFP COUNTYNS AFFGEOID GEOID NAME  LSAD  ALAND AWATER
#>    <chr>   <chr>    <chr>    <chr>    <chr> <chr> <chr> <chr> <chr> 
#>  1 37      009      01008535 0500000… 37009 Ashe  06    1100… 80931…
#>  2 37      017      01026336 0500000… 37017 Blad… 06    2265… 33041…
#>  3 37      023      01008539 0500000… 37023 Burke 06    1311… 20699…
#>  4 37      027      01008541 0500000… 37027 Cald… 06    1222… 70384…
#>  5 37      047      01026339 0500000… 37047 Colu… 06    2428… 43651…
#>  6 37      091      01026127 0500000… 37091 Hert… 06    9146… 18740…
#>  7 37      095      01008564 0500000… 37095 Hyde  06    1585… 21929…
#>  8 37      105      01008567 0500000… 37105 Lee   06    6605… 10752…
#>  9 37      115      01025834 0500000… 37115 Madi… 06    1164… 48407…
#> 10 37      121      01008571 0500000… 37121 Mitc… 06    5730… 16295…
#> # … with 90 more rows, and 1 more variable: geometry <MULTIPOLYGON [°]>
Was this page helpful?
0 / 5 - 0 ratings

Related issues

kendonB picture kendonB  Â·  4Comments

kendonB picture kendonB  Â·  3Comments

ekarsten picture ekarsten  Â·  4Comments

matthewpaulking picture matthewpaulking  Â·  4Comments

jsta picture jsta  Â·  4Comments