Right now there're two ways to read simple features from file:
sf::st_read() returns an sf-data.frame, an object of class c("sf", "data.frame")sf::read_sf() returns an sf-tibble, an object of class c("sf", "tbl_df", "tbl", "data.frame")But for reading in-memory foreign objects, sf::st_as_sf() only returns an sf-data.frame.
Wounldn't it makes sense to provide an as_sf_tibble() function to convert an sf-data.frame into an sf-tibble?
The issue has been raised before: issue #404, PR #405, PR #927.
PR #405 suggests adding as_data_frame.sf() and as_tibble.sf(). PR #927 also suggests adding a as_tibble.sf().
@edzer provided two reasons for not supporting such functions. The first one is that sf-data.frame is very similar to sf-tibble:
I don't see so much point in moving between the two (very similar) representations when they're already converted to sf - that should better happen before that.
But the printing of tibble is much better.
The second reason is related to the name of the proposed method:
Isn't as_tibble supposed to return a tbl_df, rather than an sf object? Wouldn't you expect it to de-sf an object? as.data.frame.sf now de-sf-s an sf,data.frame object.
I agree that maybe one would expect as_tibble.sf() to de-sf an object. So how about renaming the function to be as_sf_tibble.sf()?
Are these "in-memory foreign objects" not dataframes or similar that you can convert to tibble before calling st_as_sf?
As @kendonB mentionned, I believe the best way to do it is to use as_tibble() %>% st_as_sf()
library(sf)
#> Linking to GEOS 3.7.0, GDAL 2.3.2, PROJ 5.2.0
library(tibble)
x <- st_read(system.file("shape/nc.shp", package="sf"))
#> Reading layer `nc' from data source `/home/etienne/R/x86_64-pc-linux-gnu-library/3.5/sf/shape/nc.shp' using driver `ESRI Shapefile'
#> Simple feature collection with 100 features and 14 fields
#> geometry type: MULTIPOLYGON
#> dimension: XY
#> bbox: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> epsg (SRID): 4267
#> proj4string: +proj=longlat +datum=NAD27 +no_defs
x %>%
as_tibble() %>%
st_as_sf()
#> Simple feature collection with 100 features and 14 fields
#> geometry type: MULTIPOLYGON
#> dimension: XY
#> bbox: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> epsg (SRID): 4267
#> proj4string: +proj=longlat +datum=NAD27 +no_defs
#> # A tibble: 100 x 15
#> AREA PERIMETER CNTY_ CNTY_ID NAME FIPS FIPSNO CRESS_ID BIR74 SID74
#> <dbl> <dbl> <dbl> <dbl> <fct> <fct> <dbl> <int> <dbl> <dbl>
#> 1 0.114 1.44 1825 1825 Ashe 37009 37009 5 1091 1
#> 2 0.061 1.23 1827 1827 Alle… 37005 37005 3 487 0
#> 3 0.143 1.63 1828 1828 Surry 37171 37171 86 3188 5
#> 4 0.07 2.97 1831 1831 Curr… 37053 37053 27 508 1
#> 5 0.153 2.21 1832 1832 Nort… 37131 37131 66 1421 9
#> 6 0.097 1.67 1833 1833 Hert… 37091 37091 46 1452 7
#> 7 0.062 1.55 1834 1834 Camd… 37029 37029 15 286 0
#> 8 0.091 1.28 1835 1835 Gates 37073 37073 37 420 0
#> 9 0.118 1.42 1836 1836 Warr… 37185 37185 93 968 4
#> 10 0.124 1.43 1837 1837 Stok… 37169 37169 85 1612 1
#> # … with 90 more rows, and 5 more variables: NWBIR74 <dbl>, BIR79 <dbl>,
#> # SID79 <dbl>, NWBIR79 <dbl>, geometry <MULTIPOLYGON [°]>
Created on 2019-01-17 by the reprex package (v0.2.1)
I gave this a second thought, because as you mentioned the issue is recurrent. So I had a discussion with @edzer (since we're both at rstudio::conf --come say hi if you are). I think we could think of multiple solutions. Supporting both data.frame and tibble makes things slightly more complicated both for usage and development, and it probably reflects that we need to change some of the package design. I see a few solutions:
tibble. data.frame and another one that would add a tidyverse layer on top. Dropping tibble support is not really an option, because the integration with the tidyverse makes a lot of sense and we get a lot of benefits out of it. These are just thoughts, but I think it could make sense to explore this.
There is still the option of doing nothing, just like UK and USA politics. I am in favor of solution 1, after user consultation / discussion. tibbles are now well understood, well supported, established, and mostly harmless. Does anyone see real disadvantages, beyond the additional dependency?
This might break old code due to the difference in how some functions behave between tbl_df and data.frame. [, for example, always yields a tibble when used on a tibble but not always when used on a data.frame. My view is that sf is new enough that there's not enough old code out there to worry about for something like this.
In sorrow and anger:
If sf goes to tibble only, all the many more users of sp who need a non-"tidy" transition to post-sp representations of spatial data will be betrayed. All who model will be betrayed. Making sf non-diverse will split a 20-year-old community. We do not need tibble for most work, certainly not modelling using most existing core and contributed code. So don't let this tiny discussion in this very limited group of cool people, not exposed to the much larger number of users, even say on R-sig-geo (>3500 subscribers), let you think you can take these kinds of very destructive steps. This is an absolute veto from me.
If you do this I will stop shifting spatial representations in spdep to sf and elsewhere, and advise against migration from sp to sf. sf and stars are only about more modern representations in R of spatial objects, not about forcing users into the "tidy" monoculture.
OK, thanks - I agree we don't want to split communities, and the current setup is actually quite nice, and the code clobber rather small; I will close this discussion here. In addition, this is not the right place to suggest strong design decisions.
As of the original issue raised: the answers by @kendonB and @etiennebr should work.
And a recent tibble wobbler: https://stat.ethz.ch/pipermail/r-package-devel/2019q1/003403.html - forcing users to avoid cli.unicode (glyph instead of ...) themselves, breaking testing diffs and backward compatibility.
Dear @kendonB, @etiennebr, @edzer, and @rsbivand,
Thanks for all your inputs. To provide a bit more context, I was hoping to convert a Spatial object into an sf object. For example, the following function in tigris returns a "SpatialPolygonsDataFrame".
# install.packages("tigris")
nc = tigris::counties("nc", cb = TRUE)
class(nc)
#> [1] "SpatialPolygonsDataFrame"
#> attr(,"package")
#> [1] "sp"
Running sf::st_as_sf(nc) returns an sf-data.frame:
nc_sf = sf::st_as_sf(nc)
class(nc_sf)
#> [1] "sf" "data.frame"
Because I found sf-tibbles to be easier to work with, I tended just to use sf::read_sf() to read sf objects from a file.
So when I saw sf::st_as_sf() only converts a foreign object to an sf-data.frame, I naturally thought either there would be an as_tibble argument in the sf::st_as_sf() function, or there would be an as_tibble.sf() method (or as_sf_tibble.sf()) in the sf package. I wasn't able to find either one of the two options.
Because similar issues have been raised before, I thought I'd try again. I didn't know that there's a tension between supporting sf-data.frame and sf-tibble. I considered the current implementation strikes a great balance, and my hope is to point out a possible small fix that might be useful for other users in the future.
In case other people might find it useful, here's what I came up with based on the answers by @kendonB and @etiennebr:
library(sf)
library(tibble)
# install.packages("tigris")
nc = tigris::counties("nc", cb = TRUE)
nc %>% st_as_sf() %>% as_tibble() %>% st_as_sf()
#> Simple feature collection with 100 features and 9 fields
#> geometry type: MULTIPOLYGON
#> dimension: XY
#> bbox: xmin: -84.32187 ymin: 33.84232 xmax: -75.46062 ymax: 36.58812
#> epsg (SRID): 4269
#> proj4string: +proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs
#> # A tibble: 100 x 10
#> STATEFP COUNTYFP COUNTYNS AFFGEOID GEOID NAME LSAD ALAND AWATER
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 37 009 01008535 0500000… 37009 Ashe 06 1100… 80931…
#> 2 37 017 01026336 0500000… 37017 Blad… 06 2265… 33041…
#> 3 37 023 01008539 0500000… 37023 Burke 06 1311… 20699…
#> 4 37 027 01008541 0500000… 37027 Cald… 06 1222… 70384…
#> 5 37 047 01026339 0500000… 37047 Colu… 06 2428… 43651…
#> 6 37 091 01026127 0500000… 37091 Hert… 06 9146… 18740…
#> 7 37 095 01008564 0500000… 37095 Hyde 06 1585… 21929…
#> 8 37 105 01008567 0500000… 37105 Lee 06 6605… 10752…
#> 9 37 115 01025834 0500000… 37115 Madi… 06 1164… 48407…
#> 10 37 121 01008571 0500000… 37121 Mitc… 06 5730… 16295…
#> # … with 90 more rows, and 1 more variable: geometry <MULTIPOLYGON [°]>
Most helpful comment
OK, thanks - I agree we don't want to split communities, and the current setup is actually quite nice, and the code clobber rather small; I will close this discussion here. In addition, this is not the right place to suggest strong design decisions.
As of the original issue raised: the answers by @kendonB and @etiennebr should work.