gpkg gives geometry the column name "geom", geojson and shp give "geometry".
I would expect all backends to give the same geometry column name so that e.g. rbind() is easier.
Maybe all new backends could standardise on "geometry". I can't think of a good deprecation path off the top of my head if you wanted to change the gpkg writer/reader behaviour.
Slightly related to #282.
I've also noticed this and wanted to put in an issue. As a note @cmcaine referring to geometry columns with st_geometry(x) is more failsafe than x$geometry, as illustrated below which also does some benchmarking for different file formats and for point creation related to #700 and #716 - showing data I/O to hd is almost as fast as creating points from a df (as there's no write phase in the 1st tic/toc, just read):
library(sf)
#> Linking to GEOS 3.5.1, GDAL 2.1.2, proj.4 4.9.3
library(tictoc)
n = 10000
d = data.frame(v1 = 1:n, v2 = rep(letters, n), x = runif(n), y = runif(n))
tic()
x = st_as_sf(d, coords = c("x", "y"))
toc()
#> 3.754 sec elapsed
f = c("x.shp", "x.geojson", "x.gpkg")
for (i in f) {
tic()
write_sf(x, i)
x_new = read_sf(i)
# x_new$geometry # fails for gpkg
st_geometry(x_new)
toc()
}
#> 7.13 sec elapsed
#> 11.639 sec elapsed
#> 12.074 sec elapsed
file.remove(f)
#> [1] TRUE TRUE TRUE
Session info
devtools::session_info()
#> Session info -------------------------------------------------------------
#> setting value
#> version R version 3.4.4 (2018-03-15)
#> system x86_64, linux-gnu
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> tz Etc/UTC
#> date 2018-04-22
#> Packages -----------------------------------------------------------------
#> package * version date source
#> backports 1.1.2 2017-12-13 CRAN (R 3.4.4)
#> base * 3.4.4 2018-04-21 local
#> class 7.3-14 2015-08-30 CRAN (R 3.4.4)
#> classInt 0.2-3 2018-04-16 CRAN (R 3.4.4)
#> compiler 3.4.4 2018-04-21 local
#> datasets * 3.4.4 2018-04-21 local
#> DBI 0.8 2018-03-02 CRAN (R 3.4.4)
#> devtools 1.13.5 2018-02-18 CRAN (R 3.4.4)
#> digest 0.6.15 2018-01-28 CRAN (R 3.4.4)
#> e1071 1.6-8 2017-02-02 CRAN (R 3.4.4)
#> evaluate 0.10.1 2017-06-24 CRAN (R 3.4.4)
#> formatR 1.5 2017-04-25 CRAN (R 3.4.4)
#> graphics * 3.4.4 2018-04-21 local
#> grDevices * 3.4.4 2018-04-21 local
#> grid 3.4.4 2018-04-21 local
#> htmltools 0.3.6 2017-04-28 CRAN (R 3.4.4)
#> knitr 1.20 2018-02-20 CRAN (R 3.4.4)
#> magrittr 1.5 2014-11-22 CRAN (R 3.4.4)
#> memoise 1.1.0 2017-04-21 CRAN (R 3.4.4)
#> methods * 3.4.4 2018-04-21 local
#> pillar 1.2.1 2018-02-27 CRAN (R 3.4.4)
#> Rcpp 0.12.16 2018-03-13 CRAN (R 3.4.4)
#> rlang 0.2.0.9001 2018-04-21 Github (r-lib/rlang@82b2727)
#> rmarkdown 1.9 2018-03-01 CRAN (R 3.4.4)
#> RPostgreSQL 0.6-2 2017-06-24 CRAN (R 3.4.4)
#> rprojroot 1.3-2 2018-01-03 CRAN (R 3.4.4)
#> sf * 0.6-2 2018-04-21 Github (r-spatial/sf@b69c835)
#> spData 0.2.8.8 2018-04-21 Github (nowosad/spData@8edd206)
#> spDataLarge 0.2.6.3 2018-04-21 Github (nowosad/spDataLarge@22c826a)
#> stats * 3.4.4 2018-04-21 local
#> stringi 1.1.7 2018-03-12 CRAN (R 3.4.4)
#> stringr 1.3.0 2018-02-19 CRAN (R 3.4.4)
#> tibble 1.4.2 2018-01-22 CRAN (R 3.4.4)
#> tictoc * 1.0 2014-06-17 CRAN (R 3.4.4)
#> tools 3.4.4 2018-04-21 local
#> udunits2 0.13 2016-11-17 CRAN (R 3.4.4)
#> units 0.5-1 2018-01-08 CRAN (R 3.4.4)
#> utils * 3.4.4 2018-04-21 local
#> withr 2.1.2 2018-04-21 Github (jimhester/withr@79d7b0d)
#> yaml 2.1.18 2018-03-08 CRAN (R 3.4.4)
Interesting that the flawed shapefile still seems to be the fastest, at least for these point on the docker set-up I'm on at the moment (apologies for mixing issues).
@Robinlovelace please start a new issue if you have one, or help with this one here.
@cmcaine : no, we interface gdal, and gdal provides names of geometries for those drivers for which this is meaningful. Even if it is not meaningful for shapefile or geojson, some of the other 91 interfaced vector formats may have meaningful names for geometry columns. Also, files to import, and hence sf objects, may have more than one geometry column, in which case giving one of them the name geometry would be pointless. Think of database tables (e.g. PostGIS), here geometry columns are named, and we inherit those names in R, so that we can make a clean roundtrip.
Sure, was trying to help by suggesting st_geometry() as a failsafe way and providing code that illustrated the issue - the benchmarking being an afterthought. I think the fact that common open source formats seem to be slower than shapefiles is an issue, but am not sure it merits opening an issue. Cheers for the explanation in any case, sounds like the answer is 'wontfix' or rather 'not broke in the first place'!
@Robinlovelace I've found st_geometry, but it doesn't really help for my use case. Thanks for mentioning it, though.
@edzer Cool, so sounds like the bug is in gdal for choosing different arbitrary names for geometry in formats for which there is not a meaningful name?
Are there any non-db backends that support multiple geometry columns?
If gdal does not provide a geometry name, sf sets it to geometry; I see no bug in gdal in this respect.
@cmcaine in case it helps see the reprex below which allows you to rbind two sf objects with different geometries. Would be interested to see other solutions:
library(sf)
#> Linking to GEOS 3.5.1, GDAL 2.2.2, proj.4 4.9.2
d = data.frame(v1 = 1:3, x = 1:3, y = 1:3)
x1 = st_as_sf(d, coords = c("x", "y"))
write_sf(x1, "/tmp/x.gpkg")
x2 = read_sf("/tmp/x.gpkg")
rbind(x, x2) # fail
#> Error in rbind(x, x2): object 'x' not found
x2 = st_sf(st_set_geometry(x2, NULL), geometry = st_geometry(x2))
rbind(x1, x2)
#> Simple feature collection with 6 features and 1 field
#> geometry type: POINT
#> dimension: XY
#> bbox: xmin: 1 ymin: 1 xmax: 3 ymax: 3
#> epsg (SRID): NA
#> proj4string: NA
#> v1 geometry
#> 1 1 POINT (1 1)
#> 2 2 POINT (2 2)
#> 3 3 POINT (3 3)
#> 4 1 POINT (1 1)
#> 5 2 POINT (2 2)
#> 6 3 POINT (3 3)
@Robinlovelace
I'd just been using rename() manually because it didn't come up much, but that's a nice name agnostic solution :)
@edzer,
I respect your decision not to interpret this behaviour as buggy. I'll just state my reasoning once clearly in case we misunderstand each other and leave it at that
It is regrettable that the mysql and gpkg drivers for gdal pick different default geometry column names ("SHAPE" and "geom"). Given that these names are not meaningful to users who have not manually set a name they should not be named differently to other geometries.
It also seems like gpkg only supports one geometry column per table anyway, so the geom column name is never interesting at the sf level. (I might be misunderstanding the spec)
If I were maintainer I would consider something like adding a flag to st_read/st_write to rename the columns and then do a deprecation cycle: vX: warning when not using flag that default behaviour will change in next version, vX+1: default behaviour changes. If people tend to ignore warnings then I'd make it an error not to include the flag for one version.
Anyway, thanks for your work on sf :)
Most helpful comment
@Robinlovelace
I'd just been using
rename()manually because it didn't come up much, but that's a nice name agnostic solution :)@edzer,
I respect your decision not to interpret this behaviour as buggy. I'll just state my reasoning once clearly in case we misunderstand each other and leave it at that
It is regrettable that the mysql and gpkg drivers for gdal pick different default geometry column names ("SHAPE" and "geom"). Given that these names are not meaningful to users who have not manually set a name they should not be named differently to other geometries.
It also seems like gpkg only supports one geometry column per table anyway, so the geom column name is never interesting at the sf level. (I might be misunderstanding the spec)
If I were maintainer I would consider something like adding a flag to st_read/st_write to rename the columns and then do a deprecation cycle: vX: warning when not using flag that default behaviour will change in next version, vX+1: default behaviour changes. If people tend to ignore warnings then I'd make it an error not to include the flag for one version.
Anyway, thanks for your work on sf :)