Sf: New representation of crs objects

Created on 22 Dec 2019  路  7Comments  路  Source: r-spatial/sf

TL;DR:

  • current crs objects are hard to move to the post-proj4string world we now live in
  • I suggest to have crs objects with two fields: input for user input / short description, and wkt for communicating to/from GDAL
  • this still supports specifying CRS by EPSG ID or proj4string, but also with 8 other CRS formats, widening the interoperability dramatically

Currently, crs objects have two fields: epsg (integer) and proj4string (character). Branch wkt2 added a third, wkt2 (character) with the WKT-2 representation. This tries to merge the PROJ.4 world view. where CRS are expressed as proj4strings, and the PROJ ( >= 6) world, where proj4strings are considered legacy (if not to be deprecated). As expected, this turned out to be very messy; see #1146.

Spawning from branch wkt2, I created a branch SetFromUserInput which rethinks crs objects. In short, in this branch crs objects have two character fields:

  • input: the string with which the object was initialised by the user, or the string that characterizes the CRS briefly when an object was read through GDAL; it is meant to be the human-readable version of the CRS (as opposed to WKT).
  • wkt: the wkt (or wkt2 in GDAL3/PROJ6) description of the CRS, which is meant to be communicate the entire CRS to and from GDAL

Only input is ever to be set by users, e.g. as in
```{r}

st_crs(4326)
Coordinate Reference System:
User input: EPSG:4326
wkt:
GEOGCRS["WGS 84",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["geodetic latitude (Lat)",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["geodetic longitude (Lon)",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
USAGE[
SCOPE["unknown"],
AREA["World"],
BBOX[-90,-180,90,180]],
ID["EPSG",4326]]

(where 4326 is converted to `EPSG:4326`) and input is passed on to [OGRSpatialReference::SetFromUserInput](https://gdal.org/doxygen/classOGRSpatialReference.html#aec3c6a49533fe457ddc763d699ff8796). This means that not only proj4strings and EPSG IDs can be input, but also WKT(2), OGC urns, PROJJSON and even non-EPSG CRS's (in total 10 formats), e.g.:
```r
> st_crs("urn:ogc:def:crs:EPSG::4326")
Coordinate Reference System:
  User input: urn:ogc:def:crs:EPSG::4326 
  wkt:
GEOGCRS["WGS 84",
    DATUM["World Geodetic System 1984",
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    USAGE[
        SCOPE["unknown"],
        AREA["World"],
        BBOX[-90,-180,90,180]],
    ID["EPSG",4326]]

This broadens the scope of crs objects in terms of interoperability, can also work with GDAL 2, and allows everyone to move away from proj4strings on their own pace. For convenience, the $ method for crs objects does a bit more than extracing input or wkt, but can generate some fields that help backward compatibility:

 > x = st_crs("urn:ogc:def:crs:EPSG::4326")
> x$epsg # now a character:
[1] "4326"
> x$proj4string # generated on-the-fly
[1] "+proj=longlat +datum=WGS84 +no_defs"

What messes up backward compatibility to reverse dependencies is stored legacy objects, e.g. in package tmap, and possibly data packages such as bcmaps. E.g.

> library(bcmaps)
Loading required package: sf
Linking to GEOS 3.8.0, GDAL 3.0.2, PROJ 6.2.1
> bc_bound <- get_layer("bc_bound")
> str(st_crs(bc_bound))
List of 2
 $ epsg       : int 3005
 $ proj4string: chr "+proj=aea +lat_1=50 +lat_2=58.5 +lat_0=45 +lon_0=-126 +x_0=1000000 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +"| __truncated__
 - attr(*, "class")= chr "crs"
> st_crs(bc_bound)
Coordinate Reference System:
  User input: +proj=aea +lat_1=50 +lat_2=58.5 +lat_0=45 +lon_0=-126 +x_0=1000000 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
  wkt:
BOUNDCRS[
... # long text skipped
]Warning messages:
1: In `$.crs`(x, "input") : old-style crs object found: please update code
2: In `$.crs`(x, "input") : old-style crs object found: please update code
3: In `$.crs`(x, "wkt") : old-style crs object found: please update code
4: In `$.crs`(x, "wkt") : old-style crs object found: please update code

These should ideally have been generated at package load rather than stored, to be robust against changes like this. I might be able to do an auto-conversion + warning for backward compatibility in packages that I control and that use them (sf, lwgeom).

This is related to #1033, #1146 and #1187.

breaking change

Most helpful comment

Branch SetFromUserInput has now been merged into master. CRAN release planned within 2 weeks.

All 7 comments

My plan is to merge this into branch wkt2, merge that into master, and release when CRAN returns, after managing reverse dependencies fallout.

This looks like a great idea @edzer. What would be the best approach for data packages such as bcmaps? Just recreate the stored objects with the new version of sf?

Thanks Andy! In the end I guess yes; a more flexible solution that may work with both old and new sf could be to have bcmaps::get_layer set the crs with a call to st_set_crs(3005). But I still need to look into all rev deps.

I agree; packages with canned data can revise defensively in sf and sp workflows by re-creating the crs/CRS on load. Please explore how this might be done, it'll be very useful in giving other packages in the same position a working example. I'm unsure how to add an R command to a straight data() or particularly lazy-loaded data, but with a get_...() function, it should be feasible. load() from GADM isn't obvious, as GADM needs to serve to old and new versions of sf (and sp).

Oh that's a more flexible idea... thanks!

Implemented in bcmaps here. Seems simple, but will test with dev sf whenever it's ready...

Branch SetFromUserInput has now been merged into master. CRAN release planned within 2 weeks.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

faridcher picture faridcher  路  4Comments

thiagoveloso picture thiagoveloso  路  3Comments

dpprdan picture dpprdan  路  4Comments

gregmacfarlane picture gregmacfarlane  路  4Comments

kendonB picture kendonB  路  3Comments