Sf: st_simplify behaviour

Created on 12 Jun 2017  路  19Comments  路  Source: r-spatial/sf

Per discussion at gis_se - https://gis.stackexchange.com/questions/243569/simplify-polygons-of-sf-object

I've been wondering for a while if 'topology preserving' means the same thing to different people. Thus far, the only simplification algorithm I'd consider properly topology preserving (as in, keeps interlocking polygons interlocking after simplification) is the one rmapshaper::ms_simplify uses by default. Whatever st_simplify uses, it can't seem to manage the same thing for the test datasets I've used (see test data here). Not sure if that's expected behaviour or not?

Most helpful comment

So I got curious about using topology in postGIS, which I had never done before. A brief example of loading an sf object into postgis, creating topologies, simplifying, and reading back into R:

library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.1.3, PROJ 4.9.3
library(odbc)
library(RPostgreSQL)
#> Loading required package: DBI

nc <- read_sf(system.file("shape/nc.shp", package = "sf")) %>% 
   st_transform(2163) # transform to projected crs with units in metres

nc <- setNames(nc, tolower(names(nc)))

plot(nc[, "name"])

conn <- DBI::dbConnect(RPostgreSQL::PostgreSQL(), 
                  dbname = "postgis",
                  user = "postgres", 
                  host = "localhost", 
                  port = 5432)

# Write nc to the db
st_write(nc, conn, "nc")
#> [1] TRUE

# Enaable topology in the postgres db
dbExecute(conn, "CREATE EXTENSION IF NOT EXISTS postgis_topology;")
#> [1] 0

# Create an empty topology schema with the crs of the input (nc)
dbExecute(conn, "SELECT topology.CreateTopology('nc_topology', 2163);")
#> [1] -1

# Create a table to hold the topology
dbExecute(conn, "CREATE TABLE nc_topo(cnty_id integer primary key);")
#> [1] 0

# Add a topogeometry column to the nc_topo table
dbExecute(conn, 
   "SELECT topology.AddTopoGeometryColumn('nc_topology', 'public', 'nc_topo', 
                                      'topo', 'MULTIPOLYGON');"
)
#> [1] -1

# load the original geometry into the newly created table as topogeom 
# (zero snapping tolerance set here, you can adjust)
dbExecute(conn, 
  "INSERT INTO nc_topo(cnty_id, topo) 
   SELECT cnty_id, topology.toTopoGeom(geometry, 'nc_topology', 1) 
   FROM nc;"
)
#> [1] 100

# Add a new geometry column to the original table to hold the simplified geometry
dbExecute(conn, "ALTER TABLE nc ADD COLUMN geom_simp geometry;")
#> [1] 0

# Simplify the geometry (10km tolerance) and load it into the newly created 
# geometry column in the original table
dbExecute(
   conn, 
   "UPDATE nc
    SET geom_simp = st_simplify(t.topo, 10000)
    FROM nc_topo t
    WHERE nc.cnty_id = t.cnty_id;"
)
#> [1] 100

# Read it back, plot, and compare sizes to original
nc_simp <- read_sf(conn, query = "SELECT cnty_id, name, geom_simp as geometry from nc;") %>% 
   st_sf(sf_column_name = "geometry")

plot(nc_simp[, "name"])

object.size(nc)
#> 142912 bytes
object.size(nc_simp)
#> 97992 bytes

Created on 2018-12-11 by the reprex package (v0.2.1)

All 19 comments

FWIW, on the data set below the topology preserving behaviour breaks between tol of 1,000 and 10,000 (I was going up in orders of magnitude; I can count!)

download.file("https://borders.ukdataservice.ac.uk/ukborders/easy_download/prebuilt/shape/England_gor_2011.zip",
              destfile = "regions.zip")
unzip("regions.zip")

regions <- sf::read_sf("england_gor_2011.shp")
pryr::object_size(regions)

regions_1k <- sf::st_simplify(regions, preserveTopology = TRUE, dTolerance = 1000)
pryr::object_size(regions_1k)

plot(regions_1k)

regions_10k <- sf::st_simplify(regions, preserveTopology = TRUE, dTolerance = 10000)
pryr::object_size(regions_10k)
plot(regions_10k)

I believe it is the underlying GEOS simplification algorithm that doesn't preserve topology in neighbouring polygons - and I believe both sf::st_simplify and rgeos::gSimplify use the same GEOS algorithm.

rmapshaper::ms_simplify uses the algorithm implemented in mapshaper, for which I can take no credit - I just wrapped it in R.

Also, sf support is coming to rmapshaper soon. I'm working on it here. So as not to hijack the issue in this repo, please open an issue over there if you have questions.

Yes, like geos::gSimplfy, sf::st_simplify does a simplification on a per-geometry basis, meaning that any topology of pairs of geometries might change. Also, as the gSimplify docs describe, simplified features may become non-simple or non-valid. Will add to the docs todo list.

This works, assuming that GRASS is installed, and here using the very untried sf version of rgrass7:

library(sf)
regions <- sf::read_sf("england_gor_2011.shp")
regions_1k <- sf::st_simplify(regions, preserveTopology = TRUE, dTolerance = 1000)
plot(regions_1k)
regions_10k <- sf::st_simplify(regions, preserveTopology = TRUE, dTolerance = 10000)
plot(regions_10k)
devtools::install_github("rsbivand/rgrass7")
library(rgrass7sf)
td <- tempdir()
library(maptools)
SG <- Sobj_SpatialGrid(as(regions, "Spatial"))$SG
initGRASS("/home/rsb/topics/grass/g720/grass-7.2.0", td, SG)
writeVECT(regions, "regions", v.in.ogr_flags="o")
execGRASS("v.generalize", input="regions", type="area", output="regions10k", threshold=10000, method="douglas")
regions_10k_GRASS <- readVECT("regions10k")
plot(regions_10k_GRASS)

yielding (without the CRS here for briefness, initGRASS needs fixing):

screenshot from 2017-06-12 22-21-37

I've been trying to figure out what the preserve topology flag in gSimplify or [st_simplify0(https://r-spatial.github.io/sf/reference/geos_unary.html) even does. It seems like has more to do with the linear topology of line or ring nodes rather than the topology of feature geometries?

I have found that the ms_simplify function does preserve feature geometry edge topology and would point anyone who lands here over there for that kind of functionality.

Might I suggest an update to the documentation of st_simplify to make it clear that preserveTopology = TRUE will not keep feature-coverages intact? Also, seems like that should be available from sf() no?

See https://github.com/r-spatial/sf/commit/bb53f029341c15fb4f58fd2f613e6be9ca2ac9fe.

I agree that that would be nice to have, but I don't see how the underlying machinery (GEOS, GDAL) can provide this.

Thanks for adding that to the docs. I guess we can continue to rely on rmapshaper as a CRAN available solution.

Yes; some further discussion here.

Further: this suggests PostGIS, which implies that this should be possible by interfacing more liblwgeom code through package lwgeom.

Ahh there it is! I'd been looking for that from PostGIS. TopoGeometry looks to be the ticket. An explicit type for feature coverages makes a lot of sense. I've always been uncomfortable with the typical (modern) GIS practice of treating a coverage of features the same as any old feature collection. Arc/Info had a feature coverage but I think the data type has been watered down in more recent days. I suppose this may go a bit beyond the expected scope of sf? Would be useful in lwgeom.

Yes, the workflow using GRASS and rgrass works, but typically needs post-processing to handle invalid geometries, or artefacts (odd points) created when reading simple features into GRASS from file, which then do not generalise with v.generalize, and need to be removed on return. Examples with slides and code/data from a course last week (slides 7-16). There are still oddities in the output (some fields of municipality 3020033 lie in 3012052 in the input, and in the same area (southern Wielkopolska) part of a wood drops out of all municipalities. Thanks to @tim-salabim for mapview, makes finding anomalies fun!

@dblodgett-usgs I see sf and lwgeom as one and the same project (spatial vector data in R by simple features), with sf holding the classes + methods + glue to GEOS and GDAL, and lwgeom just the glue to liblwgeom. I remember looking at the topogeometry stuff in liblwgeom more than once, but found it hard to comprehend.

The core issue is that the Simple Features spec simply doesn't have a concept of topology (shared boundaries and vertices between neighbouring features). It's great that the spatial world has largely embraced a standard for vector data (Simple Features), but this is a major limitation of it. Presumably because of the ubiquity of web mapping and the data size/performance constraints that go with it, there has been a lot of work in the javascript world around this with topoJSON and related projects such as mapshaper. I believe @mdsumner has been doing a lot of thinking about this on the R side.

It looks like the topology module for postGIS lives here: https://github.com/postgis/postgis/tree/svn-trunk/topology

Thanks @ateucher ; this is the figure in the ER directory there:
topology

# 馃く

So I got curious about using topology in postGIS, which I had never done before. A brief example of loading an sf object into postgis, creating topologies, simplifying, and reading back into R:

library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.1.3, PROJ 4.9.3
library(odbc)
library(RPostgreSQL)
#> Loading required package: DBI

nc <- read_sf(system.file("shape/nc.shp", package = "sf")) %>% 
   st_transform(2163) # transform to projected crs with units in metres

nc <- setNames(nc, tolower(names(nc)))

plot(nc[, "name"])

conn <- DBI::dbConnect(RPostgreSQL::PostgreSQL(), 
                  dbname = "postgis",
                  user = "postgres", 
                  host = "localhost", 
                  port = 5432)

# Write nc to the db
st_write(nc, conn, "nc")
#> [1] TRUE

# Enaable topology in the postgres db
dbExecute(conn, "CREATE EXTENSION IF NOT EXISTS postgis_topology;")
#> [1] 0

# Create an empty topology schema with the crs of the input (nc)
dbExecute(conn, "SELECT topology.CreateTopology('nc_topology', 2163);")
#> [1] -1

# Create a table to hold the topology
dbExecute(conn, "CREATE TABLE nc_topo(cnty_id integer primary key);")
#> [1] 0

# Add a topogeometry column to the nc_topo table
dbExecute(conn, 
   "SELECT topology.AddTopoGeometryColumn('nc_topology', 'public', 'nc_topo', 
                                      'topo', 'MULTIPOLYGON');"
)
#> [1] -1

# load the original geometry into the newly created table as topogeom 
# (zero snapping tolerance set here, you can adjust)
dbExecute(conn, 
  "INSERT INTO nc_topo(cnty_id, topo) 
   SELECT cnty_id, topology.toTopoGeom(geometry, 'nc_topology', 1) 
   FROM nc;"
)
#> [1] 100

# Add a new geometry column to the original table to hold the simplified geometry
dbExecute(conn, "ALTER TABLE nc ADD COLUMN geom_simp geometry;")
#> [1] 0

# Simplify the geometry (10km tolerance) and load it into the newly created 
# geometry column in the original table
dbExecute(
   conn, 
   "UPDATE nc
    SET geom_simp = st_simplify(t.topo, 10000)
    FROM nc_topo t
    WHERE nc.cnty_id = t.cnty_id;"
)
#> [1] 100

# Read it back, plot, and compare sizes to original
nc_simp <- read_sf(conn, query = "SELECT cnty_id, name, geom_simp as geometry from nc;") %>% 
   st_sf(sf_column_name = "geometry")

plot(nc_simp[, "name"])

object.size(nc)
#> 142912 bytes
object.size(nc_simp)
#> 97992 bytes

Created on 2018-12-11 by the reprex package (v0.2.1)

A few notes on the above:

  1. I tried it with the Engand regions in the original example, and the conversion to a topology took a VERY long time. I didn't time it, but I would bet close to an hour! I've seen around that is notoriously slow.
  2. The algorithm used by ST_Simplify is the Douglas-Peuker algorithm. maphaper (and thus rmapshaper::ms_simplify() uses the Visvalingam algorithm, which in my humble opinion does a much nicer job.

And contrast with sf::st_simplify:

library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.1.3, PROJ 4.9.3

nc <- read_sf(system.file("shape/nc.shp", package = "sf")) %>% 
   st_transform(2163) # transform to projected crs with units in metres

nc_simp_no_topo <- st_simplify(nc, TRUE, 10000)
plot(nc_simp_no_topo[, "NAME"])

Created on 2018-12-11 by the reprex package (v0.2.1)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

kendonB picture kendonB  路  4Comments

tiernanmartin picture tiernanmartin  路  3Comments

Nosferican picture Nosferican  路  3Comments

kendonB picture kendonB  路  4Comments

gregmacfarlane picture gregmacfarlane  路  4Comments