Sf: problem with st_difference and st_sym_difference with polygons

Created on 9 Aug 2017  路  7Comments  路  Source: r-spatial/sf

Examples will be better than my explanations. I face two cases for which the results, in my opinion, should be different.

Sym_difference between two overlapping polygons shapefiles

Let's create two overlapping sets of polygons. I created them with library sp to show the difference with sf.

library(sp)
library(rgdal)
library(sf)
library(dplyr)
library(ggplot2)

# Create Polygons
p1 <- rbind(c(-180,-20), c(-140,55), c(-50, 0), c(-140,-60), c(-180,-20))
p2 <- rbind(c(-10,0), c(140,60), c(160,0), c(140,-55), c(-10,0))
data1 <- SpatialPolygonsDataFrame(SpatialPolygons(list(Polygons(list(Polygon(p1)), 1), Polygons(list(Polygon(p2)), 2))),
                                  data = data.frame(col1 = 1:2), match.ID = FALSE)

p3 <- rbind(c(-125,0), c(0,60), c(40,5), c(15,-45), c(-125,0))
data2 <- SpatialPolygonsDataFrame(SpatialPolygons(list(Polygons(list(Polygon(p3)), 3))), 
                                  data = data.frame(col2 = "test"), match.ID = FALSE)

overlapping_polygons

With library sp, the symdifference (+disaggregate) creates 3 polygons.

diff.sp <- rgeos::gSymdifference(data1,data2) %>% sp::disaggregate()

sp_symdifference

If I realise the same st_sym_difference (+st_cast) with library sf, I get 4 polygons.

# Transform as sf
data1.sf <- sf::st_as_sf(data1)
data2.sf <- sf::st_as_sf(data2)

diff.sim.sf <- st_sym_difference(data2.sf, data1.sf) %>%
  st_cast("POLYGON") %>%
  mutate(id = 1:n())

ggplot() +
  geom_sf(data = diff.sim.sf, aes(colour = factor(id), fill = factor(id)), alpha = 0.25)

st_symdifference

Indeed, the polygon in the middle is composed of two polygons, I suppose that polygon 'id=3' should be the polygon hole of the other polygon 'id=2'. But it not accounted as is with geom_sf. If I am correct, to be accounted as a hole in ggplot, coordinates need to be ordered anti-clockwise.
However, even if this drawing problem is solved, the visual output would be good but not the geometry. The sym_difference should give three polygons like in library sp and not four.
Of course, this also affect st_difference when realized on my data2 like this:

diff.yx.sf <- sf::st_difference(data2.sf, data1.sf)

ggplot() +
  geom_sf(data = diff.yx.sf, colour = "green", fill = "green", alpha = 0.25)

st_difference

sim_difference with multiple polygons included in other

Another case is the difference of one polygon included in an other one. Here, the st_difference seems to work as expected if there is only one polygon. However, when there are multiple polygons, the inner polygon is strangely accounted. This problem is similar than above as holes polygons seem to not be accounted as is.

# Create an inner polygon
p1.2 <- rbind(c(-150,0), c(-100,10), c(-60, -5))
p2.2 <- rbind(c(50,0), c(120,40), c(100,0), c(120,-20))
data1.2 <- SpatialPolygonsDataFrame(SpatialPolygons(list(Polygons(list(Polygon(p1.2)), 1), Polygons(list(Polygon(p2.2)), 2))),
                                  data = data.frame(col1 = 1:2), match.ID = FALSE)

data1.2.sf <- sf::st_as_sf(data1.2)

diff.1.2.sf <- sf::st_difference(data1.sf, data1.2.sf) %>%
  mutate(id = 1:n())

ggplot() +
  geom_sf(data = diff.1.2.sf, aes(colour = factor(id), fill = factor(col1.1)), alpha = 0.25)

st_inner_polygons_difference

As if there is only one polygon, the st_difference function is working as expected, I found a workaround as follows. However, this only works because there are as many holes as polygons. In a case of undefined number of intersections, we should probably repeat the operation for each pairs of polygons.

geom.diff <- purrr::map(1:nrow(data1.sf), ~st_difference(data1.sf[.x,], data1.2.sf[.x,]))
geom.diff.sf <- do.call(rbind, geom.diff)
geom.diff.sf <- st_cast(geom.diff.sf) %>%
  mutate(id = 1:n())

ggplot() +
  geom_sf(data = geom.diff.sf, aes(colour = factor(id), fill = factor(col1.1)), alpha = 0.25)

st_holes_difference_expected

I hope you can find a solution.
PS: Thank you for this package, this greatly renew the work with spatial data in R

Most helpful comment

@birbritto Sorry but a github issue is not the place to ask for help. An issue on github is more to present to the package maintainers that you found a bug or had a problem when using some functions on a specific case.
If you are stuck with some code you already tried, please use a website like https://stackoverflow.com/, where you will need to show a reproducible example of your data, show what code you tried and where you are stuck. By the way, for any question, a reproducible example will help people help you. Without, we can only guess what you really want.
If you did not try to write code, you may find your answer in the documentation. Maybe here: https://r-spatial.github.io/sf/ or here: https://geocompr.robinlovelace.net/

All 7 comments

Is this with the CRAN version, or the GitHub one?

On 9 August 2017 17:04:52 CEST, statnmap notifications@github.com wrote:

Examples will be better than my explanations. I face two cases for
which the results, in my opinion, should be different.

Sym_difference between two overlapping polygons shapefiles

Let's create two overlapping sets of polygons. I created them with
library sp to show the difference with sf.

library(sp)
library(rgdal)
library(sf)
library(dplyr)
library(ggplot2)

# Create Polygons
p1 <- rbind(c(-180,-20), c(-140,55), c(-50, 0), c(-140,-60),
c(-180,-20))
p2 <- rbind(c(-10,0), c(140,60), c(160,0), c(140,-55), c(-10,0))
data1 <-
SpatialPolygonsDataFrame(SpatialPolygons(list(Polygons(list(Polygon(p1)),
1), Polygons(list(Polygon(p2)), 2))),
                      data = data.frame(col1 = 1:2), match.ID = FALSE)

p3 <- rbind(c(-125,0), c(0,60), c(40,5), c(15,-45), c(-125,0))
data2 <-
SpatialPolygonsDataFrame(SpatialPolygons(list(Polygons(list(Polygon(p3)),
3))), 
                   data = data.frame(col2 = "test"), match.ID = FALSE)

overlapping_polygons

With library sp, the symdifference (+disaggregate) creates 3
polygons.

diff.sp <- rgeos::gSymdifference(data1,data2) %>% sp::disaggregate()

sp_symdifference

If I realise the same st_sym_difference (+st_cast) with library sf, I
get 4 polygons.

# Transform as sf
data1.sf <- sf::st_as_sf(data1)
data2.sf <- sf::st_as_sf(data2)

diff.sim.sf <- st_sym_difference(data2.sf, data1.sf) %>%
 st_cast("POLYGON") %>%
 mutate(id = 1:n())

ggplot() +
geom_sf(data = diff.sim.sf, aes(colour = factor(id), fill =
factor(id)), alpha = 0.25)

st_symdifference

Indeed, the polygon in the middle is composed of two polygons, I
suppose that polygon 'id=3' should be the polygon hole of the other
polygon 'id=2'. But it not accounted as is with geom_sf. If I am
correct, to be accounted as a hole in ggplot, coordinates need to be
ordered anti-clockwise.
However, even if this drawing problem is solved, the visual output
would be good but not the geometry. The sym_difference should give
three polygons like in library sp and not four.
Of course, this also affect st_difference when realized on my data2
like this:

diff.yx.sf <- sf::st_difference(data2.sf, data1.sf)

ggplot() +
geom_sf(data = diff.yx.sf, colour = "green", fill = "green", alpha =
0.25)

st_difference

sim_difference with multiple polygons included in other

Another case is the difference of one polygon included in an other one.
Here, the st_difference seems to work as expected if there is only
one polygon. However, when there are multiple polygons, the inner
polygon is strangely accounted. This problem is similar than above as
holes polygons seem to not be accounted as is.

# Create an inner polygon
p1.2 <- rbind(c(-150,0), c(-100,10), c(-60, -5))
p2.2 <- rbind(c(50,0), c(120,40), c(100,0), c(120,-20))
data1.2 <-
SpatialPolygonsDataFrame(SpatialPolygons(list(Polygons(list(Polygon(p1.2)),
1), Polygons(list(Polygon(p2.2)), 2))),
                      data = data.frame(col1 = 1:2), match.ID = FALSE)

data1.2.sf <- sf::st_as_sf(data1.2)

diff.1.2.sf <- sf::st_difference(data1.sf, data1.2.sf) %>%
 mutate(id = 1:n())

ggplot() +
geom_sf(data = diff.1.2.sf, aes(colour = factor(id), fill =
factor(col1.1)), alpha = 0.25)

st_inner_polygons_difference

As if there is only one polygon, the st_difference function is
working as expected, I found a workaround as follows. However, this
only works because there are as many holes as polygons. In a case of
undefined number of intersections, we should probably repeat the
operation for each pairs of polygons.

geom.diff <- purrr::map(1:nrow(data1.sf), ~st_difference(data1.sf[.x,],
data1.2.sf[.x,]))
geom.diff.sf <- do.call(rbind, geom.diff)
geom.diff.sf <- st_cast(geom.diff.sf) %>%
 mutate(id = 1:n())

ggplot() +
geom_sf(data = geom.diff.sf, aes(colour = factor(id), fill =
factor(col1.1)), alpha = 0.25)

st_holes_difference_expected

I hope you can find a solution.
PS: Thank you for this package, this greatly renew the work with
spatial data in R

--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/r-spatial/sf/issues/459

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

The github one. I just downloaded the last version.

You forgot about the sp::disaggregate equivalent in sf world. Does rgeos::gsymdifference have a byid argument? Think of that as set to TRUE by sf. Try combining the two polygons in data1 to a MULTIPOLYGON first, to get the sp behaviour you want.

On 9 August 2017 17:37:11 CEST, statnmap notifications@github.com wrote:

The github one. I just downloaded the last version.

--
You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
https://github.com/r-spatial/sf/issues/459#issuecomment-321293364

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

  • In the first example, I used st_cast("POLYGON")which did the equivalent of sp::disaggregate as far as I see.
  • rgeos::gSymdifference has a by_id argument, which is set to FALSE by default.

Transformation in multipolygons

I did new tests based on your remark.
There are two ways we can transform the polygons as Multipolygons. (1) Using st_combine(x) combines all polygons into a Multipolygon with a unique feature of multipolygon. This looses the attributes. (2) Using st_cast(x, "MULTIPOLYGON") transforms the polygon object as a multipolygon with multiple feature of multipolygons. This keeps the attributes.

Sym_difference with overlapping polygons

  • [OK] Combining into a unique Multipolygon using st_combine looses the attributes but work as rgeos. (By the way, rgeos also looses the attributes). Strangely, although this is a symetric difference, this works if the first one is st_combine and the second one is st_cast, but not inversely.
# Union of polygons
## Correct geometry but no attributes
data1.sf <- sf::st_as_sf(data1) %>% st_combine() %>% st_sf()
data2.sf <- sf::st_as_sf(data2) %>% st_cast("MULTIPOLYGON")

diff.sim.sf <- st_sym_difference(data2.sf, data1.sf) %>%
  st_cast("POLYGON") %>%
  mutate(id = 1:n())

ggplot() +
  geom_sf(data = diff.sim.sf, aes(colour = factor(id), fill = factor(id)), alpha = 0.25)

union_correct

  • [not OK] Transforming as Multipolygon using st_cast only does not have an effect.
## Keep attributes by wrong geometry
data1.sf <- sf::st_as_sf(data1) %>% st_cast("MULTIPOLYGON")
data2.sf <- sf::st_as_sf(data2) %>% st_cast("MULTIPOLYGON")

diff.sim.sf <- st_sym_difference(data2.sf, data1.sf) %>%
  st_cast("POLYGON") %>%
  mutate(id = 1:n())

ggplot() +
  geom_sf(data = diff.sim.sf, aes(colour = factor(id), fill = factor(id)), alpha = 0.25)

union_incorrect

Difference with inner polygons

  • [OK] st_combine needs to be applied to the inner polygons and it works as expected.
# Inner polygons
## Correct geometry with attributes
data1.sf <- sf::st_as_sf(data1)
data1.2.sf <- sf::st_as_sf(data1.2) %>% st_combine() %>% st_sf()

diff.1.2.sf <- sf::st_difference(data1.sf, data1.2.sf) %>%
  mutate(id = 1:n()) %>%
  st_cast('POLYGON')

ggplot() +
  geom_sf(data = diff.1.2.sf, aes(colour = factor(id), fill = factor(id)), alpha = 0.25)

inner_correct

  • [Not OK] When only using st_cast to transform as Multiple Multipolygon, the output is not the expected one
## Wrong geometry and keep attributes
data1.sf <- sf::st_as_sf(data1)
data1.2.sf <- sf::st_as_sf(data1.2) %>% st_cast("MULTIPOLYGON")

diff.1.2.sf <- sf::st_difference(data1.sf, data1.2.sf) %>%
  mutate(id = 1:n()) %>%
  st_cast('POLYGON')

ggplot() +
  geom_sf(data = diff.1.2.sf, aes(colour = factor(id), fill = factor(id)), alpha = 0.25)

inner_incorrect

Difference with inner and outer polygons

  • [OK] When having a combination of inner and outer polygons, the first layer needs to be st_cast as Multiple Multipolygon only, and the second layer needs to be st_combine as a unique Multipolygon. Because the first one is a Multiple Multipolygon, the attributes are kept.
## Create an inner and outer polygon
p3 <- rbind(c(-125,0), c(0,60), c(40,5), c(15,-45), c(-125,0))
p4 <- rbind(c(-120,-40), c(-130,-40), c(-130,-20), c(-120,-20), c(-120,-40))
data1.3 <- SpatialPolygonsDataFrame(SpatialPolygons(list(Polygons(list(Polygon(p3)), 1), Polygons(list(Polygon(p4)), 2))),
                                  data = data.frame(col1 = 1:2), match.ID = FALSE)

### The difference one is multipolygon
data1.sf <- sf::st_as_sf(data1) %>% st_cast("MULTIPOLYGON")
data1.3.sf <- sf::st_as_sf(data1.3) %>% st_combine() %>% st_sf()

diff.1.3.sf <- sf::st_difference(data1.sf, data1.3.sf) %>%
  mutate(id = 1:n()) %>%
  st_cast('POLYGON')

ggplot() +
  geom_sf(data = diff.1.3.sf, aes(colour = factor(id), fill = factor(id)), alpha = 0.25)

inner_outer_correct

So, in st_sym_difference, the first layer needs to be a st_combine, but in st_difference, this should be the second one. Don't you think, this should be clarified in the documentation ? Or maybe integrate this directly in the functions depending on cases ?

Please feel welcome to suggest PRs for improved documentation, regarding this issue.

I have a single polygon with 2 nexted inner polygons and I'm trying to split that one shape into 3 separate polygons. Is there a way to do this?

@birbritto Sorry but a github issue is not the place to ask for help. An issue on github is more to present to the package maintainers that you found a bug or had a problem when using some functions on a specific case.
If you are stuck with some code you already tried, please use a website like https://stackoverflow.com/, where you will need to show a reproducible example of your data, show what code you tried and where you are stuck. By the way, for any question, a reproducible example will help people help you. Without, we can only guess what you really want.
If you did not try to write code, you may find your answer in the documentation. Maybe here: https://r-spatial.github.io/sf/ or here: https://geocompr.robinlovelace.net/

Was this page helpful?
0 / 5 - 0 ratings

Related issues

duleise picture duleise  路  3Comments

matthewpaulking picture matthewpaulking  路  4Comments

kendonB picture kendonB  路  3Comments

faridcher picture faridcher  路  4Comments

kendonB picture kendonB  路  3Comments