Sf: Dissolving multiple detailed polygons seems to be very slow

Created on 18 Mar 2019  路  9Comments  路  Source: r-spatial/sf

I am trying to dissolve multiple polygons into a single one, using the following code

library(data.table)
library(dplyr)
library(sf)

df_path <- fread("df_path.txt", stringsAsFactors = F)

  sf.object.Q1 <- df_path[QQscore == "1", 1:3] %>%
    st_as_sf(coords = c("lx", "ly")) %>%
    group_by(ID) %>%
    summarise(geometry = st_combine(geometry)) %>%
    st_cast("POLYGON") %>%
    ungroup() %>%
    st_union()

It seems to be quite fast for small data set (i.e. when filtering for QQscore 1, 2, or 3),
with a run time between 1s to 25s, but it starts to be very slow when the dataset is quite large and complex, like when QQscore is 4 or 5.

  sf.object.Q4 <- df_path[QQscore == "4", 1:3] %>%
    st_as_sf(coords = c("lx", "ly")) %>%
    group_by(ID) %>%
    summarise(geometry = st_combine(geometry)) %>%
    st_cast("POLYGON") %>%
    ungroup() %>%
    st_union()

Is this somehow expected due to the nature of my data, or can it be speed up?

Here my input data
https://drive.google.com/open?id=1mTTHM3_zZUzfkjpSOFu12YhG_9OAKPtY

All 9 comments

The majority of the runtime (60% for QQscore == "3") is spent in the GEOS code that backs st_union(). Improving this would require improving GEOS (there are some possibilities), reducing the complexity of the polygons before unioning (st_simplify, if that is acceptable) or finding a different way to accomplish the task represented by this snippet.

Profile output attached, primarily as documentation for GEOS developers.
callgrind.out.25518.gz

Hi tried to use st_simplify before st_union, however I do not see any 'real' improvement

system.time(df_path[QQscore == "3", 1:3] %>%
  st_as_sf(coords = c("lx", "ly")) %>%
  group_by(ID) %>%
  summarise(geometry = st_combine(geometry)) %>%
  st_cast("POLYGON") %>%
  ungroup() %>%
  st_union())

   user  system elapsed 
  25.51    9.80   24.21 

system.time(df_path[QQscore == "3", 1:3] %>%
  st_as_sf(coords = c("lx", "ly")) %>%
  group_by(ID) %>%
  summarise(geometry = st_combine(geometry)) %>%
  st_cast("POLYGON") %>%
  ungroup() %>%
  st_simplify() %>%
  st_union())

   user  system elapsed 
  25.19    9.74   23.77 

I guess the time saved for st_union when the complexity is reduced is taken up by the computing time for st_simplify

The default tolerance for st_simplify is zero; to get any benefit you'd need to specify a nonzero tolerance that's acceptable for your data.

How about using ms_simplify() from the rmapshaper package? It should behave better on polygons...

How about using ms_simplify() from the rmapshaper package? It should behave better on polygons...

I was writing this exact same reply. I have used rmapshaper in the past and it works very well.

Does that dataset consist of 16,438 polygons containing a total of 3.3 M vertices? And they are all long skinny curved arcs which mostly all intersect? Then that's the problem - that is a seriously complex dataset. I'm surprised GEOS can even complete, let alone be performant.

What is the goal of performing the union? is there any other way to compute the desired result?

Also, since it looks like most polygons intersect most others, if the result could be computed it would have on the order of (# polygons)^2 holes - or about 100M holes! It's highly unlikely this could be computed or represented.

For comparison, the dataset with Q <= 2 has only 71 polygons and 14K vertices. The union is a polygon with about 14K vertices as well, but with 822 holes (which as expected is a similar order of magnitude to 71^2).

Since the polygons are already quite smooth, it's not surprising that simplifcation doesn't make much difference. And if my understanding of MapShaper is correct, it will only perform unions on non-overlapping coverages, which this dataset is definitely not.

This issue of performance could be solved once there sf and data.table become fully compatible. Once we have that, it dissolving polygons should be quite fast using a solution like df_path[, .(geometry = st_union(geometry)), by=ID]

Was this page helpful?
0 / 5 - 0 ratings

Related issues

happyshows picture happyshows  路  3Comments

kendonB picture kendonB  路  4Comments

dkyleward picture dkyleward  路  4Comments

kendonB picture kendonB  路  3Comments

dpprdan picture dpprdan  路  4Comments