I am trying to dissolve multiple polygons into a single one, using the following code
library(data.table)
library(dplyr)
library(sf)
df_path <- fread("df_path.txt", stringsAsFactors = F)
sf.object.Q1 <- df_path[QQscore == "1", 1:3] %>%
st_as_sf(coords = c("lx", "ly")) %>%
group_by(ID) %>%
summarise(geometry = st_combine(geometry)) %>%
st_cast("POLYGON") %>%
ungroup() %>%
st_union()
It seems to be quite fast for small data set (i.e. when filtering for QQscore 1, 2, or 3),
with a run time between 1s to 25s, but it starts to be very slow when the dataset is quite large and complex, like when QQscore is 4 or 5.
sf.object.Q4 <- df_path[QQscore == "4", 1:3] %>%
st_as_sf(coords = c("lx", "ly")) %>%
group_by(ID) %>%
summarise(geometry = st_combine(geometry)) %>%
st_cast("POLYGON") %>%
ungroup() %>%
st_union()
Is this somehow expected due to the nature of my data, or can it be speed up?
Here my input data
https://drive.google.com/open?id=1mTTHM3_zZUzfkjpSOFu12YhG_9OAKPtY
The majority of the runtime (60% for QQscore == "3") is spent in the GEOS code that backs st_union(). Improving this would require improving GEOS (there are some possibilities), reducing the complexity of the polygons before unioning (st_simplify, if that is acceptable) or finding a different way to accomplish the task represented by this snippet.
Profile output attached, primarily as documentation for GEOS developers.
callgrind.out.25518.gz
Hi tried to use st_simplify before st_union, however I do not see any 'real' improvement
system.time(df_path[QQscore == "3", 1:3] %>%
st_as_sf(coords = c("lx", "ly")) %>%
group_by(ID) %>%
summarise(geometry = st_combine(geometry)) %>%
st_cast("POLYGON") %>%
ungroup() %>%
st_union())
user system elapsed
25.51 9.80 24.21
system.time(df_path[QQscore == "3", 1:3] %>%
st_as_sf(coords = c("lx", "ly")) %>%
group_by(ID) %>%
summarise(geometry = st_combine(geometry)) %>%
st_cast("POLYGON") %>%
ungroup() %>%
st_simplify() %>%
st_union())
user system elapsed
25.19 9.74 23.77
I guess the time saved for st_union when the complexity is reduced is taken up by the computing time for st_simplify
The default tolerance for st_simplify is zero; to get any benefit you'd need to specify a nonzero tolerance that's acceptable for your data.
How about using ms_simplify() from the rmapshaper package? It should behave better on polygons...
How about using
ms_simplify()from the rmapshaper package? It should behave better on polygons...
I was writing this exact same reply. I have used rmapshaper in the past and it works very well.
Does that dataset consist of 16,438 polygons containing a total of 3.3 M vertices? And they are all long skinny curved arcs which mostly all intersect? Then that's the problem - that is a seriously complex dataset. I'm surprised GEOS can even complete, let alone be performant.
What is the goal of performing the union? is there any other way to compute the desired result?
Also, since it looks like most polygons intersect most others, if the result could be computed it would have on the order of (# polygons)^2 holes - or about 100M holes! It's highly unlikely this could be computed or represented.
For comparison, the dataset with Q <= 2 has only 71 polygons and 14K vertices. The union is a polygon with about 14K vertices as well, but with 822 holes (which as expected is a similar order of magnitude to 71^2).
Since the polygons are already quite smooth, it's not surprising that simplifcation doesn't make much difference. And if my understanding of MapShaper is correct, it will only perform unions on non-overlapping coverages, which this dataset is definitely not.
This issue of performance could be solved once there sf and data.table become fully compatible. Once we have that, it dissolving polygons should be quite fast using a solution like df_path[, .(geometry = st_union(geometry)), by=ID]