Sf: Make st_voronoi() retain the point order

Created on 16 Apr 2020  路  10Comments  路  Source: r-spatial/sf

I have a very simple example in which I simulate 100 points, then create the Voronoi polygons associated with this partition and assigne it to another column. My problem is that the order of the polygons created by st_voronoi won't have the same order than the points used in argument.

library(sf)
library(data.table)
library(ggplot2)

set.seed(20200415L)

border <- st_sfc(st_polygon(list(matrix(c(0, 0, 1, 1, 0, 0, 1, 1, 0, 0), ncol=2))))

data <- data.table(
  point = st_sfc(lapply(replicate(100, runif(2), simplify=FALSE), st_point))
)

data[, polygon:=st_intersection(st_collection_extract(st_voronoi(do.call(c, point))), border)]

sample_data <- data[sample(1:.N, 10)]

sample_data[]
##                             point polygon
##   1:  POINT (0.4209446 0.4083639)    <XY>
##   2:    POINT (0.0919243 0.40189)    <XY>
##   3:  POINT (0.7040549 0.1799017)    <XY>
##   4:  POINT (0.5056223 0.7427985)    <XY>
##   5:   POINT (0.2678713 0.343384)    <XY>
##   6:  POINT (0.9404894 0.6390899)    <XY>
##   7:   POINT (0.188111 0.2670979)    <XY>
##   8:  POINT (0.5343007 0.3091294)    <XY>
##   9: POINT (0.01642705 0.8337068)    <XY>
##  10: POINT (0.4475015 0.05526273)    <XY>

ggplot(sample_data) +
  geom_sf(
    mapping = aes(
      geometry = point
    )
  ) +
  geom_sf(
    mapping = aes(
      geometry = polygon
    ),
    fill = NA
  ) +
  theme_minimal()

image

As one could see with the image above, the point and the polygon are not associated correctly and this is due to the st_voronoi function which do not preserve the order of the points.

Is it possible to improve this on your side or my only solution is to wrap st_voronoi into a personal function that will reorder the polygon to follow the point pattern correctly?


sessionInfo()
##  R version 3.5.0 (2018-04-23)
##  Platform: x86_64-apple-darwin15.6.0 (64-bit)
##  Running under: macOS  10.15.4
##  
##  Matrix products: default
##  BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
##  LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
##  
##  locale:
##  [1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
##  
##  attached base packages:
##  [1] stats     graphics  grDevices utils     datasets  methods   base     
##  
##  other attached packages:
##  [1] ggplot2_3.0.0     data.table_1.11.4 sf_0.9-2         
##  
##  loaded via a namespace (and not attached):
##   [1] Rcpp_1.0.1         rstudioapi_0.7     magrittr_1.5       units_0.6-2       
##   [5] tidyselect_0.2.5   munsell_0.5.0      colorspace_1.3-2   R6_2.2.2          
##   [9] rlang_0.4.5        plyr_1.8.4         dplyr_0.8.0.1      tools_3.5.0       
##  [13] grid_3.5.0         gtable_0.2.0       KernSmooth_2.23-15 e1071_1.7-1       
##  [17] DBI_1.0.0          withr_2.1.2        class_7.3-14       assertthat_0.2.0  
##  [21] lazyeval_0.2.1     tibble_2.1.1       crayon_1.3.4       purrr_0.3.3       
##  [25] glue_1.3.1         compiler_3.5.0     pillar_1.3.1       scales_1.0.0      
##  [29] classInt_0.4-3     pkgconfig_2.0.2   

Most helpful comment

I think GEOS should retain the order when returning the polygons. The only reason I can think of for not doing so is ambiguity in the case of duplicates. But I like @jplecavalier 's suggestion:

to only match the order of points and polygons when the length of both is the same.

I don't see that this functionality has ever been requested/ticketed in GEOS, so I opened one at https://trac.osgeo.org/geos/ticket/1030#ticket

All 10 comments

This is a "feature" of the implementation of Voronoi polygons in JTS/GEOS. You should simplify your notation, neither ggplot2 nor data.table are needed. Do everything step by step.

plot(st_geometry(border))
plot(data$polygon[1:10,], add=TRUE)
plot(data$point[1:10,], add=TRUE)

My approach (not putting the two sfc objects in the same object, which assumes that they are sorted in the same way), is:

library(sf)
border <- st_sfc(st_polygon(list(matrix(c(0, 0, 1, 1, 0, 0, 1, 1, 0, 0), ncol=2))))
set.seed(20200415L)
point <- st_sfc(lapply(replicate(100, runif(2), simplify=FALSE), st_point))
polygon <- st_intersection(st_collection_extract(st_voronoi(do.call(c, point))), border)
o <- unlist(st_intersects(point, polygon)) # find the unique sort of polygons to points
plot(border)
plot(point[1:10], add=TRUE)
plot(polygon[1:10], add=TRUE) # start from west
plot(polygon[o][1:10], border="red", add=TRUE)

@rsbivand I'm sorry for the use of data.table and ggplot2, I just tried to make the example the more visual as possible. Thanks for simplifying my example using base-R only functions. The point is that we both observe the same behaviour of st_voronoi.

I did the same as you to reorder the polygon according to the point order, but it is time consuming since I work with A LOT of points. I still feel there is no good reason to reorder the polygons when outputting from st_voronoi.

The only reason I see is, what if more than one points are exactly at the same coordinates. In that case, these points would share the same Voronoi polygon. In that particular case, there would not be the same number of polygons than points, thus it would not be possible to match the order perfectly. The two solutions I see would be to throw a warning (about duplicate points) and remove the duplicated points before performing tessellation OR to only match the order of points and polygons when the length of both is the same.

Not a big problem that you use complicating extra packages, it just makes it a lot harder to debug problems.

Wrt. identical points - you'll find that many problems in computational geometry get much harder for such duplicated observations. The excellent deldir package drops them silently:

library(deldir)
set.seed(1)
x <- runif(50)
y <- runif(50)
o <- deldir(x, y)
plot(o)
str(o)
o1 <- deldir(c(x, x[1:10]), c(y, y[1:10]))
str(o1)

This should really be done prior to running the function. A method of st_as_sf for deldir objects might be fun to write. voronoiPolygons in the cholera package might help here, https://cran.r-project.org/web/packages/cholera/vignettes/tiles.polygons.html.

So, my dream of having an optional keep_order argument in st_voronoi is impossible since the primary goal is to replicate the exact behaviour of GEOS I guess?

Personally, I do not trust JTS/GEOS and trust deldir more on this. A lot more work would be needed to eliminate duplicates, then re-order, and possibly copy geometries out to the duplicated points in any case. You have not described a use case - that in deldir is point pattern, so silently dropping duplicates makes sense (or throwing an error, as in other analytical functions). st_voronoi simply does what GEOS does, which is to create the computational geometry tesselation, which it does, but not preserving the order.

You could contribute the code to set aside duplicates, noting which retained point is their equal, run st_voronoi, run st_intersects on the unique input points and the output polygons, re-order the output polygons, and add back the duplicate points/rows with copies of the polygon for their included equal point (to facilitate subsequent possible subsetting). All of this can be done in R, not compiled code, and only requires care and handling edge cases.

Yes, all of this can be done in R easily. Since it won't be integrated in the st_voronoi interface, I'm not sure this should be done in sf. I'll suggest @etiennebr to integrate this in his experimental geotidy, I think my suggestion is more aligned with the motivation of this package.

Thanks @jplecavalier, I agree things should be aligned! As @rsbivand mentioned, it is GEOS that returns the set unordered. Apparently, we're not alone: https://trac.osgeo.org/postgis/ticket/4392

Note that you could use st_union to merge points (rather than c or st_combine) since this would remove duplicates. I would be really happy to look at it within geotidy.

If your data is very large, maybe it might be worth using a database. Since this solution is not very well documented, and little used, here's an example using a docker container (and only ten points). Note that the SQL code is not prettier than the R code here.

# setup
system2("docker run --rm -p 25432:5432 -d -t kartoza/postgis")
Sys.sleep(10)

con <- DBI::dbConnect(
  RPostgres::Postgres(), 
  host = "localhost", 
  port = 25432, 
  dbname = "gis", 
  user = "docker",
  password= "docker"
)
st_write(data, con, layer = "data")
st_write(border, con, layer = "border")

q <- "
SELECT point as geom, voronoi
FROM
  (SELECT (st_dump(ST_VoronoiPolygons(g.geom, 0.001, b.geom))).geom as voronoi
  FROM 
    (SELECT st_collect(point) as geom FROM data) as g, border as b) as v, 
    data
WHERE st_intersects(point, voronoi)"

x <- st_read(con, query = q)

x %>% 
  mutate(
    row = row_number()
  ) %>% 
  ggplot() +
  geom_sf(aes(geometry = voronoi)) +
  geom_sf(aes(geometry = geom)) +
  facet_wrap(~row)

image

I think GEOS should retain the order when returning the polygons. The only reason I can think of for not doing so is ambiguity in the case of duplicates. But I like @jplecavalier 's suggestion:

to only match the order of points and polygons when the length of both is the same.

I don't see that this functionality has ever been requested/ticketed in GEOS, so I opened one at https://trac.osgeo.org/geos/ticket/1030#ticket

Thanks! Follow-ups on the referenced GEOS tickets.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

faridcher picture faridcher  路  4Comments

gregmacfarlane picture gregmacfarlane  路  4Comments

dkyleward picture dkyleward  路  4Comments

kendonB picture kendonB  路  3Comments

jsta picture jsta  路  4Comments