Sf: Speeding up st_intersection

Created on 20 Jul 2018  路  4Comments  路  Source: r-spatial/sf

I noticed while doing some work with st_intersection that I could speed it up by applying st_intersects first and then only calling st_intersection on the pairs of geometries that actually intersect. In light of this, I tried implementing a fork of sf that builds this check into the st_intersection function.

I am a novice C++ programmer, so I'm sure my implementation isn't ideal, but I have linked it here around line 730.

Below is some reproducible example code to test the speed of the new version vs. the old version. The speed improvement isn't insane (~ 20%), but it's nontrivial. I am intersecting North Carolina with some 3mi by 3mi grids.

## Checking to see if new version of library is faster

# devtools is broken so you have to do this, sorry :/
library(devtools)
install_github("r-lib/devtools")
library(devtools)
install_github("ekarsten/sf", force = T)
library(sf)
library(dplyr)

utm_crs <- 32614
mile_2_meter <- 1609.344

nc <-
  system.file("shape/nc.shp", package="sf") %>%
  st_read() %>%
  st_transform(utm_crs) %>%
  select()

grids <- 
  nc %>%
  st_make_grid(cellsize = 3*c(mile_2_meter, mile_2_meter)) %>%
  st_sf(seq(1, length(.)), ., crs = utm_crs) %>%
  select(Grid = starts_with("X1"),
         geometry = ".")

# check-time
system.time({new = st_intersection(nc, grids)})

remove.packages("sf")
install.packages("sf")
library(sf)

# check-time
system.time({old = st_intersection(nc, grids)})

I am using GEOS 3.6.1, GDAL 2.2.3, proj.4 4.9.3
Session info:

R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.11.4    mapview_2.4.0        dplyr_0.7.6          sf_0.6-4             usethis_1.3.0        devtools_1.13.6.9000

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.17       lattice_0.20-35    png_0.1-7          class_7.3-14       assertthat_0.2.0   rprojroot_1.3-2    digest_0.6.15     
 [8] foreach_1.4.4      mime_0.5           R6_2.2.2           plyr_1.8.4         backports_1.1.2    stats4_3.5.1       e1071_1.6-8       
[15] httr_1.3.1         pillar_1.2.3       rlang_0.2.1        curl_3.2           rstudioapi_0.7     callr_2.0.4        raster_2.6-7      
[22] R.utils_2.6.0      R.oo_1.22.0        desc_1.2.0         webshot_0.5.0      rgdal_1.3-3        htmlwidgets_1.2    munsell_0.5.0     
[29] shiny_1.1.0        compiler_3.5.1     httpuv_1.4.4.2     pkgconfig_2.0.1    base64enc_0.1-3    pkgbuild_1.0.0     htmltools_0.3.6   
[36] tidyselect_0.2.4   tibble_1.4.2       codetools_0.2-15   viridisLite_0.3.0  crayon_1.3.4       withr_2.1.2        later_0.7.3       
[43] R.methodsS3_1.7.1  grid_3.5.1         jsonlite_1.5       spData_0.2.9.0     satellite_1.0.1    xtable_1.8-2       DBI_1.0.0         
[50] git2r_0.22.1       magrittr_1.5       units_0.6-0        scales_0.5.0       cli_1.0.0          promises_1.0.1     leaflet_2.0.1     
[57] bindrcpp_0.2.2     sp_1.3-1           testthat_2.0.0     iterators_1.0.9    tools_3.5.1        gdalUtils_2.0.1.14 glue_1.2.0        
[64] purrr_0.2.5        crosstalk_1.0.0    processx_3.1.0     pkgload_1.0.0      yaml_2.1.19        colorspace_1.3-2   classInt_0.2-3    
[71] memoise_1.1.0      knitr_1.20         bindr_0.1.1 

All 4 comments

Thanks, this makes a lot of sense; postgis also benefits from a st_intersects before an st_intersection.

I think to make it easier for @edzer to review your changes would be to squash the commits and create a PR.

Thanks for the feedback. I squashed into one commit and submitted PR #803.

I haven't had time to investigate, but I suspect similar speed gains could be achieved in st_difference by performing similar checks (but maybe not) if two objects don't intersect, then st_difference should just return the first, if they do, maybe call st_within to see if st_difference shouldn't return output and then finally call st_difference if a new geometry needs to be returned.

I think you're right. It looks like another layer, after indexing, to speedup computations. I wonder when or if it slows it down in certain cases. I guess looking at the postgis execution plans for these operations should be pretty informative.

Quick follow-up on this, did this get implemented @ekarsten? It seems from #803 that the PR didn't go in. Could be very useful for my work at the moment! Thanks.

Was this page helpful?
0 / 5 - 0 ratings