I noticed while doing some work with st_intersection that I could speed it up by applying st_intersects first and then only calling st_intersection on the pairs of geometries that actually intersect. In light of this, I tried implementing a fork of sf that builds this check into the st_intersection function.
I am a novice C++ programmer, so I'm sure my implementation isn't ideal, but I have linked it here around line 730.
Below is some reproducible example code to test the speed of the new version vs. the old version. The speed improvement isn't insane (~ 20%), but it's nontrivial. I am intersecting North Carolina with some 3mi by 3mi grids.
## Checking to see if new version of library is faster
# devtools is broken so you have to do this, sorry :/
library(devtools)
install_github("r-lib/devtools")
library(devtools)
install_github("ekarsten/sf", force = T)
library(sf)
library(dplyr)
utm_crs <- 32614
mile_2_meter <- 1609.344
nc <-
system.file("shape/nc.shp", package="sf") %>%
st_read() %>%
st_transform(utm_crs) %>%
select()
grids <-
nc %>%
st_make_grid(cellsize = 3*c(mile_2_meter, mile_2_meter)) %>%
st_sf(seq(1, length(.)), ., crs = utm_crs) %>%
select(Grid = starts_with("X1"),
geometry = ".")
# check-time
system.time({new = st_intersection(nc, grids)})
remove.packages("sf")
install.packages("sf")
library(sf)
# check-time
system.time({old = st_intersection(nc, grids)})
I am using GEOS 3.6.1, GDAL 2.2.3, proj.4 4.9.3
Session info:
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.11.4 mapview_2.4.0 dplyr_0.7.6 sf_0.6-4 usethis_1.3.0 devtools_1.13.6.9000
loaded via a namespace (and not attached):
[1] Rcpp_0.12.17 lattice_0.20-35 png_0.1-7 class_7.3-14 assertthat_0.2.0 rprojroot_1.3-2 digest_0.6.15
[8] foreach_1.4.4 mime_0.5 R6_2.2.2 plyr_1.8.4 backports_1.1.2 stats4_3.5.1 e1071_1.6-8
[15] httr_1.3.1 pillar_1.2.3 rlang_0.2.1 curl_3.2 rstudioapi_0.7 callr_2.0.4 raster_2.6-7
[22] R.utils_2.6.0 R.oo_1.22.0 desc_1.2.0 webshot_0.5.0 rgdal_1.3-3 htmlwidgets_1.2 munsell_0.5.0
[29] shiny_1.1.0 compiler_3.5.1 httpuv_1.4.4.2 pkgconfig_2.0.1 base64enc_0.1-3 pkgbuild_1.0.0 htmltools_0.3.6
[36] tidyselect_0.2.4 tibble_1.4.2 codetools_0.2-15 viridisLite_0.3.0 crayon_1.3.4 withr_2.1.2 later_0.7.3
[43] R.methodsS3_1.7.1 grid_3.5.1 jsonlite_1.5 spData_0.2.9.0 satellite_1.0.1 xtable_1.8-2 DBI_1.0.0
[50] git2r_0.22.1 magrittr_1.5 units_0.6-0 scales_0.5.0 cli_1.0.0 promises_1.0.1 leaflet_2.0.1
[57] bindrcpp_0.2.2 sp_1.3-1 testthat_2.0.0 iterators_1.0.9 tools_3.5.1 gdalUtils_2.0.1.14 glue_1.2.0
[64] purrr_0.2.5 crosstalk_1.0.0 processx_3.1.0 pkgload_1.0.0 yaml_2.1.19 colorspace_1.3-2 classInt_0.2-3
[71] memoise_1.1.0 knitr_1.20 bindr_0.1.1
Thanks, this makes a lot of sense; postgis also benefits from a st_intersects before an st_intersection.
I think to make it easier for @edzer to review your changes would be to squash the commits and create a PR.
Thanks for the feedback. I squashed into one commit and submitted PR #803.
I haven't had time to investigate, but I suspect similar speed gains could be achieved in st_difference by performing similar checks (but maybe not) if two objects don't intersect, then st_difference should just return the first, if they do, maybe call st_within to see if st_difference shouldn't return output and then finally call st_difference if a new geometry needs to be returned.
I think you're right. It looks like another layer, after indexing, to speedup computations. I wonder when or if it slows it down in certain cases. I guess looking at the postgis execution plans for these operations should be pretty informative.
Quick follow-up on this, did this get implemented @ekarsten? It seems from #803 that the PR didn't go in. Could be very useful for my work at the moment! Thanks.