Sf: Article suggestion: demonstrate multi-step geometry manipulation

Created on 1 Dec 2017  路  7Comments  路  Source: r-spatial/sf

I recently found myself scratching my head over how to do a pretty common GIS procedure using the sf package: join one set of multipolygon data to another based on the highest amount of spatial intersection.

A quick Stackoverflow search turned up a postgis solution, but reverse engineering it with sf took me quite a bit of time.

That made me wish that the sf documentation had an example showing how to apply a sequence of steps that combine verbs from sf and other tidyverse packages like dplyr, tidyr, and purrr.

After my project I threw together a reprex to illustrate the approach that I settled on - my hope is that it sparks a discussion that then informs a new section of the geometry manipulation article (or perhaps a separate article entirely).

Here's an image (reprex below) that uses the nc dataset to illustrate the result I was looking for:

Update:
Here's my approach wrapped up in a function:

st_intersects_most <- function(x, y){ 
  ints <- st_intersects(x,y)

  tibble(
    ROW_ID = imap(ints, ~rep(.y, times = length(.x))) %>% flatten_chr(),
    INTERSECT_DF = flatten_int(ints) %>% map(~y[.x,] %>% st_drop_geometry), # note: see the reprex for st_drop_geometry's definition
    INTERSECT_AREA = st_intersection(x,y) %>% st_area %>% map_dbl(1)
  ) %>%  
    unnest %>%  
    mutate(geometry = st_geometry(x[ROW_ID,])) %>% 
    st_as_sf
}

It requires the user to follow up with group_by %>% top_n(1, INTERSECT_AREA) %>% ungroup - that could probably be improved.

note: the reprex below has been updated to show the above function in action

Reprex + Session Info

# SETUP ----
library(magrittr)
library(tidyverse)  
library(ggplot2)     # devtools::install_github('tidyverse/ggplot2) for `geom_sf()`
library(sf)
## Linking to GEOS 3.6.1, GDAL 2.2.0, proj.4 4.9.3
library(scales) 
library(gridExtra) 

# LOAD DATA ----

nc_sf <- 
  st_read(system.file("shape/nc.shp", package="sf")) %>% 
  st_transform(2264) # NC state plane, US foot 

# CREATE GRID ----

grid_sf <- 
  tibble(geometry = st_make_grid(nc_sf)) %>% 
  st_sf %>% 
  bind_cols(crossing(X=1:10,Y=1:10)) %>% 
  mutate(CENTROID = map(geometry, st_centroid),
         COORDS = map(CENTROID, st_coordinates),
         COORDS_X = map_dbl(COORDS,1),
         COORDS_Y = map_dbl(COORDS,2),
         COORD_COMB = map2_dbl(COORDS_X,COORDS_Y,~ sum(abs(.x)*1e5,abs(.y)))) %>% 
  arrange(desc(COORD_COMB)) %>% 
  mutate(GRID_ID = purrr::cross2(LETTERS[1:10],str_pad(1:10,2,side = "left",pad = "0")) %>% map_chr(str_c, collapse = "")) %>% 
  as_tibble() %>% 
  st_as_sf() 

# SPATIAL JOIN FUNCTION ----

st_drop_geometry <- function (x) 
{
  if (inherits(x, "sf")) {
    x <- st_set_geometry(x, NULL)
    class(x) <- "data.frame"
    x <- as_tibble(x)
  }
  return(x)
}

st_intersects_most <- function(x, y){ 
  ints <- st_intersects(x,y)

  tibble(
    ROW_ID = imap(ints, ~rep(.y, times = length(.x))) %>% flatten_chr(),
    INTERSECT_DF = flatten_int(ints) %>% map(~y[.x,] %>% st_drop_geometry),
    INTERSECT_AREA = st_intersection(x,y) %>% st_area %>% map_dbl(1)
  ) %>%  
    unnest %>%  
    mutate(geometry = st_geometry(x[ROW_ID,])) %>% 
    st_as_sf
}

# SPATIAL JOIN ----

nc_grid_sf <- 
  st_intersects_most(nc_sf, grid_sf) %>% 
  group_by(ROW_ID) %>% 
  top_n(1, INTERSECT_AREA) %>% 
  ungroup %>% 
  select(-CENTROID,-COORDS,-COORDS_X,-COORDS_Y,-COORD_COMB) %>%   
  mutate(CENTROID = map(geometry, st_centroid),
         COORDS = map(CENTROID, st_coordinates),
         COORDS_X = map_dbl(COORDS,1),
         COORDS_Y = map_dbl(COORDS,2),
         COORD_COMB = map2_dbl(COORDS_X,COORDS_Y,~ sum(abs(.x)*1e5,abs(.y))))
## Warning: attribute variables are assumed to be spatially constant
## throughout all geometries


# PLOT RESULTS ----

# Create a discrete color scale for each grid rectangle 

scale_fill_grid <- function(...){

  cols <- rep(brewer_pal(type = "qual", palette = 3)(12),times = 9)[1:100]

  vals <- grid_sf$GRID_ID

  ggplot2::scale_fill_manual(..., 
                             values = setNames(cols,
                                               vals))
}


# Grid beside counties
g1 <- ggplot()
g1 <- g1 + geom_sf(data = grid_sf,  color = "black", fill = "transparent") 
g1 <- g1 + geom_text(data = grid_sf, mapping = aes(COORDS_X, COORDS_Y, label = GRID_ID))
g1 <- g1 + coord_sf(crs = st_crs(grid_sf), datum = NA)
g1 <- g1 + theme_void()
g1 <- g1 + theme(legend.position = "none")

g2 <- ggplot() 
g2 <- g2 + geom_sf(data = nc_sf, color = "white")  
g2 <- g2 + coord_sf(crs = st_crs(nc_sf), datum = NA)
g2 <- g2 + theme_void()
g2 <- g2 + theme(legend.position = "none")

# Grid over counties

g3 <- ggplot()
g3 <- g3 + geom_sf(data = grid_sf, mapping = aes(fill = GRID_ID), alpha = .5, color = "transparent")
g3 <- g3 + geom_sf(data = nc_sf, color = "black",  fill = "transparent")
g3 <- g3 + scale_fill_grid()
g3 <- g3 + geom_text(data = grid_sf, mapping = aes(COORDS_X, COORDS_Y, label = GRID_ID))
g3 <- g3 + coord_sf(crs = st_crs(grid_sf), datum = NA)
g3 <- g3 + theme_void()
g3 <- g3 + theme(legend.position = "none")

g4  <- ggplot()
g4  <- g4  + geom_sf(data = nc_grid_sf, aes(fill = GRID_ID), alpha = .5, color = "white")
g4  <- g4  + scale_fill_grid() 
g4  <- g4  + geom_text(data = nc_grid_sf, mapping = aes(COORDS_X, COORDS_Y, label = GRID_ID), size = 2)
g4  <- g4  + geom_sf(data = grid_sf, fill = "transparent", color = "black")
g4  <- g4  + coord_sf(crs = st_crs(grid_sf), datum = NA)
g4  <- g4  + theme_void()
g4  <- g4  + theme(legend.position = "none")


grid.arrange(g2, g1, ncol=1)

grid.arrange(g3, g4, ncol=1)
devtools::session_info()
## Session info -------------------------------------------------------------
##  setting  value                       
##  version  R version 3.4.0 (2017-04-21)
##  system   x86_64, mingw32             
##  ui       RTerm                       
##  language (EN)                        
##  collate  English_United States.1252  
##  tz       America/Los_Angeles         
##  date     2017-12-02
## Packages -----------------------------------------------------------------
##  package      * version    date       source                            
##  assertthat     0.2.0      2017-04-11 CRAN (R 3.4.2)                    
##  backports      1.1.0      2017-05-22 CRAN (R 3.4.0)                    
##  base         * 3.4.0      2017-04-21 local                             
##  bindr          0.1        2016-11-13 CRAN (R 3.4.2)                    
##  bindrcpp     * 0.2        2017-06-17 CRAN (R 3.4.2)                    
##  bitops         1.0-6      2013-08-17 CRAN (R 3.4.0)                    
##  broom          0.4.2      2017-02-13 CRAN (R 3.4.0)                    
##  cellranger     1.1.0      2016-07-27 CRAN (R 3.4.2)                    
##  class          7.3-14     2015-08-30 CRAN (R 3.4.0)                    
##  classInt       0.1-24     2017-04-16 CRAN (R 3.4.2)                    
##  cli            1.0.0      2017-11-05 CRAN (R 3.4.2)                    
##  colorspace     1.3-2      2016-12-14 CRAN (R 3.4.2)                    
##  compiler       3.4.0      2017-04-21 local                             
##  crayon         1.3.4      2017-10-30 Github (r-lib/crayon@b5221ab)     
##  datasets     * 3.4.0      2017-04-21 local                             
##  DBI            0.7        2017-06-18 CRAN (R 3.4.2)                    
##  devtools       1.13.2     2017-06-02 CRAN (R 3.4.0)                    
##  digest         0.6.12     2017-01-27 CRAN (R 3.4.0)                    
##  dplyr        * 0.7.4      2017-09-28 CRAN (R 3.4.2)                    
##  e1071          1.6-8      2017-02-02 CRAN (R 3.4.2)                    
##  evaluate       0.10       2016-10-11 CRAN (R 3.4.0)                    
##  forcats      * 0.2.0      2017-01-23 CRAN (R 3.4.2)                    
##  foreign        0.8-67     2016-09-13 CRAN (R 3.4.0)                    
##  ggplot2      * 2.2.1.9000 2017-12-02 Github (tidyverse/ggplot2@7b5c185)
##  glue           1.1.1.9000 2017-10-18 Github (tidyverse/glue@808054b)   
##  graphics     * 3.4.0      2017-04-21 local                             
##  grDevices    * 3.4.0      2017-04-21 local                             
##  grid           3.4.0      2017-04-21 local                             
##  gridExtra    * 2.3        2017-09-09 CRAN (R 3.4.2)                    
##  gtable         0.2.0      2016-02-26 CRAN (R 3.4.2)                    
##  haven          1.1.0      2017-07-09 CRAN (R 3.4.2)                    
##  hms            0.3        2016-11-22 CRAN (R 3.4.2)                    
##  htmltools      0.3.6      2017-04-28 CRAN (R 3.4.0)                    
##  httr           1.3.1      2017-08-20 CRAN (R 3.4.2)                    
##  jsonlite       1.5        2017-06-01 CRAN (R 3.4.0)                    
##  knitr          1.16       2017-05-18 CRAN (R 3.4.0)                    
##  lattice        0.20-35    2017-03-25 CRAN (R 3.4.0)                    
##  lazyeval       0.2.1      2017-10-29 CRAN (R 3.4.2)                    
##  lubridate      1.7.1      2017-11-03 CRAN (R 3.4.2)                    
##  magrittr     * 1.5        2014-11-22 CRAN (R 3.4.0)                    
##  memoise        1.1.0      2017-04-21 CRAN (R 3.4.0)                    
##  methods      * 3.4.0      2017-04-21 local                             
##  mnormt         1.5-5      2016-10-15 CRAN (R 3.4.1)                    
##  modelr         0.1.1      2017-07-24 CRAN (R 3.4.2)                    
##  munsell        0.4.3      2016-02-13 CRAN (R 3.4.2)                    
##  nlme           3.1-131    2017-02-06 CRAN (R 3.4.0)                    
##  parallel       3.4.0      2017-04-21 local                             
##  pkgconfig      2.0.1      2017-03-21 CRAN (R 3.4.2)                    
##  plyr           1.8.4      2016-06-08 CRAN (R 3.4.2)                    
##  psych          1.7.8      2017-09-09 CRAN (R 3.4.2)                    
##  purrr        * 0.2.4.9000 2017-10-24 Github (tidyverse/purrr@fc8bea1)  
##  R6             2.2.2      2017-06-17 CRAN (R 3.4.0)                    
##  RColorBrewer   1.1-2      2014-12-07 CRAN (R 3.4.1)                    
##  Rcpp           0.12.14    2017-11-23 CRAN (R 3.4.2)                    
##  RCurl          1.95-4.8   2016-03-01 CRAN (R 3.4.1)                    
##  readr        * 1.1.1      2017-05-16 CRAN (R 3.4.2)                    
##  readxl         1.0.0      2017-04-18 CRAN (R 3.4.2)                    
##  reshape2       1.4.2      2016-10-22 CRAN (R 3.4.2)                    
##  rlang          0.1.4      2017-11-05 CRAN (R 3.4.2)                    
##  rmarkdown      1.8        2017-11-17 CRAN (R 3.4.2)                    
##  rprojroot      1.2        2017-01-16 CRAN (R 3.4.0)                    
##  rvest          0.3.2      2016-06-17 CRAN (R 3.4.2)                    
##  scales       * 0.5.0.9000 2017-12-02 Github (hadley/scales@d767915)    
##  sf           * 0.5-5      2017-10-31 CRAN (R 3.4.0)                    
##  stats        * 3.4.0      2017-04-21 local                             
##  stringi        1.1.5      2017-04-07 CRAN (R 3.4.0)                    
##  stringr      * 1.2.0      2017-02-18 CRAN (R 3.4.0)                    
##  tibble       * 1.3.4      2017-08-22 CRAN (R 3.4.2)                    
##  tidyr        * 0.7.2      2017-10-16 CRAN (R 3.4.2)                    
##  tidyverse    * 1.2.1      2017-11-14 CRAN (R 3.4.2)                    
##  tools          3.4.0      2017-04-21 local                             
##  udunits2       0.13       2016-11-17 CRAN (R 3.4.1)                    
##  units          0.4-6      2017-08-27 CRAN (R 3.4.2)                    
##  utils        * 3.4.0      2017-04-21 local                             
##  withr          2.1.0.9000 2017-12-02 Github (jimhester/withr@fe81c00)  
##  XML            3.98-1.9   2017-06-19 CRAN (R 3.4.1)                    
##  xml2           1.1.1      2017-01-24 CRAN (R 3.4.2)                    
##  yaml           2.1.14     2016-11-12 CRAN (R 3.4.0)

Most helpful comment

@tiernanmartin something went wrong, it seems, e.g. when assigning F08 and H04, in your plot. I now integrated a somewhat different approach in st_join, honoring the rest of st_join's conventions wrt renaming duplicate columns, and left option. Output from ?st_join below:
x

All 7 comments

That's a nice article, and IMO a candidate to be wrapped into something simpler: a new function or an option under an existing one. Does this activity have a name in GIS or spatial database world?

I wrote sf::st_interpolate_aw for area-weighted interpolation, which is what you'd do when the labels are continuous variables. For me, intuitively I would search for such functionality under st_join.

I revised the example to show how this might work as a function instead of a series of mutate steps.

I'm not sure if this activity has a formal name in GIS. I've been calling it an "area-weighted spatial join" but others have different names for it:

Sometimes called majority fill or maximum area overlay -- but I believe a standard (faster?) approach is to rasterize the base layer to an appropriate resolution and then use raster::extract() to return the mode/median raster value in each polygon.

@tiernanmartin something went wrong, it seems, e.g. when assigning F08 and H04, in your plot. I now integrated a somewhat different approach in st_join, honoring the rest of st_join's conventions wrt renaming duplicate columns, and left option. Output from ?st_join below:
x

@mbacou "maximum area overlay" sounds good to me and thanks for the the tip about using raster::extract()

@edzer You're right - the function version above contains a mistake. I'll see if I can sleuth it out...

I played around with st_join(..., largest = TRUE) and I'm really pleased with the results. The example in ?st_join is quite clear. Thanks for the quick turn-around with f16249d and I'll let you know if I run into any unexpected behavior.

I haven't tested it with large datasets yet, but I'm wondering if this may be another instance where st_intersection would benefit from vectorization (#575)?

"largest" parameter as is now in st_join would also add a lot to raster::extract(), as an additional option for "fun" https://www.rdocumentation.org/packages/raster/versions/3.1-5/topics/extract

But that is an issue for raster, right?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dpprdan picture dpprdan  路  4Comments

adrfantini picture adrfantini  路  4Comments

kendonB picture kendonB  路  4Comments

kendonB picture kendonB  路  3Comments

matthewpaulking picture matthewpaulking  路  4Comments