Sf: Joins between sf objects and data.frames

Created on 2 Jun 2017  路  3Comments  路  Source: r-spatial/sf

Lately I've been trying to use joins for pairs of sf objects and data.frames and I came across two problematic groups of joins:

  1. When a row exist in a data.frame, but doesn't exist in a sf object. Then, a new row with a GEOMETRYCOLLECTION geometry is added.

  2. When a data.frame is the main object in join. Then, a new object has a geometry column, but doesn't have a sf class.

My idea is:

  • in the first group - a new row should preserve a geometry type of an sf object, in this example an empty MULTIPOLYGON
  • in the second group - a geom column should be removed

I'm not sure, if those ideas are the best. What do you think? @edzer @hadley @Robinlovelace

library(tidyverse)
library(sf)

sf_obj = st_read(system.file("shape/nc.shp", package="sf")) %>% 
  filter(NAME %in% c("Ashe", "Surry")) %>% 
  select(NAME)

df_obj = data.frame(NAME = c("Ashe", "Surry", "Rowan"), VALUE = c(1, 4, 6))

## 1th group --------------------------

# error: empty GEOMETRYCOLLECTION() added to geom
right_join1 = sf_obj %>% 
  right_join(df_obj, by = "NAME")
right_join1

# error: empty GEOMETRYCOLLECTION
full_join1 = sf_obj %>% 
  full_join(df_obj, by = "NAME") 
full_join1

## 2nd group ------------------------

# error: keeps geom col
left_join1 = df_obj %>% 
  left_join(sf_obj, by = "NAME")
left_join1

# error: unwanted geom column added
right_join2 = df_obj %>% 
  right_join(sf_obj, by = "NAME") 
right_join2

# error: geom column added
inner_join1 =  df_obj %>% 
  inner_join(sf_obj, by = "NAME") 
inner_join1

# error: null geom
full_join2 = df_obj %>% 
  full_join(sf_obj, by = "NAME") 
full_join2

Most helpful comment

Now,

> full_join2 %>% st_sf
Simple feature collection with 3 features and 2 fields (with 1 geometry empty)
geometry type:  MULTIPOLYGON
dimension:      XY
bbox:           xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
epsg (SRID):    4267
proj4string:    +proj=longlat +datum=NAD27 +no_defs
   NAME VALUE                       geometry
1  Ashe     1 MULTIPOLYGON(((-81.47275543...
2 Surry     4 MULTIPOLYGON(((-80.45634460...
3 Rowan     6                 MULTIPOLYGON()

substitutes the NULL list column value returned by dplyr's join with the appropriate empty geometry.

All 3 comments

Thanks; the first problem is not an error, but indeed annoying, I will look into it.

The second problem is not an sf issue; the data.frame methods for these methods are in dplyr.

Thank you @edzer. I tested it a little bit and works great. I've also opened a new issue in the dplyr package - https://github.com/tidyverse/dplyr/issues/2833

Now,

> full_join2 %>% st_sf
Simple feature collection with 3 features and 2 fields (with 1 geometry empty)
geometry type:  MULTIPOLYGON
dimension:      XY
bbox:           xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
epsg (SRID):    4267
proj4string:    +proj=longlat +datum=NAD27 +no_defs
   NAME VALUE                       geometry
1  Ashe     1 MULTIPOLYGON(((-81.47275543...
2 Surry     4 MULTIPOLYGON(((-80.45634460...
3 Rowan     6                 MULTIPOLYGON()

substitutes the NULL list column value returned by dplyr's join with the appropriate empty geometry.

Was this page helpful?
0 / 5 - 0 ratings