Lately I've been trying to use joins for pairs of sf objects and data.frames and I came across two problematic groups of joins:
When a row exist in a data.frame, but doesn't exist in a sf object. Then, a new row with a GEOMETRYCOLLECTION geometry is added.
When a data.frame is the main object in join. Then, a new object has a geometry column, but doesn't have a sf class.
My idea is:
MULTIPOLYGONI'm not sure, if those ideas are the best. What do you think? @edzer @hadley @Robinlovelace
library(tidyverse)
library(sf)
sf_obj = st_read(system.file("shape/nc.shp", package="sf")) %>%
filter(NAME %in% c("Ashe", "Surry")) %>%
select(NAME)
df_obj = data.frame(NAME = c("Ashe", "Surry", "Rowan"), VALUE = c(1, 4, 6))
## 1th group --------------------------
# error: empty GEOMETRYCOLLECTION() added to geom
right_join1 = sf_obj %>%
right_join(df_obj, by = "NAME")
right_join1
# error: empty GEOMETRYCOLLECTION
full_join1 = sf_obj %>%
full_join(df_obj, by = "NAME")
full_join1
## 2nd group ------------------------
# error: keeps geom col
left_join1 = df_obj %>%
left_join(sf_obj, by = "NAME")
left_join1
# error: unwanted geom column added
right_join2 = df_obj %>%
right_join(sf_obj, by = "NAME")
right_join2
# error: geom column added
inner_join1 = df_obj %>%
inner_join(sf_obj, by = "NAME")
inner_join1
# error: null geom
full_join2 = df_obj %>%
full_join(sf_obj, by = "NAME")
full_join2
Thanks; the first problem is not an error, but indeed annoying, I will look into it.
The second problem is not an sf issue; the data.frame methods for these methods are in dplyr.
Thank you @edzer. I tested it a little bit and works great. I've also opened a new issue in the dplyr package - https://github.com/tidyverse/dplyr/issues/2833
Now,
> full_join2 %>% st_sf
Simple feature collection with 3 features and 2 fields (with 1 geometry empty)
geometry type: MULTIPOLYGON
dimension: XY
bbox: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
epsg (SRID): 4267
proj4string: +proj=longlat +datum=NAD27 +no_defs
NAME VALUE geometry
1 Ashe 1 MULTIPOLYGON(((-81.47275543...
2 Surry 4 MULTIPOLYGON(((-80.45634460...
3 Rowan 6 MULTIPOLYGON()
substitutes the NULL list column value returned by dplyr's join with the appropriate empty geometry.
Most helpful comment
Now,
substitutes the
NULLlist column value returned by dplyr's join with the appropriate empty geometry.