library(sf)
library(tidyverse)
set.seed(42)
nc_sf <- sf::st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE)
nc_df <- data.frame(
CNTY_ID = nc_sf$CNTY_ID,
choosen_one = sample(c(TRUE, FALSE), nrow(nc_sf), replace=TRUE)
)
This will work:
nc_sf %>%
left_join(nc_df, by="CNTY_ID") %>%
ggplot() +
geom_sf(aes(fill=choosen_one))

This will not work:
nc_df %>%
right_join(nc_sf, by="CNTY_ID") %>%
ggplot() +
geom_sf(aes(fill=choosen_one))
Gives: Error: stat_sf requires the following missing aesthetics: geometry
Even if I explicitly puts the geometry in the aes, it will not work
nc_df %>%
right_join(nc_sf, by="CNTY_ID") %>%
ggplot() +
geom_sf(aes(geometry=geometry, fill=choosen_one))
Gives: Error in if (type == "point") { : argument is of length zero
Is this the expected behavior or there is a bug?
My intuition would expect that left_join(a, b) have the same semantics and should return the same result as right_join(b, a).
I noticed that the object returned by left_join and right_join are different.
The first command, with lef_join, returns both the sf and data.frame objects. The second command, with right_join, returns only the data.frame object.
I tested a couple of function calls, changing the order of the parameters and also using the base merge function.
left_join(nc_sf, nc_df, by="CNTY_ID") %>% class # works, returning `sf` and `data.frame`
right_join(nc_sf, nc_df, by="CNTY_ID") %>% class # works, returning `sf` and `data.frame`
left_join(nc_df, nc_sf, by="CNTY_ID") %>% class # does not work, returning only the `data.frame`
right_join(nc_df, nc_sf, by="CNTY_ID") %>% class # does not work, returning only the `data.frame`
merge(nc_sf, nc_df, by="CNTY_ID") %>% class # works, returning `sf` and `data.frame`
merge(nc_df, nc_sf, by="CNTY_ID") %>% class # does not work, returning only the `data.frame`
Merge/join functions will work only if the sf object is the first parameter of the function call, no matter if it is a left join or right join.
Library versions:
> library(sf)
Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
> library(tidyverse)
โโ Attaching packages โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ tidyverse 1.3.0 โโ
โ ggplot2 3.3.0 โ purrr 0.3.4
โ tibble 3.0.1 โ dplyr 0.8.5
โ tidyr 1.0.2 โ stringr 1.4.0
โ readr 1.3.1 โ forcats 0.5.0
โโ Conflicts โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ tidyverse_conflicts() โโ
โ dplyr::filter() masks stats::filter()
โ dplyr::lag() masks stats::lag()
>
This simply can't work because of how S3 classes work in general in R. Method dispatch is based only on the first argument and consequently:
left_join(sf_object, data_frame_object) a method for sf objects is dispatched (that is defined in _sf_ package), that treats sf object properly and returns sf object;right_join(data_frame_object, sf_object) a method for data.frame objects is dispatched (that is regular _dplyr_ right_join) that doesn't care at all about sf_object is somewhat special and returns regular data frame (to be precise a tibble).@tzoltak that's true, but not complete โ in dplyr, we certainly could choose to implement double dispatch so that left_join(x, y) and right_join(y, x) return the same type of thing, but we deliberately chose not to.
We did this partly because getting all the details requires a complicated algorithm (so it's much harder to explain than returning the same type as the left-hand side), and partly because it's not obvious what you should get if you (say) join a sf and a data table, or a data table and a grouped df, or ...
Most helpful comment
This simply can't work because of how S3 classes work in general in R. Method dispatch is based only on the first argument and consequently:
left_join(sf_object, data_frame_object)a method forsfobjects is dispatched (that is defined in _sf_ package), that treatssfobject properly and returnssfobject;right_join(data_frame_object, sf_object)a method fordata.frameobjects is dispatched (that is regular _dplyr_right_join) that doesn't care at all aboutsf_objectis somewhat special and returns regular data frame (to be precise atibble).