Sf: tidyverse::*_join returns different results depending on the order of the parameters

Created on 12 May 2020  ยท  2Comments  ยท  Source: r-spatial/sf

Example dataset

library(sf)
library(tidyverse)

set.seed(42)

nc_sf <- sf::st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE)

nc_df <- data.frame(
  CNTY_ID     = nc_sf$CNTY_ID,
  choosen_one = sample(c(TRUE, FALSE), nrow(nc_sf), replace=TRUE)
)

Problem

This will work:

nc_sf %>%
  left_join(nc_df, by="CNTY_ID") %>%
  ggplot() +
  geom_sf(aes(fill=choosen_one))

Rplot

This will not work:

nc_df %>%
  right_join(nc_sf, by="CNTY_ID") %>%
  ggplot() +
  geom_sf(aes(fill=choosen_one))

Gives: Error: stat_sf requires the following missing aesthetics: geometry

Even if I explicitly puts the geometry in the aes, it will not work

nc_df %>%
  right_join(nc_sf, by="CNTY_ID") %>%
  ggplot() +
  geom_sf(aes(geometry=geometry, fill=choosen_one))

Gives: Error in if (type == "point") { : argument is of length zero

Expected behaviour

Is this the expected behavior or there is a bug?

My intuition would expect that left_join(a, b) have the same semantics and should return the same result as right_join(b, a).

Tests

I noticed that the object returned by left_join and right_join are different.
The first command, with lef_join, returns both the sf and data.frame objects. The second command, with right_join, returns only the data.frame object.
I tested a couple of function calls, changing the order of the parameters and also using the base merge function.

left_join(nc_sf, nc_df, by="CNTY_ID") %>% class # works, returning `sf` and `data.frame`
right_join(nc_sf, nc_df, by="CNTY_ID") %>% class # works, returning `sf` and `data.frame`
left_join(nc_df, nc_sf, by="CNTY_ID") %>% class # does not work, returning only the `data.frame`
right_join(nc_df, nc_sf, by="CNTY_ID") %>% class # does not work, returning only the `data.frame`
merge(nc_sf, nc_df, by="CNTY_ID") %>% class # works, returning `sf` and `data.frame`
merge(nc_df, nc_sf, by="CNTY_ID") %>% class # does not work, returning only the `data.frame`

Conclusion

Merge/join functions will work only if the sf object is the first parameter of the function call, no matter if it is a left join or right join.

System Environment

Library versions:

> library(sf)
Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
> library(tidyverse)
โ”€โ”€ Attaching packages โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ tidyverse 1.3.0 โ”€โ”€
โœ” ggplot2 3.3.0     โœ” purrr   0.3.4
โœ” tibble  3.0.1     โœ” dplyr   0.8.5
โœ” tidyr   1.0.2     โœ” stringr 1.4.0
โœ” readr   1.3.1     โœ” forcats 0.5.0
โ”€โ”€ Conflicts โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ tidyverse_conflicts() โ”€โ”€
โœ– dplyr::filter() masks stats::filter()
โœ– dplyr::lag()    masks stats::lag()
> 

Most helpful comment

This simply can't work because of how S3 classes work in general in R. Method dispatch is based only on the first argument and consequently:

  • if you call left_join(sf_object, data_frame_object) a method for sf objects is dispatched (that is defined in _sf_ package), that treats sf object properly and returns sf object;
  • if you call right_join(data_frame_object, sf_object) a method for data.frame objects is dispatched (that is regular _dplyr_ right_join) that doesn't care at all about sf_object is somewhat special and returns regular data frame (to be precise a tibble).

All 2 comments

This simply can't work because of how S3 classes work in general in R. Method dispatch is based only on the first argument and consequently:

  • if you call left_join(sf_object, data_frame_object) a method for sf objects is dispatched (that is defined in _sf_ package), that treats sf object properly and returns sf object;
  • if you call right_join(data_frame_object, sf_object) a method for data.frame objects is dispatched (that is regular _dplyr_ right_join) that doesn't care at all about sf_object is somewhat special and returns regular data frame (to be precise a tibble).

@tzoltak that's true, but not complete โ€” in dplyr, we certainly could choose to implement double dispatch so that left_join(x, y) and right_join(y, x) return the same type of thing, but we deliberately chose not to.

We did this partly because getting all the details requires a complicated algorithm (so it's much harder to explain than returning the same type as the left-hand side), and partly because it's not obvious what you should get if you (say) join a sf and a data table, or a data table and a grouped df, or ...

Was this page helpful?
0 / 5 - 0 ratings

Related issues

thiagoveloso picture thiagoveloso  ยท  3Comments

ekarsten picture ekarsten  ยท  4Comments

matthewpaulking picture matthewpaulking  ยท  4Comments

tiernanmartin picture tiernanmartin  ยท  3Comments

Nowosad picture Nowosad  ยท  3Comments