Sf: Add support for left_join( ) in dplyr where second argument can be sf object

Created on 9 Apr 2018  路  4Comments  路  Source: r-spatial/sf

I spent a good amount of time today wondering why this code:
left_join(df, sf_object, by = "column")

produced a tbl_df object instead of an sf one, and then I went to the package documentation on joining dataframes and sf objects and realized that the second argument had to be a data.frame object.

So then I wrote left_join(sf_object, df, by = "column") and got what I wanted.

I don't think it's necessarily intuitive for one of these to produce an sf object, and the other one not to - I do understand that the final object is taking the class of the thing that is being joined TO, but in most cases, if I'm joining an sf object to a normal dataframe, I generally want that final thing to be an sf object.

Is there any way to make it such that any type of join on an sf object produces an sf object, so it doesn't matter what order you write the arguments in? Or have some warning pop up mentioning that the the first example above produces a data.frame object, and if you want an sf object, to switch the order of the join?

All 4 comments

Doesn't it already matter what order the arguments are in for left_join though; left_join(x, y) doesn't do the same thing as left_join(y, x). So even producing a warning to switch the join order wouldn't be correct for this operation, though I could see an argument for full_join. You can always use right_join, or follow up with st_as_sf?

Oh yep, @Zedseayou, you're right that left_join(x, y) and left_join(y, x) produce different results, since you're taking the first argument as the basis for the join, so maybe I should have worded that differently.

But my main point is that no matter if you're joining a sf object to a dataframe, or a dataframe to an sf object, I've generally found that I want the final product to be an sf object (why else would I be joining an sf geometry column to a normal dataframe in the first place?), so it might be good to make that the default class of the final result, no matter what order you write the join.

I don't know how technically involved it is to change this @edzer, but I've found this to be an issue a few times when I've been writing joins - I never remember that the second argument has to be the sf object and it takes me some time every time to figure out why the final object isn't an sf object.

All this uses S3 generics (type: methods(left_join)) and that means that package sf can _only_ catch the case where x is of class sf. When y is of class sf and x is not, it is handled by dplyr, so if you want to make a case of this you'll have to raise an issue there.

Where in the documentation did you find something that made you think it should work the way you hoped for?

Got it, that makes sense! No need to do this then, I thought I'd just mention it in case it was an easy change.

Regarding your question, I think it's just an idea I thought up - it's not in the documentation. Your explanation of _why_ the first object needs to be the sf object is good though, so maybe it can be included in the vignette. I think it's easy to overlook the sentence that mentions the second object being a data.frame.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jsta picture jsta  路  4Comments

ekarsten picture ekarsten  路  4Comments

faridcher picture faridcher  路  4Comments

tiernanmartin picture tiernanmartin  路  3Comments

Nosferican picture Nosferican  路  3Comments