Sf: Merge sf object and data frame [Help]

Created on 30 Jan 2017  路  14Comments  路  Source: r-spatial/sf

Hi, I'm sorry if this issue doesn't apply, but I wasn't sure where else to ask.
I'm having trouble merging a sf object and a data frame by a common column i.e "NAME". The final result is a data frame instead of sf, which I didn't expect since I did not change anything in the geometry. As I try to convert it back to sf with st_as_sf(), the geometry appears to have changed, because the plot is different. Here's the example I constructed bellow:

library(sf)
nc = st_read(system.file("shape/nc.shp", package="sf"))
df_example = data.frame(cbind(c("Ashe", "Alleghany", "Surry", "Currituck", "Northhampton", "Salt Lake City", "Atlanta", "New York City"), c(3, 4, 5, 2, 0, 7, 6, 20)))
colnames(df_example) = c("NAME", "RANDOM_NUMBER")
merged = merge(nc, df_example, by = "NAME")
class(nc)
[1] "sf"         "data.frame"
class(merged)
[1] "data.frame"
plot(nc)
plot(merged)
#### plots are different ####

Is there a simple solution to this or am I doing something wrong? When I try merging these using a SpatialPolygonsDataFrame object and a data.frame, it works. Thank you so much!

Most helpful comment

Both merged <- st_as_sf(merged) and st_geometry(merged) <- merged$geometry after merging should work, I think.

> st_geometry(merged) <- merged$geometry
> class(merged)
[1] "sf"         "data.frame"

If by "plots are different" you mean that your merger drops all geometries for which is.na(RANDOM_NUMBER), addall.x = TRUE to your merge() (or use left_join())

All 14 comments

Both merged <- st_as_sf(merged) and st_geometry(merged) <- merged$geometry after merging should work, I think.

> st_geometry(merged) <- merged$geometry
> class(merged)
[1] "sf"         "data.frame"

If by "plots are different" you mean that your merger drops all geometries for which is.na(RANDOM_NUMBER), addall.x = TRUE to your merge() (or use left_join())

Thanks; I think we should make this automatic, by providing merge and left_join methods for sf objects.

Thank you very much, all.x = TRUE did the trick to preserve the geometry.

Would you add a duplicateGeoms argument to merge.sf, like in merge.Spatial?

Well, that doesn't do much, apart from suppressing a warning in case of multiple matches. My gut feeling is that it will be more useful to try to support the *_join functions in dplyr, and include spatial matches.

While it would be great to eventually have a set of spatial *_join functions, I expect that may take considerable attention and time to develop. In the meantime, could we get a interim version where the sf-class isn't dropped when a sf object is included in a *_join?

Example:

library(dplyr, warn.conflicts = F, quietly = T)
library(sf)
#> Linking to GEOS 3.5.0, GDAL 2.1.1, proj.4 4.9.3
demo(nc, ask = FALSE, echo = FALSE, verbose = FALSE)

nc_df <- nc %>% unclass %>% as_data_frame() %>% select(NAME, BIR74)

nc %>% select(-BIR74) %>% left_join(nc_df, by = "NAME") %>% class
#> [1] "data.frame"

@brendaprallon please test.

@tiernanmartin pls test the non-spatial *_join.sf methods.

Now, empty GEOMETRYCOLLECTION geometries are put in cases where no geom is available; this might be improved to e.g. put an empty LINESTRING when all the geoms are LINESTRING; for POINT geoms there is no empty version:

a = data.frame(a = 1:3, b = 5:7)
st_geometry(a) = st_sfc(st_point(c(0,0)), st_point(c(1,1)), st_point(c(2,2)))
b = data.frame(x = c("a", "b", "c"), b = c(2,5,6))
full_join(a, b)
# Joining, by = "b"
# Simple feature collection with 4 features and 3 fields (of which 1 is empty)
# geometry type:  GEOMETRY
# dimension:      XY
# bbox:           xmin: 0 ymin: 0 xmax: 2 ymax: 2
# epsg (SRID):    NA
# proj4string:    NA
#    a b    x             geometry
# 1  1 5    b           POINT(0 0)
# 2  2 6    c           POINT(1 1)
# 3  3 7 <NA>           POINT(2 2)
# 4 NA 2    a GEOMETRYCOLLECTION()

@tiernanmartin https://github.com/edzer/sfr/commit/3f7c25dbae8972a3ce031e4823dcf943ca69b98e now adds st_join for spatial join with flexible geometry predicates, besides the *_join dplyr join methods for non-spatial joins; #200 #50 #42 -- please test!

@edzer my limited tests of the non-spatial *_join functions all returned the expected results - thanks for the quick turn around! If I encounter anything unusual I will post a test here.

I tried st_join with a slightly modified version of your test above and got an error:

library(sf)
#> Linking to GEOS 3.5.0, GDAL 2.1.1, proj.4 4.9.3

a = data.frame(a = 1:3, b = 5:7)
st_geometry(a) = st_sfc(st_point(c(0, 0)), st_point(c(1, 1)), st_point(c(2, 
  2)))

b = data.frame(x = c("a", "b", "c"), b = c(2, 5, 6))
st_geometry(b) = st_sfc(st_point(c(0, 0)), st_point(c(1, 1)), st_point(c(2, 
  2)))


st_join(a, b)
#> Error: length(setdiff(names(value), nv)) == 0 is not TRUE

Running st_join in debug mode indicates that the error occurs when subsetting x :

        if (missing(FUN)) {
                if (left) {
                        i = lapply(i, function(x) {
                                if (length(x) == 0) 
                                        NA_integer_
                                else x
                        })
                        ix = rep(seq_len(nrow(x)), sapply(i, length))
                }
                st_sf(cbind(as.data.frame(x[ix, ]),     <- error occurs here
                                  y[unlist(i), , drop = FALSE]))    
        }

@edzer Thank you so much for the quick response. However I'm running into a lot of trouble to install the github version for some reason. I'll test it as soon as I get around that. Thanks!

Things should work now with 0.3-4 on CRAN; could you pls check?

Thank you, I tested it with my data and it is working perfectly!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

gregmacfarlane picture gregmacfarlane  路  4Comments

thiagoveloso picture thiagoveloso  路  3Comments

adrfantini picture adrfantini  路  4Comments

jmsigner picture jmsigner  路  4Comments

Nosferican picture Nosferican  路  3Comments