Hi, I'm sorry if this issue doesn't apply, but I wasn't sure where else to ask.
I'm having trouble merging a sf object and a data frame by a common column i.e "NAME". The final result is a data frame instead of sf, which I didn't expect since I did not change anything in the geometry. As I try to convert it back to sf with st_as_sf(), the geometry appears to have changed, because the plot is different. Here's the example I constructed bellow:
library(sf)
nc = st_read(system.file("shape/nc.shp", package="sf"))
df_example = data.frame(cbind(c("Ashe", "Alleghany", "Surry", "Currituck", "Northhampton", "Salt Lake City", "Atlanta", "New York City"), c(3, 4, 5, 2, 0, 7, 6, 20)))
colnames(df_example) = c("NAME", "RANDOM_NUMBER")
merged = merge(nc, df_example, by = "NAME")
class(nc)
[1] "sf" "data.frame"
class(merged)
[1] "data.frame"
plot(nc)
plot(merged)
#### plots are different ####
Is there a simple solution to this or am I doing something wrong? When I try merging these using a SpatialPolygonsDataFrame object and a data.frame, it works. Thank you so much!
Both merged <- st_as_sf(merged) and st_geometry(merged) <- merged$geometry after merging should work, I think.
> st_geometry(merged) <- merged$geometry
> class(merged)
[1] "sf" "data.frame"
If by "plots are different" you mean that your merger drops all geometries for which is.na(RANDOM_NUMBER), addall.x = TRUE to your merge() (or use left_join())
Thanks; I think we should make this automatic, by providing merge and left_join methods for sf objects.
Thank you very much, all.x = TRUE did the trick to preserve the geometry.
Would you add a duplicateGeoms argument to merge.sf, like in merge.Spatial?
Well, that doesn't do much, apart from suppressing a warning in case of multiple matches. My gut feeling is that it will be more useful to try to support the *_join functions in dplyr, and include spatial matches.
While it would be great to eventually have a set of spatial *_join functions, I expect that may take considerable attention and time to develop. In the meantime, could we get a interim version where the sf-class isn't dropped when a sf object is included in a *_join?
Example:
library(dplyr, warn.conflicts = F, quietly = T)
library(sf)
#> Linking to GEOS 3.5.0, GDAL 2.1.1, proj.4 4.9.3
demo(nc, ask = FALSE, echo = FALSE, verbose = FALSE)
nc_df <- nc %>% unclass %>% as_data_frame() %>% select(NAME, BIR74)
nc %>% select(-BIR74) %>% left_join(nc_df, by = "NAME") %>% class
#> [1] "data.frame"
@brendaprallon please test.
@tiernanmartin pls test the non-spatial *_join.sf methods.
Now, empty GEOMETRYCOLLECTION geometries are put in cases where no geom is available; this might be improved to e.g. put an empty LINESTRING when all the geoms are LINESTRING; for POINT geoms there is no empty version:
a = data.frame(a = 1:3, b = 5:7)
st_geometry(a) = st_sfc(st_point(c(0,0)), st_point(c(1,1)), st_point(c(2,2)))
b = data.frame(x = c("a", "b", "c"), b = c(2,5,6))
full_join(a, b)
# Joining, by = "b"
# Simple feature collection with 4 features and 3 fields (of which 1 is empty)
# geometry type: GEOMETRY
# dimension: XY
# bbox: xmin: 0 ymin: 0 xmax: 2 ymax: 2
# epsg (SRID): NA
# proj4string: NA
# a b x geometry
# 1 1 5 b POINT(0 0)
# 2 2 6 c POINT(1 1)
# 3 3 7 <NA> POINT(2 2)
# 4 NA 2 a GEOMETRYCOLLECTION()
@tiernanmartin https://github.com/edzer/sfr/commit/3f7c25dbae8972a3ce031e4823dcf943ca69b98e now adds st_join for spatial join with flexible geometry predicates, besides the *_join dplyr join methods for non-spatial joins; #200 #50 #42 -- please test!
@edzer my limited tests of the non-spatial *_join functions all returned the expected results - thanks for the quick turn around! If I encounter anything unusual I will post a test here.
I tried st_join with a slightly modified version of your test above and got an error:
library(sf)
#> Linking to GEOS 3.5.0, GDAL 2.1.1, proj.4 4.9.3
a = data.frame(a = 1:3, b = 5:7)
st_geometry(a) = st_sfc(st_point(c(0, 0)), st_point(c(1, 1)), st_point(c(2,
2)))
b = data.frame(x = c("a", "b", "c"), b = c(2, 5, 6))
st_geometry(b) = st_sfc(st_point(c(0, 0)), st_point(c(1, 1)), st_point(c(2,
2)))
st_join(a, b)
#> Error: length(setdiff(names(value), nv)) == 0 is not TRUE
Running st_join in debug mode indicates that the error occurs when subsetting x :
if (missing(FUN)) {
if (left) {
i = lapply(i, function(x) {
if (length(x) == 0)
NA_integer_
else x
})
ix = rep(seq_len(nrow(x)), sapply(i, length))
}
st_sf(cbind(as.data.frame(x[ix, ]), <- error occurs here
y[unlist(i), , drop = FALSE]))
}
@edzer Thank you so much for the quick response. However I'm running into a lot of trouble to install the github version for some reason. I'll test it as soon as I get around that. Thanks!
Things should work now with 0.3-4 on CRAN; could you pls check?
Thank you, I tested it with my data and it is working perfectly!
Most helpful comment
Both
merged <- st_as_sf(merged)andst_geometry(merged) <- merged$geometryafter merging should work, I think.If by "plots are different" you mean that your merger drops all geometries for which
is.na(RANDOM_NUMBER), addall.x = TRUEto yourmerge()(or useleft_join())