First posted at https://github.com/tidyverse/dplyr/issues/4143, they suggested I ask over here.
Using dplyr 0.8.0, using group_map() with sf objects is either failing or I'm using it wrong.
Example below is using st_centroid() as a standin for a custom function I want to use that will keep all rows and create a new column of values, each value calculated for each row but only using the grouped rows.
Thanks for any thoughts.
```` r
library(sf)
nc <- st_read(system.file("shape/nc.shp", package="sf"))
nc' from data sourceC:\Users\matt\Documents\R\win-library\sf\shape\nc.shp' using driver `ESRI Shapefile'nc$gp <- sample(1:10, replace=T)
cent <- st_centroid(nc)
nc_gp_area <- nc %>%
group_by(gp) %>%
summarize(area_mean = mean(AREA))
nc_gp_cent <- nc %>%
group_by(gp) %>%
group_map(st_centroid)
nc_gp_cent <- nc %>%
group_by(gp) %>%
nest() %>%
mutate(out = purrr::map(data, ~st_centroid(.x))) %>%
unnest(out) %>%
st_as_sf()
````
Created on 2019-01-31 by the reprex package (v0.2.1)
For more context, group_map is one of several new generics. dplyr 0.8.0 is scheduled to be released today (Feb 1).
tbh I'm going to need to see some demo code before group_map() and summarise() are properly distinct in my mind. As far as I can tell you can already derive the centroids you want from nc_gp_cent, as summarise() sf method unions the group geometries:
library(sf)
library(dplyr)
nc <- st_read(system.file("shape/nc.shp", package="sf"))
nc$gp <- sample(1:10, replace=T)
# Example of centroid of each polygon; works
cent <- st_centroid(nc)
nc_gp_area <- nc %>%
group_by(gp) %>%
summarize(area_mean = mean(AREA))
grp <- sample(seq(10), 1)
plot(nc_gp_area[grp, 0], axes = T, reset = F)
plot(st_centroid(nc_gp_area[grp, 0]), add = T, pch = 19, col = 'red')

yes/no?
Centroid was an example of an existing function. What I'm actually using is a custom function to find the nearest neighbor within each group and add the distance as a new column. Because this needs to create a unique value for each row in the group, I can't use summarize.
In the past I might have split groups into a list, but this method seems better.
Would be great if someone could report this works as expected!
group_nest seems to be a whole other problem, as it is currently implemented.
Not sure if related, but I it seems I can't load sf when I first load tidyverse.
> library(tidyverse)
โโ Attaching packages โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ tidyverse 1.2.1 โโ
โ ggplot2 3.1.0 โ purrr 0.3.0
โ tibble 2.0.1 โ dplyr 0.7.8
โ tidyr 0.8.2 โ stringr 1.4.0
โ readr 1.3.1 โ forcats 0.3.0
โโ Conflicts โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ tidyverse_conflicts() โโ
โ dplyr::filter() masks stats::filter()
โ dplyr::lag() masks stats::lag()
> library(sf)
Error: package or namespace load failed for โsfโ:
.onLoad failed in loadNamespace() for 'sf', details:
call: get(genname, envir = envir)
error: object 'group_map' not found
> sessionInfo()
R version 3.5.2 (2018-12-20)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=de_DE.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=de_DE.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] forcats_0.3.0 stringr_1.4.0 dplyr_0.7.8 purrr_0.3.0 readr_1.3.1 tidyr_0.8.2 tibble_2.0.1
[8] ggplot2_3.1.0 tidyverse_1.2.1
loaded via a namespace (and not attached):
[1] Rcpp_1.0.0 cellranger_1.1.0 pillar_1.3.1 compiler_3.5.2 plyr_1.8.4 bindr_0.1.1
[7] class_7.3-15 tools_3.5.2 jsonlite_1.6 lubridate_1.7.4 nlme_3.1-137 gtable_0.2.0
[13] lattice_0.20-38 pkgconfig_2.0.2 rlang_0.3.1 DBI_1.0.0 cli_1.0.1 rstudioapi_0.9.0
[19] yaml_2.2.0 haven_2.0.0 bindrcpp_0.2.2 e1071_1.7-0.1 withr_2.1.2 xml2_1.2.0
[25] httr_1.4.0 generics_0.0.2 hms_0.4.2 classInt_0.3-1 grid_3.5.2 tidyselect_0.2.5
[31] glue_1.3.0 R6_2.3.0 readxl_1.2.0 modelr_0.1.3 magrittr_1.5 units_0.6-2
[37] backports_1.1.3 scales_1.0.0 rvest_0.3.2 assertthat_0.2.0 colorspace_1.4-0 stringi_1.3.1
[43] lazyeval_0.2.1 munsell_0.5.0 broom_0.5.1 crayon_1.3.4
When I load them the other way round, tidyverse reports an error on the same call for group_map and group_split, but loads nevertheless.
You'll have to update dplyr to >= 0.8-0; since sf only Suggests: dplyr, it can't enforce this by installing or loading.
Thanks for the fast response! Having updated dplyr, running the above code seems to work, with a bunch of warnings.
> nc_gp_cent <- nc %>%
+ group_by(gp) %>%
+ group_map(st_centroid)
There were 21 warnings (use warnings() to see them)
> warnings()
Warning messages:
1: In st_centroid.sf(.x, .y, ...) :
st_centroid assumes attributes are constant over geometries of x
2: In st_centroid.sfc(st_geometry(x), of_largest_polygon = of_largest_polygon) :
st_centroid does not give correct centroids for longitude/latitude data
3: In st_centroid.sf(.x, .y, ...) :
st_centroid assumes attributes are constant over geometries of x
4: In st_centroid.sfc(st_geometry(x), of_largest_polygon = of_largest_polygon) :
st_centroid does not give correct centroids for longitude/latitude data
5: In st_centroid.sf(.x, .y, ...) :
st_centroid assumes attributes are constant over geometries of x
6: In st_centroid.sfc(st_geometry(x), of_largest_polygon = of_largest_polygon) :
st_centroid does not give correct centroids for longitude/latitude data
7: In st_centroid.sf(.x, .y, ...) :
st_centroid assumes attributes are constant over geometries of x
8: In st_centroid.sfc(st_geometry(x), of_largest_polygon = of_largest_polygon) :
st_centroid does not give correct centroids for longitude/latitude data
9: In st_centroid.sf(.x, .y, ...) :
st_centroid assumes attributes are constant over geometries of x
10: In st_centroid.sfc(st_geometry(x), of_largest_polygon = of_largest_polygon) :
st_centroid does not give correct centroids for longitude/latitude data
11: In st_centroid.sf(.x, .y, ...) :
st_centroid assumes attributes are constant over geometries of x
12: In st_centroid.sfc(st_geometry(x), of_largest_polygon = of_largest_polygon) :
st_centroid does not give correct centroids for longitude/latitude data
13: In st_centroid.sf(.x, .y, ...) :
st_centroid assumes attributes are constant over geometries of x
14: In st_centroid.sfc(st_geometry(x), of_largest_polygon = of_largest_polygon) :
st_centroid does not give correct centroids for longitude/latitude data
15: In bind_rows_(x, .id) :
Vectorizing 'sfc_POINT' elements may not preserve their attributes
16: In bind_rows_(x, .id) :
Vectorizing 'sfc_POINT' elements may not preserve their attributes
17: In bind_rows_(x, .id) :
Vectorizing 'sfc_POINT' elements may not preserve their attributes
18: In bind_rows_(x, .id) :
Vectorizing 'sfc_POINT' elements may not preserve their attributes
19: In bind_rows_(x, .id) :
Vectorizing 'sfc_POINT' elements may not preserve their attributes
20: In bind_rows_(x, .id) :
Vectorizing 'sfc_POINT' elements may not preserve their attributes
21: In bind_rows_(x, .id) :
Vectorizing 'sfc_POINT' elements may not preserve their attributes
>
I can confirm that the current master works, with the same results as @EhrmannS. Thanks!
The "may not preserve their attributes" warning means that the CRS information has been lost. I am working around this by copying the CRS from another variable, e.g., st_crs(nc_gp_cent) <- st_crs(nc).
Most helpful comment
You'll have to update dplyr to >= 0.8-0; since
sfonly Suggests:dplyr, it can't enforce this by installing or loading.