Sf: dplyr::group_map() usage with sf data

Created on 1 Feb 2019  ยท  8Comments  ยท  Source: r-spatial/sf

First posted at https://github.com/tidyverse/dplyr/issues/4143, they suggested I ask over here.

Using dplyr 0.8.0, using group_map() with sf objects is either failing or I'm using it wrong.

Example below is using st_centroid() as a standin for a custom function I want to use that will keep all rows and create a new column of values, each value calculated for each row but only using the grouped rows.

Thanks for any thoughts.

```` r
library(sf)

> Linking to GEOS 3.6.1, GDAL 2.2.3, PROJ 4.9.3

nc <- st_read(system.file("shape/nc.shp", package="sf"))

> Reading layer nc' from data sourceC:\Users\matt\Documents\R\win-library\sf\shape\nc.shp' using driver `ESRI Shapefile'

> Simple feature collection with 100 features and 14 fields

> geometry type: MULTIPOLYGON

> dimension: XY

> bbox: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965

> epsg (SRID): 4267

> proj4string: +proj=longlat +datum=NAD27 +no_defs

Add grouping column

nc$gp <- sample(1:10, replace=T)

Example of centroid of each polygon; works

cent <- st_centroid(nc)

> Warning in st_centroid.sf(nc): st_centroid assumes attributes are constant

> over geometries of x

> Warning in st_centroid.sfc(st_geometry(x), of_largest_polygon =

> of_largest_polygon): st_centroid does not give correct centroids for

> longitude/latitude data

Example of summary; works

nc_gp_area <- nc %>%
group_by(gp) %>%
summarize(area_mean = mean(AREA))

Get centroid of each group of polygons; does not work

nc_gp_cent <- nc %>%
group_by(gp) %>%
group_map(st_centroid)

> Error in UseMethod("st_centroid") :

no applicable method for 'st_centroid' applied to an object of class "c('tbl_df', 'tbl', 'data.frame')"

This method is what dplyr::group_map() is supposed to replace; works

(https://github.com/tidyverse/dplyr/issues/4066#issue-395061423)

nc_gp_cent <- nc %>%
group_by(gp) %>%
nest() %>%
mutate(out = purrr::map(data, ~st_centroid(.x))) %>%
unnest(out) %>%
st_as_sf()
````

Created on 2019-01-31 by the reprex package (v0.2.1)

feature

Most helpful comment

You'll have to update dplyr to >= 0.8-0; since sf only Suggests: dplyr, it can't enforce this by installing or loading.

All 8 comments

For more context, group_map is one of several new generics. dplyr 0.8.0 is scheduled to be released today (Feb 1).

tbh I'm going to need to see some demo code before group_map() and summarise() are properly distinct in my mind. As far as I can tell you can already derive the centroids you want from nc_gp_cent, as summarise() sf method unions the group geometries:

library(sf)
library(dplyr)
nc <- st_read(system.file("shape/nc.shp", package="sf"))
nc$gp <- sample(1:10, replace=T)
# Example of centroid of each polygon; works
cent <- st_centroid(nc)
nc_gp_area <- nc %>%
  group_by(gp) %>%
  summarize(area_mean = mean(AREA))

grp <- sample(seq(10), 1)
plot(nc_gp_area[grp, 0], axes = T, reset = F)
plot(st_centroid(nc_gp_area[grp, 0]), add = T, pch = 19, col = 'red')

image

yes/no?

Centroid was an example of an existing function. What I'm actually using is a custom function to find the nearest neighbor within each group and add the distance as a new column. Because this needs to create a unique value for each row in the group, I can't use summarize.

In the past I might have split groups into a list, but this method seems better.

Would be great if someone could report this works as expected!

group_nest seems to be a whole other problem, as it is currently implemented.

Not sure if related, but I it seems I can't load sf when I first load tidyverse.

> library(tidyverse)
โ”€โ”€ Attaching packages โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ tidyverse 1.2.1 โ”€โ”€
โœ” ggplot2 3.1.0     โœ” purrr   0.3.0
โœ” tibble  2.0.1     โœ” dplyr   0.7.8
โœ” tidyr   0.8.2     โœ” stringr 1.4.0
โœ” readr   1.3.1     โœ” forcats 0.3.0
โ”€โ”€ Conflicts โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ tidyverse_conflicts() โ”€โ”€
โœ– dplyr::filter() masks stats::filter()
โœ– dplyr::lag()    masks stats::lag()
> library(sf)
Error: package or namespace load failed for โ€˜sfโ€™:
 .onLoad failed in loadNamespace() for 'sf', details:
  call: get(genname, envir = envir)
  error: object 'group_map' not found
> sessionInfo()
R version 3.5.2 (2018-12-20)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=de_DE.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] forcats_0.3.0   stringr_1.4.0   dplyr_0.7.8     purrr_0.3.0     readr_1.3.1     tidyr_0.8.2     tibble_2.0.1   
[8] ggplot2_3.1.0   tidyverse_1.2.1

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.0       cellranger_1.1.0 pillar_1.3.1     compiler_3.5.2   plyr_1.8.4       bindr_0.1.1     
 [7] class_7.3-15     tools_3.5.2      jsonlite_1.6     lubridate_1.7.4  nlme_3.1-137     gtable_0.2.0    
[13] lattice_0.20-38  pkgconfig_2.0.2  rlang_0.3.1      DBI_1.0.0        cli_1.0.1        rstudioapi_0.9.0
[19] yaml_2.2.0       haven_2.0.0      bindrcpp_0.2.2   e1071_1.7-0.1    withr_2.1.2      xml2_1.2.0      
[25] httr_1.4.0       generics_0.0.2   hms_0.4.2        classInt_0.3-1   grid_3.5.2       tidyselect_0.2.5
[31] glue_1.3.0       R6_2.3.0         readxl_1.2.0     modelr_0.1.3     magrittr_1.5     units_0.6-2     
[37] backports_1.1.3  scales_1.0.0     rvest_0.3.2      assertthat_0.2.0 colorspace_1.4-0 stringi_1.3.1   
[43] lazyeval_0.2.1   munsell_0.5.0    broom_0.5.1      crayon_1.3.4 

When I load them the other way round, tidyverse reports an error on the same call for group_map and group_split, but loads nevertheless.

You'll have to update dplyr to >= 0.8-0; since sf only Suggests: dplyr, it can't enforce this by installing or loading.

Thanks for the fast response! Having updated dplyr, running the above code seems to work, with a bunch of warnings.

> nc_gp_cent <- nc %>%
+     group_by(gp) %>%
+     group_map(st_centroid)
There were 21 warnings (use warnings() to see them)
> warnings()
Warning messages:
1: In st_centroid.sf(.x, .y, ...) :
  st_centroid assumes attributes are constant over geometries of x
2: In st_centroid.sfc(st_geometry(x), of_largest_polygon = of_largest_polygon) :
  st_centroid does not give correct centroids for longitude/latitude data
3: In st_centroid.sf(.x, .y, ...) :
  st_centroid assumes attributes are constant over geometries of x
4: In st_centroid.sfc(st_geometry(x), of_largest_polygon = of_largest_polygon) :
  st_centroid does not give correct centroids for longitude/latitude data
5: In st_centroid.sf(.x, .y, ...) :
  st_centroid assumes attributes are constant over geometries of x
6: In st_centroid.sfc(st_geometry(x), of_largest_polygon = of_largest_polygon) :
  st_centroid does not give correct centroids for longitude/latitude data
7: In st_centroid.sf(.x, .y, ...) :
  st_centroid assumes attributes are constant over geometries of x
8: In st_centroid.sfc(st_geometry(x), of_largest_polygon = of_largest_polygon) :
  st_centroid does not give correct centroids for longitude/latitude data
9: In st_centroid.sf(.x, .y, ...) :
  st_centroid assumes attributes are constant over geometries of x
10: In st_centroid.sfc(st_geometry(x), of_largest_polygon = of_largest_polygon) :
  st_centroid does not give correct centroids for longitude/latitude data
11: In st_centroid.sf(.x, .y, ...) :
  st_centroid assumes attributes are constant over geometries of x
12: In st_centroid.sfc(st_geometry(x), of_largest_polygon = of_largest_polygon) :
  st_centroid does not give correct centroids for longitude/latitude data
13: In st_centroid.sf(.x, .y, ...) :
  st_centroid assumes attributes are constant over geometries of x
14: In st_centroid.sfc(st_geometry(x), of_largest_polygon = of_largest_polygon) :
  st_centroid does not give correct centroids for longitude/latitude data
15: In bind_rows_(x, .id) :
  Vectorizing 'sfc_POINT' elements may not preserve their attributes
16: In bind_rows_(x, .id) :
  Vectorizing 'sfc_POINT' elements may not preserve their attributes
17: In bind_rows_(x, .id) :
  Vectorizing 'sfc_POINT' elements may not preserve their attributes
18: In bind_rows_(x, .id) :
  Vectorizing 'sfc_POINT' elements may not preserve their attributes
19: In bind_rows_(x, .id) :
  Vectorizing 'sfc_POINT' elements may not preserve their attributes
20: In bind_rows_(x, .id) :
  Vectorizing 'sfc_POINT' elements may not preserve their attributes
21: In bind_rows_(x, .id) :
  Vectorizing 'sfc_POINT' elements may not preserve their attributes
>

I can confirm that the current master works, with the same results as @EhrmannS. Thanks!

The "may not preserve their attributes" warning means that the CRS information has been lost. I am working around this by copying the CRS from another variable, e.g., st_crs(nc_gp_cent) <- st_crs(nc).

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dpprdan picture dpprdan  ยท  4Comments

faridcher picture faridcher  ยท  4Comments

gregmacfarlane picture gregmacfarlane  ยท  4Comments

dkyleward picture dkyleward  ยท  4Comments

jmsigner picture jmsigner  ยท  4Comments