I have a simple features collection of points, where each point represents the
location of an individual. I'd like to calculate the distance between successive
points in the cleanest and most efficient way possible.
library(dplyr)
library(sf)
knitr::opts_chunk$set(error = TRUE)
df <- structure(list(timestamp = structure(
c(1522540900, 1522540905, 1522540907, 1522540910, 1522540912, 1522540915,
1522540917, 1522540920, 1522540922, 1522540925),
class = c("POSIXct", "POSIXt"), tzone = ""),
geometry = structure(list(
structure(c(-84.50954, 33.89375), class = c("XY", "POINT", "sfg")),
structure(c(-84.50954, 33.89378), class = c("XY", "POINT", "sfg")),
structure(c(-84.50954, 33.89377), class = c("XY", "POINT", "sfg")),
structure(c(-84.50954, 33.89377), class = c("XY", "POINT", "sfg")),
structure(c(-84.50954, 33.89377), class = c("XY", "POINT", "sfg")),
structure(c(-84.50954, 33.89377), class = c("XY", "POINT", "sfg")),
structure(c(-84.50954, 33.89377), class = c("XY", "POINT", "sfg")),
structure(c(-84.50954, 33.89377), class = c("XY", "POINT", "sfg")),
structure(c(-84.50954, 33.89377), class = c("XY", "POINT", "sfg")),
structure(c(-84.50954, 33.89377), class = c("XY", "POINT", "sfg"))),
n_empty = 0L, precision = 0,
crs = structure(list(epsg = NA_integer_, proj4string = NA_character_), class = "crs"),
bbox = structure(
c(xmin = -84.81364, ymin = 33.54257, xmax = -83.84208, ymax = 34.24617), class = "bbox"),
class = c("sfc_POINT", "sfc"))),
row.names = c(NA, -10L), class = c("sf", "tbl_df", "tbl", "data.frame"),
sf_column = "geometry",
agr = structure(c(timestamp = NA_integer_), .Label = c("constant", "aggregate", "identity"), class = "factor"))
df
#> Simple feature collection with 10 features and 1 field
#> geometry type: POINT
#> dimension: XY
#> bbox: xmin: -84.81364 ymin: 33.54257 xmax: -83.84208 ymax: 34.24617
#> epsg (SRID): NA
#> proj4string: NA
#> # A tibble: 10 x 2
#> timestamp geometry
#> <dttm> <POINT>
#> 1 2018-04-01 00:01:40 (-84.50954 33.89375)
#> 2 2018-04-01 00:01:45 (-84.50954 33.89378)
#> 3 2018-04-01 00:01:47 (-84.50954 33.89377)
#> 4 2018-04-01 00:01:50 (-84.50954 33.89377)
#> 5 2018-04-01 00:01:52 (-84.50954 33.89377)
#> 6 2018-04-01 00:01:55 (-84.50954 33.89377)
#> 7 2018-04-01 00:01:57 (-84.50954 33.89377)
#> 8 2018-04-01 00:02:00 (-84.50954 33.89377)
#> 9 2018-04-01 00:02:02 (-84.50954 33.89377)
#> 10 2018-04-01 00:02:05 (-84.50954 33.89377)
sf::st_distance can return the matrix of distances between these points, or the element-wise distance between two different vectors. I have written a function to compute the successive distance by bootstrapping a dummy point on the end or beginning of the vector.
#' Calculate the successive distance between points
#'
#' @param x The geometry column of an `sf` object
#' @return The distance between the `x[i]` and `x[i + 1]` elements of `x`
#'
distance_between_points <- function(x) {
null_point <- st_sf(x = st_sfc(st_point(x = c(0, 0))))
x0 <- rbind(st_sf(x), null_point)
x1 <- rbind(null_point, st_sf(x))
d <- st_distance(x0, x1, by_element = TRUE) %>%
.[-1]
d[length(d)] <- NA
d
}
distance_between_points(df$geometry)
#> [1] 3e-05 1e-05 0e+00 0e+00 0e+00 0e+00 0e+00 0e+00 0e+00 NA
This works, but it's a bit slow. I think I was hoping for something more along the lines of the dplyr::lead function, like I can do for the timestamp column.
df %>%
mutate(
elapsed_time = lead(timestamp) - timestamp
)
#> Simple feature collection with 10 features and 2 fields
#> geometry type: POINT
#> dimension: XY
#> bbox: xmin: -84.81364 ymin: 33.54257 xmax: -83.84208 ymax: 34.24617
#> epsg (SRID): NA
#> proj4string: NA
#> # A tibble: 10 x 3
#> timestamp elapsed_time geometry
#> <dttm> <time> <POINT>
#> 1 2018-04-01 00:01:40 5 (-84.50954 33.89375)
#> 2 2018-04-01 00:01:45 2 (-84.50954 33.89378)
#> 3 2018-04-01 00:01:47 3 (-84.50954 33.89377)
#> 4 2018-04-01 00:01:50 2 (-84.50954 33.89377)
#> 5 2018-04-01 00:01:52 3 (-84.50954 33.89377)
#> 6 2018-04-01 00:01:55 2 (-84.50954 33.89377)
#> 7 2018-04-01 00:01:57 3 (-84.50954 33.89377)
#> 8 2018-04-01 00:02:00 2 (-84.50954 33.89377)
#> 9 2018-04-01 00:02:02 3 (-84.50954 33.89377)
#> 10 2018-04-01 00:02:05 <NA> (-84.50954 33.89377)
df %>%
mutate(
elapsed_time = lead(timestamp) - timestamp,
distance_to_next = sf::st_distance(geometry, lead(geometry))
)
#> Error in mutate_impl(.data, dots): Evaluation error: no applicable method for 'st_bbox' applied to an object of class "logical".
Created on 2018-07-16 by the reprex package (v0.2.0).
This is tricky and the error message isn't very helpful. The issue is caused by the fact that sf doesn't use NA for missing coordinates. It rather uses an EMPTY coordinate. lead() by default uses NA to pad the vector, which causes the obscure issue. I think this could be improved and polished.
Meanwhile, here's a solution:
empty <- st_as_sfc("POINT(EMPTY)")
df %>%
mutate(
elapsed_time = lead(timestamp) - timestamp,
distance_to_next = sf::st_distance(
geometry,
lead(geometry, default = empty),
by_element = TRUE)
)
#> Simple feature collection with 10 features and 3 fields
#> geometry type: POINT
#> dimension: XY
#> bbox: xmin: -84.81364 ymin: 33.54257 xmax: -83.84208 ymax: 34.24617
#> epsg (SRID): NA
#> proj4string: NA
#> # A tibble: 10 x 4
#> timestamp elapsed_time distance_to_next geometry
#> <dttm> <time> <dbl> <POINT>
#> 1 2018-03-31 20:01:40 " 5 secs" 0.00003 (-84.50954 33.89375)
#> 2 2018-03-31 20:01:45 " 2 secs" 0.00001000 (-84.50954 33.89378)
#> 3 2018-03-31 20:01:47 " 3 secs" 0 (-84.50954 33.89377)
#> 4 2018-03-31 20:01:50 " 2 secs" 0 (-84.50954 33.89377)
#> 5 2018-03-31 20:01:52 " 3 secs" 0 (-84.50954 33.89377)
#> 6 2018-03-31 20:01:55 " 2 secs" 0 (-84.50954 33.89377)
#> 7 2018-03-31 20:01:57 " 3 secs" 0 (-84.50954 33.89377)
#> 8 2018-03-31 20:02:00 " 2 secs" 0 (-84.50954 33.89377)
#> 9 2018-03-31 20:02:02 " 3 secs" 0 (-84.50954 33.89377)
#> 10 2018-03-31 20:02:05 NA secs 0 (-84.50954 33.89377)
However, the distances computed are unitless (angular distances) because the crs is undefined. Angular distances are pretty much useless. You need meters:
df %>%
st_set_crs(4326) %>% # will use great circle distance
mutate(
elapsed_time = lead(timestamp) - timestamp,
distance_to_next = sf::st_distance(
geometry,
lead(geometry, default = empty),
by_element = TRUE)
)
#> Simple feature collection with 10 features and 3 fields
#> geometry type: POINT
#> dimension: XY
#> bbox: xmin: -84.81364 ymin: 33.54257 xmax: -83.84208 ymax: 34.24617
#> epsg (SRID): 4326
#> proj4string: +proj=longlat +datum=WGS84 +no_defs
#> # A tibble: 10 x 4
#> timestamp elapsed_time distance_to_next geometry
#> <dttm> <time> <S3: units> <POINT [°]>
#> 1 2018-03-31 20:01:40 " 5 secs" 3.327614 m (-84.50954 33.89375)
#> 2 2018-03-31 20:01:45 " 2 secs" 1.109205 m (-84.50954 33.89378)
#> 3 2018-03-31 20:01:47 " 3 secs" 0.000000 m (-84.50954 33.89377)
#> 4 2018-03-31 20:01:50 " 2 secs" 0.000000 m (-84.50954 33.89377)
#> 5 2018-03-31 20:01:52 " 3 secs" 0.000000 m (-84.50954 33.89377)
#> 6 2018-03-31 20:01:55 " 2 secs" 0.000000 m (-84.50954 33.89377)
#> 7 2018-03-31 20:01:57 " 3 secs" 0.000000 m (-84.50954 33.89377)
#> 8 2018-03-31 20:02:00 " 2 secs" 0.000000 m (-84.50954 33.89377)
#> 9 2018-03-31 20:02:02 " 3 secs" 0.000000 m (-84.50954 33.89377)
#> 10 2018-03-31 20:02:05 NA secs " NA m" (-84.50954 33.89377)
Notice that the geometry column has units now (<POINT [°]>) as well as distance_to_next (m). You can (and probably should) st_set_srid() as soon as you define your df, to avoid errors.
This is tricky and the error message isn't very helpful. The issue is caused by the fact that
sfdoesn't useNAfor missing coordinates. It rather uses anEMPTYcoordinate.lead()by default usesNAto pad the vector, which causes the obscure issue. I think this could be improved and polished.Meanwhile, here's a solution:
empty <- st_as_sfc("POINT(EMPTY)") df %>% mutate( elapsed_time = lead(timestamp) - timestamp, distance_to_next = sf::st_distance( geometry, lead(geometry, default = empty), by_element = TRUE) ) #> Simple feature collection with 10 features and 3 fields #> geometry type: POINT #> dimension: XY #> bbox: xmin: -84.81364 ymin: 33.54257 xmax: -83.84208 ymax: 34.24617 #> epsg (SRID): NA #> proj4string: NA #> # A tibble: 10 x 4 #> timestamp elapsed_time distance_to_next geometry #> <dttm> <time> <dbl> <POINT> #> 1 2018-03-31 20:01:40 " 5 secs" 0.00003 (-84.50954 33.89375) #> 2 2018-03-31 20:01:45 " 2 secs" 0.00001000 (-84.50954 33.89378) #> 3 2018-03-31 20:01:47 " 3 secs" 0 (-84.50954 33.89377) #> 4 2018-03-31 20:01:50 " 2 secs" 0 (-84.50954 33.89377) #> 5 2018-03-31 20:01:52 " 3 secs" 0 (-84.50954 33.89377) #> 6 2018-03-31 20:01:55 " 2 secs" 0 (-84.50954 33.89377) #> 7 2018-03-31 20:01:57 " 3 secs" 0 (-84.50954 33.89377) #> 8 2018-03-31 20:02:00 " 2 secs" 0 (-84.50954 33.89377) #> 9 2018-03-31 20:02:02 " 3 secs" 0 (-84.50954 33.89377) #> 10 2018-03-31 20:02:05 NA secs 0 (-84.50954 33.89377)However, the distances computed are unitless (angular distances) because the
crsis undefined. Angular distances are pretty much useless. You need meters:df %>% st_set_crs(4326) %>% # will use great circle distance mutate( elapsed_time = lead(timestamp) - timestamp, distance_to_next = sf::st_distance( geometry, lead(geometry, default = empty), by_element = TRUE) ) #> Simple feature collection with 10 features and 3 fields #> geometry type: POINT #> dimension: XY #> bbox: xmin: -84.81364 ymin: 33.54257 xmax: -83.84208 ymax: 34.24617 #> epsg (SRID): 4326 #> proj4string: +proj=longlat +datum=WGS84 +no_defs #> # A tibble: 10 x 4 #> timestamp elapsed_time distance_to_next geometry #> <dttm> <time> <S3: units> <POINT [°]> #> 1 2018-03-31 20:01:40 " 5 secs" 3.327614 m (-84.50954 33.89375) #> 2 2018-03-31 20:01:45 " 2 secs" 1.109205 m (-84.50954 33.89378) #> 3 2018-03-31 20:01:47 " 3 secs" 0.000000 m (-84.50954 33.89377) #> 4 2018-03-31 20:01:50 " 2 secs" 0.000000 m (-84.50954 33.89377) #> 5 2018-03-31 20:01:52 " 3 secs" 0.000000 m (-84.50954 33.89377) #> 6 2018-03-31 20:01:55 " 2 secs" 0.000000 m (-84.50954 33.89377) #> 7 2018-03-31 20:01:57 " 3 secs" 0.000000 m (-84.50954 33.89377) #> 8 2018-03-31 20:02:00 " 2 secs" 0.000000 m (-84.50954 33.89377) #> 9 2018-03-31 20:02:02 " 3 secs" 0.000000 m (-84.50954 33.89377) #> 10 2018-03-31 20:02:05 NA secs " NA m" (-84.50954 33.89377)Notice that the
geometrycolumn has units now (<POINT [°]>) as well asdistance_to_next(m). You can (and probably should)st_set_srid()as soon as you define yourdf, to avoid errors.
Hi is this answer still valid?
I'm seeing this error about incompatibility between the attributes of df and the empty sfc. but if I use NA in the default arg of st_distance everything works fine.
Error: Problem with `mutate()` input `distance_to_next`.
x Can't combine `default` <sfc_GEOMETRY> and `x` <sfc_GEOMETRY>.
x Some attributes are incompatible.
ℹ The author of the class should implement vctrs methods.
ℹ See <https://vctrs.r-lib.org/reference/faq-error-incompatible-attributes.html>.
ℹ Input `distance_to_next` is `sf::st_distance(geometry, lead(geometry, default = empty), by_element = T)`.
Same issue here, but setting lead(geometry, default = NA) (not the argument default to NA in st_distance; perhaps that's what was meant) allows it to run without error.
This is tricky and the error message isn't very helpful. The issue is caused by the fact that
sfdoesn't useNAfor missing coordinates. It rather uses anEMPTYcoordinate.lead()by default usesNAto pad the vector, which causes the obscure issue. I think this could be improved and polished.
Meanwhile, here's a solution:empty <- st_as_sfc("POINT(EMPTY)") df %>% mutate( elapsed_time = lead(timestamp) - timestamp, distance_to_next = sf::st_distance( geometry, lead(geometry, default = empty), by_element = TRUE) ) #> Simple feature collection with 10 features and 3 fields #> geometry type: POINT #> dimension: XY #> bbox: xmin: -84.81364 ymin: 33.54257 xmax: -83.84208 ymax: 34.24617 #> epsg (SRID): NA #> proj4string: NA #> # A tibble: 10 x 4 #> timestamp elapsed_time distance_to_next geometry #> <dttm> <time> <dbl> <POINT> #> 1 2018-03-31 20:01:40 " 5 secs" 0.00003 (-84.50954 33.89375) #> 2 2018-03-31 20:01:45 " 2 secs" 0.00001000 (-84.50954 33.89378) #> 3 2018-03-31 20:01:47 " 3 secs" 0 (-84.50954 33.89377) #> 4 2018-03-31 20:01:50 " 2 secs" 0 (-84.50954 33.89377) #> 5 2018-03-31 20:01:52 " 3 secs" 0 (-84.50954 33.89377) #> 6 2018-03-31 20:01:55 " 2 secs" 0 (-84.50954 33.89377) #> 7 2018-03-31 20:01:57 " 3 secs" 0 (-84.50954 33.89377) #> 8 2018-03-31 20:02:00 " 2 secs" 0 (-84.50954 33.89377) #> 9 2018-03-31 20:02:02 " 3 secs" 0 (-84.50954 33.89377) #> 10 2018-03-31 20:02:05 NA secs 0 (-84.50954 33.89377)However, the distances computed are unitless (angular distances) because the
crsis undefined. Angular distances are pretty much useless. You need meters:df %>% st_set_crs(4326) %>% # will use great circle distance mutate( elapsed_time = lead(timestamp) - timestamp, distance_to_next = sf::st_distance( geometry, lead(geometry, default = empty), by_element = TRUE) ) #> Simple feature collection with 10 features and 3 fields #> geometry type: POINT #> dimension: XY #> bbox: xmin: -84.81364 ymin: 33.54257 xmax: -83.84208 ymax: 34.24617 #> epsg (SRID): 4326 #> proj4string: +proj=longlat +datum=WGS84 +no_defs #> # A tibble: 10 x 4 #> timestamp elapsed_time distance_to_next geometry #> <dttm> <time> <S3: units> <POINT [°]> #> 1 2018-03-31 20:01:40 " 5 secs" 3.327614 m (-84.50954 33.89375) #> 2 2018-03-31 20:01:45 " 2 secs" 1.109205 m (-84.50954 33.89378) #> 3 2018-03-31 20:01:47 " 3 secs" 0.000000 m (-84.50954 33.89377) #> 4 2018-03-31 20:01:50 " 2 secs" 0.000000 m (-84.50954 33.89377) #> 5 2018-03-31 20:01:52 " 3 secs" 0.000000 m (-84.50954 33.89377) #> 6 2018-03-31 20:01:55 " 2 secs" 0.000000 m (-84.50954 33.89377) #> 7 2018-03-31 20:01:57 " 3 secs" 0.000000 m (-84.50954 33.89377) #> 8 2018-03-31 20:02:00 " 2 secs" 0.000000 m (-84.50954 33.89377) #> 9 2018-03-31 20:02:02 " 3 secs" 0.000000 m (-84.50954 33.89377) #> 10 2018-03-31 20:02:05 NA secs " NA m" (-84.50954 33.89377)Notice that the
geometrycolumn has units now (<POINT [°]>) as well asdistance_to_next(m). You can (and probably should)st_set_srid()as soon as you define yourdf, to avoid errors.Hi is this answer still valid?
I'm seeing this error about incompatibility between the attributes of
dfand the empty sfc. but if I useNAin thedefaultarg ofst_distanceeverything works fine.Error: Problem with `mutate()` input `distance_to_next`. x Can't combine `default` <sfc_GEOMETRY> and `x` <sfc_GEOMETRY>. x Some attributes are incompatible. ℹ The author of the class should implement vctrs methods. ℹ See <https://vctrs.r-lib.org/reference/faq-error-incompatible-attributes.html>. ℹ Input `distance_to_next` is `sf::st_distance(geometry, lead(geometry, default = empty), by_element = T)`.
Thanks for reporting this. It's probably because the empty geometry doesn't have a compatible crs (NA_crs_). Try with:
empty <- st_as_sfc("POINT(EMPTY)", crs = 4326)
We should probably improve the error message, because Can't combine `default` <sfc_GEOMETRY> and `x` <sfc_GEOMETRY> doesn't make sense.
Most helpful comment
This is tricky and the error message isn't very helpful. The issue is caused by the fact that
sfdoesn't useNAfor missing coordinates. It rather uses anEMPTYcoordinate.lead()by default usesNAto pad the vector, which causes the obscure issue. I think this could be improved and polished.Meanwhile, here's a solution:
However, the distances computed are unitless (angular distances) because the
crsis undefined. Angular distances are pretty much useless. You need meters:Notice that the
geometrycolumn has units now (<POINT [°]>) as well asdistance_to_next(m). You can (and probably should)st_set_srid()as soon as you define yourdf, to avoid errors.