Sf: Provide an option to vectorize st_bbox()

Created on 21 Apr 2018  路  6Comments  路  Source: r-spatial/sf

It seems st_bbox(x) returns the bounding box of the entire x object (which makes sense). In my case, I would like to get the bounding box of each feature in x. For example, I can do

library(sf)
nc <- st_read(system.file("shape/nc.shp", package="sf"))
bboxes <- list()
for(i in seq_len(nrow(nc))) bboxes <- c(bboxes, list(st_bbox(nc[i, ])))

But this is obviously slow for large feature sets. Is there a better way and if not, would it be difficult to vectorize st_bbox() at the c++ level? Thanks!

> sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.2

Most helpful comment

Small tip - maybe it would be helpful in your case. You should not grow an object if you know its future size. The code below should be substantially faster for large datasets:

bboxes2 <- vector("list", nrow(nc))
for(i in seq_len(nrow(nc))) bboxes2[[i]] <- st_bbox(nc[i, ])

Update:
Even faster approach, but it drops CRS:

nc2 = st_geometry(nc)
bboxes3 <- vector("list", length(nc2))
for(i in seq_len(length(nc2))) bboxes3[[i]] <- st_bbox(nc2[[i]])

All 6 comments

Small tip - maybe it would be helpful in your case. You should not grow an object if you know its future size. The code below should be substantially faster for large datasets:

bboxes2 <- vector("list", nrow(nc))
for(i in seq_len(nrow(nc))) bboxes2[[i]] <- st_bbox(nc[i, ])

Update:
Even faster approach, but it drops CRS:

nc2 = st_geometry(nc)
bboxes3 <- vector("list", length(nc2))
for(i in seq_len(length(nc2))) bboxes3[[i]] <- st_bbox(nc2[[i]])

Comparison using a medium-sized data:

library(sf)                                                                
#> Linking to GEOS 3.6.1, GDAL 2.1.4, proj.4 4.9.3                                          

## mock up some point data                                                 
n = 1e4                                                                    

set.seed(234)                                                              
dat = data.frame(x = runif(n), y = runif(n))                               
pts = st_as_sf(dat, coords = c("x", "y"))                                  


get_bbox1 = function(x){                                                   
bboxes <- list()                                                           
for(i in seq_len(nrow(x))) bboxes <- c(bboxes, list(st_bbox(x[i, ])))      
return(bboxes)                                                             
}                                                                          

get_bbox2 = function(x){                                                   
bboxes2 <- vector("list", nrow(x))                                         
for(i in seq_len(nrow(x))) bboxes2[[i]] <- st_bbox(x[i, ])                 
return(bboxes2)                                                            
}                                                                          

get_bbox3 = function(x){                                                   
x = st_geometry(x)                                                         
bboxes3 <- vector("list", length(x))                                       
system.time({for(i in seq_len(length(x))) bboxes3[[i]] <- st_bbox(x[[i]])})
return(bboxes3)                                                            
}                                                                          

system.time(get_bbox1(pts))                                                
#>    user  system elapsed 
#>  26.992   0.019  27.280
system.time(get_bbox2(pts))                                                
#>    user  system elapsed 
#>  23.446   0.005  23.675
system.time(get_bbox3(pts))                                                
#>    user  system elapsed 
#>   0.699   0.000   0.705

...and a larger one:

library(sf)                                                                
#> Linking to GEOS 3.6.1, GDAL 2.1.4, proj.4 4.9.3                                           

## mock up some point data                                                 
n = 1e5                                                                    

set.seed(234)                                                              
dat = data.frame(x = runif(n), y = runif(n))                               
pts = st_as_sf(dat, coords = c("x", "y"))                                  


get_bbox1 = function(x){                                                   
bboxes <- list()                                                           
for(i in seq_len(nrow(x))) bboxes <- c(bboxes, list(st_bbox(x[i, ])))      
return(bboxes)                                                             
}                                                                          

get_bbox2 = function(x){                                                   
bboxes2 <- vector("list", nrow(x))                                         
for(i in seq_len(nrow(x))) bboxes2[[i]] <- st_bbox(x[i, ])                 
return(bboxes2)                                                            
}                                                                          

get_bbox3 = function(x){                                                   
x = st_geometry(x)                                                         
bboxes3 <- vector("list", length(x))                                       
system.time({for(i in seq_len(length(x))) bboxes3[[i]] <- st_bbox(x[[i]])})
return(bboxes3)                                                            
}                                                                          

system.time(get_bbox1(pts))                                                
#>    user  system elapsed 
#> 607.953   2.174 620.526
system.time(get_bbox2(pts))                                                
#>    user  system elapsed 
#> 265.432   3.197 272.193
system.time(get_bbox3(pts))                                                
#>    user  system elapsed 
#>   6.807   0.031   6.901

@Nowosad This is great, thanks! I still think it'd be nice to have a native, vectorized version of st_bbox() in the sf package, but your solution works great until that day comes (if it does).

@Nowosad I appreciate it if you help solving issues here.

@ben519 feel free to write a PR for this in case it is worth the speed up.

The non-for-loop version of this is

lapply(st_geometry(nc), st_bbox)

Thanks @edzer. I had tried versions of lapply(), sapply(), and apply() with no luck but didn't think about casting my sf object to type sfc first. Much appreciated.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

adrfantini picture adrfantini  路  4Comments

duleise picture duleise  路  3Comments

dpprdan picture dpprdan  路  4Comments

jmsigner picture jmsigner  路  4Comments

dkyleward picture dkyleward  路  4Comments