Sf: How can I set INTERLEAVED_READING options with PBF files

Created on 9 Dec 2019  Â·  34Comments  Â·  Source: r-spatial/sf

Hi! I'm working on a project on spatial networks in Greater London and I'm downloading the network data from Geofabrik.

When I read the data I get a warning

london_highways <- sf::st_read(
    dsn = "https://download.geofabrik.de/europe/great-britain/england/greater-london-latest.osm.pbf", 
    layer = "lines", 
    stringsAsFactors = FALSE
)
#> Reading layer `lines' from data source `https://download.geofabrik.de/europe/great-britain/england/greater-london-latest.osm.pbf' using driver `OSM'
#> Warning in CPL_read_ogr(dsn, layer, query, as.character(options), quiet, :
#> GDAL Error 1: Too many features have accumulated in points layer. Use
#> OGR_INTERLEAVED_READING=YES mode
#> Simple feature collection with 62788 features and 9 fields
#> geometry type:  LINESTRING
#> dimension:      XY
#> bbox:           xmin: -0.6124681 ymin: 51.2822 xmax: 0.3760979 ymax: 51.78736
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs

Created on 2019-12-09 by the reprex package (v0.3.0)

so I checked on gdal website on the meaning of that error and, after reading your answers in #1157, I tried to that set parameter. The problem is that none of these options work:

london_highways <- sf::st_read(
    dsn = "https://download.geofabrik.de/europe/great-britain/england/greater-london-latest.osm.pbf", 
    layer = "lines", 
    options = c("OGR_INTERLEAVED_READING=YES"), 
    stringsAsFactors = FALSE
)
#> options:        OGR_INTERLEAVED_READING=YES 
#> Reading layer `lines' from data source `https://download.geofabrik.de/europe/great-britain/england/greater-london-latest.osm.pbf' using driver `OSM'
#> Warning in CPL_read_ogr(dsn, layer, query, as.character(options), quiet, :
#> GDAL Error 1: Too many features have accumulated in points layer. Use
#> OGR_INTERLEAVED_READING=YES mode
#> Simple feature collection with 62788 features and 9 fields
#> geometry type:  LINESTRING
#> dimension:      XY
#> bbox:           xmin: -0.6124681 ymin: 51.2822 xmax: 0.3760979 ymax: 51.78736
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs
london_highways
#> Simple feature collection with 62788 features and 9 fields
#> geometry type:  LINESTRING
#> dimension:      XY
#> bbox:           xmin: -0.6124681 ymin: 51.2822 xmax: 0.3760979 ymax: 51.78736
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs
#> First 10 features:
#>    osm_id                 name     highway waterway aerialway barrier man_made
#> 1      74        Ballards Lane     primary     <NA>      <NA>    <NA>     <NA>
#> 2      75            High Road     primary     <NA>      <NA>    <NA>     <NA>
#> 3      79        East End Road     primary     <NA>      <NA>    <NA>     <NA>
#> 4     482     Cockfosters Road     primary     <NA>      <NA>    <NA>     <NA>
#> 5     488          Western Way residential     <NA>      <NA>    <NA>     <NA>
#> 6     489          The Linkway residential     <NA>      <NA>    <NA>     <NA>
#> 7     490        Sherrards Way residential     <NA>      <NA>    <NA>     <NA>
#> 8     491     Grasvenor Avenue residential     <NA>      <NA>    <NA>     <NA>
#> 9    1200          Hanger Lane       trunk     <NA>      <NA>    <NA>     <NA>
#> 10   1201 Hanger Lane Gyratory  trunk_link     <NA>      <NA>    <NA>     <NA>
#>                          geometry
#> 1  LINESTRING (-0.193124 51.60...
#> 2  LINESTRING (-0.1767883 51.6...
#> 3  LINESTRING (-0.1979855 51.5...
#> 4  LINESTRING (-0.1607372 51.6...
#> 5  LINESTRING (-0.1886468 51.6...
#> 6  LINESTRING (-0.1883204 51.6...
#> 7  LINESTRING (-0.1896293 51.6...
#> 8  LINESTRING (-0.1892364 51.6...
#> 9  LINESTRING (-0.291608 51.51...
#> 10 LINESTRING (-0.2925582 51.5...

london_highways <- sf::st_read(
    dsn = "https://download.geofabrik.de/europe/great-britain/england/greater-london-latest.osm.pbf", 
    layer = "lines", 
    options = c("INTERLEAVED_READING=YES"), 
    stringsAsFactors = FALSE
)
#> options:        INTERLEAVED_READING=YES 
#> Reading layer `lines' from data source `https://download.geofabrik.de/europe/great-britain/england/greater-london-latest.osm.pbf' using driver `OSM'
#> Simple feature collection with 0 features and 9 fields
#> bbox:           xmin: NA ymin: NA xmax: NA ymax: NA
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs
london_highways
#> Simple feature collection with 0 features and 9 fields
#> bbox:           xmin: NA ymin: NA xmax: NA ymax: NA
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs
#>  [1] osm_id     name       highway    waterway   aerialway  barrier   
#>  [7] man_made   z_order    other_tags geometry  
#> <0 rows> (or 0-length row.names)

Created on 2019-12-09 by the reprex package (v0.3.0)

How can I set that parameter? Do I have to modify the .ini config file?

This problem is related to https://github.com/ITSLeeds/geofabric/issues/12 so apologies for "crossposting".

The PBF file is approximately 55MB of data and the same problem/warning disapper if I consider a smaller PBF.

Session info

devtools::session_info()
#> - Session info ---------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.6.1 (2019-07-05)
#>  os       Windows 10 x64              
#>  system   x86_64, mingw32             
#>  ui       RTerm                       
#>  language EN                          
#>  collate  English_United Kingdom.1252 
#>  ctype    English_United Kingdom.1252 
#>  tz       Europe/Berlin               
#>  date     2019-12-09                  
#> 
#> - Packages -------------------------------------------------------------------
#>  package     * version date       lib source        
#>  assertthat    0.2.1   2019-03-21 [1] CRAN (R 3.6.0)
#>  backports     1.1.5   2019-10-02 [1] CRAN (R 3.6.1)
#>  callr         3.3.2   2019-09-22 [1] CRAN (R 3.6.1)
#>  class         7.3-15  2019-01-01 [2] CRAN (R 3.6.1)
#>  classInt      0.4-2   2019-10-17 [1] CRAN (R 3.6.1)
#>  cli           1.1.0   2019-03-19 [1] CRAN (R 3.6.0)
#>  crayon        1.3.4   2017-09-16 [1] CRAN (R 3.6.0)
#>  DBI           1.0.0   2018-05-02 [1] CRAN (R 3.6.0)
#>  desc          1.2.0   2018-05-01 [1] CRAN (R 3.6.0)
#>  devtools      2.2.1   2019-09-24 [1] CRAN (R 3.6.1)
#>  digest        0.6.23  2019-11-23 [1] CRAN (R 3.6.1)
#>  e1071         1.7-2   2019-06-05 [1] CRAN (R 3.6.0)
#>  ellipsis      0.3.0   2019-09-20 [1] CRAN (R 3.6.1)
#>  evaluate      0.14    2019-05-28 [1] CRAN (R 3.6.0)
#>  fs            1.3.1   2019-05-06 [1] CRAN (R 3.6.1)
#>  glue          1.3.1   2019-03-12 [1] CRAN (R 3.6.0)
#>  highr         0.8     2019-03-20 [1] CRAN (R 3.6.0)
#>  htmltools     0.4.0   2019-10-04 [1] CRAN (R 3.6.1)
#>  KernSmooth    2.23-15 2015-06-29 [2] CRAN (R 3.6.1)
#>  knitr         1.26    2019-11-12 [1] CRAN (R 3.6.1)
#>  magrittr      1.5     2014-11-22 [1] CRAN (R 3.6.0)
#>  memoise       1.1.0   2017-04-21 [1] CRAN (R 3.6.0)
#>  pkgbuild      1.0.6   2019-10-09 [1] CRAN (R 3.6.1)
#>  pkgload       1.0.2   2018-10-29 [1] CRAN (R 3.6.0)
#>  prettyunits   1.0.2   2015-07-13 [1] CRAN (R 3.6.0)
#>  processx      3.4.1   2019-07-18 [1] CRAN (R 3.6.0)
#>  ps            1.3.0   2018-12-21 [1] CRAN (R 3.6.0)
#>  R6            2.4.1   2019-11-12 [1] CRAN (R 3.6.1)
#>  Rcpp          1.0.3   2019-11-08 [1] CRAN (R 3.6.1)
#>  remotes       2.1.0   2019-06-24 [1] CRAN (R 3.6.1)
#>  rlang         0.4.2   2019-11-23 [1] CRAN (R 3.6.1)
#>  rmarkdown     1.17    2019-11-13 [1] CRAN (R 3.6.1)
#>  rprojroot     1.3-2   2018-01-03 [1] CRAN (R 3.6.0)
#>  sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 3.6.0)
#>  sf            0.8-0   2019-09-17 [1] CRAN (R 3.6.1)
#>  stringi       1.4.3   2019-03-12 [1] CRAN (R 3.6.0)
#>  stringr       1.4.0   2019-02-10 [1] CRAN (R 3.6.0)
#>  testthat      2.3.0   2019-11-05 [1] CRAN (R 3.6.1)
#>  units         0.6-5   2019-10-08 [1] CRAN (R 3.6.1)
#>  usethis       1.5.1   2019-07-04 [1] CRAN (R 3.6.1)
#>  withr         2.1.2   2018-03-15 [1] CRAN (R 3.6.0)
#>  xfun          0.11    2019-11-12 [1] CRAN (R 3.6.1)
#>  yaml          2.2.0   2018-07-25 [1] CRAN (R 3.6.0)
#> 
#> [1] C:/Users/Utente/Documents/R/win-library/3.6
#> [2] C:/Program Files/R/R-3.6.1/library

All 34 comments

I see the same warning, but why do you think it is an error? Do you have an indication that not all the features are being read?

I never checked that (and I'm not sure how) but this morning I tested that hypothesis with the following expertiment (following the same ideas as in #1157). I think it shows some evidence that some features are missing with the OGR_INTERLEAVED_READING warning.

# packages
library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.2.3, PROJ 4.9.3
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

# download some data
download.file("http://download.geofabrik.de/europe/isle-of-man-latest.osm.pbf", "isle-of-man.osm.pbf", mode = "wb")
download.file("http://download.geofabrik.de/europe/andorra-latest.osm.pbf", "andorra.osm.pbf", mode = "wb")
download.file("http://download.geofabrik.de/europe/great-britain/england/greater-london-latest.osm.pbf", "greater-london.osm.pbf", mode = "wb")

# isle of man example
isle_of_man <- st_read(
  "isle-of-man.osm.pbf",
  layer = "lines", 
  stringsAsFactors = FALSE
)
#> Reading layer `lines' from data source `C:\Users\Utente\AppData\Local\Temp\Rtmp2FNMdi\reprex1bd45b8b2596\isle-of-man.osm.pbf' using driver `OSM'
#> Simple feature collection with 12943 features and 9 fields
#> geometry type:  LINESTRING
#> dimension:      XY
#> bbox:           xmin: -20.05167 ymin: 48.18019 xmax: -2.913484 ymax: 56.7
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs
nrow(isle_of_man %>% filter(highway == "primary"))
#> [1] 462

isle_of_man_query <- st_read(
  "isle-of-man.osm.pbf",
  layer = "lines", 
  query = "SELECT * FROM lines WHERE (highway = 'primary')", 
  stringsAsFactors = FALSE
)
#> Reading layer `lines' from data source `C:\Users\Utente\AppData\Local\Temp\Rtmp2FNMdi\reprex1bd45b8b2596\isle-of-man.osm.pbf' using driver `OSM'
#> Simple feature collection with 462 features and 9 fields
#> geometry type:  LINESTRING
#> dimension:      XY
#> bbox:           xmin: -4.792454 ymin: 54.06365 xmax: -4.31848 ymax: 54.41379
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs
nrow(isle_of_man_query)
#> [1] 462

# andorra example
andorra <- st_read(
  "andorra.osm.pbf",
  layer = "lines", 
  stringsAsFactors = FALSE
)
#> Reading layer `lines' from data source `C:\Users\Utente\AppData\Local\Temp\Rtmp2FNMdi\reprex1bd45b8b2596\andorra.osm.pbf' using driver `OSM'
#> Simple feature collection with 5439 features and 9 fields
#> geometry type:  LINESTRING
#> dimension:      XY
#> bbox:           xmin: 0.975577 ymin: 42.32422 xmax: 1.824654 ymax: 42.78834
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs
nrow(andorra %>% filter(highway == "primary"))
#> [1] 460

andorra_query <- st_read(
  "andorra.osm.pbf",
  layer = "lines", 
  query = "SELECT * FROM lines WHERE (highway = 'primary')", 
  stringsAsFactors = FALSE
)
#> Reading layer `lines' from data source `C:\Users\Utente\AppData\Local\Temp\Rtmp2FNMdi\reprex1bd45b8b2596\andorra.osm.pbf' using driver `OSM'
#> Simple feature collection with 460 features and 9 fields
#> geometry type:  LINESTRING
#> dimension:      XY
#> bbox:           xmin: 1.419741 ymin: 42.43526 xmax: 1.737154 ymax: 42.63297
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs
nrow(andorra_query)
#> [1] 460

# greater london example
greater_london <- st_read(
  "greater-london.osm.pbf",
  layer = "lines", 
  stringsAsFactors = FALSE
)
#> Reading layer `lines' from data source `C:\Users\Utente\AppData\Local\Temp\Rtmp2FNMdi\reprex1bd45b8b2596\greater-london.osm.pbf' using driver `OSM'
#> Warning in CPL_read_ogr(dsn, layer, query, as.character(options), quiet, :
#> GDAL Error 1: Too many features have accumulated in points layer. Use
#> OGR_INTERLEAVED_READING=YES mode
#> Simple feature collection with 62785 features and 9 fields
#> geometry type:  LINESTRING
#> dimension:      XY
#> bbox:           xmin: -0.6124681 ymin: 51.2822 xmax: 0.3760979 ymax: 51.78736
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs
nrow(greater_london %>% filter(highway == "primary"))
#> [1] 2094

greater_london_query <- st_read(
  "greater-london.osm.pbf",
  layer = "lines", 
  query = "SELECT * FROM lines WHERE (highway = 'primary')", 
  stringsAsFactors = FALSE
)
#> Reading layer `lines' from data source `C:\Users\Utente\AppData\Local\Temp\Rtmp2FNMdi\reprex1bd45b8b2596\greater-london.osm.pbf' using driver `OSM'
#> Simple feature collection with 12600 features and 9 fields
#> geometry type:  LINESTRING
#> dimension:      XY
#> bbox:           xmin: -0.5022575 ymin: 51.28324 xmax: 0.2661656 ymax: 51.68595
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs
nrow(greater_london_query)
#> [1] 12600

Created on 2019-12-10 by the reprex package (v0.3.0)

I applied the same idea in all the examples: in the first part of each example I read the data using st_read and then I applied a filter (i.e. highway == "primary") while in the second part of each example I applied the same query before reading in the data (i.e. using the query parameter of st_read). In the first two examples I get the same number of rows using both methods while in the third example I get a OGR_INTERLEAVED_READING warning and less features in the first method wrt the second method.

Just to confirm: I can reproduce this issue with GDAL 3.0.2 and have found a work-around, convert to another file format with ogr2ogr first:

# Aim: test gdal's interleaved settings

gdal-config --version

wget http://download.geofabrik.de/europe/great-britain/england/greater-london-latest.osm.pbf

ogr2ogr \
  -f GPKG lnd.gpkg \
  greater-london-latest.osm.pbf \
  "lines"

Then back in R:

greater_london_gpkg = read_sf("lnd.gpkg")
nrow(greater_london_gpkg %>% filter(highway == "primary")) # 12600

Yes, or all in R:

> gdal_utils("vectortranslate", "greater-london-latest.osm.pbf", "greater-london-latest.gpkg")
> st_layers("greater-london-latest.gpkg")
Driver: GPKG 
Available layers:
        layer_name       geometry_type features fields
1           points               Point   326046     10
2            lines         Line String   352014      9
3 multilinestrings   Multi Line String     4332      4
4    multipolygons       Multi Polygon   544434     25
5  other_relations Geometry Collection    15859      4
> st_layers("greater-london-latest.osm.pbf", do_count = TRUE)
Driver: OSM 
Available layers:
        layer_name       geometry_type features fields
1           points               Point   326046     10
2            lines         Line String    62788      9
3 multilinestrings   Multi Line String        0      4
4    multipolygons       Multi Polygon        0     25
5  other_relations Geometry Collection        0      4
Warning messages:
1: In CPL_get_layers(dsn, options, do_count) :
  GDAL Error 1: Too many features have accumulated in lines layer. Use OGR_INTERLEAVED_READING=YES mode
2: In CPL_get_layers(dsn, options, do_count) :
  GDAL Error 1: Too many features have accumulated in points layer. Use OGR_INTERLEAVED_READING=YES mode

Hi @edzer not quite related to this issue, but is there a reason that srsinfo is not supported in gdal_utils?
Use-case here

Yes, it is not part of the GDAL C API for GDAL utilities.

Ah I see, thanks for clarifying!

I just tested that code and it doesn't work on my laptop since I get the following error:

library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.2.3, PROJ 4.9.3

> gdal_utils("vectortranslate", "greater-london-latest.osm.pbf", "greater-london-latest.gpkg")
#> Error in gdal_utils("vectortranslate", "greater-london-latest.osm.pbf",  : 
#> gdal_utils vectortranslate: an error occured
#> In addition: Warning messages:
#> 1: In CPL_gdalvectortranslate(source, destination, options) :
#> GDAL Message 6: Normalized/laundered field name: 'admin_level' to 'admin_le_3'
#> 2: In CPL_gdalvectortranslate(source, destination, options) :
#> GDAL Error 6: Geometry type of `Geometry Collection' not supported in shapefiles.  Type can be #> overridden with a layer creation option of SHPT=POINT/ARC/POLYGON/MULTIPOINT/POINTZ/ARCZ/POLYGONZ/MULTIPOINTZ/MULTIPATCH.

Session info

devtools::session_info()
#> - Session info ---------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.6.1 (2019-07-05)
#>  os       Windows 10 x64              
#>  system   x86_64, mingw32             
#>  ui       RTerm                       
#>  language EN                          
#>  collate  English_United Kingdom.1252 
#>  ctype    English_United Kingdom.1252 
#>  tz       Europe/Berlin               
#>  date     2019-12-10                  
#> 
#> - Packages -------------------------------------------------------------------
#>  package     * version date       lib source                       
#>  assertthat    0.2.1   2019-03-21 [1] CRAN (R 3.6.0)               
#>  backports     1.1.5   2019-10-02 [1] CRAN (R 3.6.1)               
#>  callr         3.4.0   2019-12-09 [1] CRAN (R 3.6.1)               
#>  class         7.3-15  2019-01-01 [2] CRAN (R 3.6.1)               
#>  classInt      0.4-2   2019-10-17 [1] CRAN (R 3.6.1)               
#>  cli           1.1.0   2019-03-19 [1] CRAN (R 3.6.0)               
#>  crayon        1.3.4   2017-09-16 [1] CRAN (R 3.6.0)               
#>  DBI           1.0.0   2018-05-02 [1] CRAN (R 3.6.0)               
#>  desc          1.2.0   2018-05-01 [1] CRAN (R 3.6.0)               
#>  devtools      2.2.1   2019-09-24 [1] CRAN (R 3.6.1)               
#>  digest        0.6.23  2019-11-23 [1] CRAN (R 3.6.1)               
#>  e1071         1.7-3   2019-11-26 [1] CRAN (R 3.6.1)               
#>  ellipsis      0.3.0   2019-09-20 [1] CRAN (R 3.6.1)               
#>  evaluate      0.14    2019-05-28 [1] CRAN (R 3.6.0)               
#>  fs            1.3.1   2019-05-06 [1] CRAN (R 3.6.1)               
#>  glue          1.3.1   2019-03-12 [1] CRAN (R 3.6.0)               
#>  highr         0.8     2019-03-20 [1] CRAN (R 3.6.0)               
#>  htmltools     0.4.0   2019-10-04 [1] CRAN (R 3.6.1)               
#>  KernSmooth    2.23-15 2015-06-29 [2] CRAN (R 3.6.1)               
#>  knitr         1.26    2019-11-12 [1] CRAN (R 3.6.1)               
#>  magrittr      1.5     2014-11-22 [1] CRAN (R 3.6.0)               
#>  memoise       1.1.0   2017-04-21 [1] CRAN (R 3.6.0)               
#>  pkgbuild      1.0.6   2019-10-09 [1] CRAN (R 3.6.1)               
#>  pkgload       1.0.2   2018-10-29 [1] CRAN (R 3.6.0)               
#>  prettyunits   1.0.2   2015-07-13 [1] CRAN (R 3.6.0)               
#>  processx      3.4.1   2019-07-18 [1] CRAN (R 3.6.0)               
#>  ps            1.3.0   2018-12-21 [1] CRAN (R 3.6.0)               
#>  R6            2.4.1   2019-11-12 [1] CRAN (R 3.6.1)               
#>  Rcpp          1.0.3   2019-11-08 [1] CRAN (R 3.6.1)               
#>  remotes       2.1.0   2019-06-24 [1] CRAN (R 3.6.1)               
#>  rlang         0.4.2   2019-11-23 [1] CRAN (R 3.6.1)               
#>  rmarkdown     1.18    2019-11-27 [1] CRAN (R 3.6.1)               
#>  rprojroot     1.3-2   2018-01-03 [1] CRAN (R 3.6.0)               
#>  sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 3.6.0)               
#>  sf          * 0.8-1   2019-12-10 [1] Github (r-spatial/sf@60208b7)
#>  stringi       1.4.3   2019-03-12 [1] CRAN (R 3.6.0)               
#>  stringr       1.4.0   2019-02-10 [1] CRAN (R 3.6.0)               
#>  testthat      2.3.1   2019-12-01 [1] CRAN (R 3.6.1)               
#>  units         0.6-5   2019-10-08 [1] CRAN (R 3.6.1)               
#>  usethis       1.5.1   2019-07-04 [1] CRAN (R 3.6.1)               
#>  withr         2.1.2   2018-03-15 [1] CRAN (R 3.6.0)               
#>  xfun          0.11    2019-11-12 [1] CRAN (R 3.6.1)               
#>  yaml          2.2.0   2018-07-25 [1] CRAN (R 3.6.0)               
#> 
#> [1] C:/Users/Utente/Documents/R/win-library/3.6
#> [2] C:/Program Files/R/R-3.6.1/library

so I think that it's a problem related to my GDAL version. I followed the instructions here for updating the GDAL distribution on windows using Conda and I completed all the steps but I still get the same "old" GDAL version linked to sf package. Sorry for the stupid question, but how can I update that?

Switch to Linux ; )

Support for auto-recognizing target driver from file name extension (i.e. .gpkg means geopackage) was apparently added to GDAL after 2.2.3; you'd have to specify the output format, which you can do. Anyway, this is getting of topic, so closing now.

Thanks, and just for future reference, I solved that problem as follows:

gdal_utils(
  util = "vectortranslate", 
  source = "greater-london-latest.osm.pbf", 
  destination = "greater-london-latest.gpkg", 
  options = c("-f", "GPKG")
)

If you want I could create a PR and add this example to gdal_utils help page. I've never used GDAL (so maybe it's an obvious example) but it took me a lot of time to understand that options = "-f GPKG" was not the correct syntax.

PR welcome!

I'm not sure the original issue is solved. @edzer do you think the best solution in this situation, building on the reprex above, is to first convert the object and then read it in, e.g.:

library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.2, PROJ 6.2.1
download.file("http://download.geofabrik.de/europe/great-britain/england/greater-london-latest.osm.pbf", "greater-london-latest.osm.pbf", mode = "wb")

gdal_utils(
  util = "vectortranslate", 
  source = "greater-london-latest.osm.pbf", 
  destination = "greater-london-latest.gpkg", 
  options = c("-f", "GPKG")
)

greater_london_gpkg = read_sf("greater-london-latest.gpkg", layer = "lines")
nrow(greater_london_gpkg)
#> [1] 352158

Created on 2019-12-10 by the reprex package (v0.3.0)

I imagine there could be a faster way, e.g. selecting the layer in the translate stage, but not sure.

Out of interest I did a benchmark. An interesting question is 'at what point is interleaving/translate needed? For context, @agila5 and I are working on a package that aims to make access to large OSM datasets quicker and easier for R users (comments/suggestions on this welcome): https://github.com/ITSLeeds/geofabric

library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.2, PROJ 6.2.1
setwd("~/atfutures/who-data/")

get_highways1 = function(x, highway_type) {
  x_gpkg = paste0(x, ".gpkg")
  gdal_utils(
    util = "vectortranslate", 
    source = x, 
    destination = x_gpkg, 
    options = c("-f", "GPKG")
  )
  full_res = read_sf(x_gpkg, layer = "lines")
  full_res[full_res$highway == highway_type, ]
}

get_highways2 = function(x, highway_type) {
  q = paste0("SELECT * FROM lines WHERE (highway = '", highway_type, "')")
  read_sf(x, layer = "lines", query = q)
}

h1 = get_highways1("greater-london-latest.osm.pbf", "primary")
h2 = get_highways2("greater-london-latest.osm.pbf", "primary")

plot(st_geometry(h1))
plot(st_geometry(h2))

bench::mark(iterations = 1, check = F,
            get_highways1("greater-london-latest.osm.pbf", "primary"),
            get_highways2("greater-london-latest.osm.pbf", "primary")
            )
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 2 x 6
#>   expression                                                  min median
#>   <bch:expr>                                                <bch> <bch:>
#> 1 get_highways1("greater-london-latest.osm.pbf", "primary") 30.1s  30.1s
#> 2 get_highways2("greater-london-latest.osm.pbf", "primary")  2.7s   2.7s
#> # … with 3 more variables: `itr/sec` <dbl>, mem_alloc <bch:byt>, `gc/sec` <dbl>

Created on 2019-12-11 by the reprex package (v0.3.0)

I'm not sure the original issue is solved.

IMO the issue is not solved from the geofabric package pov and for pbf much bigger than my original example. For example, the PBF file of the region where I live is here and it's almost 400MB. The corresponding GPKG file is 2.2GB and the conversion time is more than 15 minutes.

> system.time({
+   gdal_utils(
+     util = "vectortranslate", 
+     source = "nord-ovest-latest.osm.pbf", 
+     destination = "nord-ovest-latest.gpkg", 
+     options = c("-f", "GPKG")
+   )
+ })
   user  system elapsed 
 964.39  425.41 1464.15 
> file.size("nord-ovest-latest.gpkg") / 1e9
[1] 2.404442

Still I'm not sure if it's possible to read the PBF file without the conversion e.g. by properly setting the INTERLEAVED_READING option (and check the timings) since I don't know how to do that.

An interesting question is 'at what point is interleaving/translate needed?

I read the GDAL documentation of the PBF format but they did not mention anything about the "maximum size" (and I'm not sure it does make sense to talk about a max size). They explain the problem in the interleaved reading section.

Interesting, does that section mean that OGR_INTERLEAVED_READING=YES is outdated on recent versions of GDAL?

Starting with GDAL 2.2, applications should use the GDALDataset::GetNextFeature() API to iterate over features in the order they are produced.

@edzer this issue does not seem resolved and am not sure why it was closed, other than it briefly going off topic.

It seems OGR_... is, but INTERLEAVED_READING=YES is not outdated:

edzer@gin-edzer:/tmp$ ogrinfo -oo INTERLEAVED_READING=YES *pbf points > x2
edzer@gin-edzer:/tmp$ ogrinfo -oo OGR_INTERLEAVED_READING=YES *pbf points > x1
Warning 6: driver OSM does not support open option OGR_INTERLEAVED_READING
ERROR 1: Too many features have accumulated in lines layer. Use OGR_INTERLEAVED_READING=YES mode

but that the effect is the same (x1 and x2 are identical). My idea behind closing this was that I felt it was not an sf issue, but if anything, a GDAL issue. What do you think?

I think it may be a GDAL issue and was considering raising an issue. If you're up for mentoring us through the process and help us develop a decent reprex (e.g. based on the rich examples above) I would happily submit one. Note, thought, that I've commented on a closed related issue here: https://github.com/OSGeo/gdal/issues/1785

From what I can gather a good starting point for a GDAL reprex could be something like:

wget http://download.geofabrik.de/europe/great-britain/england/greater-london-latest.osm.pbf
ogrinfo -oo OGR_INTERLEAVED_READING=YES *pbf points > x1

Although I must confess I'm not 100% sure what the second line does and suspect we may get a better response if you raise the issue on our (and sf's) behalf.

I think the warning & error message sufficiently illustrate the problem, and that the first command

ogrinfo -oo INTERLEAVED_READING=YES *pbf points > x2

suggests the docs are correct, but that the error message gives the wrong hint.

OK, shall I open a ticket on the GitHub Issue tracker?

And will the solution to that issue resolve this issue sf side (or do we just need to add some incantation like options = c("INTERLEAVED_READING", "YES") (I've not tested this)?

And will the solution to that issue resolve this issue sf side (or do we just need to add some incantation like options = c("INTERLEAVED_READING", "YES") (I've not tested this)?

What makes you think so?

Tested it and no it doesn't work:

greater_london <- st_read(
+   "greater-london-latest.osm.pbf",
+   layer = "lines", 
+   stringsAsFactors = FALSE,
+   options = c("INTERLEAVED_READING", "YES")
+ )
options:        INTERLEAVED_READING YES 
Reading layer `lines' from data source `/mnt/57982e2a-2874-4246-a6fe-115c199bc6bd/atfutures/creds2/od-data/greater-london-latest.osm.pbf' using driver `OSM'
Simple feature collection with 0 features and 9 fields
bbox:           xmin: NA ymin: NA xmax: NA ymax: NA
epsg (SRID):    4326
proj4string:    +proj=longlat +datum=WGS84 +no_defs
Warning messages:
1: In CPL_read_ogr(dsn, layer, query, as.character(options), quiet,  :
  GDAL Message 6: open option 'INTERLEAVED_READING' is not formatted with the key=value format
2: In CPL_read_ogr(dsn, layer, query, as.character(options), quiet,  :
  GDAL Message 6: open option 'YES' is not formatted with the key=value format

This one did though, issue solved I think :tada:

greater_london <- st_read(
+   "greater-london-latest.osm.pbf",
+   layer = "lines", 
+   stringsAsFactors = FALSE,
+   options = "INTERLEAVED_READING=YES")
options:        INTERLEAVED_READING=YES 
Reading layer `lines' from data source `/mnt/57982e2a-2874-4246-a6fe-115c199bc6bd/atfutures/creds2/od-data/greater-london-latest.osm.pbf' using driver `OSM'
Simple feature collection with 0 features and 9 fields
bbox:           xmin: NA ymin: NA xmax: NA ymax: NA
epsg (SRID):    4326
proj4string:    +proj=longlat +datum=WGS84 +no_defs

Can you reproduce this @agila5 ?

I think it's the same example as in my first comment and the problem is that with that option it reads 0 features. It doesn't make sense, right?

I agree, but I get the same when I try

edzer@gin-edzer:/tmp$ ogrinfo -oo INTERLEAVED_READING=YES *pbf lines > l2
edzer@gin-edzer:/tmp$ ogrinfo *pbf lines > l1
ERROR 1: Too many features have accumulated in points layer. Use OGR_INTERLEAVED_READING=YES mode
edzer@gin-edzer:/tmp$ wc l1 l2
  470818  1744967 19993830 l1
      30       74      831 l2
  470848  1745041 19994661 total

suggesting it is a GDAL issue, not an sf issue.

Correct, it returns 0 features, sorry for going round in circles. Looks like a GDAL issue that could solve this issue when closed (it may be worth keeping this issue open as an issue that requires changes in GDAL to be closed). Final question: how to I demonstrate that

ogrinfo -oo OGR_INTERLEAVED_READING=YES *pbf points > x1

yields 0 features?

Open x1 in a text editor?

Please proof-read this issue:

Dear GDAL dev team, I think there is an issue with the osm driver. The error message it produces does not seem to be useful (it says Use OGR_INTERLEAVED_READING=YES when that option is specified) and when using INTERLEAVED_READING=YES, which I think is correctly documented here https://gdal.org/drivers/vector/osm.html#interleaved-reading , the results seem to have no features.

Please could you try to reproduce the (hopefully reproducible) example below?

# Aim: demonstrate issues with the OSM driver

gdalinfo --version
# $ GDAL 2.4.2, released 2019/06/28

# get data
wget http://download.geofabrik.de/europe/great-britain/england/greater-london-latest.osm.pbf

ogrinfo -oo OGR_INTERLEAVED_READING=YES *pbf lines > x2
# $ Warning 6: driver OSM does not support open option OGR_INTERLEAVED_READING
# $ ERROR 1: Too many features have accumulated in points layer. Use OGR_INTERLEAVED_READING=YES mode
ogrinfo -oo INTERLEAVED_READING=YES *pbf lines > x3
wc x3
# $  27  65 740 x3 # contains no features

This may be related to https://github.com/OSGeo/gdal/issues/1785.

See here for use case and some additional tests from within R: https://github.com/r-spatial/sf/issues/1213

(If you post that comment as a github issue I would also cite the other related github issue, i.e. https://github.com/OSGeo/gdal/issues/1785, maybe helps with the context.)

Thanks @agila5, I've added the link to that closed issue. @edzer good to go from your perspective?

I think we have a solution. In https://github.com/OSGeo/gdal/issues/2100 Even Rouault suggests converting only the layer we need to a GeoPackage before reading. I was sceptical about this 2 stage process but it seems it is indeed the quickest way, as benchmarked below (with conversion only needing to run once, hence 2 get3() calls):

get1 = function(x = "greater-london-latest.osm.pbf", highway_type = "residential") {
  x_gpkg = paste0(x, ".gpkg")
  gdal_utils(
    util = "vectortranslate",
    source = x,
    destination = x_gpkg,
    options = c("-f", "GPKG")
  )
  full_res = read_sf(x_gpkg, layer = "lines")
  full_res[full_res$highway == highway_type, ]
}

get2 = function(x = "greater-london-latest.osm.pbf", highway_type = "residential") {
  q = paste0("SELECT * FROM lines WHERE (highway = '", highway_type, "')")
  read_sf(x, layer = "lines", query = q)
}

get3 = function(x = "greater-london-latest.osm.pbf", highway_type = "residential") {
  q = paste0("SELECT * FROM lines WHERE (highway = '", highway_type, "')")
  dest_filename = paste0(x, ".gpkg")
  if(!file.exists(dest_filename)) {
    sf::gdal_utils(
      util = "vectortranslate",
      source = "greater-london-latest.osm.pbf",
      destination = dest_filename,
      options = c("-f", "GPKG", "lines")
    )
  }
  st_read(dest_filename, query = q)
}

h1 = get1()
h2 = get2()
h3 = get3()
nrow(h1)
nrow(h2)
nrow(h3)

file.remove("greater-london-latest.osm.pbf.gpkg")

bench::mark(iterations = 1, check = FALSE, get1(), get2(), get3(), get3())

Could you try to reproduce this @agila5 to check it works on Windows? If so I suggest we go with this solution in geofabric.

I get slighlty different results between h1 and h2 and h3 because full_res[full_res$highway == highway_type, ] includes only the rows in which highway is NA. In fact,

> table(h1$highway, useNA = "ifany")
residential        <NA> 
      70406       74234 

Another thing that I noticed is that the geometry column in h2 isn't called geom or geometry

> colnames(h2)
 [1] "osm_id"         "name"           "highway"        "waterway"       "aerialway"     
 [6] "barrier"        "man_made"       "z_order"        "other_tags"     "_ogr_geometry_"

If I correct those "problems", then I get:

> bench::mark(iterations = 10, get1(), get2(), get3())
# A tibble: 3 x 13
  expression    min median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result memory time 
  <bch:expr> <bch:> <bch:>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list> <list> <lis>
1 get1()      1.61m  1.67m    0.0100   193.4MB   0.0631    10    63      16.6m <tibb~ <df[,~ <bch~
2 get2()     20.96s 21.38s    0.0463    26.5MB   0.0370    10     8       3.6m <tibb~ <df[,~ <bch~
3 get3()      1.45s   1.5s    0.658     26.5MB   0.592     10     9      15.2s <tibb~ <df[,~ <bch~
# ... with 1 more variable: gc <list>
Warning message:
Some expressions had a GC in every iteration; so filtering is disabled. 

Anyway I will add more tests and I think that now we can move the discussion to geofabrik repo.

Great, all that is pointing towards the 'h3' solution for me, main thing, it works (touch wood)! Great thread, good idea to move it back to geofabric repo.

Was this page helpful?
0 / 5 - 0 ratings