I am having trouble reading in a GML file with st_read. I get the following message:
Reading layer `Road' from data source `D:\Data\OS\MasterMapRoads\MasterMap Highways Network_rami_2408772\Highways_Rrami_Road_FULL_001.gml.gz' using driver `GML'
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows: 1, 0, 2
In addition: Warning message:
no simple feature geometries present: returning a data.frame or tbl_df
These files come from Ordance Survey https://www.ordnancesurvey.co.uk/business-and-government/products/os-mastermap-highways-network.html
The file is valid and can be opened in QGIS. Unfortunately, the files are copyrighted so I can't post an example. I can provide the data structure
<GMLFeatureClassList>
<GMLFeatureClass>
<Name>Road</Name>
<ElementPath>Road</ElementPath>
<GeometryType>100</GeometryType>
<DatasetSpecificInfo>
<FeatureCount>88000</FeatureCount>
</DatasetSpecificInfo>
<PropertyDefn>
<Name>identifier</Name>
<ElementPath>identifier</ElementPath>
<Type>String</Type>
<Width>37</Width>
</PropertyDefn>
<PropertyDefn>
<Name>beginLifespanVersion</Name>
<ElementPath>beginLifespanVersion</ElementPath>
<Type>String</Type>
<Width>23</Width>
</PropertyDefn>
<PropertyDefn>
<Name>localId</Name>
<ElementPath>inspireId|Identifier|localId</ElementPath>
<Type>Integer</Type>
<Subtype>Integer64</Subtype>
</PropertyDefn>
<PropertyDefn>
<Name>namespace</Name>
<ElementPath>inspireId|Identifier|namespace</ElementPath>
<Type>String</Type>
<Width>18</Width>
</PropertyDefn>
<PropertyDefn>
<Name>validFrom</Name>
<ElementPath>validFrom</ElementPath>
<Type>String</Type>
<Width>18</Width>
</PropertyDefn>
<PropertyDefn>
<Name>name</Name>
<ElementPath>designatedName|DesignatedNameType|name</ElementPath>
<Type>StringList</Type>
</PropertyDefn>
<PropertyDefn>
<Name>designatedName|DesignatedNameType|namingAuthority|ResponsibleAuthority|identifier</Name>
<ElementPath>designatedName|DesignatedNameType|namingAuthority|ResponsibleAuthority|identifier</ElementPath>
<Type>IntegerList</Type>
</PropertyDefn>
<PropertyDefn>
<Name>authorityName</Name>
<ElementPath>designatedName|DesignatedNameType|namingAuthority|ResponsibleAuthority|authorityName</ElementPath>
<Type>StringList</Type>
</PropertyDefn>
<PropertyDefn>
<Name>reasonForChange</Name>
<ElementPath>reasonForChange</ElementPath>
<Type>String</Type>
<Width>19</Width>
</PropertyDefn>
<PropertyDefn>
<Name>nationalRoadCode</Name>
<ElementPath>nationalRoadCode</ElementPath>
<Type>String</Type>
<Width>6</Width>
</PropertyDefn>
<PropertyDefn>
<Name>roadClassification</Name>
<ElementPath>roadClassification</ElementPath>
<Type>String</Type>
<Width>8</Width>
</PropertyDefn>
</GMLFeatureClass>
</GMLFeatureClassList>
I wonder if the problem is that this file contains Integer and String Lists as some of the field types.
When using rgdal::readOGR I get the following message:
Error in rgdal::readOGR(path) : no features found
In addition: Warning message:
In ogrInfo(dsn = dsn, layer = layer, encoding = encoding, use_iconv = use_iconv, :
ogrInfo: all features NULL
Please provide a file offline.
I have sent a link by email, to the address on your GitHub profile
Thanks. Your intuition was right: this was caused by a list column that mostly contains 1, but sometimes 0 or 2 elements. data.frames don't like that when you use the data.frame() or as.data.frame() constructors. OTOH, tibbles don't have problems with this. So far, read_sf would read an object as data.frame with st_read, then convert its output to tibble. With this patch, you can use read_sf to read your GML, it no longer tries to mold it into a data.frame first but creates a tibble directly.
Confirmed that works for me too, thanks for the quick fix.
@edzer @mem48 could you please share the file with me so that I can see what rgdal does with it?
Unfortunately, the file seems to report wkbUnknown for all features (coded 0), so rgdal::ogrInfo() reports no valid features:
> Road <- ogrInfo("Highways_Rrami_Road_FULL_001.gml", "Road")
Warning message:
In ogrInfo("Highways_Rrami_Road_FULL_001.gml", "Road") :
ogrInfo: all features NULL
> str(Road)
List of 11
$ nrows : int 88000
$ nitems : int 12
$ iteminfo :List of 5
..$ name : chr [1:12] "gml_id" "identifier" "beginLifespanVersion" "localId" ...
..$ type : int [1:12] 4 4 4 12 4 4 5 1 5 4 ...
..$ length : int [1:12] 0 37 23 0 18 0 0 0 0 19 ...
..$ typeName : chr [1:12] "String" "String" "String" "Integer64" ...
..$ maxListCount: int [1:12] 0 0 0 0 0 0 2 2 2 0 ...
$ driver : chr "GML"
$ extent : NULL
$ nListFields : int 3
$ have_features : logi FALSE
$ null_geometries: chr "Null geometry IDs: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 2"| __truncated__
$ dsn : chr "/home/rsb/tmp/bigshape/Highways_Rrami_Road_FULL_001.gml"
$ layer : chr "Road"
$ p4s : chr NA
- attr(*, "class")= chr "ogrinfo"
This comes from:
> res <- .Call("R_OGR_types", "Highways_Rrami_Road_FULL_001.gml", "Road")
> str(res)
List of 6
$ dsn : chr "Highways_Rrami_Road_FULL_001.gml"
$ layer : chr "Road"
$ proj4string: int 858994740
$ geomTypes : int [1:88000] 0 0 0 0 0 0 0 0 0 0 ...
$ with_z : int [1:88000] 0 0 0 0 0 0 0 0 0 0 ...
$ isNULL : int [1:88000] 1 1 1 1 1 1 1 1 1 1 ...
> table(res$geomTypes)
0
88000
If the file has all geometries of type unknown (this seems to be a network structure), then rgdal will not read it by design, as there are no geometries/features to read.
I can't get current sf to read the file anyway - what was your incantation?
I used the devtools::install_github("r-spatial/sf") and then read_sf() and it worked.
This may be a bit of an Ordnance Survey specific case but they provide all their data as GML whether it contains geometries or not. They are the UK national mapping agency so there is a lot of demand in the UK for their data.
Thanks, I'd used st_read(). With Road <- read_sf("Highways_Rrami_Road_FULL_001.gml") it does work. rgdal::readOGR() only uses the dropNULLGeometries= argument when some geometries are not NULL - I'll add a note to the documentation and probably modify the error message; maybe even branch on all NULL to return a data.frame - rgdal already propagates list fields.
Thanks for this case - it makes sense to handle it properly.
As of r 768, rgdal::readOGR() (on R-Forge) handles this as:
> Roadxx <- ogrInfo("Highways_Rrami_Road_FULL_001.gml", "Road")
Warning message:
In ogrInfo("Highways_Rrami_Road_FULL_001.gml", "Road") :
ogrInfo: all features NULL
> Roadxxdf <- readOGR("Highways_Rrami_Road_FULL_001.gml", "Road", dropNULLGeometries=FALSE)
OGR data source with driver: GML
Source: "/home/rsb/tmp/bigshape/Highways_Rrami_Road_FULL_001.gml", layer: "Road"
with 88000 features
It has 12 fields, of which 3 list fields
Integer64 fields read as strings: localId
Warning messages:
1: In ogrInfo(dsn = dsn, layer = layer, encoding = encoding, use_iconv = use_iconv, :
ogrInfo: all features NULL
2: In readOGR("Highways_Rrami_Road_FULL_001.gml", "Road", dropNULLGeometries = FALSE) :
no features found; proceeding to atttributes only
> names(Roadxxdf)
[1] "gml_id"
[2] "identifier"
[3] "beginLifespanVersion"
[4] "localId"
[5] "namespace"
[6] "validFrom"
[7] "name1"
[8] "name2"
[9] "designatedName.DesignatedNameType.namingAuthority.ResponsibleAuthority.identifier1"
[10] "designatedName.DesignatedNameType.namingAuthority.ResponsibleAuthority.identifier2"
[11] "authorityName1"
[12] "authorityName2"
[13] "reasonForChange"
[14] "nationalRoadCode"
[15] "roadClassification"
I'm having the same problem reading a layer from a NAS file (which is, to the best of my knowledge, a special form of GML and only used in Germany).
Using the devtools::install_github("r-spatial/sf") and then read_sf(), like suggested by @mem48 did not work.
The NAS example data can be downloaded here
devtools::install_github("r-spatial/sf")
library(sf)
alkisData <- "Downloads/ALKIS_NAS_Bestandsdatenauszug/Bestandsdatenauszug_NAS_ETRS89_UTM_0348.xml"
## print available layers
st_layers(alkisData)
## read layer "AX_Anschrift"
st_read(alkisData, layer = "AX_Anschrift")
returns:
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows: 359, 2
I'm grateful for any help.
Yes, this is the same problem, the layer has no geometries:
library(rgdal)
ogrListLayers("Bestandsdatenauszug_NAS_ETRS89_UTM_0348.xml")
ogrInfo("Bestandsdatenauszug_NAS_ETRS89_UTM_0348.xml", "AX_Anschrift")
nG <- readOGR("Bestandsdatenauszug_NAS_ETRS89_UTM_0348.xml", "AX_Anschrift", dropNULLGeometries=FALSE)
str(nG)
table(nG$advStandardModell1)
table(nG$advStandardModell2)
library(sf)
nG1 <- st_read("Bestandsdatenauszug_NAS_ETRS89_UTM_0348.xml", "AX_Anschrift")
# Error
nG2 <- read_sf("Bestandsdatenauszug_NAS_ETRS89_UTM_0348.xml", layer="AX_Anschrift")
str(nG2)
str(do.call("c", nG2$advStandardModell))
length(nG2$advStandardModell)
where you'll see the different handling of the advStandardModell string list column. Arguably, rgdal::readOGR() does this less aggresively than sf::read_sf(), and sf::st_read() doesn't give a helpful error message. If more of this kind of data is coming, it should be revisited (@edzer)?
Thanks for the quick response!
Using readOGR() instead of st_read() or read_sf()solves my problem for now.
Thank you very much for that!
My cumbersome workaround so far was to write all data into PostgreSQL via ogr2ogr and read it from the DB with st_read().
By the way: Running @rsbivand s example code, both functions st_read() and read_sf() return the same error Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows: 359, 2 for me.
> sessionInfo()
...
other attached packages:
[1] sf_0.7-5 rgdal_1.4-4 sp_1.3-1
...
I was using sf 0.7-4 (released version), with GDAL 3.0.0 (sf_extSoftVersion()). Unsure where the difference originates. I only got a warning with sf::read_sf(): no simple feature geometries present: returning a data.frame or tbl_df. Which platform are you on?
R version 3.6.0 (2019-04-26) -- "Planting of a Tree"
Platform: x86_64-apple-darwin18.5.0 (64-bit)
> library(rgdal)
Loading required package: sp
rgdal: version: 1.4-4, (SVN revision 833)
Geospatial Data Abstraction Library extensions to R successfully loaded
Loaded GDAL runtime: GDAL 2.4.1, released 2019/03/15
Path to GDAL shared files: /usr/local/Cellar/gdal/2.4.1_1/share/gdal
GDAL binary built with GEOS: TRUE
Loaded PROJ.4 runtime: Rel. 6.1.0, May 15th, 2019, [PJ_VERSION: 610]
Path to PROJ.4 shared files: (autodetected)
Linking to sp version: 1.3-1
>
> alkisData <- "~/Downloads/ALKIS_NAS_Bestandsdatenauszug/Bestandsdatenauszug_NAS_ETRS89_UTM_0348.xml"
>
> ogrListLayers(alkisData)
[1] "AP_PTO"
[2] "AP_LTO"
[3] "AP_Darstellung"
[4] "AX_Flurstueck"
...
[59] "AX_LagebezeichnungKatalogeintrag"
[60] "ALKIS_beziehungen"
attr(,"driver")
[1] "NAS"
attr(,"nlayers")
[1] 60
> ogrInfo(alkisData, "AX_Anschrift")
Source: "/Users/markolipka/Downloads/ALKIS_NAS_Bestandsdatenauszug/Bestandsdatenauszug_NAS_ETRS89_UTM_0348.xml", layer: "AX_Anschrift"
Driver: NAS, no features found
Null geometry IDs: 1, 2, 3, ..., 356, 357, 358, 359
Number of fields: 12
Number of list fields: 1
name type length typeName
1 gml_id 4 16 String
2 identifier 4 28 String
...
12 weitereAdressen 4 29 String
Warning message:
In ogrInfo(alkisData, "AX_Anschrift") : ogrInfo: all features NULL
> nG <- readOGR(alkisData, "AX_Anschrift", dropNULLGeometries=FALSE)
OGR data source with driver: NAS
Source: "/Users/markolipka/Downloads/ALKIS_NAS_Bestandsdatenauszug/Bestandsdatenauszug_NAS_ETRS89_UTM_0348.xml", layer: "AX_Anschrift"
with 359 features
It has 12 fields, of which 1 list fields
Warning messages:
1: In ogrInfo(dsn = dsn, layer = layer, encoding = encoding, use_iconv = use_iconv, :
ogrInfo: all features NULL
2: In readOGR(alkisData, "AX_Anschrift", dropNULLGeometries = FALSE) :
no features found; proceeding to atttributes only
> str(nG)
'data.frame': 359 obs. of 13 variables:
$ gml_id : Factor w/ 359 levels "DESHLFS300000017",..: 4 5 6 7 8 9 10 16 17 19 ...
$ identifier : Factor w/ 359 levels "urn:adv:oid:DESHLFS300000017",..: 4 5 6 7 8 9 10 16 17 19 ...
$ beginnt : Factor w/ 4 levels "2011-02-09T12:11:09Z",..: 4 4 4 4 4 4 4 4 4 4 ...
$ advStandardModell1 : Factor w/ 1 level "DLKM": 1 1 1 1 1 1 1 1 1 1 ...
$ advStandardModell2 : Factor w/ 1 level "DFGM": NA NA NA NA NA NA NA NA NA NA ...
$ ort_Post : Factor w/ 30 levels "Altdorf","Aue",..: 5 1 19 12 19 16 29 5 6 29 ...
$ postleitzahlPostzustellung: int 53097 NA 99210 96600 74201 10081 22780 92136 11611 88629 ...
$ strasse : Factor w/ 30 levels "Adlerweg","Am Hofplatz",..: 9 NA 3 15 NA 8 19 18 NA NA ...
$ hausnummer : int 1 NA 52 101 NA 140 52 101 NA NA ...
$ bestimmungsland : Factor w/ 1 level "DEU": NA NA NA NA NA NA NA NA NA NA ...
$ fax : int NA NA NA NA NA NA NA NA NA NA ...
$ telefon : int NA NA NA NA NA NA NA NA NA NA ...
$ weitereAdressen : Factor w/ 1 level "[email protected]": NA NA NA NA NA NA NA NA NA NA ...
> table(nG$advStandardModell1)
DLKM
359
> table(nG$advStandardModell2)
DFGM
1
>
> library(sf)
Linking to GEOS 3.7.2, GDAL 2.4.1, PROJ 6.1.0
> nG1 <- st_read(alkisData, "AX_Anschrift")
Reading layer `AX_Anschrift' from data source `/Users/markolipka/Downloads/ALKIS_NAS_Bestandsdatenauszug/Bestandsdatenauszug_NAS_ETRS89_UTM_0348.xml' using driver `NAS'
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows: 359, 2
In addition: Warning message:
no simple feature geometries present: returning a data.frame or tbl_df
> nG2 <- read_sf(alkisData, layer="AX_Anschrift")
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows: 359, 2
In addition: Warning message:
no simple feature geometries present: returning a data.frame or tbl_df
> str(nG2)
Error in str(nG2) : object 'nG2' not found
Maybe that helps ...
So two differences, GDAL and sf itself. It is a regression in sf (or possibly upstream as the tbl now doesn't handle the string list without error) between 0.7-4 and 0.7.5 @edzer - comments? It isn't the GDAL 2.4.1/3.0.0 version difference. Maybe install sf from CRAN to get 0.7-4 for your problem, or stay with rgdal to get a vanilla correctly columned data.frame?
I switched back to sf 0.7-4 again and now read_sf()works. Thanks for the hint!
Yes, thanks; this is due to one of the columns being a list-column, which doesn't go well down as.data.frame, but does through as_tibble. I now added a warning message for this case, suggesting using as_tibble=TRUE.
Maybe also use ogr2ogr with -splitlistfields, equivalently use sf::gdal_utils(util='vectortranslate', source=..., destination=..., options='splitlistfields') before reading to create a new file without list fields (untried). This is what rgdal::readOGR() does internally.