I am trying to read a large ESRI File geodatabase. I know that geopandas can read the particular layer I'm interested in because it successfully loads the other layers in the File Geodatabase with no issues.
However this layer is a large polygon layer containing all building footprints for a major European country. The .read_file method takes extremely long to execute.
How can I reduce the amount of time it takes to read this dataset so that I can perform further analysis on it?
If the issue is performance, you can try using pyogrio - https://github.com/brendan-ward/pyogrio/ Otherwise I am not sure if there's much we can do now. Pyorgio might become a default option in geopandas in future, but not now.
I ended up using the bbox optional argument in geopandas.read_file which improved things quite a bit. But thanks, its good to know ....
@awa5114 using geopandas, you could also try to read only a subset of the data at a time, if that is possible for your use case. See the docs here about the multiple options for this: https://geopandas.readthedocs.io/en/latest/docs/user_guide/io.html#reading-subsets-of-the-data
Yeah, so you already discovered that option! ;)
(closing this issue then)
@jorisvandenbossche sorry, but reading using the bbox option is not foolproof and causes all sorts of geometry problems. I solved one issue (self-intersecting polygons) by applying a zero buffer, but now I'm getting a new issue:
IllegalArgumentException: Points of LinearRing do not form a closed linestring
Traceback (most recent call last):
File "clip_datasets.py", line 32, in <module>
gdf = geopandas.read_file(dataset_path, layer=layer, bbox=box(*buffered_trace.total_bounds))
File "lib\site-packages\shapely\geometry\geo.py", line 59, in box
return Polygon(coords)
File "lib\site-packages\shapely\geometry\polygon.py", line 243, in __init__
ret = geos_polygon_from_py(shell, holes)
File "lib\site-packages\shapely\geometry\polygon.py", line 509, in geos_polygon_from_py
ret = geos_linearring_from_py(shell)
File "shapely\speedups\_speedups.pyx", line 408, in shapely.speedups._speedups.geos_linearring_from_py
ValueError: GEOSGeom_createLinearRing_r returned a NULL pointer
Why does this occur? How can I fix or circumvent it?