Geopandas: BUG: GeometryArray and compatibility with numerical pandas operations

Created on 30 Dec 2020 · 3Comments · Source: geopandas/geopandas

[x] I have checked that this issue has not already been reported.
[x] I have confirmed this bug exists on the latest version of geopandas.
[x] (optional) I have confirmed this bug exists on the master branch of geopandas.

Code Sample, a copy-pastable example

import geopandas

world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
world = world - world.mean()

Problem description

One of the main use cases of pandas is for machine learning. An important step in ML is to standardize the dataset by subtracting the mean and dividing by the standard deviation. However, even simple arithmetic operations like this don't work for GeoDataFrames and GeoSeries:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/Adam/.spack/.spack-env/view/lib/python3.8/site-packages/geopandas/geodataframe.py", line 1140, in __sub__
    return self.geometry.difference(other)
  File "/Users/Adam/.spack/.spack-env/view/lib/python3.8/site-packages/geopandas/base.py", line 523, in difference
    return _binary_geo("difference", self, other)
  File "/Users/Adam/.spack/.spack-env/view/lib/python3.8/site-packages/geopandas/base.py", line 60, in _binary_geo
    geoms, index = _delegate_binary_method(op, this, other)
  File "/Users/Adam/.spack/.spack-env/view/lib/python3.8/site-packages/geopandas/base.py", line 49, in _delegate_binary_method
    raise TypeError(type(this), type(other))
TypeError: (<class 'geopandas.geoseries.GeoSeries'>, <class 'pandas.core.series.Series'>)

Expected Output

I would expect all of the same features of pandas to work with geopandas. In my mind, geopandas is a superset of pandas, not a subset.

Output of `geopandas.show_versions()`

SYSTEM INFO

python : 3.8.6 (default, Dec 25 2020, 19:26:42) [Clang 12.0.0 (clang-1200.0.32.28)]
executable : /Users/Adam/.spack/.spack-env/view/bin/python
machine : macOS-10.15.7-x86_64-i386-64bit

GEOS, GDAL, PROJ INFO

GEOS : 3.8.1
GEOS lib : /Users/Adam/spack/opt/spack/darwin-catalina-x86_64/apple-clang-12.0.0/geos-3.8.1-vlrmv4vvnmfcvabpu6t4boks5fxtllko/lib/libgeos_c.dylib
GDAL : 3.2.0
GDAL data dir: /Users/Adam/.spack/.spack-env/view/share/gdal
PROJ : 7.1.0
PROJ data dir: /Users/Adam/.spack/.spack-env/view/share/proj

PYTHON DEPENDENCIES

geopandas : 0.8.1
pandas : 1.2.0
fiona : 1.8.18
numpy : 1.19.4
shapely : 1.7.1
rtree : None
pyproj : 2.6.0
matplotlib : 3.3.3
mapclassify: None
geopy : None
psycopg2 : None
geoalchemy2: None
pyarrow : None

bug

Source

adamjstewart

All 3 comments

Thanks for raising this. It is true that for a situation like this, GeoSeries should behave like any other non-numerical column and return NaN. The one thing which blocks this now is a custom implementation of __sub__ which assumes geometric difference but that has already been deprecated and will be removed.

https://github.com/geopandas/geopandas/blob/d1b71c4bfd81f3919aedd70948beb02ae4182bbf/geopandas/geodataframe.py#L1527-L1534

The other thing we'll have to take care of is the __sub__ behaviour of GeometryArray, which should return an array of NaNs in this case.

The vast majority of pandas operations work, so you just found one of the few which cause an issue. Note that in your case, you would probably want to drop geometry column anyway.

martinfleis on 30 Dec 2020

👍1

Thanks! The other thing I've noticed is that for many use cases, I would like the "geometry" to be the "index". This would allow the geometry column to be ignored during these kinds of numerical operations, but still be available for indexing later. However, shapely.geometry.Point and friends are no longer hashable: https://github.com/Toblerity/Shapely/issues/209. This means that whenever I use an external library like sklearn, I have to copy the index, pop the geometry, and add both back later. Not sure if this is a known limitation or worth opening a new issue for.

adamjstewart on 30 Dec 2020

I would like the "geometry" to be the "index"

That should be possible with the upcoming shapely 2.0 which makes geometry hashable again so it is just a matter of time.

martinfleis on 31 Dec 2020

👍1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Consistency of predicate argument

martinfleis · 5Comments

BUG: geopandas.read_file only works for zipped shapefiles on disk. (Not streams/open calls)

cjshowalter · 5Comments

gpd.to_file(): Z dimension not saved with ESRI driver

mattayes · 6Comments

No geometry methods via GeoDataFrame.iterrows

perrygeo · 3Comments

Performance in Geopandas appears to be slow

kuanb · 4Comments