crs is null on GeoDataFrame.apply(...) result.
the following unit test fails because result.crs is None:
def test_apply_function_to_rows():
gdf = GeoDataFrame(
data={'foo': ['Bar']},
geometry=[Point(45.5089192, -73.5565212)],
crs={'init': 'epsg:4326'}
)
def _lower_foo(row):
row['foo'] = row['foo'].lower()
return row
result = gdf.apply(_lower_foo, axis=1, reduce=False)
assert_geodataframe_equal(
result,
GeoDataFrame(
data={'foo': ['bar']},
geometry=[Point(45.5089192, -73.5565212)],
crs={'init': 'epsg:4326'}
)
)
running with the environment below:
$ pip list
Package Version
--------------- -----------
atomicwrites 1.2.1
attrs 18.2.0
Click 7.0
click-plugins 1.0.4
cligj 0.5.0
Fiona 1.8.4
geopandas 0.4.0
more-itertools 5.0.0
munch 2.3.2
numpy 1.16.0
pandas 0.23.4
pip 19.0.1
pluggy 0.8.1
py 1.7.0
pyproj 1.9.6
pytest 4.1.1
python-dateutil 2.7.5
pytz 2018.9
setuptools 40.6.3
Shapely 1.6.4.post2
six 1.12.0
The problem with trying to preserve the crs, in general, is that apply is a very generic method for which we cannot make much assumptions about what the result will be.
In your example, you keep the geometry column, but often the result of an apply call will not even have the geometry anymore, or might have transformed the geometry.
So I am not fully sure we should try to preserve it.
mmmmh... you're right.
It's a weird situation though, because the result is still a GeoDataFrame instance, with the same geometry. (type and geometry are preserved, but it depends on the function applied, doesn't it?)
I'm not sure neither...
my code currently looks like:
result = dataframe.apply(_my_function, axis=1, reduce=False)
if isinstance(dataframe, GeoDataFrame):
result.crs = dataframe.crs
return result
and the result is still a GeoDataFrame when dataframe is a GeoDataFrame
just unexpectedly encountered this behavior. I'm inclined to think users shouldn't invariably have to re-set the crs after using apply(). Seems fair to assume on average only a small fraction of arguments are geometry-altering functions, and perhaps the burden of caution should be placed on those uses. I do agree it's a contentious issue though...
ref #1351
Like @spolloni, I just unexpectedly encountered this behavior. I was surprised to discover that gdf.apply() actually resulted in a pandas series being passed to the function rather than an object of type geopandas geoseries. In my case I was trying to work specifically with a named geometry column, so I could not generalize my function to access .geometry. It was easy enough to work around, but I'm now wondering that there's not a geopandas wrapper for a common iterator like apply() to pass the geoseries through (or perhaps the more generalizable approach would be to have pandas check for a valid extension type in apply() and allow the extended type to pass through if it's valid).
I honestly thought we fixed this in #1478 but that apparently fixes only GeoSeries.apply not GeoDataFrame.apply. We should do the similar wrapper for GeoDataFrame as well then.