Geopandas: BUG: crs is lost after calling GeoDataFrame.apply() method

Created on 25 Jan 2019  路  6Comments  路  Source: geopandas/geopandas

crs is null on GeoDataFrame.apply(...) result.
the following unit test fails because result.crs is None:

    def test_apply_function_to_rows():
        gdf = GeoDataFrame(
            data={'foo': ['Bar']},
            geometry=[Point(45.5089192, -73.5565212)],
            crs={'init': 'epsg:4326'}
        )

        def _lower_foo(row):
            row['foo'] = row['foo'].lower()
            return row

        result = gdf.apply(_lower_foo, axis=1, reduce=False)

        assert_geodataframe_equal(
            result,
            GeoDataFrame(
                data={'foo': ['bar']},
                geometry=[Point(45.5089192, -73.5565212)],
                crs={'init': 'epsg:4326'}
            )
        )

running with the environment below:

$ pip list
Package         Version    
--------------- -----------
atomicwrites    1.2.1      
attrs           18.2.0     
Click           7.0        
click-plugins   1.0.4      
cligj           0.5.0      
Fiona           1.8.4      
geopandas       0.4.0      
more-itertools  5.0.0      
munch           2.3.2      
numpy           1.16.0     
pandas          0.23.4     
pip             19.0.1     
pluggy          0.8.1      
py              1.7.0      
pyproj          1.9.6      
pytest          4.1.1      
python-dateutil 2.7.5      
pytz            2018.9     
setuptools      40.6.3     
Shapely         1.6.4.post2
six             1.12.0     
bug crs

All 6 comments

The problem with trying to preserve the crs, in general, is that apply is a very generic method for which we cannot make much assumptions about what the result will be.

In your example, you keep the geometry column, but often the result of an apply call will not even have the geometry anymore, or might have transformed the geometry.

So I am not fully sure we should try to preserve it.

mmmmh... you're right.
It's a weird situation though, because the result is still a GeoDataFrame instance, with the same geometry. (type and geometry are preserved, but it depends on the function applied, doesn't it?)

I'm not sure neither...

my code currently looks like:

        result = dataframe.apply(_my_function, axis=1, reduce=False)

        if isinstance(dataframe, GeoDataFrame):
            result.crs = dataframe.crs

        return result

and the result is still a GeoDataFrame when dataframe is a GeoDataFrame

just unexpectedly encountered this behavior. I'm inclined to think users shouldn't invariably have to re-set the crs after using apply(). Seems fair to assume on average only a small fraction of arguments are geometry-altering functions, and perhaps the burden of caution should be placed on those uses. I do agree it's a contentious issue though...

ref #1351

Like @spolloni, I just unexpectedly encountered this behavior. I was surprised to discover that gdf.apply() actually resulted in a pandas series being passed to the function rather than an object of type geopandas geoseries. In my case I was trying to work specifically with a named geometry column, so I could not generalize my function to access .geometry. It was easy enough to work around, but I'm now wondering that there's not a geopandas wrapper for a common iterator like apply() to pass the geoseries through (or perhaps the more generalizable approach would be to have pandas check for a valid extension type in apply() and allow the extended type to pass through if it's valid).

I honestly thought we fixed this in #1478 but that apparently fixes only GeoSeries.apply not GeoDataFrame.apply. We should do the similar wrapper for GeoDataFrame as well then.

Was this page helpful?
0 / 5 - 0 ratings