Geopandas: API: disable automatic alignment in spatial predicates/operations ?

Created on 22 Jun 2018  路  3Comments  路  Source: geopandas/geopandas

A common mistake I see people do is something like:

df_points.within(df_polys['geometry'])

where they do not expect the default "alignment" behaviour (to align both calling and passed dataframe on the index, and then to perform the operation element-wise).

In the above example, I think that the users often are trying to achieve something different: a) they are trying to pass a single geometry (but which still needs to unpacked from a single element GeoSeries) or b) they expect that it checks that the point is in any of the polygons, or checks for each point whether in which of the polygons it is (in which case they rather need for example a spatial join operation).

Personally I think that needing this alignment is probably a more rare use case (compared to passing either a single geometry or an already aligned series). If this is correct, we could disable (first deprecate) this behaviour as the default, and actually require an already aligned series (in case of passing a series).

We can still provide it as an option (eg with an align=True keyword), or otherwise expect that the user does it themselves (eg df_points.within(df_polys['geometry'].reindex(df_points.index)))

What do people think?

good first issue

Most helpful comment

Having made, and observed, this mistake, and never having encountered a case where the alignment was desired, I think this is a good idea.

All 3 comments

Having made, and observed, this mistake, and never having encountered a case where the alignment was desired, I think this is a good idea.

@jdmcbr +1 I have never had use of this alignment feature. IMO if it is supported it shall at least contain a check so that both arrays / series are of the same length.

@leohardtke

and maybe have another keyword method with two possible values all or any, to check if df_points is within ANY or ALL of the polygons of df_polys.

I think this can be easily achieve by calling .any() or .all() on boolean series result, no need for another method argument.

Was this page helpful?
0 / 5 - 0 ratings