Assigning a value to a single location in a DataFrame (using .loc with scalar indexers) started to fail with "values with a length".
Consider the following example:
In [1]: df = pd.DataFrame({'a': [1, 2, 3], 'b': [(1, 2), (1, 2, 3), (3, 4)]})
In [2]: df
Out[2]:
a b
0 1 (1, 2)
1 2 (1, 2, 3)
2 3 (3, 4)
In [3]: df.loc[0, 'b'] = (7, 8, 9)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-3-a2d59e11519a> in <module>
----> 1 df.loc[0, 'b'] = (7, 8, 9)
~/miniconda3/lib/python3.7/site-packages/pandas/core/indexing.py in __setitem__(self, key, value)
187 key = com._apply_if_callable(key, self.obj)
188 indexer = self._get_setitem_indexer(key)
--> 189 self._setitem_with_indexer(indexer, value)
190
191 def _validate_key(self, key, axis):
~/miniconda3/lib/python3.7/site-packages/pandas/core/indexing.py in _setitem_with_indexer(self, indexer, value)
604
605 if len(labels) != len(value):
--> 606 raise ValueError('Must have equal len keys and value '
607 'when setting with an iterable')
608
ValueError: Must have equal len keys and value when setting with an iterable
This raises on 0.23.4 - master, but worked in 0.20.3 - 0.22.0 (the ones I tested):
In [1]: pd.__version__
Out[1]: '0.22.0'
In [2]: df = pd.DataFrame({'a': [1, 2, 3], 'b': [(1, 2), (1, 2, 3), (3, 4)]})
In [3]: df
Out[3]:
a b
0 1 (1, 2)
1 2 (1, 2, 3)
2 3 (3, 4)
In [4]: df.loc[0, 'b'] = (7, 8, 9)
In [5]: df
Out[5]:
a b
0 1 (7, 8, 9)
1 2 (1, 2, 3)
2 3 (3, 4)
Related to https://github.com/pandas-dev/pandas/issues/25806
We don't have very robust support in general for list-like values, but, for the specific case above of updating a single value, I don't think there is anything ambiguous about it? You are updating a single value, so the passed value should simply be put in that place?
Note, the above is with tuples. But, there are also custom objects like MultiPolygons that represent single objects, but do define a __len__ ..
There are several potential issues with assigning list-like values, see eg the issues being linked to here: https://github.com/pandas-dev/pandas/issues/19590#issuecomment-364079529
However, most of those cases occur when assigning to multiple elements at once (unpack the list?), while here the arguments to loc are both scalars.
Probably caused by https://github.com/pandas-dev/pandas/pull/20732
I am getting the same issue as the multipolygon seen as equivalent to a list of polygons.
my work around (to assign a MP to a single row) is to wrap the MP in a list first.
I am running a try except to catch the error, something like this:
from shapely.geometry import Multipolygon
try:
geopandas_dataframe_one_row['geometry'] = Multipolygon
except ValueError:
geopandas_dataframe_one_row['geometry'] = [Multipolygon]
Hope that's helpful to someone.
Hi! Any solution in sight? Also encountering the issue when wanting to assign MultiPolygons and the list work-around as well as ".values" work-around do not seem to help... Thankful for any hint!
@jklatt I was able to resolve by roundtripping a shapely geo through geopandas:
gdf.loc[scalar_index_loc, 'geometry'] = geopandas.GeoDataFrame(geometry=[shapely_geo]).geometry.values
This seems to avoid the ValueError
Works like a charm! Thank you Thomas :)
Thanks, @koshy1123 ! Your roundtripping shapely geo via geopandas worked for me!
Most helpful comment
@jklatt I was able to resolve by roundtripping a shapely geo through geopandas:
This seems to avoid the
ValueError