import pandas as pd
df = pd.DataFrame({'A': [1, 1, 1, 2, 2], 'B': range(5), 'C': range(5)})
df.groupby('A').agg({'B': 'sum', 'G': 'min'}) # aggregate by a non existing column
produces
<ipython-input-5-f5ac34bf856f> in <module>
----> 1 df.groupby('A').agg({'B': 'sum', 'G': 'min'})
~/src/dgr00/.venv3/lib/python3.6/site-packages/pandas/core/groupby/generic.py in aggregate(self, func, *args, **kwargs)
938 func = _maybe_mangle_lambdas(func)
939
--> 940 result, how = self._aggregate(func, *args, **kwargs)
941 if how is None:
942 return result
~/src/dgr00/.venv3/lib/python3.6/site-packages/pandas/core/base.py in _aggregate(self, arg, *args, **kwargs)
364 obj.columns.intersection(keys)
365 ) != len(keys):
--> 366 raise SpecificationError("nested renamer is not supported")
367
368 from pandas.core.reshape.concat import concat
SpecificationError: nested renamer is not supported
While groupby.agg() with a dictionary when renaming was deprecated in 1.0 (
https://pandas.pydata.org/pandas-docs/stable/whatsnew/v0.20.0.html#deprecate-groupby-agg-with-a-dictionary-when-renaming) the corresponding error message can also be obtained when aggregating by an non existing column which can lead to confusion.
Error saying that the column G
does not exist.
pd.show_versions()
python : 3.6.4.final.0
OS : Linux
machine : x86_64
pandas : 1.0.1
Confirmed. The incorrect error message threw me off as well.
Same problem
Same problem here, bug is due to non-existing column
same problem here, anyone has a workaround method? thx
Can confirm, got the same exception due to a non-existing column name.
It's also inconsistent:
import pandas as pd
df = pd.DataFrame({'A': [1, 1, 1, 2, 2], 'B': range(5), 'C': range(5)})
df.groupby('A').agg({'B': 'sum', 'G': ['min']) # <- use ['min'] instead of 'min'
raises the correct error: KeyError: "Column 'G' does not exist!"
You wrote G instead of C. 'G' is nothing. 'C' is the column.
import pandas as pd
df = pd.DataFrame({'A': [1, 1, 1, 2, 2], 'B': range(5), 'C': range(5)})
df.groupby('A').agg({'B': 'sum', 'C': 'min'})
Applying multiple aggregation a column
df = pd.DataFrame({'A': [1, 1, 1, 2, 2], 'B': range(5), 'C': range(5)})
df.groupby('A').agg({'B': ['sum','min'], 'C': ['sum','min']})
I think this one is also related.
I tried to rename the column right after groupby by the way it is done in pd.version < 1.0
. I do not get the deprecation warnings like I get in pd.version < 1.0
.
Here is the example:
df = pd.DataFrame({'A': [1, 1, 1, 2, 2],'B': range(5)})
df.groupby('A').agg({'B': {'foo': 'sum'}})
The error message is:
---------------------------------------------------------------------------
SpecificationError Traceback (most recent call last)
<ipython-input-22-440a616816b6> in <module>
1 df = pd.DataFrame({'A': [1, 1, 1, 2, 2],'B': range(5)})
----> 2 df.groupby('A').agg({'B': {'foo': 'sum'}})
~/anaconda3/lib/python3.7/site-packages/pandas/core/groupby/generic.py in aggregate(self, func, *args, **kwargs)
926 func = _maybe_mangle_lambdas(func)
927
--> 928 result, how = self._aggregate(func, *args, **kwargs)
929 if how is None:
930 return result
~/anaconda3/lib/python3.7/site-packages/pandas/core/base.py in _aggregate(self, arg, *args, **kwargs)
340 # {'ra' : { 'A' : 'mean' }}
341 if isinstance(v, dict):
--> 342 raise SpecificationError("nested renamer is not supported")
343 elif isinstance(obj, ABCSeries):
344 raise SpecificationError("nested renamer is not supported")
SpecificationError: nested renamer is not supported
You wrote G instead of C. 'G' is nothing. 'C' is the column.
import pandas as pd
df = pd.DataFrame({'A': [1, 1, 1, 2, 2], 'B': range(5), 'C': range(5)})
df.groupby('A').agg({'B': 'sum', 'C': 'min'})
Applying multiple aggregation a column
df = pd.DataFrame({'A': [1, 1, 1, 2, 2], 'B': range(5), 'C': range(5)})
df.groupby('A').agg({'B': ['sum','min'], 'C': ['sum','min']})
We are expecting the appropriate error message. The current error message is not pointing to the right direction.
I also ran into this error and as mentioned, it was caused by trying to aggregate a non-existent column. Version 1.0.3
got the same error when there are duplicate columns in the dataframe.
I only get this error when I'm running my code with command or git bash, when I'm running my code in jupyter it works fine, what's the best way to solve it by still using the agg()?
Instead of using .agg({'B': 'sum', 'G': 'min'})
, try passing it as a list of tuples like .agg([('B', 'sum'), ('G', 'min')])
.
thanks a lot, upstairs
I think this has been fixed on master: See this issue 32755 and PR #32836.
I now get the below error message on master, rather than the "nested renamer" error:
import pandas as pd
df = pd.DataFrame({'A': [1, 1, 1, 2, 2], 'B': range(5), 'C': range(5)})
df.groupby('A').agg({'B': 'sum', 'G': 'min'})
Traceback (most recent call last):
File "C:\Users\timhu\Anaconda3\envs\pandas-dev-2\lib\site-packages\IPython\core\interactiveshell.py", line 3331, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-2-08e47b9415b3>", line 4, in <module>
df.groupby('A').agg({'B': 'sum', 'G': 'min'})
File "c:\users\timhu\documents\code\pandas\pandas\core\groupby\generic.py", line 948, in aggregate
result, how = self._aggregate(func, *args, **kwargs)
File "c:\users\timhu\documents\code\pandas\pandas\core\base.py", line 354, in _aggregate
raise SpecificationError(f"Column(s) {cols} do not exist")
pandas.core.base.SpecificationError: Column(s) ['G'] do not exist
I think this issue can be closed.
Most helpful comment
Instead of using
.agg({'B': 'sum', 'G': 'min'})
, try passing it as a list of tuples like.agg([('B', 'sum'), ('G', 'min')])
.