Pandas: TypeError on argmax of object dtype (change from 0.20.3)

Created on 29 Oct 2017 · 17Comments · Source: pandas-dev/pandas

>>> import pandas as pd
>>> pd.Series([0, 0], dtype='object').argmax()

I was doing action = state_action.idxmax() where state_action was of type 'pandas.core.series.Series'. When I run in 0.21.0, it gives the following error:

File "/usr/local/lib/python3.5/dist-packages/pandas/core/series.py", line 1357, in idxmax
i = nanops.nanargmax(_values_from_object(self), skipna=skipna)
File "/usr/local/lib/python3.5/dist-packages/pandas/core/nanops.py", line 74, in _f
raise TypeError(msg.format(name=f.__name__.replace('nan', '')))
TypeError: reduction operation 'argmax' not allowed for this dtype

However, when I downgraded to pandas 0.20.3, it worked just fine. You might wanna look into this. :)

Bug Dtypes Numeric Regression

Source

keerthanpg

Most helpful comment

As a workaround, you can call argmax on the underlying NumPy array:

Python 3.6.3 |Anaconda custom (64-bit)| (default, Oct 27 2017, 12:14:30) 
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> pd.Series(['one', 'two']).values.argmax()
1

DGrady on 3 Nov 2017

👍6

All 17 comments

Can you give a reproducible example?

TomAugspurger on 29 Oct 2017

I meet the same problem.
At first i use action = state_action.argmax(), it saysFutureWarning: 'argmax' is deprecated. Use 'idxmax' instead. The behavior of 'argmax' will be corrected to return the positional maximum in the future. Use 'series.values.argmax' to get the position of the maximum now. action = state_action.argmax()
So I change to action = state_action.idxmax()
When I run in 0.21.0, it gives the following error:

Traceback (most recent call last):
  File "/Users/baron/.pyenv/versions/3.6.3/lib/python3.6/tkinter/__init__.py", line 1699, in __call__
    return self.func(*args)
  File "/Users/baron/.pyenv/versions/3.6.3/lib/python3.6/tkinter/__init__.py", line 745, in callit
    func(*args)
  File "/Users/baron/PycharmProjects/HelloPython/test_Q.py", line 26, in update
    action = RL.choose_action(str(observation))
  File "/Users/baron/PycharmProjects/HelloPython/RL_brain.py", line 40, in choose_action
    action = state_action.idxmax()
  File "/Users/baron/.pyenv/versions/3.6.3/lib/python3.6/site-packages/pandas/core/series.py", line 1357, in idxmax
    i = nanops.nanargmax(_values_from_object(self), skipna=skipna)
  File "/Users/baron/.pyenv/versions/3.6.3/lib/python3.6/site-packages/pandas/core/nanops.py", line 74, in _f
    raise TypeError(msg.format(name=f.__name__.replace('nan', '')))
TypeError: reduction operation 'argmax' not allowed for this dtype

barondu on 30 Oct 2017

Can you provide a copy-pastable example @barondu?

TomAugspurger on 30 Oct 2017

sure,
You can test the code as fellow. @TomAugspurger
https://github.com/MorvanZhou/Reinforcement-learning-with-tensorflow/tree/master/contents/2_Q_Learning_maze

barondu on 30 Oct 2017

Do you have a minimal test-case, something that could go in a unit test?

TomAugspurger on 30 Oct 2017

@TomAugspurger

import pandas as pd
import numpy as np

q_table = pd.DataFrame(columns=['a', 'b', 'c', 'd'])
q_table = q_table.append(pd.Series([0] * 4, index=q_table.columns, name='test1', ))
q_table = q_table.append(pd.Series([0] * 4, index=q_table.columns, name='test2', ))
print(q_table)
state_action = q_table.ix['test2', :]
print(state_action)
state_action = state_action.reindex(
    np.random.permutation(state_action.index))
print(state_action)
action = state_action.idxmax()
# action = state_action.argmax()
print('\naction: ', action)

barondu on 30 Oct 2017

Here is the error message

Traceback (most recent call last):
  File "/Users/baron/PycharmProjects/HelloPython/pandas_exercise.py", line 13, in <module>
    action = state_action.idxmax()
  File "/Users/baron/.pyenv/versions/3.6.3/lib/python3.6/site-packages/pandas/core/series.py", line 1357, in idxmax
    i = nanops.nanargmax(_values_from_object(self), skipna=skipna)
  File "/Users/baron/.pyenv/versions/3.6.3/lib/python3.6/site-packages/pandas/core/nanops.py", line 74, in _f
    raise TypeError(msg.format(name=f.__name__.replace('nan', '')))
TypeError: reduction operation 'argmax' not allowed for this dtype

barondu on 30 Oct 2017

Thanks, simplified a bit:

In [11]: pd.Series([0, 0], dtype='object')
Out[11]:
0    0
1    0
dtype: object

In [12]: pd.Series([0, 0], dtype='object').argmax()
/Users/taugspurger/Envs/pandas-dev/bin/ipython:1: FutureWarning: 'argmax' is deprecated. Use 'idxmax' instead. The behavior of 'argmax' will be corrected to return the positional maximum in the future. Use 'series.values.argmax' to get the position of the maximum now.
  #!/Users/taugspurger/Envs/pandas-dev/bin/python3.6
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-12-e0ba19c8565d> in <module>()
----> 1 pd.Series([0, 0], dtype='object').argmax()

~/Envs/pandas-dev/lib/python3.6/site-packages/pandas/pandas/util/_decorators.py in wrapper(*args, **kwargs)
     34     def wrapper(*args, **kwargs):
     35         warnings.warn(msg, klass, stacklevel=stacklevel)
---> 36         return alternative(*args, **kwargs)
     37     return wrapper
     38

~/Envs/pandas-dev/lib/python3.6/site-packages/pandas/pandas/core/series.py in idxmax(self, axis, skipna, *args, **kwargs)
   1355         """
   1356         skipna = nv.validate_argmax_with_skipna(skipna, args, kwargs)
-> 1357         i = nanops.nanargmax(_values_from_object(self), skipna=skipna)
   1358         if i == -1:
   1359             return np.nan

~/Envs/pandas-dev/lib/python3.6/site-packages/pandas/pandas/core/nanops.py in _f(*args, **kwargs)
     72             if any(self.check(obj) for obj in obj_iter):
     73                 msg = 'reduction operation {name!r} not allowed for this dtype'
---> 74                 raise TypeError(msg.format(name=f.__name__.replace('nan', '')))
     75             try:
     76                 with np.errstate(invalid='ignore'):

TypeError: reduction operation 'argmax' not allowed for this dtype

Is there a reason you're using object dtype here?

TomAugspurger on 30 Oct 2017

Seems like https://github.com/pandas-dev/pandas/pull/16449 maybe have been the root issues (cc @DGrady)

NumPy will (somehow) handle object arrays in argmax/min, so I suppose @disallow('O') is a bit too strict.

TomAugspurger on 30 Oct 2017

We'll need to think about whether we want to emulate NumPy here though. It's nice to know ahead of time whether you function is valid or not for the type of the values being passed. With object dtype there's no way of knowing that.

TomAugspurger on 30 Oct 2017

I think for object dtype we should not, beforehand, decide whether such an operation works or not, but IMO we should defer that to the actual objects. Eg min/max works on strings, and so it seems logical that argmax/argmin does as well.

jorisvandenbossche on 30 Oct 2017

Fortunately, argmin/max didn't work on strings before :)

In [1]: import pandas as pd

In [2]: pd.__version__
Out[2]: '0.20.3'

In [3]: pd.Series(['a', 'b']).argmax()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-4747fce7cbb5> in <module>()
----> 1 pd.Series(['a', 'b']).argmax()

~/miniconda3/envs/pandas-0.20.3/lib/python3.6/site-packages/pandas/core/series.py in idxmax(self, axis, skipna, *args, **kwargs)
   1262         """
   1263         skipna = nv.validate_argmax_with_skipna(skipna, args, kwargs)
-> 1264         i = nanops.nanargmax(_values_from_object(self), skipna=skipna)
   1265         if i == -1:
   1266             return np.nan

~/miniconda3/envs/pandas-0.20.3/lib/python3.6/site-packages/pandas/core/nanops.py in nanargmax(values, axis, skipna)
    476     """
    477     values, mask, dtype, _ = _get_values(values, skipna, fill_value_typ='-inf',
--> 478                                          isfinite=True)
    479     result = values.argmax(axis)
    480     result = _maybe_arg_null_out(result, axis, mask, skipna)

~/miniconda3/envs/pandas-0.20.3/lib/python3.6/site-packages/pandas/core/nanops.py in _get_values(values, skipna, fill_value, fill_value_typ, isfinite, copy)
    194     values = _values_from_object(values)
    195     if isfinite:
--> 196         mask = _isfinite(values)
    197     else:
    198         mask = isnull(values)

~/miniconda3/envs/pandas-0.20.3/lib/python3.6/site-packages/pandas/core/nanops.py in _isfinite(values)
    237             is_integer_dtype(values) or is_bool_dtype(values)):
    238         return ~np.isfinite(values)
--> 239     return ~np.isfinite(values.astype('float64'))
    240
    241

ValueError: could not convert string to float: 'b'

But yes, I suppose that we should attempt to support it.

TomAugspurger on 30 Oct 2017

Ah, yes :-) Although in numpy it works:

In [118]: a = np.array(['a', 'b', 'c'], dtype=object)

In [119]: a.min()
Out[119]: 'a'

In [120]: a.argmin()
Out[120]: 0

jorisvandenbossche on 30 Oct 2017

Just refreshing my memory — so in the course of tracking down the bug that prompted #16449, it turned out that argmax etc were always trying to coerce their inputs to float, which is why they used to fail with string data. They no longer do that. But, at least at the time, it seemed pretty tricky to get argmax etc to behave consistently with arbitrary object dtypes that could also contain nulls, and we decided to disallow that case. If you remove the disallow decorator, they currently work as expected with string data, as long as there are no null values, but once you start including null values or possibly using other types of objects things would not work as expected. I think that marking argmax as not allowed with object dtypes was done mainly for expediency.

DGrady on 1 Nov 2017

I'm not sure to get all the thing involving here, but in the exemple given by @barondu (for MorvanZhou code), is the only solution to downgrade pandas ? Is there a simpler solution like to replace argmax by an other function ?
(Sorry I'm very new to python)

JB712 on 2 Nov 2017

As a workaround, you can call argmax on the underlying NumPy array:

Python 3.6.3 |Anaconda custom (64-bit)| (default, Oct 27 2017, 12:14:30) 
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> pd.Series(['one', 'two']).values.argmax()
1

DGrady on 3 Nov 2017

👍6

I faced the same issue and I tried with pandas 0.19.2 and 0.18.1. Non of them worked for me. I was able to run it successfully only after downgrading to pandas 0.20.3. Hope this will help someone. (y)