Pandas: BUG: bug in group by rank string

Created on 20 Jun 2018 · 6Comments · Source: pandas-dev/pandas

Bug description

In[1]: import pandas as pd
In[2]: df = pd.DataFrame({"key": ["a", "a", "b", "b", "b"],
                   "value_str": ["u1", "u2", "u3", "u4", "u5"],
                   "value_int": range(5)})
In[3]: df
Out[3]:
  key value_str  value_int
0   a        u1          0
1   a        u2          1
2   b        u3          2
3   b        u4          3
4   b        u5          4

when groupby "key" and rank "value_str", error presents

In[4]: df.groupby("key")["value_str"].rank()  # error
Out[4]: 
Traceback (most recent call last):
  File "/Users/wangqian/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2862, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-9-5357c8abb14f>", line 1, in <module>
    df.groupby("key")["value_str"].rank()
  File "/Users/wangqian/anaconda3/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 1906, in rank
    na_option=na_option, pct=pct, axis=axis)
  File "/Users/wangqian/anaconda3/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 1025, in _cython_transform
    **kwargs)
  File "/Users/wangqian/anaconda3/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 2630, in transform
    return self._cython_operation('transform', values, how, axis, **kwargs)
  File "/Users/wangqian/anaconda3/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 2590, in _cython_operation
    **kwargs)
  File "/Users/wangqian/anaconda3/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 2664, in _transform
    transform_func(result, values, comp_ids, is_datetimelike, **kwargs)
  File "/Users/wangqian/anaconda3/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 2479, in wrapper
    return f(afunc, *args, **kwargs)
  File "/Users/wangqian/anaconda3/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 2431, in <lambda>
    kwargs.get('na_option', 'keep')
TypeError: 'NoneType' object is not callable

but, if group by "key" and rank "value_int", the error doesn't present:

In[10]: df.groupby("key")["value_int"].rank()

Out[10]: 
0    1.0
1    2.0
2    1.0
3    2.0
4    3.0
Name: value_int, dtype: float64

if we just rank "value_str", the error dosen't present either:

In[11]: df["value_str"].rank()

Out[11]: 
0    1.0
1    2.0
2    3.0
3    4.0
4    5.0
Name: value_str, dtype: float64

Duplicate Groupby

Source

xinai57

👍4 👎1

Most helpful comment

@WillAyd Thanks very much.
in fact, this error presents after I update pandas from v0.20 to v0.23. There is no error in version 0.20...

xinai57 on 20 Jun 2018

👍2

All 6 comments

Lexicographcal ranking is not supported, hence the Error with groupby. IIRC there is an issue already to make that consistent across GroupBy and Series objects

WillAyd on 20 Jun 2018

Here's the original issue - #19560. Looks like there's a PR referenced there that hasn't been updated in a couple months, so if you are interested can reach out to the author and try to push over the finish line.

Closing this issue specifically as it is a duplicate

WillAyd on 20 Jun 2018

@WillAyd Thanks very much.
in fact, this error presents after I update pandas from v0.20 to v0.23. There is no error in version 0.20...

xinai57 on 20 Jun 2018

👍2

I'm having the exact same issue as @xinai57 has, after upgrading the pandas version to v0.23.