Pandas: BUG: bug in group by rank string

Created on 20 Jun 2018  ·  6Comments  ·  Source: pandas-dev/pandas

Bug description

In[1]: import pandas as pd
In[2]: df = pd.DataFrame({"key": ["a", "a", "b", "b", "b"],
                   "value_str": ["u1", "u2", "u3", "u4", "u5"],
                   "value_int": range(5)})
In[3]: df
Out[3]:
  key value_str  value_int
0   a        u1          0
1   a        u2          1
2   b        u3          2
3   b        u4          3
4   b        u5          4

when groupby "key" and rank "value_str", error presents

In[4]: df.groupby("key")["value_str"].rank()  # error
Out[4]: 
Traceback (most recent call last):
  File "/Users/wangqian/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2862, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-9-5357c8abb14f>", line 1, in <module>
    df.groupby("key")["value_str"].rank()
  File "/Users/wangqian/anaconda3/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 1906, in rank
    na_option=na_option, pct=pct, axis=axis)
  File "/Users/wangqian/anaconda3/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 1025, in _cython_transform
    **kwargs)
  File "/Users/wangqian/anaconda3/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 2630, in transform
    return self._cython_operation('transform', values, how, axis, **kwargs)
  File "/Users/wangqian/anaconda3/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 2590, in _cython_operation
    **kwargs)
  File "/Users/wangqian/anaconda3/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 2664, in _transform
    transform_func(result, values, comp_ids, is_datetimelike, **kwargs)
  File "/Users/wangqian/anaconda3/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 2479, in wrapper
    return f(afunc, *args, **kwargs)
  File "/Users/wangqian/anaconda3/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 2431, in <lambda>
    kwargs.get('na_option', 'keep')
TypeError: 'NoneType' object is not callable

but, if group by "key" and rank "value_int", the error doesn't present:

In[10]: df.groupby("key")["value_int"].rank()

Out[10]: 
0    1.0
1    2.0
2    1.0
3    2.0
4    3.0
Name: value_int, dtype: float64

if we just rank "value_str", the error dosen't present either:

In[11]: df["value_str"].rank()

Out[11]: 
0    1.0
1    2.0
2    3.0
3    4.0
4    5.0
Name: value_str, dtype: float64
Duplicate Groupby

Most helpful comment

@WillAyd Thanks very much.
in fact, this error presents after I update pandas from v0.20 to v0.23. There is no error in version 0.20...

All 6 comments

Lexicographcal ranking is not supported, hence the Error with groupby. IIRC there is an issue already to make that consistent across GroupBy and Series objects

Here's the original issue - #19560. Looks like there's a PR referenced there that hasn't been updated in a couple months, so if you are interested can reach out to the author and try to push over the finish line.

Closing this issue specifically as it is a duplicate

@WillAyd Thanks very much.
in fact, this error presents after I update pandas from v0.20 to v0.23. There is no error in version 0.20...

I'm having the exact same issue as @xinai57 has, after upgrading the pandas version to v0.23.

Out of curiosity what's the reasoning behind removing the ability for lexicographic ranking?

你好,我也遇到了同样的问题,df.groupby("key")["value_int"].rank(),这句话的意思是不是把key这一列按照value_int这一列的数值大小进行排序?如果是这样,字符串应该不能排序吧,是不是只支持数字排序?

Was this page helpful?
0 / 5 - 0 ratings