In[1]: import pandas as pd
In[2]: df = pd.DataFrame({"key": ["a", "a", "b", "b", "b"],
"value_str": ["u1", "u2", "u3", "u4", "u5"],
"value_int": range(5)})
In[3]: df
Out[3]:
key value_str value_int
0 a u1 0
1 a u2 1
2 b u3 2
3 b u4 3
4 b u5 4
when groupby "key" and rank "value_str", error presents
In[4]: df.groupby("key")["value_str"].rank() # error
Out[4]:
Traceback (most recent call last):
File "/Users/wangqian/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2862, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-9-5357c8abb14f>", line 1, in <module>
df.groupby("key")["value_str"].rank()
File "/Users/wangqian/anaconda3/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 1906, in rank
na_option=na_option, pct=pct, axis=axis)
File "/Users/wangqian/anaconda3/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 1025, in _cython_transform
**kwargs)
File "/Users/wangqian/anaconda3/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 2630, in transform
return self._cython_operation('transform', values, how, axis, **kwargs)
File "/Users/wangqian/anaconda3/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 2590, in _cython_operation
**kwargs)
File "/Users/wangqian/anaconda3/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 2664, in _transform
transform_func(result, values, comp_ids, is_datetimelike, **kwargs)
File "/Users/wangqian/anaconda3/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 2479, in wrapper
return f(afunc, *args, **kwargs)
File "/Users/wangqian/anaconda3/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 2431, in <lambda>
kwargs.get('na_option', 'keep')
TypeError: 'NoneType' object is not callable
but, if group by "key" and rank "value_int", the error doesn't present:
In[10]: df.groupby("key")["value_int"].rank()
Out[10]:
0 1.0
1 2.0
2 1.0
3 2.0
4 3.0
Name: value_int, dtype: float64
if we just rank "value_str", the error dosen't present either:
In[11]: df["value_str"].rank()
Out[11]:
0 1.0
1 2.0
2 3.0
3 4.0
4 5.0
Name: value_str, dtype: float64
Lexicographcal ranking is not supported, hence the Error with groupby. IIRC there is an issue already to make that consistent across GroupBy and Series objects
Here's the original issue - #19560. Looks like there's a PR referenced there that hasn't been updated in a couple months, so if you are interested can reach out to the author and try to push over the finish line.
Closing this issue specifically as it is a duplicate
@WillAyd Thanks very much.
in fact, this error presents after I update pandas from v0.20 to v0.23. There is no error in version 0.20...
I'm having the exact same issue as @xinai57 has, after upgrading the pandas version to v0.23.
Out of curiosity what's the reasoning behind removing the ability for lexicographic ranking?
你好,我也遇到了同样的问题,df.groupby("key")["value_int"].rank(),这句话的意思是不是把key这一列按照value_int这一列的数值大小进行排序?如果是这样,字符串应该不能排序吧,是不是只支持数字排序?
Most helpful comment
@WillAyd Thanks very much.
in fact, this error presents after I update pandas from v0.20 to v0.23. There is no error in version 0.20...