In pandas 0.24.x, we had
In [1]: import pandas as pd
In [2]: pd.DataFrame({"A": ['a', 'b']}).groupby([0, 0]).quantile()
Out[2]:
Empty DataFrame
Columns: []
Index: []
In 0.25.0, we have
In [3]: pd.DataFrame({"A": ['a', 'b']}).groupby([0, 0]).quantile()
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-3-8152ffc0932b> in <module>
----> 1 pd.DataFrame({"A": ['a', 'b']}).groupby([0, 0]).quantile()
~/sandbox/pandas/pandas/core/groupby/groupby.py in quantile(self, q, interpolation)
1908 post_processing=post_processor,
1909 q=q,
-> 1910 interpolation=interpolation,
1911 )
1912
~/sandbox/pandas/pandas/core/groupby/groupby.py in _get_cythonized_result(self, how, grouper, aggregate, cython_dtype, needs_values, needs_mask, needs_ngroups, result_is_index, pre_processing, post_processing, **kwargs)
2236 vals = obj.values
2237 if pre_processing:
-> 2238 vals, inferences = pre_processing(vals)
2239 func = partial(func, vals)
2240
~/sandbox/pandas/pandas/core/groupby/groupby.py in pre_processor(vals)
1875 if is_object_dtype(vals):
1876 raise TypeError(
-> 1877 "'quantile' cannot be performed against 'object' dtypes!"
1878 )
1879
TypeError: 'quantile' cannot be performed against 'object' dtypes!
This is most relevant for mixed dataframes
In [6]: df = pd.DataFrame({"A": [0, 1], 'B': ['a', 'b']})
In [7]: df.groupby([0, 1]).quantile(0.5)
...
TypeError: 'quantile' cannot be performed against 'object' dtypes!
Unfortunately, we can't just exclude object dtype. We apparently used to try to do the quantile, and caught any exceptions
# in 0.24.2
In [3]: pd.DataFrame({"A": ['a', 'b']}, dtype=object).groupby([0, 0]).quantile()
Out[3]:
Empty DataFrame
Columns: []
Index: []
I don't know if that behavior is worth preserving.
What is the desired behavior here? Some possibilities:
object data type;object data type, but warn the user about the use of quantile (something similar to numpy warnings when applying log to a list that contains zero);object data type, but return a column with NaNs;Ideally, we would match the 0.24.2 behavior. That can roughly be described as "attempt the quantile, but skip any columns that raise an error". But that may not be easily doable with the new quantile implementation.
I'm not sure anyone will get to this before 0.25.1. I'll leave it at that milestone, but we can push if needed.
When will 0.25.1 be released? I will try to understand what is happening with the quantile function.
0.25.1 is targeted for this Wednesday.
On Mon, Aug 19, 2019 at 12:52 PM Guilherme Salomé notifications@github.com
wrote:
When will 0.25.1 be released? I will try to understand what is happening
with the quantile function.—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/pandas-dev/pandas/issues/27892?email_source=notifications&email_token=AAKAOIRWUFHC6ZYSINHX5QTQFLMUFA5CNFSM4ILGPO32YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4TYWMI#issuecomment-522685233,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAKAOIXS7UHPCUD4G4VHPH3QFLMUFANCNFSM4ILGPO3Q
.
@WillAyd do you think this is doable for 1.0? You had a recent refactor for quantile right?
Haven’t looked at this. I don’t object to pushing unless a community PR picks it up
Sent from my iPhone
On Nov 12, 2019, at 9:20 AM, Tom Augspurger notifications@github.com wrote:

@WillAyd do you think this is doable for 1.0? You had a recent refactor for quantile right?—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
looks like 0.25.3 still has the non-numeric issue...
Pushing.
Most helpful comment
Ideally, we would match the 0.24.2 behavior. That can roughly be described as "attempt the quantile, but skip any columns that raise an error". But that may not be easily doable with the new quantile implementation.