Pandas: DataFrameGroupBy.quantile raises for non-numeric dtypes rather than dropping columns

Created on 13 Aug 2019 · 10Comments · Source: pandas-dev/pandas

In pandas 0.24.x, we had

In [1]: import pandas as pd

In [2]: pd.DataFrame({"A": ['a', 'b']}).groupby([0, 0]).quantile()
Out[2]:
Empty DataFrame
Columns: []
Index: []

In 0.25.0, we have

In [3]: pd.DataFrame({"A": ['a', 'b']}).groupby([0, 0]).quantile()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-8152ffc0932b> in <module>
----> 1 pd.DataFrame({"A": ['a', 'b']}).groupby([0, 0]).quantile()

~/sandbox/pandas/pandas/core/groupby/groupby.py in quantile(self, q, interpolation)
   1908             post_processing=post_processor,
   1909             q=q,
-> 1910             interpolation=interpolation,
   1911         )
   1912

~/sandbox/pandas/pandas/core/groupby/groupby.py in _get_cythonized_result(self, how, grouper, aggregate, cython_dtype, needs_values, needs_mask, needs_ngroups, result_is_index, pre_processing, post_processing, **kwargs)
   2236                 vals = obj.values
   2237                 if pre_processing:
-> 2238                     vals, inferences = pre_processing(vals)
   2239                 func = partial(func, vals)
   2240

~/sandbox/pandas/pandas/core/groupby/groupby.py in pre_processor(vals)
   1875             if is_object_dtype(vals):
   1876                 raise TypeError(
-> 1877                     "'quantile' cannot be performed against 'object' dtypes!"
   1878                 )
   1879

TypeError: 'quantile' cannot be performed against 'object' dtypes!

This is most relevant for mixed dataframes

In [6]: df = pd.DataFrame({"A": [0, 1], 'B': ['a', 'b']})

In [7]: df.groupby([0, 1]).quantile(0.5)

...

TypeError: 'quantile' cannot be performed against 'object' dtypes!

Bug Groupby Regression quantile

Source

TomAugspurger

Most helpful comment

Ideally, we would match the 0.24.2 behavior. That can roughly be described as "attempt the quantile, but skip any columns that raise an error". But that may not be easily doable with the new quantile implementation.

TomAugspurger on 19 Aug 2019

👍2

All 10 comments

Unfortunately, we can't just exclude object dtype. We apparently used to try to do the quantile, and caught any exceptions

# in 0.24.2
In [3]: pd.DataFrame({"A": ['a', 'b']}, dtype=object).groupby([0, 0]).quantile()
Out[3]:
Empty DataFrame
Columns: []
Index: []

I don't know if that behavior is worth preserving.

TomAugspurger on 13 Aug 2019

What is the desired behavior here? Some possibilities:

Drop the columns that have an object data type;
Drop the columns that have an object data type, but warn the user about the use of quantile (something similar to numpy warnings when applying log to a list that contains zero);
Do not drop the column with object data type, but return a column with NaNs;
Interrupt execution.

Salompas on 16 Aug 2019

TomAugspurger on 19 Aug 2019

👍2

I'm not sure anyone will get to this before 0.25.1. I'll leave it at that milestone, but we can push if needed.

TomAugspurger on 19 Aug 2019

When will 0.25.1 be released? I will try to understand what is happening with the quantile function.

Salompas on 19 Aug 2019

0.25.1 is targeted for this Wednesday.

On Mon, Aug 19, 2019 at 12:52 PM Guilherme Salomé notifications@github.com
wrote:

When will 0.25.1 be released? I will try to understand what is happening
with the quantile function.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/pandas-dev/pandas/issues/27892?email_source=notifications&email_token=AAKAOIRWUFHC6ZYSINHX5QTQFLMUFA5CNFSM4ILGPO32YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4TYWMI#issuecomment-522685233,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAKAOIXS7UHPCUD4G4VHPH3QFLMUFANCNFSM4ILGPO3Q
.

TomAugspurger on 19 Aug 2019

@WillAyd do you think this is doable for 1.0? You had a recent refactor for quantile right?

TomAugspurger on 12 Nov 2019

Haven’t looked at this. I don’t object to pushing unless a community PR picks it up

Sent from my iPhone

On Nov 12, 2019, at 9:20 AM, Tom Augspurger notifications@github.com wrote:

@WillAyd do you think this is doable for 1.0? You had a recent refactor for quantile right?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.