Pandas: DataFrameGroupBy.quantile raises for non-numeric dtypes rather than dropping columns

Created on 13 Aug 2019  Â·  10Comments  Â·  Source: pandas-dev/pandas

In pandas 0.24.x, we had

In [1]: import pandas as pd

In [2]: pd.DataFrame({"A": ['a', 'b']}).groupby([0, 0]).quantile()
Out[2]:
Empty DataFrame
Columns: []
Index: []

In 0.25.0, we have

In [3]: pd.DataFrame({"A": ['a', 'b']}).groupby([0, 0]).quantile()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-8152ffc0932b> in <module>
----> 1 pd.DataFrame({"A": ['a', 'b']}).groupby([0, 0]).quantile()

~/sandbox/pandas/pandas/core/groupby/groupby.py in quantile(self, q, interpolation)
   1908             post_processing=post_processor,
   1909             q=q,
-> 1910             interpolation=interpolation,
   1911         )
   1912

~/sandbox/pandas/pandas/core/groupby/groupby.py in _get_cythonized_result(self, how, grouper, aggregate, cython_dtype, needs_values, needs_mask, needs_ngroups, result_is_index, pre_processing, post_processing, **kwargs)
   2236                 vals = obj.values
   2237                 if pre_processing:
-> 2238                     vals, inferences = pre_processing(vals)
   2239                 func = partial(func, vals)
   2240

~/sandbox/pandas/pandas/core/groupby/groupby.py in pre_processor(vals)
   1875             if is_object_dtype(vals):
   1876                 raise TypeError(
-> 1877                     "'quantile' cannot be performed against 'object' dtypes!"
   1878                 )
   1879

TypeError: 'quantile' cannot be performed against 'object' dtypes!

This is most relevant for mixed dataframes

In [6]: df = pd.DataFrame({"A": [0, 1], 'B': ['a', 'b']})

In [7]: df.groupby([0, 1]).quantile(0.5)

...

TypeError: 'quantile' cannot be performed against 'object' dtypes!

Bug Groupby Regression quantile

Most helpful comment

Ideally, we would match the 0.24.2 behavior. That can roughly be described as "attempt the quantile, but skip any columns that raise an error". But that may not be easily doable with the new quantile implementation.

All 10 comments

Unfortunately, we can't just exclude object dtype. We apparently used to try to do the quantile, and caught any exceptions

# in 0.24.2
In [3]: pd.DataFrame({"A": ['a', 'b']}, dtype=object).groupby([0, 0]).quantile()
Out[3]:
Empty DataFrame
Columns: []
Index: []

I don't know if that behavior is worth preserving.

What is the desired behavior here? Some possibilities:

  • Drop the columns that have an object data type;
  • Drop the columns that have an object data type, but warn the user about the use of quantile (something similar to numpy warnings when applying log to a list that contains zero);
  • Do not drop the column with object data type, but return a column with NaNs;
  • Interrupt execution.

Ideally, we would match the 0.24.2 behavior. That can roughly be described as "attempt the quantile, but skip any columns that raise an error". But that may not be easily doable with the new quantile implementation.

I'm not sure anyone will get to this before 0.25.1. I'll leave it at that milestone, but we can push if needed.

When will 0.25.1 be released? I will try to understand what is happening with the quantile function.

0.25.1 is targeted for this Wednesday.

On Mon, Aug 19, 2019 at 12:52 PM Guilherme Salomé notifications@github.com
wrote:

When will 0.25.1 be released? I will try to understand what is happening
with the quantile function.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/pandas-dev/pandas/issues/27892?email_source=notifications&email_token=AAKAOIRWUFHC6ZYSINHX5QTQFLMUFA5CNFSM4ILGPO32YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4TYWMI#issuecomment-522685233,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAKAOIXS7UHPCUD4G4VHPH3QFLMUFANCNFSM4ILGPO3Q
.

@WillAyd do you think this is doable for 1.0? You had a recent refactor for quantile right?

Haven’t looked at this. I don’t object to pushing unless a community PR picks it up

Sent from my iPhone

On Nov 12, 2019, at 9:20 AM, Tom Augspurger notifications@github.com wrote:


@WillAyd do you think this is doable for 1.0? You had a recent refactor for quantile right?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.

looks like 0.25.3 still has the non-numeric issue...

Pushing.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Abrosimov-a-a picture Abrosimov-a-a  Â·  3Comments

matthiasroder picture matthiasroder  Â·  3Comments

ericdf picture ericdf  Â·  3Comments

songololo picture songololo  Â·  3Comments

marcelnem picture marcelnem  Â·  3Comments