Dask: [Feature request] aggregate syntax and quantile computation

Created on 6 Mar 2020  路  2Comments  路  Source: dask/dask

Hi,

The Dask API provides a method to compute quantiles of Series:
https://docs.dask.org/en/latest/dataframe-api.html#dask.dataframe.DataFrame.quantile

This is already a great thing as quantile distributed computation is known to be a real challenge.

Unfortunately, this is not available using aggregate API.
https://docs.dask.org/en/latest/dataframe-groupby.html#aggregate

Would it be possible to provide quantile computation to Dask aggregate syntax?

Most helpful comment

Thank you for your answer.

Correct, this is in groupby context.

Here is a snippet of what I would do using pure Pandas:

In [3]: import pandas as pd

    def q25(s):
        return s.quantile(0.25)

    df_pandas = pd.DataFrame({"car": [1, 1, 2, 4, 4, 4], "speed": [1, 2, 3, 4, 5, 6]})
    df_pandas.groupby("car").speed.agg(["mean", q25])                       

Out[3]:
     mean   q25
car
1     1.5  1.25
2     3.0  3.00
4     5.0  4.50

All 2 comments

Can you provide a snippet with the input data and expected output? Is this in a groupby context, i.e. the Dask version of

In [12]: df = pd.DataFrame({"A": ['a', 'a', 'b'], "B": [1, 2, 3], "C": [4, 5, 4]})

In [13]: df.groupby("A").quantile()
Out[13]:
     B    C
A
a  1.5  4.5
b  3.0  4.0

One slight complication is that quantile isn't always an aggregation.

In [14]: df.groupby("A").quantile([0.5, 0.75])
Out[14]:
           B     C
A
a 0.50  1.50  4.50
  0.75  1.75  4.75
b 0.50  3.00  4.00
  0.75  3.00  4.00

It may still be doable though.

Thank you for your answer.

Correct, this is in groupby context.

Here is a snippet of what I would do using pure Pandas:

In [3]: import pandas as pd

    def q25(s):
        return s.quantile(0.25)

    df_pandas = pd.DataFrame({"car": [1, 1, 2, 4, 4, 4], "speed": [1, 2, 3, 4, 5, 6]})
    df_pandas.groupby("car").speed.agg(["mean", q25])                       

Out[3]:
     mean   q25
car
1     1.5  1.25
2     3.0  3.00
4     5.0  4.50

Was this page helpful?
0 / 5 - 0 ratings