Pandas: Feature request: add median, mode & number of unique entries to pandas.DataFrame.describe()

Created on 30 Apr 2014 · 6Comments · Source: pandas-dev/pandas

Would come in handy for a number of a different applications from basic statistics, to understanding one's data, to estimating machine learning algorithmic load.

dupe of #2749

Enhancement Numeric Reshaping

Source

tantrev

Most helpful comment

Note that median is already in describe, as in 50%

jorisvandenbossche on 10 Feb 2016

👍5

All 6 comments

FYI, you can easily make your own and patch it in. Just put in your startup code / application.

def describe(self):
    """ describe  of a series """
    l = [ ('nobs'  , len(self.index)),
          ('valid' , self.count()   ),
          ('mean'  , self.mean()    ),
          ('min'   , self.min()     ),
          ('max'   , self.max()     ),
          ('std'   , self.std()     ),
          ('10%'   , self.quantile(0.10)),
          ('25%'   , self.quantile(0.25)),
          ('50%'   , self.median()  ),
          ('75%'   , self.quantile(0.75)),
          ('90%'   , self.quantile(0.90)),
          ('skew'  , self.skew()    ),
          ('kurt'  , self.kurt()    ) ]
    s = Series(dict(l), index = [ k for k, v in l ])
    s[s.abs()<0.000001] = 0.0
    return s
Series.describe = describe

jreback on 30 Apr 2014

❤1

median out-of-the-box would really be a great thing to have!
Not just for Series but also for GroupBy's.

soupault on 10 Feb 2016

Note that median is already in describe, as in 50%

jorisvandenbossche on 10 Feb 2016

👍5

@jorisvandenbossche indeed! Sorry, my bad. :baby:

soupault on 10 Feb 2016

Hijacking this issue to propose a variant (and steal more from dplyr)

Why not have a DataFrame.agg that is identical to DataFrame.groupby.agg, but with a "single group"? This also goes along with the new .resample.agg, yay for synergy. Basically it takes a function/str or list of functions/strs or a dict of column names to functions/str and aggregates accordingly.