Pandas: BUG/VIS: groupby.hist/plot() should pass group keys as labels

Created on 6 Feb 2014 · 12Comments · Source: pandas-dev/pandas

At least wherever possible.

In [2]: df = pd.DataFrame(np.random.randn(30, 2), columns=['A', 'B'])

In [3]: df['C'] = 15 * ['a'] + 15 * ['b']

In [4]: ax = df.groupby('C')['A'].hist()

hist_leg
Ideally the group keys would be the legend for the plot.
I can take this one.

Bug Enhancement Groupby Visualization

Source

TomAugspurger

👍6

Most helpful comment

This is something I'd really like to see ... I took a look at the code, but I think the various pathways are way too confounding for me to tackle.

However, I did embark on a journey of exploration and compiled all the possible ways to do this and how they behave out of the box, plus another workaround that turns out to function nicely but isn't necessarily obvious. It might be useful for anybody who wants to tackle this issue. It's in a spreadsheet that is accessible (I think) here.

The workaround involves pivot: df.pivot(values='A', columns='C').plot.hist(stacked=True).

Jeitan on 9 Sep 2017

👍5

All 12 comments

I'm still working on this. The interface between the groupby and plotting methods is a bit messy.

Interestingly enough, df.groupby('C')['A'].hist() and df['A'].hist(by=df['C']) follow two completely different code paths, and produce different results.

>>>df.groupby('C')['A'].hist()
C
a    Axes(0.552174,0.15;0.347826x0.75)
b    Axes(0.552174,0.15;0.347826x0.75)
dtype: object

group_then_hist

and

>>>df['A'].hist(by=df['C'])
array([<matplotlib.axes.AxesSubplot object at 0x1111bba10>,
       <matplotlib.axes.AxesSubplot object at 0x1111ef710>], dtype=object)

hist_then_group

TomAugspurger on 13 Feb 2014

Is there currently a hack to get a legend for plots like this (i.e. the first two plots where histograms are on the same axis)? At present I have no way of knowing which histogram belongs to which series.

fonnesbeck on 19 Mar 2014

Sorry, haven't gotten around to fixing this. My current workaround is to do the groupby and then iterate over the groups:

groups = df.groupby("age_bin")['Impressions']

fig, ax = plt.subplots()

for k, v in groups:
    v.hist(label=k, alpha=.75, ax=ax)

ax.legend()

That will give you something like

hist

TomAugspurger on 19 Mar 2014

👍4

@TomAugspurger fixable for 0.14? push? wont-fix?

jreback on 6 Apr 2014

@jreback pushing

TomAugspurger on 28 Apr 2014

ok......(of course if you do fix by release time, then can pull forward)

jreback on 28 Apr 2014

@TomAugspurger this is actually also true for Groupby.plot(), this also does not show a legend

jorisvandenbossche on 12 Mar 2015

Is this issue abandoned?

mattayes on 22 Nov 2016

Still open. PRs welcome if it's something you'd use.

TomAugspurger on 24 Nov 2016

This is something I'd really like to see ... I took a look at the code, but I think the various pathways are way too confounding for me to tackle.

The workaround involves pivot: df.pivot(values='A', columns='C').plot.hist(stacked=True).

Jeitan on 9 Sep 2017

👍5

Is there a way to plot these histograms on a 3rd axis as a 3D plot or on subplots to have a better comparison of each?

judimaci on 20 Jun 2018

@judimaci I haven't looked at this again since last year, but at that time at least it was actually quite easy to make subplots, just not intuitive how to do it. If you want to plot values of a column 'A' grouped by categories in column 'C', something along the lines of df.pivot(values='A', columns='C').plot.hist(subplots=True) should work.

Jeitan on 25 Oct 2018

👍1

Was this page helpful?

0 / 5 - 0 ratings