Pandas: BUG/VIS: groupby.hist/plot() should pass group keys as labels

Created on 6 Feb 2014  路  12Comments  路  Source: pandas-dev/pandas

At least wherever possible.

In [2]: df = pd.DataFrame(np.random.randn(30, 2), columns=['A', 'B'])

In [3]: df['C'] = 15 * ['a'] + 15 * ['b']

In [4]: ax = df.groupby('C')['A'].hist()

hist_leg
Ideally the group keys would be the legend for the plot.
I can take this one.

Bug Enhancement Groupby Visualization

Most helpful comment

This is something I'd really like to see ... I took a look at the code, but I think the various pathways are way too confounding for me to tackle.

However, I did embark on a journey of exploration and compiled all the possible ways to do this and how they behave out of the box, plus another workaround that turns out to function nicely but isn't necessarily obvious. It might be useful for anybody who wants to tackle this issue. It's in a spreadsheet that is accessible (I think) here.

The workaround involves pivot: df.pivot(values='A', columns='C').plot.hist(stacked=True).

All 12 comments

I'm still working on this. The interface between the groupby and plotting methods is a bit messy.

Interestingly enough, df.groupby('C')['A'].hist() and df['A'].hist(by=df['C']) follow two completely different code paths, and produce different results.

>>>df.groupby('C')['A'].hist()
C
a    Axes(0.552174,0.15;0.347826x0.75)
b    Axes(0.552174,0.15;0.347826x0.75)
dtype: object

group_then_hist

and

>>>df['A'].hist(by=df['C'])
array([<matplotlib.axes.AxesSubplot object at 0x1111bba10>,
       <matplotlib.axes.AxesSubplot object at 0x1111ef710>], dtype=object)

hist_then_group

Is there currently a hack to get a legend for plots like this (i.e. the first two plots where histograms are on the same axis)? At present I have no way of knowing which histogram belongs to which series.

Sorry, haven't gotten around to fixing this. My current workaround is to do the groupby and then iterate over the groups:

groups = df.groupby("age_bin")['Impressions']

fig, ax = plt.subplots()

for k, v in groups:
    v.hist(label=k, alpha=.75, ax=ax)

ax.legend()

That will give you something like

hist

@TomAugspurger fixable for 0.14? push? wont-fix?

@jreback pushing

ok......(of course if you do fix by release time, then can pull forward)

@TomAugspurger this is actually also true for Groupby.plot(), this also does not show a legend

Is this issue abandoned?

Still open. PRs welcome if it's something you'd use.

This is something I'd really like to see ... I took a look at the code, but I think the various pathways are way too confounding for me to tackle.

However, I did embark on a journey of exploration and compiled all the possible ways to do this and how they behave out of the box, plus another workaround that turns out to function nicely but isn't necessarily obvious. It might be useful for anybody who wants to tackle this issue. It's in a spreadsheet that is accessible (I think) here.

The workaround involves pivot: df.pivot(values='A', columns='C').plot.hist(stacked=True).

Is there a way to plot these histograms on a 3rd axis as a 3D plot or on subplots to have a better comparison of each?

@judimaci I haven't looked at this again since last year, but at that time at least it was actually quite easy to make subplots, just not intuitive how to do it. If you want to plot values of a column 'A' grouped by categories in column 'C', something along the lines of df.pivot(values='A', columns='C').plot.hist(subplots=True) should work.

Was this page helpful?
0 / 5 - 0 ratings