At least wherever possible.
In [2]: df = pd.DataFrame(np.random.randn(30, 2), columns=['A', 'B'])
In [3]: df['C'] = 15 * ['a'] + 15 * ['b']
In [4]: ax = df.groupby('C')['A'].hist()
Ideally the group keys would be the legend for the plot.
I can take this one.
I'm still working on this. The interface between the groupby and plotting methods is a bit messy.
Interestingly enough, df.groupby('C')['A'].hist()
and df['A'].hist(by=df['C'])
follow two completely different code paths, and produce different results.
>>>df.groupby('C')['A'].hist()
C
a Axes(0.552174,0.15;0.347826x0.75)
b Axes(0.552174,0.15;0.347826x0.75)
dtype: object
and
>>>df['A'].hist(by=df['C'])
array([<matplotlib.axes.AxesSubplot object at 0x1111bba10>,
<matplotlib.axes.AxesSubplot object at 0x1111ef710>], dtype=object)
Is there currently a hack to get a legend for plots like this (i.e. the first two plots where histograms are on the same axis)? At present I have no way of knowing which histogram belongs to which series.
Sorry, haven't gotten around to fixing this. My current workaround is to do the groupby and then iterate over the groups:
groups = df.groupby("age_bin")['Impressions']
fig, ax = plt.subplots()
for k, v in groups:
v.hist(label=k, alpha=.75, ax=ax)
ax.legend()
That will give you something like
@TomAugspurger fixable for 0.14? push? wont-fix?
@jreback pushing
ok......(of course if you do fix by release time, then can pull forward)
@TomAugspurger this is actually also true for Groupby.plot()
, this also does not show a legend
Is this issue abandoned?
Still open. PRs welcome if it's something you'd use.
This is something I'd really like to see ... I took a look at the code, but I think the various pathways are way too confounding for me to tackle.
However, I did embark on a journey of exploration and compiled all the possible ways to do this and how they behave out of the box, plus another workaround that turns out to function nicely but isn't necessarily obvious. It might be useful for anybody who wants to tackle this issue. It's in a spreadsheet that is accessible (I think) here.
The workaround involves pivot: df.pivot(values='A', columns='C').plot.hist(stacked=True)
.
Is there a way to plot these histograms on a 3rd axis as a 3D plot or on subplots to have a better comparison of each?
@judimaci I haven't looked at this again since last year, but at that time at least it was actually quite easy to make subplots, just not intuitive how to do it. If you want to plot values of a column 'A' grouped by categories in column 'C', something along the lines of df.pivot(values='A', columns='C').plot.hist(subplots=True)
should work.
Most helpful comment
This is something I'd really like to see ... I took a look at the code, but I think the various pathways are way too confounding for me to tackle.
However, I did embark on a journey of exploration and compiled all the possible ways to do this and how they behave out of the box, plus another workaround that turns out to function nicely but isn't necessarily obvious. It might be useful for anybody who wants to tackle this issue. It's in a spreadsheet that is accessible (I think) here.
The workaround involves pivot:
df.pivot(values='A', columns='C').plot.hist(stacked=True)
.