Hi Michael,
Just curious if you ever plan to add "hue" to distplot (and maybe also jointplot)?
For some analysis, it's useful to have histogram further segmented based on another categorical variable, for instance:
sns.set_style("whitegrid")
fig = plt.figure()
# Histogram using Seaborn
# ax_hist = fig.add_subplot(212)
# ax_hist.set_ylabel("Entity Count")
# sns.distplot(
# iris.petal_length,
# hist=True,
# kde=False,
# rug=False,
# bins=range(18,39,1),
# hist_kws={
# "cumulative": False,
# "stacked": True,
# "label": ["setosa", "versicolor", "virginica"],
# "color": ["b", "g", "r"]},
# kde_kws={
# "cumulative": True},
# ax = ax_hist)
# Histogram using matplotlib
ax_hist = fig.add_subplot(212)
ax_hist.set_ylabel("Entity Count")
ax_hist.hist(
[iris[iris.species == "setosa"].sepal_width,
iris[iris.species == "versicolor"].sepal_width,
iris[iris.species == "virginica"].sepal_width],
stacked = True,
rwidth = 1,
label = ["setosa", "versicolor", "virginica"],
color = ["g", "b", "r"],
alpha = 0.7)
# Cumulative density curve
ax_kde = ax_hist.twinx()
ax_kde.set_ylabel("Cumulative Density (%)")
sns.distplot(
iris.sepal_width,
kde=True,
hist=False,
kde_kws={
"cumulative": True},
color = "black",
ax = ax_kde)
# ax_kde.yaxis.tick_right()
# ax_kde.yaxis.set_ticks_position('right')
ax_kde.grid(None)
# Box plot on top
ax_box = fig.add_subplot(211, sharex=ax_hist)
sns.boxplot(
iris.sepal_width,
color = "r",
ax = ax_box)
fig.set_size_inches(10, 10)
# sns.despine()
This is quite easy in matplotlib, but it's hard to maintain visual consistence blending Seaborn and native matplotlib charts.
I have tried various approach tinkering with distplot to no avail, please kindly advice.
Thank you!
Yes, this would definitely make for a fantastic addition to Seaborn - I was trying to do this just now, but I don't have @jameshu2008's skill with Matplotlib.
Maybe eventually, but not in the near future.
If you want a hack for this for now, you can do this by passing a single column into sns.pairplot:
df = pd.DataFrame(np.random.randint(0, 10, (100, 2)), columns=["x", "y"])
sns.pairplot(df[["x", "y"]], hue="y", size=5)
However, I've found this only works sometimes--sometimes it tries to print the hue column anyway.
Better to use a FacetGrid.
g = sns.FacetGrid(df_rtn, hue="group")
g = g.map(sns.kdeplot, "variable")
or
g = sns.FacetGrid(df_rtn, hue="group")
g = g.map(sns.distplot, "variable")
Unfortunately that solution does not generate equal-sized bins between both groups.
@twiecki just add one line to get equal-sized bins between both groups:
_, bins = np.histogram(df["variable"])
g = sns.FacetGrid(df, hue="group")
g = g.map(sns.distplot, "variable", bins=bins)

@mwaskom
Maybe eventually, but not in the near future.
That would be cool. I don't know if enough time has passed yet:) And in the end the FacetGrid works nicely, it's just difficult to set up the first time.
3 years passed any chance this can be added?
No. To be honest, comments like that decrease my interest in doing so, rather than increase it.
I spent some time adding some features to the workaround (robustness to missing values, legend). Maybe someone else also finds it useful:
def distplot_with_hue(data=None, x=None, hue=None, row=None, col=None, legend=True, **kwargs):
_, bins = np.histogram(data[x].dropna())
g = sns.FacetGrid(data, hue=hue, row=row, col=col)
g.map(sns.distplot, x, **kwargs)
if legend and (hue is not None) and (hue not in [x, row, col]):
g.add_legend(title=hue)
Example usage:
titanic = sns.load_dataset('titanic')
distplot_with_hue(data=titanic, x='age', hue='sex', hist=False)
Or faceted on the class:
distplot_with_hue(data=titanic, x='age', hue='sex', hist=False, col='class')
@lbalazscs: aren't you missing the bins=bins part in map()?
Yeah, actually I made an improved version, but I didn't want to post it here until I am 100% happy with it, but then I got distracted by another project... Anyway, here it is my best workaround. The biggest limitation is that the areas under the KDE curves are normalized independently to one, which might give the wrong impression if the compared groups have different sizes. It would be nice the have something like the scale, scale_hue options of violinplot.
def distplot_fig(data, x, hue=None, row=None, col=None, legend=True, hist=False, **kwargs):
"""A figure-level distribution plot with support for hue, col, row arguments."""
bins = kwargs.pop('bins', None)
if (bins is None) and hist:
# Make sure that the groups have equal-sized bins
bins = np.histogram_bin_edges(data[x].dropna())
g = sns.FacetGrid(data, hue=hue, row=row, col=col)
g.map(sns.distplot, x, bins=bins, hist=hist, **kwargs)
if legend and (hue is not None) and (hue not in [x, row, col]):
g.add_legend(title=hue)
return g
Example:
g = distplot_fig(data=titanic, x='age', hue='sex', col='class')
The problem is that this is IMHO misleading plot because in reality there were twice as many males than females and more people on the third class, but you wouldn't tell based on this plot. The distortion doesn't appear with kde=False and hist=True, but the overlapping histograms might be harder to interpret.
g = distplot_fig(data=titanic, x='age', hue='sex', col='class', kde=False, hist=True)
Stacked histograms would be fine, but I found no way to combine matplotlib's stacked histograms with FacetGrid. With pure pandas it is possible to have stacked histograms, but then there is no hue parameter:
ax = titanic.pivot(columns='sex')['age'].plot(kind = 'hist', stacked=True, bins=16)
If anyone has ideas, I would love to hear them!
below could be done... in case of iris dataset as example:
sns.FacetGrid(iris, hue="species", size=5).map(sns.distplot, "petal_length")
Closed with #2125.
I'd like to remind everyone that dropping into a stale issue on an open source project and demanding that other people work to resolve it is rude behavior.
Most helpful comment
g = sns.FacetGrid(df_rtn, hue="group")
g = g.map(sns.kdeplot, "variable")
or
g = sns.FacetGrid(df_rtn, hue="group")
g = g.map(sns.distplot, "variable")