Seaborn: Distplot Supporting "Hue"?

Created on 12 Feb 2016 · 15Comments · Source: mwaskom/seaborn

Hi Michael,

Just curious if you ever plan to add "hue" to distplot (and maybe also jointplot)?

For some analysis, it's useful to have histogram further segmented based on another categorical variable, for instance:

sns.set_style("whitegrid")
fig = plt.figure()

# Histogram using Seaborn
# ax_hist = fig.add_subplot(212)
# ax_hist.set_ylabel("Entity Count")
# sns.distplot(
#     iris.petal_length,
#     hist=True,
#     kde=False,
#     rug=False,
#     bins=range(18,39,1),
#     hist_kws={
#         "cumulative": False,
#         "stacked": True,
#         "label": ["setosa", "versicolor", "virginica"],
#         "color": ["b", "g", "r"]},
#     kde_kws={
#         "cumulative": True},
#     ax = ax_hist)

# Histogram using matplotlib
ax_hist = fig.add_subplot(212)
ax_hist.set_ylabel("Entity Count")
ax_hist.hist(
    [iris[iris.species == "setosa"].sepal_width, 
     iris[iris.species == "versicolor"].sepal_width,
     iris[iris.species == "virginica"].sepal_width],
    stacked = True,
    rwidth = 1,
    label = ["setosa", "versicolor", "virginica"],
    color = ["g", "b", "r"],
    alpha = 0.7)

# Cumulative density curve
ax_kde = ax_hist.twinx()
ax_kde.set_ylabel("Cumulative Density (%)")
sns.distplot(
    iris.sepal_width, 
    kde=True,
    hist=False,
    kde_kws={
        "cumulative": True},
    color = "black",
    ax = ax_kde)
# ax_kde.yaxis.tick_right()
# ax_kde.yaxis.set_ticks_position('right')
ax_kde.grid(None)

# Box plot on top
ax_box = fig.add_subplot(211, sharex=ax_hist)
sns.boxplot(
    iris.sepal_width, 
    color = "r",
    ax = ax_box)

fig.set_size_inches(10, 10)

# sns.despine()

This is quite easy in matplotlib, but it's hard to maintain visual consistence blending Seaborn and native matplotlib charts.

I have tried various approach tinkering with distplot to no avail, please kindly advice.

Thank you!

Source

jameshu2008

👍27

Most helpful comment

g = sns.FacetGrid(df_rtn, hue="group")
g = g.map(sns.kdeplot, "variable")

g = sns.FacetGrid(df_rtn, hue="group")
g = g.map(sns.distplot, "variable")

citynorman on 3 Mar 2017

👍29

All 15 comments

Yes, this would definitely make for a fantastic addition to Seaborn - I was trying to do this just now, but I don't have @jameshu2008's skill with Matplotlib.

nsanthanam on 12 Jun 2016

Maybe eventually, but not in the near future.

mwaskom on 12 Jun 2016

If you want a hack for this for now, you can do this by passing a single column into sns.pairplot:

df = pd.DataFrame(np.random.randint(0, 10, (100, 2)), columns=["x", "y"])
sns.pairplot(df[["x", "y"]], hue="y", size=5)

However, I've found this only works sometimes--sometimes it tries to print the hue column anyway.

quantology on 27 Jul 2016

👍4 ❤1

Better to use a FacetGrid.

mwaskom on 27 Jul 2016

g = sns.FacetGrid(df_rtn, hue="group")
g = g.map(sns.kdeplot, "variable")

g = sns.FacetGrid(df_rtn, hue="group")
g = g.map(sns.distplot, "variable")

citynorman on 3 Mar 2017

👍29

Unfortunately that solution does not generate equal-sized bins between both groups.

twiecki on 17 Jan 2018

👍6

@twiecki just add one line to get equal-sized bins between both groups:

_, bins = np.histogram(df["variable"])
g = sns.FacetGrid(df, hue="group")
g = g.map(sns.distplot, "variable", bins=bins)

adrienrenaud on 19 Jan 2019

👍22

@mwaskom

Maybe eventually, but not in the near future.

That would be cool. I don't know if enough time has passed yet:) And in the end the FacetGrid works nicely, it's just difficult to set up the first time.

ajasja on 7 Feb 2019

3 years passed any chance this can be added?

eyadsibai on 27 Mar 2019

👍15 👎1

No. To be honest, comments like that decrease my interest in doing so, rather than increase it.

mwaskom on 27 Mar 2019

👎30 😕20 ❤4 👀2

I spent some time adding some features to the workaround (robustness to missing values, legend). Maybe someone else also finds it useful:

def distplot_with_hue(data=None, x=None, hue=None, row=None, col=None, legend=True, **kwargs):
    _, bins = np.histogram(data[x].dropna())
    g = sns.FacetGrid(data, hue=hue, row=row, col=col)
    g.map(sns.distplot, x, **kwargs)
    if legend and (hue is not None) and (hue not in [x, row, col]):
        g.add_legend(title=hue)

Example usage:

titanic = sns.load_dataset('titanic')
distplot_with_hue(data=titanic, x='age', hue='sex', hist=False)

Or faceted on the class:

distplot_with_hue(data=titanic, x='age', hue='sex', hist=False, col='class')

lbalazscs on 7 Aug 2019

👍10 ❤2

@lbalazscs: aren't you missing the bins=bins part in map()?

StefanUlbrich on 2 Nov 2019

Yeah, actually I made an improved version, but I didn't want to post it here until I am 100% happy with it, but then I got distracted by another project... Anyway, here it is my best workaround. The biggest limitation is that the areas under the KDE curves are normalized independently to one, which might give the wrong impression if the compared groups have different sizes. It would be nice the have something like the scale, scale_hue options of violinplot.

def distplot_fig(data, x, hue=None, row=None, col=None, legend=True, hist=False, **kwargs):
    """A figure-level distribution plot with support for hue, col, row arguments."""
    bins = kwargs.pop('bins', None)
    if (bins is None) and hist: 
        # Make sure that the groups have equal-sized bins
        bins = np.histogram_bin_edges(data[x].dropna())
    g = sns.FacetGrid(data, hue=hue, row=row, col=col)
    g.map(sns.distplot, x, bins=bins, hist=hist, **kwargs)
    if legend and (hue is not None) and (hue not in [x, row, col]):
        g.add_legend(title=hue) 
    return g

Example:

g = distplot_fig(data=titanic, x='age', hue='sex', col='class')

The problem is that this is IMHO misleading plot because in reality there were twice as many males than females and more people on the third class, but you wouldn't tell based on this plot. The distortion doesn't appear with kde=False and hist=True, but the overlapping histograms might be harder to interpret.

g = distplot_fig(data=titanic, x='age', hue='sex', col='class', kde=False, hist=True)

Stacked histograms would be fine, but I found no way to combine matplotlib's stacked histograms with FacetGrid. With pure pandas it is possible to have stacked histograms, but then there is no hue parameter:

ax = titanic.pivot(columns='sex')['age'].plot(kind = 'hist', stacked=True, bins=16)

If anyone has ideas, I would love to hear them!

lbalazscs on 2 Nov 2019

👍14

below could be done... in case of iris dataset as example:
sns.FacetGrid(iris, hue="species", size=5).map(sns.distplot, "petal_length")

2praveen on 14 Jun 2020

Closed with #2125.

I'd like to remind everyone that dropping into a stale issue on an open source project and demanding that other people work to resolve it is rude behavior.

mwaskom on 14 Jun 2020

👍6 👎1

Was this page helpful?

0 / 5 - 0 ratings