I would like to generate violin plots for truncated distributions, e.g. for efficiency scores which are always between 0 and 100%. My current approach is to use the parameter cut=0 when calling sns.violinplot, but I think that a more informative approach is to reflect the density at the truncation point, so that, for example, the area which would be drawn below zero in an unrestricted kde will appear above zero in the truncated version.
Here is a little example that illustrates my concern and a potential solution:
import numpy as np, pandas as pd, pymc as pm, matplotlib.pyplot as plt, seaborn as sns
%matplotlib inline
np.random.seed(12345)
df = pd.DataFrame(np.random.normal(size=(10,3)).clip(0,5))
sns.violinplot(data=df)

Note the disturbing non-zero density on negative values. Fixing this with cut=0 looks like this:
sns.violinplot(data=df, cut=0)

No more positive density outside the support of the data. But this truncated normal should have maximum density at zero, and the feature I am requesting is a way to ask for that. Here is a very hacky way to get something that would satisfy me:
t = sns.categorical._ViolinPlotter.fit_kde
def reflected_once_kde(self, x, bw):
kde, bw_used = t(self, x, bw)
kde_evaluate = kde.evaluate
def zero_to_five_truncated_kde_evaluate(x):
val = kde_evaluate(x)
val += kde_evaluate(-x)
val += kde_evaluate(5-(x-5))
return np.where((x<0)|(x>5), 0, val)
kde.evaluate = zero_to_five_truncated_kde_evaluate
return kde, bw_used
sns.categorical._ViolinPlotter.fit_kde = reflected_once_kde
sns.violinplot(data=df, cut=0)

There is a previous feature request that asks for something similar at #244 which was closed when the implementation was overhauled in #410. Perhaps @PierreBdR or @mwaskom has some input about if and how my feature should be implemented.
I am up for doing some amount of work on this if it would be a welcome addition to Seaborn.
In general, I would prefer that this kind of complexity in statistical estimation live upstream in the actual statistics libraries. Is truncated kernel density estimation only useful for visualization? Seems like it could be a nice addition to statsmodels.
I don鈥檛 know if it is used widely, but I did find a description of the approach I鈥檝e described in a book: Bernard W. Silverman, Density Estimation for Statistics and Data Analysis, 1986 (p. 30). This is the approach used in the benchmarking R package in the eff.dens.plot function.
Perhaps a middle road is for seaborn to expose a way to pass in a user-specified density estimator instead of the default now used in .fit_kde.
I don鈥檛 know if it is used widely, but I did find a description of the approach I鈥檝e described in a book: Bernard W. Silverman, Density Estimation for Statistics and Data Analysis, 1986 (p. 30). This is the approach used in the benchmarking R package in the eff.dens.plot function.
Not criticizing the approach, just saying that complicated stats should live in a stats package, not a visualization package.
No worries, although it is really not to complicated. But I understand the desire to keep the stats code out of the viz package.
Sure, the stats themselves might not be too complicated, but the implementation here (with the monkey patching) certainly is a hack. Getting this working properly in seaborn itself would likely mean implementing a full kde fit in seaborn, which I wouldn't be in favor of.
But this really does seem like it would be useful in statsmodels, and I would certainly add some compatibility in seaborn to allow for plots with bounded density estimation.
It looks like this work is already under way in statsmodels, although I'm not sure how the domain bounds will be specified: https://github.com/statsmodels/statsmodels/pull/2318
Closing this issue but feel free to poke at me when the truncated KDE lands in statsmodels.
Thanks for your work on this. In case anyone needs this sort of plot before the truncated KDE is finished, here is the monkey patch madness that I used in the end:
fit_kde_func = sns.categorical._ViolinPlotter.fit_kde
def reflected_once_kde(self, x, bw):
lb=0
ub=1
kde, bw_used = fit_kde_func(self, x, bw)
kde_evaluate = kde.evaluate
def truncated_kde_evaluate(x):
val = np.where((x>=lb)&(x<=ub), kde_evaluate(x), 0)
val += np.where((x>=lb)&(x<=ub), kde_evaluate(lb-x), 0)
val += np.where((x>lb)&(x<=ub), kde_evaluate(ub-(x-ub)), 0)
return val
kde.evaluate = truncated_kde_evaluate
return kde, bw_used
sns.categorical._ViolinPlotter.fit_kde = reflected_once_kde
sns.violinplot(np.random.normal(size=10).clip(0,np.inf), cut=0, inner=None)
It made my violins look like gyro meat, which I kind of like:

@aflaxman I hope the bounded KDE estimation will be very soon accepted as they provide more flexibility in how bounds are processed. For your solution, although is is a (good) way to do it, it depends on the kind of data you have. You need to think about the nature of the bounds, and if they exist because the negative values (for example) cannot exist, or if they are equivalent to the positive ones. Here, you are assuming that negative values are simply equivalent to their positive counter-part, and this is why reflective boundary condition apply. If this is not the case, you might need another method.
is there any updates or ways to do this? I would like to have something like the clip option in kdeplot for violinplot so that I don't have to move to ggplot.
Thanks
Most helpful comment
Thanks for your work on this. In case anyone needs this sort of plot before the truncated KDE is finished, here is the monkey patch madness that I used in the end:
It made my violins look like gyro meat, which I kind of like:
