Seaborn: Scale violinplot area with count?

Created on 1 Jul 2016 · 3Comments · Source: mwaskom/seaborn

Thank you Michael for the very beautiful Seaborn package.

I am creating split violinplots, like the 6th figure on this link:
https://stanford.edu/~mwaskom/software/seaborn/generated/seaborn.violinplot.html

There is an option to scale the violin _width_ by the number of observations in each bin, but surprisingly, no option to scale the violin area by the number of observations in each bin. Can this option be added?

Source

wenhaosun

👍2

Most helpful comment

Sorry for the spam -- I implemented it into my local _ViolinPlotter_ as follows. I think it may be of use to the general community, in the case of the situation I described previously:

def scale_count_area(self, density, max_density, counts, scale_hue):
    """Scale each density curve by the number of observations."""
    if self.hue_names is None:
        for count, d in zip(counts, density):
            d /= d.max()
            d *= count / counts.max()
    else:
        for i, group in enumerate(density):
            for j, d in enumerate(group):
                count = counts[i, j]
                if scale_hue:
                    scaler = count / counts[i].max()
                    max = max_density[i].max()
                else:
                    scaler = count / counts.max()
                    max = max_density.max()
                if d.size > 1:
                    d /= max
                    d *= scaler

wenhaosun on 1 Jul 2016

👍3

All 3 comments

No, all scaling is done by width. I'm not sure why you'd want to scale by area.

mwaskom on 1 Jul 2016

👎2

I am comparing datasets where the counts in each bin is different, and is a relevant parameter to visualize.

For example, on a split violin plot, I have on the left Set A that is broadly-distributed (high variance) but with small count, and Set B, which is narrowly-distributed but with large count. If I plot the split violinplot and normalize by count, the area for Set A is much larger than for Set B, which gives a misleading impression that Set A is much larger than Set B.

wenhaosun on 1 Jul 2016

👍1

Sorry for the spam -- I implemented it into my local _ViolinPlotter_ as follows. I think it may be of use to the general community, in the case of the situation I described previously:

def scale_count_area(self, density, max_density, counts, scale_hue):
    """Scale each density curve by the number of observations."""
    if self.hue_names is None:
        for count, d in zip(counts, density):
            d /= d.max()
            d *= count / counts.max()
    else:
        for i, group in enumerate(density):
            for j, d in enumerate(group):
                count = counts[i, j]
                if scale_hue:
                    scaler = count / counts[i].max()
                    max = max_density[i].max()
                else:
                    scaler = count / counts.max()
                    max = max_density.max()
                if d.size > 1:
                    d /= max
                    d *= scaler

wenhaosun on 1 Jul 2016

👍3

Was this page helpful?

0 / 5 - 0 ratings