Seaborn: Scale violinplot area with count?

Created on 1 Jul 2016  路  3Comments  路  Source: mwaskom/seaborn

Thank you Michael for the very beautiful Seaborn package.

I am creating split violinplots, like the 6th figure on this link:
https://stanford.edu/~mwaskom/software/seaborn/generated/seaborn.violinplot.html

There is an option to scale the violin _width_ by the number of observations in each bin, but surprisingly, no option to scale the violin area by the number of observations in each bin. Can this option be added?

Most helpful comment

Sorry for the spam -- I implemented it into my local _ViolinPlotter_ as follows. I think it may be of use to the general community, in the case of the situation I described previously:

def scale_count_area(self, density, max_density, counts, scale_hue):
    """Scale each density curve by the number of observations."""
    if self.hue_names is None:
        for count, d in zip(counts, density):
            d /= d.max()
            d *= count / counts.max()
    else:
        for i, group in enumerate(density):
            for j, d in enumerate(group):
                count = counts[i, j]
                if scale_hue:
                    scaler = count / counts[i].max()
                    max = max_density[i].max()
                else:
                    scaler = count / counts.max()
                    max = max_density.max()
                if d.size > 1:
                    d /= max
                    d *= scaler

All 3 comments

No, all scaling is done by width. I'm not sure why you'd want to scale by area.

I am comparing datasets where the counts in each bin is different, and is a relevant parameter to visualize.

For example, on a split violin plot, I have on the left Set A that is broadly-distributed (high variance) but with small count, and Set B, which is narrowly-distributed but with large count. If I plot the split violinplot and normalize by count, the area for Set A is much larger than for Set B, which gives a misleading impression that Set A is much larger than Set B.

Sorry for the spam -- I implemented it into my local _ViolinPlotter_ as follows. I think it may be of use to the general community, in the case of the situation I described previously:

def scale_count_area(self, density, max_density, counts, scale_hue):
    """Scale each density curve by the number of observations."""
    if self.hue_names is None:
        for count, d in zip(counts, density):
            d /= d.max()
            d *= count / counts.max()
    else:
        for i, group in enumerate(density):
            for j, d in enumerate(group):
                count = counts[i, j]
                if scale_hue:
                    scaler = count / counts[i].max()
                    max = max_density[i].max()
                else:
                    scaler = count / counts.max()
                    max = max_density.max()
                if d.size > 1:
                    d /= max
                    d *= scaler
Was this page helpful?
0 / 5 - 0 ratings

Related issues

bondarevts picture bondarevts  路  3Comments

Bercio picture Bercio  路  3Comments

tritemio picture tritemio  路  3Comments

btyukodi picture btyukodi  路  3Comments

amelio-vazquez-reina picture amelio-vazquez-reina  路  4Comments