Seaborn: Weighting the samples of a kdeplot

Created on 26 Jan 2018  路  4Comments  路  Source: mwaskom/seaborn

Related:

It would be nice to be able to create a kdeplot where a weight can be assigned to each sample. Even though there is a hist_kws argument, it is not taken into account with the kernel density estimate.

Here's a hypothetical use-case: you have data recording the (income, donation) of a number of people to some charity, and now you want to visualize the univariate distribution of donation (y-axis) against income (x-axis). If you try to do this with a kdeplot, currently, the amount donated cannot be included in the density, i.e., each donation is treated uniformly.

My work-around has been to use the weights+observations to generate a million-or-so samples according to that distribution, and then pass those samples to the kdeplot. This tends to be slow/noisy, and awkward to implement.

By the way... thank you to the community for the very useful toolkit. I have used it in a recent publication (http://ieeexplore.ieee.org/document/7378880/) and cited seaborn accordingly. Incidentally, I had to use the above work-around of generating proxy samples for that paper in order to have the kdeplot weighted correctly.

Most helpful comment

It looks like scipy will with the 1.2.0 release support weights for KDE. The relevant pull request is here. Any chance of opening this issue again?

All 4 comments

The KDE is fit with scipy, which doesn't seem to support weights.

Your hypothetical use-case seems like something you want to be doing with a lowess regression, not a weighted KDE, anyway.

It looks like scipy will with the 1.2.0 release support weights for KDE. The relevant pull request is here. Any chance of opening this issue again?

Agree I think this should be reopened, given it's being worked on in #1747.

Closed in #2104 with the addition of weights to kdeplot as part of a very thorough rewrite.

Was this page helpful?
0 / 5 - 0 ratings