I noticed that sometimes, depending on the data, KDE fitting takes a quite a bit to run (~ 3 minutes per plot). Are there any options that could limit, perhaps, the number of iterations, or still alternatively simplify the fitting process at the expense of accuracy?
Do you have statsmodels installed? I think if you do seaborn will use its FFT-based algorithm, which should be faster. Otherwise not really, although you could randomly subsample your own data.
Thank you @mwaskom. I have statsmodels installed, and I am actually noticing that distplot takes a long time even with rug=False, kde=False, and norm_hist=False. No idea what's to blame.
pip install --upgrade git+git://github.com/statsmodels/statsmodels@master)python
f, ax = plt.subplots(figsize=(6,6))
sns.distplot(my_array, rug=False, kde=False, norm_hist=False)
More data (OS X Yosemite):
$ uname -a:Darwin macbook-pro.my.company.net 14.3.0 Darwin Kernel Version 14.3.0: Mon Mar 23 11:59:05 PDT 2015; root:xnu-2782.20.48~5/RELEASE_X86_64 x86_64
If you have a huge dataset, it may just be drawing a large number of bins. In that case the delay is probably just matplotlib drawing all the bars.
It's probably not that useful to draw more than ~50 bins anyway, and in 0.6.dev the automatic calculation is capped at that value to avoid this.Â
—
Sent from Mailbox
On Mon, Jun 8, 2015 at 10:03 AM, Amelio Vazquez-Reina
[email protected] wrote:
Thank you @mwaskom. I have
statsmodelsinstalled, and I am actually noticing that distplot takes a long time even withrug=False,kde=False, andnorm_hist=False. No idea what's to blame.
- Here is pip freeze
- Here is the array
- And below is the code:
f, ax = plt.subplots(figsize=(6,6)) sns.distplot(change_df.dropna(subset=['diff'])['diff'].values, rug=False, kde=False, norm_hist=False)
Reply to this email directly or view it on GitHub:
https://github.com/mwaskom/seaborn/issues/587#issuecomment-110074176
Totally correct. Thank you @mwaskom Should have thought about that!
Most helpful comment
Thank you @mwaskom. I have
statsmodelsinstalled, and I am actually noticing that distplot takes a long time even withrug=False,kde=False, andnorm_hist=False. No idea what's to blame.pip install --upgrade git+git://github.com/statsmodels/statsmodels@master)python f, ax = plt.subplots(figsize=(6,6)) sns.distplot(my_array, rug=False, kde=False, norm_hist=False)More data (OS X Yosemite):
$ uname -a: