I'm trying to plot about 4000 points, takes about 20 minutes on a macbook pro. Violin plot on the same data takes no perceptible time. Why is swarmplot slow?
Because swarmplot and violinplot are ... not doing the same thing?
@mwaskom Is there a reason swarmplot should be so slow?
The second line of the documentation says
This function is similar to stripplot(), but the points are adjusted (only along the categorical axis) so that they don鈥檛 overlap. This gives a better representation of the distribution of values, although it does not scale as well to large numbers of observations (both in terms of the ability to show all the points and in terms of the computation needed to arrange them).
I know this is an post but @mwaskom you have to consider few things in support of @zfrenchee question.
I swarmplot ~4,000 rows: it takes 30 seconds -> | 1/4 [00:14<00:44, 14.86s/it]
i swarmplot ~10,000 rows: it takes 22 minutes.
@alegolas79 If you really have to plot thousands of points with swarmplot you can take a look at: _SwarmPlotter. You can try speeding it up for example by trying to use numba (there are some normal python loops with lists and append in _SwarmPlotter - something that numba should be good in optimizing). One you are successful you may create a separate repo/gist demonstrating the speed differences. Even if this is not accepted in seaborn (would require soft dependency on numba, which would increase maintenance load) other visitors to this issue would benefit.
Most helpful comment
I know this is an post but @mwaskom you have to consider few things in support of @zfrenchee question.
I swarmplot ~4,000 rows: it takes 30 seconds -> | 1/4 [00:14<00:44, 14.86s/it]
i swarmplot ~10,000 rows: it takes 22 minutes.