Hi,
I just updated plotly.js from 1.30 to the latest 1.33, and my plot is broken.
I’m rendering about 75000 points with a scattergl plot, and after the update, it looks like most of the points are not rendered at all. What is odd though, is that when I hover over the plot, I can still see popups showing when my mouse is over places where points are supposed to be.
After trying different things, I noticed everything is rendering fine up to 10000 data points. If I go just above, say 10001, my plot is broken and most of the points are not rendered!
Everything worked fine with plotly.js 1.30.
I made two versions of my code, one with 10000 points, and the other with 10001. See for yourself: https://0.x2a.yt/other/private/plotly-test/10000.html
https://0.x2a.yt/other/private/plotly-test/10001.html
(This bug report is a copy/paste from the following forum topic: https://community.plot.ly/t/regression-missing-data-in-scattergl-plot-when-i-go-juste-above-10000-points/8152)
cc @dfcreative
That is a very good edge case for snap-points-2d, thank you @mlaily. In 1.30.0 we use _gl-vis fancy-scattergl_, which has no points snapping enabled, therefore points are rendered to the plot directly, although slow, in terms of interactions.
Now at 1e5 points the quadtree algorithm for points clustering gets triggered, which is good for cases of randomish distributions, but not good for linearly aligned points.
We can consider increasing TOO_MANY_POINTS constant to defer optimized rendering mode, but essentially that won't fix the problem.
Alternately, that can be addressed in upcoming _point-cluster_ component, although we would have to think how to detect proper clustering mode.
Alternately, we can provide snap flag or number for scattergl trace, disabling/thresholding point snapping.
Looking at script, we can disable snapping for datetime data by default, since it tends to be regular.
Haha, you are welcome! :D
If I understand correctly, fixing the general case seems hard, but I think I would be happy with an option to disable clustering altogether.
That said, maybe I'm not using the optimal representation for my data. Would you have any advice by any chance?
The data set is my last.fm listening history, and I'm trying to reveal interesting trends over time.
I first thought using a heat map would be the best representation, but I found it not detailed enough, or lacking the webgl performance if I use too many points...
Using a scatter plot also allows me to use colors to differentiate between different artists or albums.
I think some kind of timeline plot with a sliding window would be better, but if I recall correctly, I could not find how to do that with plotly with acceptable performance.
Do you think there might be a more appropriate plot type?
Would you have any advice to improve performance on the current scattergl plot? The full data set (75000 points) is getting a bit slow. The text properties (I truncated it in my examples) has a lot of repetitive data, but I don't know how to avoid it...
Sorry if this is out of place for this issue. Feel free to say so and I will try to find a better place.
@mlaily can you please show a codepen with the example where it gets slow?
Unfortunately I cannot give qualified advice on picking right plot type for your data, I'd recommend reading Edward Tufte books for that. Or just playing around with different plot types, that is win both for us and you :)
Here is a version with all the data, using v1.30.1 of plotly: https://0.x2a.yt/other/private/plotly-test/all.html
(I can't use the up to date version since it "cheats" with clustering)
The initial delay is quite long but I guess that's to be expected. After that, the performance is good with the latest firefox, but very bad with the latest chrome.
(my dataset is too large for a codepen)
EDIT: you know what? forget about it, I'm a dumbass. I disabled hardware acceleration a while ago in chrome and forgot to put it back on -_-.
Alternately, that can be addressed in upcoming point-cluster component, although we would have to think how to detect proper clustering mode.
This would be ideal.
I don't know when you will be able to work on this, but in the meantime, a workaround is to look for snap: 1e4 in plotly-gl2d.js or directly in plolty.js, and increase the value to something more appropriate.
(I'm not entirely sure what you meant with the TOO_MANY_POINTS constant. It does not seem related)
@dy does the new point-cluster version (merged in https://github.com/plotly/plotly.js/pull/2499) help with this issue in any way?
That seems to be fixed with point-cluster.
snapping enabled:

snapping disabled:

This issue seems to still be occuring:
https://codepen.io/anon/pen/zmQzOO
This should plot a point every millisecond. If you zoom in there are a few gaps (although you can hover over the ghost points):

The new cutoff - for this shape data anyway, in Chrome on my mac - seems to be >=75564 we get some gaps, < there are no gaps. That's a number I haven't seen before 🤔 That cutoff holds for date or numeric data, and any size plot, but interestingly if I change y to bilevel the cutoff drops to 68379 https://codepen.io/alexcjohnson/pen/MPdLjV
So the issue isn't quite the same, but symptoms are similar enough that I'll reopen
Another example from the reports in https://github.com/plotly/plotly.js/issues/3413:
The problem is most likely in https://github.com/dy/point-cluster
I'd suggest changing maxDepth to see if that affects the issue.
Thanks for the hint @dy !!
Using https://codepen.io/alexcjohnson/pen/MPdLjV from https://github.com/plotly/plotly.js/issues/2334#issuecomment-434391207:
with maxDepth as it is right now (=255) gives:

with maxDepth=10, we get:

as expected, but with probably far worse panning perf when getting closer to 1e6 pts
@etpinard not necessarily - maxDepth handles edge cases with multiple points at the same coordinate. Making that number ‘127’ should be sufficient too. In fact I rarely saw more than 20 levels for real data, for 1e6 random points we had around 13 levels.
With maxDepth=10 it is possible that the artifact is at the beginning of the dataset. Anyways that’s def a bug.
WIP branch with maxDepth: 15 (the larger number that makes https://codepen.io/alexcjohnson/pen/MPdLjV render ok):
https://github.com/plotly/plotly.js/compare/scattergl-lower-maxdepth
some image tests are failing:

more investigation will be needed.
PR https://github.com/plotly/plotly.js/pull/3578 (set to be released in 1.45.0) fixes the problems reported in:
That solution probably isn't the end of this story. I suspect some graphs with more than 1e5 pts may have "missing" pts due to incorrect clustering, so I'll leave this issue open.
FWIW, I can confirm that this update has resolved the issues for me reported in #2334, using dash==0.39.0, which uses plotly 1.45.0
I can confirm that there are still issues around the 100k - point threshold.
See my issue #3405 for details.
If you create a trace with more than 100k points and then use Plotly.react to change it to have less than 100k points, many of the points that should be rendered in the second trace will not be rendered.
The threshold used to be 10k, but now appears to be 100k.
Minimal codepen to illustrate the bug.
For what it's worth, here's the code I've been using as my mitigation:
// When rendering/updating
if (this.plotlyBug(data)) {
Plotly.newPlot(this.node, data, layout, options)
} else {
Plotly.react(this.node, data, layout, options)
}
//Utility function to detect when the bug would occur
_plotlyBug = function(newData) {
var oldData = this.node.data
var oldSizes = _.map(oldData, trace => trace.x.length)
var newSizes = _.map(newData, trace => trace.x.length)
var plotBug = false
for(var i = 0; i < oldSizes.length; i++) {
if ((oldSizes[i] > 100000) && (newSizes[i] <= 100000)) {
plotBug = true
}
}
return plotBug
}
Basically detects the situation and then calls newPlot instead of react when appropriate. The issue disappears entirely if you always call newPlot but I wanted to take advantage of the extra performance of react for most cases.
This issue seems to still be occuring:
https://codepen.io/anon/pen/zmQzOO
This should plot a point every millisecond. If you zoom in there are a few gaps (although you can hover over the ghost points):
Possible fix illustrated in codepen
Minimal codepen to illustrate the bug.
Candidate fix demo
Most helpful comment
I'd suggest changing
maxDepthto see if that affects the issue.