Plotly.js: Subpar scattergl performance with date axis

Created on 11 Apr 2016  Â·  9Comments  Â·  Source: plotly/plotly.js

It seems as though scattergl with a date axis chokes around 250K points — http://codepen.io/cpsievert/pen/WwMpJW

I found that a bit surprising since ~1M points works great with non-date axes

performance

Most helpful comment

Make the (non-fancy) gl-vis/gl-scattergl handle dates, as, at the end of the day, it's also a linear scale except ticks are sampled and rendered differently; the plot itself remains the same. Making this plot work with dates probably needs work in gl-vis. Probably this is the fastest option, but it won't help if users bump into the same limitation for other reasons (e.g. log scale or arrays for marker sizes/colors)

is the winner.

All 9 comments

Not surprising on my end.

scattergl uses one of two gl-vis modules depending on the user data:

  • gl-scatter2d is designed for very large datasets (> 1e6 pts) but of limited dimension. For example, marker.color and marker.size arrays aren't allowed.
  • gl-scatter2d-fancy doesn't perform as well for very large datasets, but mimics all the available plotly.js scatter options.

The split between the two gl-vis modules happens here. You'll notice non-linear axis types are considered _fancy_.

Moreover, converting a date to coordinate routine could be optimized. I suspect that we could potentially save off a few milliseconds in this step.

I'll start on it early next week; an initial hunch is that dates are too expensive in JS, so for performance it's best to convert to some numeric representation (e.g. date.valueOf()) and convert back to a real date object only at the final stage for axis tick determination (and possibly cache it). Will know more next week.

Yes there's a big speed difference and as mentioned by @etpinard it comes down to the fact that currently, the _much_ slower gl-scatter2d-fancy renderer is being used if the axis is of type date. Though conceptually, date axes are linear, in plotly they are not of type linear, which here means a linear _numeric_ scale.

There are several options:

  1. Acknowledge that it's slower (listed for completeness' sake, given current demand I think it's not realistic)
  2. Make the (non-fancy) gl-vis/gl-scattergl handle dates, as, at the end of the day, it's also a linear scale except ticks are sampled and rendered differently; the plot itself remains the same. Making this plot work with dates probably needs work in gl-vis. Probably this is the fastest option, but it won't help if users bump into the same limitation for other reasons (e.g. log scale or arrays for marker sizes/colors).
  3. Rewrite gl-scatter2d-fancy so that it's fast. Currently, it renders into a bona fide geometry mainly to be able to draw different point marker shapes. However it's possible to turn the (non-fancy) gl-vis/gl-scattergl into something that can render point marker shapes. The drawback is that the fancy version handles other things as well: arrays for marker styles (doable with more WebGL attrib arrays) log scales (am I missing something else?). So if we do this it makes sense to cover that so the fancy version can be dropped. Benefit: one fewer renderers.
  4. A combination of the previous two points above: rewrite both renderers e.g. in regl, retaining the features of both and the speed of the non-fancy version.

I'm leaning towards waiting for the regl rewrite to fix this issue.

@jackparmer thoughts?

@etpinard regarding the speeding options, we talked about different markers rendered by shaders. Something like this would obsolete the fancy version. I made a regl example here: http://codepen.io/monfera/full/GjOBkJ/

I'm leaning towards waiting for the regl rewrite to fix this issue.

I'm not excited about waiting 3-6 months to make datetimes work in WebGL. Pared down trace options for timeseries could make a lot of sense like pointcloud. The use case is loading and looking at ridiculously _huge_ time series data, then zooming into parts that look intersting/odd for investigation. Imagine time series data coming off sensors in cars or spacecraft on subsecond intervals... _Huge_ amounts of timestamped data. There would have to be some high quality decimation work in JavaScript to make this use case a reality (a decimated view of the time series is rendered at 0% zoom, this gets recalculated on zoom, etc).

Make the (non-fancy) gl-vis/gl-scattergl handle dates, as, at the end of the day, it's also a linear scale except ticks are sampled and rendered differently; the plot itself remains the same. Making this plot work with dates probably needs work in gl-vis. Probably this is the fastest option, but it won't help if users bump into the same limitation for other reasons (e.g. log scale or arrays for marker sizes/colors)

is the winner.

https://github.com/plotly/plotly.js/pull/1021 purports to fix it, I'm trying to think of a test case for this.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

tim-sauchuk picture tim-sauchuk  Â·  3Comments

emanuelsetitinger picture emanuelsetitinger  Â·  3Comments

jonmmease picture jonmmease  Â·  3Comments

boleslawmaliszewski picture boleslawmaliszewski  Â·  3Comments

HunterMcGushion picture HunterMcGushion  Â·  3Comments