Hi,
In my case I'm dealing with one big data table (about 50MB in csv form).
I'm generating multiple plots from that data: Points, Curves and etc.
And each time the subset of the data get's included into the plot (which I require so that my hovers would work).
So my notebook size grows up to 300MB and when I save it as HTML it becomes 90MB (which I don't really understand why...).
Is there any way to load data only once and inject it into notebook and then reuse it by pointing to it from each interactive plot, instead of duplicating it again and again.
Thank you!
Sounds like you want the various Bokeh plots in a notebook to share the same data source in JavaScript? I don't know if that's possible...
Yes, that's what I'm looking for. I thought that it should be possible at least as part of the JS. And Holoviews has multiple data classes (like Table) - so __init__ might have a way to "register" the data for future "aliasing" by others (like Point, Curve and etc.)...
This isn't really possible in the notebook as described, for a variety of reasons. The notebook imposes a number of constraints we have no control over, and it requires jumping through alot of hoops to get anything working at all.
I will try to outline the reasons and suggest other possible avenues but will warn up front that none of the ideas are either ideal or trivial.
In order to make output_notebook and show "just work" very simply in the common cases, it's necessary for every output cell to render its own, new Bokeh document. This is the source of duplication (in the HTML version, there is just a single document). So there are two possible approaches to improvement:
don't use output_notebook and show. Instead, build up a Document manually and explicitly, and use HTML and bokeh.embed.components to display the output. You'll have to create all the plots first before showing them at the end, and of course this will only work in the "classic" notebook. It will never work in JupyterLab because they prohibit execution of arbitrary JavaScript by design.
Collect all the plots you want to display in a single Bokeh layout, and use show for the single layout. This is constrained by any limitation of Bokeh layouts, again like the other idea, requires putting all the plots "at the end" instead of inline.
Both of these are fairly equivalent. They get you a single Bokeh doc, and hence minimal duplication.
Another route that is not quite available, but will be soon, might be to make a Bokeh Server app "up front", and show subcomponents inline wherever you want. This is future work though, and perhaps will require some "non-notebooky" code organization.
Since it sounds like your data is somewhat large, it might be advised to consider something like Datashader. Holoviews also makes it simple to have interactivity over datashaded plots, and if you wanted to have much smaller real glyphs for outliers, hovers, etc, then combining them is straightforward. One of Datashader's implicit features is bandwidth compression for larger data sets.
Going further afield, there are much more involved ideas such as:
create a Bokeh extension, and new notebook hook to he use it, that implements some kind of "single JS data store" idea. I will warn though that given the notebooks capability for arbitrary, and out of order execution, this can quickly become unreasonable in general (in the literal sense of "can't be effectively reasoned about what should happen") That's part of why Bokeh does not try to do this. But if you have a narrow and circumscribed use scenario, perhaps it could work for you.
probably more robust but probably also much more work, create a jupyter extension for displaying Bokeh plots in an entirely different way that does not require a new document for every cell. Bokeh might consider looking into this in the future, but not as long as we have to support both the classic notebook and jupyterlab. So, several years at a minimum (definitely Bokeh 2.0 or later territory). But again, someone with a specific use case might be able to tailor something to their needs now.
Sorry I don't have better news or suggestions.
Thank you, for this answer. Exactly what I was looking for!
The point is to create an HTML report - so I can't use datashader here (I actually used it for this report before - but was asked to produce non-python static report instead).
JS Data Store - sounds like a good solution and pretty aligned with my hopes - sad that it's not here yet...
Will hope to see Bokeh 2.0 sooner than later. Redesign is always good thing to do! Unfortunately it's also very costly...
For now, I'll experiment with single layout, document solution... The best I can do for now...
Please consider this issue as closed.
Thanks again!
All the Best Wishes!
Thanks for your thoughtful and thorough response @bryevdv. That was also my perspective on this. The only additional perspective I can add is that once we rewrite the HoloViews comms and widget manager as we have planned there may be scope to add a global datasource store at that level without necessarily burdening bokeh with additional complexity but again that will have to be carefully considered and I certainly couldn't promise it over any specific timescale.
Here is a method to share data between multiple cells:
http://nbviewer.jupyter.org/gist/ruoyu0088/74856f0564d5e5c3c4062e672a0c9f24
Thanks @ruoyu0088. We could consider building something based on that approach, so I'd be happy to reopen and make it a wishlist item.
Most helpful comment
This isn't really possible in the notebook as described, for a variety of reasons. The notebook imposes a number of constraints we have no control over, and it requires jumping through alot of hoops to get anything working at all.
I will try to outline the reasons and suggest other possible avenues but will warn up front that none of the ideas are either ideal or trivial.
In order to make
output_notebookandshow"just work" very simply in the common cases, it's necessary for every output cell to render its own, new Bokeh document. This is the source of duplication (in the HTML version, there is just a single document). So there are two possible approaches to improvement:don't use
output_notebookandshow. Instead, build up a Document manually and explicitly, and useHTMLandbokeh.embed.componentsto display the output. You'll have to create all the plots first before showing them at the end, and of course this will only work in the "classic" notebook. It will never work in JupyterLab because they prohibit execution of arbitrary JavaScript by design.Collect all the plots you want to display in a single Bokeh layout, and use
showfor the single layout. This is constrained by any limitation of Bokeh layouts, again like the other idea, requires putting all the plots "at the end" instead of inline.Both of these are fairly equivalent. They get you a single Bokeh doc, and hence minimal duplication.
Another route that is not quite available, but will be soon, might be to make a Bokeh Server app "up front", and show subcomponents inline wherever you want. This is future work though, and perhaps will require some "non-notebooky" code organization.
Since it sounds like your data is somewhat large, it might be advised to consider something like Datashader. Holoviews also makes it simple to have interactivity over datashaded plots, and if you wanted to have much smaller real glyphs for outliers, hovers, etc, then combining them is straightforward. One of Datashader's implicit features is bandwidth compression for larger data sets.
Going further afield, there are much more involved ideas such as:
create a Bokeh extension, and new notebook hook to he use it, that implements some kind of "single JS data store" idea. I will warn though that given the notebooks capability for arbitrary, and out of order execution, this can quickly become unreasonable in general (in the literal sense of "can't be effectively reasoned about what should happen") That's part of why Bokeh does not try to do this. But if you have a narrow and circumscribed use scenario, perhaps it could work for you.
probably more robust but probably also much more work, create a jupyter extension for displaying Bokeh plots in an entirely different way that does not require a new document for every cell. Bokeh might consider looking into this in the future, but not as long as we have to support both the classic notebook and jupyterlab. So, several years at a minimum (definitely Bokeh 2.0 or later territory). But again, someone with a specific use case might be able to tailor something to their needs now.
Sorry I don't have better news or suggestions.