One of the benefits that lead me to altair was the ability to generate vega-lite json that can be used in the to generate and save charts locally and in the browser. I've been able to save via the browser and locally.
Viewing and saving in the browser is nice for single charts. Saving locally is nice for saving a lot of charts programically (~50 charts). When saving locally altiar launches a selenium instance and a session to render and save the chart. This can be a little slow.
To reuse the selenium resources as much as possible, is there a way I can save multiple charts at once? Something like this would satisfy my desire for a small speed increase:
alt.save(*chart_list)
That is not supported right now, but it would be a nice feature.
I think the best API would be to have a context manager that would create a persistent selenium instance that would be used by the code in headless.py, though I'm open to suggestions.
Edit: thinking on this more, I think alt.save(*charts) as @SyntaxRules suggested above is probably the right choice.
When this is addressed, we should update the documentation generation scripts to use it as well.
More thoughts. Saving a chart requires more args than just passing in the chart while alt.save(*charts) looks pretty it may have to be something more like alt.save(*charts, *file_paths, **kwargs) which is far less pretty. This also assumes that **kwargs is the same between different charts.
A more verbose version of this might be something like:
chart_save_opt = (chart, file_path, **kwargs)
chart_save_opt2 = (chart2, file_path2, **kwargs)
charts_to_save = [chart_save_opt, chart_save_opt2, ...]
alt.save(charts_to_save)
I'm not sure how I feel about having to construct and pass in save options separate to the chart. These options could be attached to the chart object with a save_options(...) method.
I'm thinking out loud and not particularly attached to any of these ideas. Maybe diving into the code will help me understand which paradigm altair leans towards.
Good point. Maybe a context manager would be better after all; e.g.
with alt.save_charts(webdriver='chrome'): # webdriver arg optional
chart1.save('chart1.png')
chart2.save('chart2.svg')
# etc.
What it would require is for Altair to have some global selenium instance available to the save() function. I think the implementation could be pretty clean as well.
It's not an answer to using the same instance, but another approach to improving the speed can be to use multiprocessing. I added about 5 lines of code based on using Multiprocessing, Map and Pool from this article on multiprocessing.
Generating graphs is great for parallel processing. (As an aside, using JSON for me is still 5-10x faster, and creates smaller downloads than PNGs.) And when I generate a thousand images I often get web driver related crashes. So JSON is also much more reliable.
Still it's nice to use all the processors on my computer.
I have a first version of this working over in my project vegasave. A simple use case is:
import vegasave
with vegasave.chart_driver() as chart_context:
vegasave.save(spec_json1, 'save_location1.png', driver=chart_context)
vegasave.save(spec_json2, 'save_location2.png', driver=chart_context)
To apply this to altair, two things have to happen:
The api for saving charts in altair would not change (except for the dependencies to save).
A new api for saving multiple charts would be introduced. Something like:
import altair as alt
chart1 = alt.Chart() ...
chart2 = alt.Chart() ...
with alt.chart_driver() as chart_context:
chart1.save('save_location1.png', driver=chart_context)
chart2.save('save_location2.png', driver=chart_context)
Does anyone have any improvements before I get to implementing this?
@jploudre I created an issue to address saving plots in parallel separately.
Apologies for waking up this thread, but since no solution was implemented in altair yet (as far as I can tell), I would like to point out that there's a good opportunity to get two scoops in one cone here.
I propose allowing Chart.save() to receive an initialized selenium webdriver instance instead of a string. This way one can reuse the same webdriver over multiple saves, avoiding the overhead of save() initializing a new one each time:
from selenium import webdriver
driver = webdriver.Chrome()
# Generate altair charts...
for i, chart in enumerate(chart_list):
chart.save(f"chart_{i}", webdriver=driver)
I think this might be a cleaner solution (API-wise) from altair's point of view, as it avoids complicated user inputs, such as specifying lists/tuples of charts, file-names, and keyword-arguments.
As for the two-for-one: The other benefit of such a solution is that it also solves https://github.com/altair-viz/altair/issues/1619 and allows users greater freedom in how they set their work environment (for example, working on remote machines where one may not have permissions to PATH or /bin in order to point to webdriver executables. Users can specify arbitrary locations when initializing the driver: webdriver.Chrome('path/to/webdriver.exe)).
This is now implemented automatically in http://github.com/altair-viz/altair_saver. I plan to remove all selenium-based code from Altair itself, and require installation of altiar_saver for programmatic saving of PNG, SVG, and PDF.
Most helpful comment
This is now implemented automatically in http://github.com/altair-viz/altair_saver. I plan to remove all selenium-based code from Altair itself, and require installation of
altiar_saverfor programmatic saving of PNG, SVG, and PDF.