This issue maybe belongs more to the related plotly.py repository, but the motivation comes from using Dash. Please feel free to transfer the issue to plotly.py if/when more useful.
Running
time python -c "import dash"
real 0m2.172s
user 0m1.183s
sys 0m0.566s
shows real/wall-clock typically above 2 seconds on my system, even when the whole Python distribution is installed locally on the same computer. When using a shared network disk Python distribution, it of course gets slower. 馃暀
Running (with Python 3.7 or higher)
python -X importtime -c "import dash"
shows that a large portion of the time is spent on unused plotly.graph_objs imports. The import stems from this line in Dash
https://github.com/plotly/dash/blob/8ee358826cd4318a3edc52fbf71e0bdda2369984/dash/dash.py#L23
where dash/dash.py is using PlotlyJSONEncoder from plotly.utils.
As a quick hack, changing that line to
from _plotly_utils.utils import PlotlyJSONEncoder
reduces the import dash wall time to 0.6-0.7 seconds, i.e. around 70% reduction in import time for dash. 馃弾 Starting a Dash app suddenly felt much more instant (which is nice during development).
The plotly Python package I guess is free to change the "private package" _plotly_utils without releasing a major release, so the "hack" above is not a permanent nice solution for dash (even though dash does not do any pinning of plotly version today, so a new major release of plotly might already break previous dash releases, but that is a separate issue 馃檪).
I guess the best solution might be to change plotly/__init__.py to not import all subpackages, such that the consumer of the plotly package can choose what to import. It is common to have to import subpackages explicitly, e.g.
python -c "import matplotlib; print(matplotlib.pyplot.__file__)"
Traceback (most recent call last):
File "<string>", line 1, in <module>
AttributeError: module 'matplotlib' has no attribute 'pyplot'
would not work, while
python -c "import matplotlib.pyplot; print(matplotlib.pyplot.__file__)"
[...]/python3.7/site-packages/matplotlib/pyplot.py
does. You also typically in the plotly.py documentation see lines like
import plotly.graph_objects as go
which is an example of explicit subpackage import, and would work even if plotly/__init__.py does not import graph_objects into its namespace.
Related issue: https://github.com/plotly/plotly.py/issues/740
Yeah, doing a direct import would be an easy win.
We don't want to import from an underscore method, the official import would be
from plotly.utils import PlotlyJSONEncoder
However, it appears that this import is still slow.
I'll transfer this over to plotly.py and we'll see what we can do. This would certainly be much easier than https://github.com/plotly/plotly.py/issues/740.
@nicolaskruchten @emmanuelle @jonmmease - If we could speed up the import speed of from plotly.utils import PlotlyJSONEncoder by preventing plotly.graph_objs from being imported as a side effect, then that would be a big win for Dash's hot reload development speed.
Note - I'm just speculating that plotly.graph_objs is being imported as a side effect as doing from plotly.utils import PlotlyJSONEncoder in my terminal took about 4 seconds.
Yeah, doing a direct import would be an easy win.
Thanks for your reply @chriddyp 馃憤
I'm just speculating that plotly.graph_objs is being imported as a side effect as doing from plotly.utils import PlotlyJSONEncoder in my terminal took about 4 seconds.
Can confirm it is imported as a side effect, due to a combination of how the Python import system works (where all __init__.py files are called while "going down the package tree"), together with these lines in plotly/__init__.py.
Maybe an easy, and non-breaking change for plotly.py, which still gives the performance increase (especially for Dash hot reload) on Python 3.7+, could be to utilize the newly added __getattr__ (see PEP562 for details) in order to do lazy Python imports.
I.e. pseudocode for plotly/__init__.py:
import sys
def __getattr__(name):
# Perform lazy loading of `name` when asked for. See PEP562 and "Rationale" for details.
...
if sys.version_info < (3, 7):
# PEP562 and `__getattr__` implemented in Python 3.7,
# e.g. continue with direct imports as today for users using Python 3.6 (or older).
from plotly import graph_objs, tools, ...
We can definitely carve out a fast path for Dash just importing the encoder, but these gains are/will all be lost when people use PX or graph_objects directly. I think what we need is some sort of global setting to disable this auto-loading behaviour or something.
Or perhaps Dash forks the encoder and uses its own version.
Right, but that will only give performance gains when folks use raw dicts and lists for making figures. We need a way to make Dash apps fast even when they use PX :)
Startup time will be greatly improved by https://github.com/plotly/plotly.py/pull/2368.
Thanks to @anders-kiaer for pointing out the potential of PEP 562. Baking this into our code generation class hierarchy really helps.
馃帀 I would consider this issue as solved now after #2368 and lazy imports (if someone uses 馃悕 Python < 3.7 and wants the speedup, they should just update their Python minor version... Python 3.7 release was back in 2018, and Python 3.6 has EOL/last security fix next year).
Thanks for implementing PEP562 @jonmmease! Looking forward to test it out in Dash 馃殌
Thanks again @anders-kiaer for bringing this up with us, and steering us in the right direction with lazy loading!
Most helpful comment
Startup time will be greatly improved by https://github.com/plotly/plotly.py/pull/2368.
Thanks to @anders-kiaer for pointing out the potential of PEP 562. Baking this into our code generation class hierarchy really helps.