Created this issue to keep track of our efforts to reduce Galaxy's startup time. ping @ic4f @dannon @afgane.
I started by deleting the database directory and then executed run.sh locally on a Mac/i7/2.5GHz while profiling the __init__ function in lib/galaxy/app.py. Those results are shown as 'Initial' series in the plot (total time 119s). Then I restarted the app and labeled the results 'Restart' (total time 3.5s).
On initial start, ~60s is spend to setup the tool dependency manager build_dependency_manager as part of _configure_toolbox. The dependency manager is loaded at: https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/tools/deps/__init__.py#L36,
and associated with the optional tool_dependency_dir configuration setting, see: https://github.com/galaxyproject/galaxy/blob/dev/config/galaxy.yml.sample#L207.
When restarting most of the time is spend on parsing the tool xml-files in _configure_toolbox, however this only takes about ~1.6s. The loop in which tools are loaded is located at: https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/tools/toolbox/base.py#L129.

I've written a small howto on profiling code in Galaxy here which might be handy. Last time I checked most time during startup was being spent parsing tool xml files.
I've learned a couple of additional things along the way that are important too:
I think one fruitful avenue here is to persist the tool objects as something that is more lightweight to load, which might be pickles or json. This could even transiently be stored in the database, which should help a lot when tools are being loading from a network share and/or by many simultaneous processes.
Another thing that can take a lot of time is being the tool index, which we should be able to delay until after the startup, and there's probably also a way to avoid doing that for each process.
which we should be able to delay until after the startup
This is a low hanging fruit for bigger Galaxies I think.
I think one fruitful avenue here is to persist the tool objects as something that is more lightweight to load, which might be pickles or json. This could even transiently be stored in the database, which should help a lot when tools are being loading from a network share and/or by many simultaneous processes.
This is also what @erasche and me has one some todo list. Our idea is very similar. We wanted to have something like a .pyc file next to the tool xml file, which gets read if the xml was not modified.
Let me take the code tag as an example.
Current code file is evaluated at tool loading at Galaxy start time, and the resulting function exists in MEM in Galaxy entire life. For frequently-used tool, I see the benefit of this design, but most loaded tools on a Galaxy sever are rarely touched by people. I don't think this is an efficient resource usage. How about just loading the code file path instead of evaluation and evaluating it only when needed. I have some raw idea below.
1) make light wighted tool loading classes, only store some key words, like file path.
2) make fully functional working tool classes/objects, already exist I think.
3) make a fixed-size memory queue to dynamically manage tool working objects. Like cache, the queue only stores those recently used objects. Settings could be left to Galaxy admin to adjust the queue size on server-wise.
@bgruening thanks for the ping, I proposed a caching framework in https://github.com/galaxyproject/galaxy/issues/4571 specifically because of the toolbox. There is definitely some way we can find to build a frozen tool pickle (or similar) and stick in an admin configurable cache, be that on disk or networked for more distributed galaxies.
As a tangent to the discussion (and a few more data points), here's what happens in a container (galaxy minimal image). This does not include container overhead (approx. 2s to start). Running a container via Kubernetes adds more overhead (approx. 2s for the Helm chart; and then Postgres, which depends on the data ingestion approach - preload and have the data directory available vs. restore from dump vs. have db running).
Galaxy is installed via ansible (ansible-galaxy role) at image build time. The data below is from Galaxy's log output, after running uwsgi --yaml config/galaxy.yml in a container. The data is from running this on a laptop (so this is only good as a relative measure). This only includes the tools that come with the default installation.
time | cumulative | log
-- | -- | --
0.00 | 0.00 | starting uwsgi...
2.11 | 2.11 | UWSGIApplicationStack initialized; initializing main galaxy queue worker...
0.96 | 3.06 | galaxy db version checked
0.10 | 3.16 | retrieving datatypes...
0.36 | 3.53 | loading sniffers; build site file + load site file...
0.05 | 3.58 | loading tool data tables...
0.13 | 3.70 | job config (no job.config; done)
0.07 | 3.77 | parsing tool config
0.02 | 3.79 | loading tools...
0.28 | 4.07 | loading tools done
0.05 | 4.13 | toolbox index: started building
1.21 | 5.33 | toolbox index: finished
0.32 | 5.65 | loaded display applications
0.18 | 5.83 | loaded converters
0.01 | 5.84 | data fetch, other minor tools done
0.70 | 6.54 | visualization plugins done
0.06 | 6.60 | tours done
0.01 | 6.61 | init job handlers, job runners... and done
0.17 | 6.78 | enabling controllers, done
0.30 | 7.08 | middleware done
0.20 | 7.29 | postfork functions done
0.03 | 7.31 | serving.
Is that seconds or minutes ?
@mvdbeek seconds
phew 馃槅
Thanks for working on this @ic4f. This looks great. Can we see how/if times change when loading a small versus larger number of tools on startup?
To add some EU data :)
Handler:
Time | Progress
--- | ---
Aug 06 10:11:29 | Started Galaxy Handlers.
Aug 06 10:11:42 | Finished Retrieving datatype modules
Aug 06 10:11:43 | Finished Loading build sites
Aug 06 10:11:43 | Loading datatypes from /opt/galaxy/config/datatypes_conf.xml
Aug 06 10:11:44 | Last "Retrieved datatype module from the datatype registry" message
Aug 06 10:13:12 | a when tag has been defined ... I guess tools are loading now?
Aug 06 10:36:06 | Toolbox index finished (352281.551 ms)
Aug 06 10:36:07 | Finished "Loaded display application"
Aug 06 10:36:18 | First job processing
And zerglings:
Time | Progress
--- | ---
Aug 06 09:26:01 | Started Galaxy Zerglings.
Aug 06 09:26:07 | UWSGIApplicationStack initialized
Aug 06 09:26:07 | At database version 153
Aug 06 09:26:12 | End of galaxy.datatypes.registry
Aug 06 09:26:20 | End of galaxy.tools.data
Aug 06 09:26:20 | Load job config
Aug 06 09:26:21 | Parsing the tool configuration /opt/galaxy/config/tool_conf.xml
Aug 06 09:43:17 | Toolbox index finished (248492.267 ms)
Aug 06 09:43:26 | Serving first request
@erasche 352s for tool indexing is very very long, that is supposed to take 30s at most. Do you have tools on slower disk or sth?
@martenson nah local SSD D:
30s at most ... for how many tools?
@erasche I think Main takes like 20s. Let me check.
edit Toolbox index finished (3746.189 ms)
With the Helm chart, on AWS loading the same toolset as on Main via the CVMFS, it's almost 30 seconds: Toolbox index finished (28797.198 ms).
Here is the startup time broken down into identifiable sections, most is being spent on parsing the tool configuration (2.5mins), followed by loading the toolbox (~30s):
| Time | Event | Cummulative runtime (s) | Time taken (s) | |
|----------|--------------------------------------------------------------------------------|------------------------:|---------------:|---|
| 15:04:56 | Start | 0 | 4 | |
| 15:05:00 | UWSGIApplicationStack initialized | 4 | 1 | |
| 15:05:01 | Loading datatypes from /cvmfs/main.galaxyproject.org/config/datatypes_conf.xml | 5 | 2 | |
| 15:05:03 | Loaded build site 'ucsc' | 7 | 0 | |
| 15:05:03 | Loaded tool data table ... | 7 | 8 | |
| 15:05:11 | Loading job configuration from ./config/job_conf.xml | 15 | 0 | |
| 15:05:11 | Parsing the tool configuration | 15 | 149 | |
| 15:07:40 | Reading tools from config files finished (149076.461 ms) | 164 | 0 | |
| 15:07:40 | Installing conda, this may take several minutes. | 164 | 84 | |
| 15:09:04 | Starting to build toolbox index. | 248 | 28 | |
| 15:09:33 | Toolbox index finished (28797.198 ms) | 276 | 0 | |
| 15:09:35 | WSGI app 0 (mountpoint='/gvl/galaxy') ready in 279 seconds | 278 | - | |
(The full startup log is available at https://gist.github.com/afgane/6f66dc8f05ae9aeade74d290fa1a85fa)
@afgane is this after cvmfs already cached this or is it fetching over http?
Toolbox index finished (3746.189 ms)
This is standard for reloads, which use the cache, fresh load can be as high as 80s for Main. Sorry for the misinformation. I'll have a look at speeding it up.
@martenson The previous times were for using a clean CVMFS cache. Times for using pre-populated cache are as follows:
| Time | Event | Cummulative runtime (s) | Time taken (s) |
|----------|--------------------------------------------------------------------------------|------------------------:|---------------:|
| 15:52:14 | Start | 0 | 3 |
| 15:52:17 | UWSGIApplicationStack initialized | 3 | 0 |
| 15:52:17 | Loading datatypes from /cvmfs/main.galaxyproject.org/config/datatypes_conf.xml | 3 | 0 |
| 15:52:17 | Loaded build site 'ucsc' | 3 | 0 |
| 15:52:18 | Loaded tool data table ... | 4 | 1 |
| 15:52:19 | Loading job configuration from ./config/job_conf.xml | 5 | 0 |
| 15:52:19 | Parsing the tool configuration | 5 | 20 |
| 15:52:39 | Reading tools from config files finished (20602.689 ms) | 25 | 0 |
| | ~Installing conda, this may take several minutes.~ | 25 | 1 |
| 15:52:40 | Starting to build toolbox index. | 26 | 24 |
| 15:53:04 | Toolbox index finished (24670.840 ms) | 50 | 2 |
| 15:53:06 | WSGI app 0 (mountpoint='/gvl/galaxy') ready in 52 seconds | 52 | - |
There is a total of about 1,600 tool wrapper files on the CVMFS, and fetching them individually from the CVMFS is clearly slow. We're going to look at alternatives methods for loading the files for the chart, but still parsing the tools and indexing the toolbox take the vast majority of startup time (e.g., 44 out of 52 seconds).
@afgane We had a mini hackathon at PSU today, these should help a bunch:
https://github.com/galaxyproject/galaxy/pull/8422
https://github.com/galaxyproject/galaxy/pull/8421
Great to see efforts in this direction! A few thoughts on this:
With 20.05, we (with @osallou) noticed that our instance (galaxy.genouest.org) is very slow to start (~410s) and eats a lot of memory (~25Gb).
I think it was better with 20.01, but I don't have precise numbers, and I can't downgrade to compare... Anyway, with 20.05 I get out of memory errors, while I never had a single one before.
It runs using uWSGI + Mules, with 5 web workers and 3 job handler mules. Galaxy code and tools are on a mounted nfs disk (not particularly slow though).
After some profiling, we found that the galaxyp/openms tools take a long time to parse and use more memory than other tools. Probably because they all use a huge macro file: https://github.com/galaxyproteomics/tools-galaxyp/blob/master/tools/openms/macros_autotest.xml
After uninstalling all these tools, the startup time dropped to ~310s, and memory usage to ~12Gb.
That's better, no more out of memory errors (for now)
Maybe they're good test case to test optimisations?
I guess there is still space for optimization, parsing all tools xml (from disk or from cache) at startup takes a lot of time when you have many tools + loading them all in memory looks superfluous when a good proportion of them are not/rarely used...
I've tried the delay_tool_initialization but it didn't help as much as I hoped....
With 20.05, we (with @osallou) noticed that our instance (galaxy.genouest.org) is very slow to start (~410s) and eats a lot of memory (~25Gb).
We're seeing about 1.6 gig RSS per process with .org (are you maybe multiplying each forked thread ? that's not entirely fair, that's copy-on-write memory). 20.05 only expands xml sources at first startup, afterwards expanded xml documents are read from the tool cache. 20.09 optimizes this a bit so that no locks are required and it is a simple sqlite document now.
Are you removing the tool cache after each startup ? If not, do you have the dbm.gnu module available in your python interpreter ? (import dbm.gnu should not error out, if this is falling back to dbm.dumb this might get very slow).
There's something going wrong, loading the entire usegalaxy.org toolbox takes about 12 seconds on my laptop, and on .org this is about 40 seconds, where most time is spent parsing tool data tables at this point.
(and yes, expanding macros is expensive)
Ok, good to know that it's not normal :)
No dbm.gnu apparently:
>>> import dbm.gnu
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/blabla/.conda/envs/_galaxy_/lib/python3.6/dbm/gnu.py", line 3, in <module>
from _gdbm import *
ModuleNotFoundError: No module named '_gdbm'
How do you install that?
Is there a reason that @qiagu's proposal can't be implemented? Even with extensive caching, the current caching solution won't really solve the memory consumption issue that @abretaud has brought up. 1.6GB is still a lot? Decoupling the tool navigation from the tool parsing would appears to be the quickest path to fix this. For Galaxy to simply startup and render the tool panel, you only need a static, pre-generated list right?, consisting of tool name, description, version and maybe a few other visible/searchable fields. Once a tool is actually used, the full tool definition could be loaded.
This would mean that there should be a one off command to auto-generate the initial tool panel - something like "run.sh generate_tool_panel /path/to/navigation_panel.yml" which administrators can run when required, and Galaxy's loading could be pretty much instant, with little to no memory consumed. Thereafter, the actual wrapper parsing is done only when a tool is used, and redis or just an in-memory LRU cache can be built to lower memory usage but improve responsiveness for frequently used tools.
This would also help to side-step all the complexity of reordering tool panel items, merging integrated_tool_panel.xml at runtime with tools etc. etc. which requires a lot of context to understand - it's still very very hard just to control the order of items in the tool panel (section headings disappearing, items appearing in the wrong section etc. etc.) if you don't have extensive experience with Galaxy.
Is there a reason that @qiagu's proposal can't be implemented? Even with extensive caching, the current caching solution won't really solve the memory consumption issue that @abretaud has brought up. 1.6GB is still a lot? Decoupling the tool navigation from the tool parsing would appears to be the quickest path to fix this. For Galaxy to simply startup and render the tool panel, you only need a static, pre-generated list right?, consisting of tool name, description, version and maybe a few other visible/searchable fields. Once a tool is actually used, the full tool definition could be loaded.
We do that already, that's what late initialization does, this only loads the bare minimum, but since macros can contain every field they need to be expanded. 1.6GB is for the entire app, when it is in use. The tools themselves I'd estimate to about 400MB.
This would also help to side-step all the complexity of reordering tool panel items, merging integrated_tool_panel.xml at runtime with tools etc. etc. which requires a lot of context to understand - it's still very very hard just to control the order of items in the tool panel (section headings disappearing, items appearing in the wrong section etc. etc.) if you don't have extensive experience with Galaxy.
And yes, that's close to what I would think is the long term plan, serving expanded documents is something that a potential toolshed replacement can do.
How do you install that?
I don't think there's a good way to do that with conda for 3.6. Can you check if you can do this import dbm.ndbm ?
That isn't much slower, and we used that on 20.05 on usegalaxy.org using a python interpreter installed from conda-forge.
I get this with ndbm:
>>> import dbm.ndbm
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/genouest/admin/galaxy/.conda/envs/_galaxy_/lib/python3.6/dbm/ndbm.py", line 3, in <module>
from _dbm import *
ModuleNotFoundError: No module named '_dbm'
OK, this one can be installed, let me see how we did that
@natefoo can you provide instructions on how to get either gdbm or ndbm installed if using a python interpreter installed via Conda ?
Ok, I don't think we have delayed initialization enabled on the helm chart, so will try that out, since in any case, there's no shared-memory advantage in the k8s setup. It would be good to be able to take advantage of a nice caching abstraction like in Django, just to cache frequently used queries and shared tool panel definitions etc.
This is how it looks for us:

We do have OpenMS installed and the usage is much more than 20.05. My suspician is also that there is a leak somewhere. During installation of many tools (OpenMS updates installs 120 tools) a reload of the handlers or zerglings is increasing the memmory significantly every time. This makes tool installations without constant restarts impossible.
Is anyone seeing a similar behaviour? Or is it really OpenMS that makes this bug visible.
Can you open a new issue ? This doesn't seem related to the startup time discussion.
Most helpful comment
@afgane We had a mini hackathon at PSU today, these should help a bunch:
https://github.com/galaxyproject/galaxy/pull/8422
https://github.com/galaxyproject/galaxy/pull/8421