Opening this to start some discussion maybe over performance during installation of many tools (especially many which are already installed.)
Currently usegalaxy.eu's jenkins job which (weekly) ensures that everything in https://github.com/usegalaxy-eu/usegalaxy-eu-tools is installed to our server, takes around 3-5 hours on average (less if it crashes due to galaxyproject/ephemeris#133, more if it finishes completely.)
What can we do to accelerate this?
Could we fetch from galaxy a list of everything already installed + revisions? Surely that endpoint could be relatively quick since it'd be:
select tool_shed, name, owner, changeset_revision from tool_shed_repository;
And diff that locally before installing?
cc @bgruening
I think there might be some unnecessary recursions when checking if a repo is already installed, if no new tools are to be added this should be fast.
So I looked at https://build.galaxyproject.eu/job/usegalaxy-eu/job/install-tools/114/consoleFull, it seems the time spent per yaml file is more or less proportional to the number of repos installed at that step, making me think that there really isn't much that can be done. When I said there might be some unnecessary recursion I thought we'd be fetching the list of installed repos multiple times, but that's just once per yaml file.
If we are not afraid of touching the TS code, the TS could provide a repository.yaml file, which lists all available repos, plus revisions etc ...
This will then be fetched and compared without yaml files and voila, no?
I guess I don't understand what the issue is. To me it looks like the performance issue is that installing things in Galaxy is slow, no amount of yaml-munging will make that go faster ?
Most of the time is spend to trigger the installation of a revision that is already installed I think. We could completely skip this and only call installation that is actually needed.
This will also speed up this step: https://build.galaxyproject.eu/job/usegalaxy-eu/job/update-trusted-tools/
But I guess @erasche plan is more effective. We could use https://pypi.org/project/structurediff/ to compare the Galaxy json with our tool yaml. But we do not have a nice API to get us that information easily, isn't it?
Exactly, it isn't slow to install, it's slow to say "this thing is already installed, let's move on".
Hence my suggestion of "pull list of already installed things" to speed up the process.
Exactly, it isn't slow to install, it's slow to say "this thing is already installed, let's move on".
Well that's a bug. Do you have an example ?
OK, I see that for fastqc the script tries to repeatedly install e7b2202befea. Can you see that revision on the Galaxy server ?
Nope e7b2202befea is not installed as far as I can see.
OK, so I think what's happening is that the galaxy install API doesn't allow updating repos without a version bump (the warning symbol). This is silly IMO, we should do that.
I don't understand entirely, because I refactored the shed-tools code some time ago to check for installed repos and skip them. https://github.com/galaxyproject/ephemeris/blob/2a57316fd31b3616f78eea22a3da54fe459bfe7b/src/ephemeris/shed_tools.py#L75
Checking all installed tools is only 1 API request. Shed-tools then only should try to install things which are not in the installed list.
@erasche Helena, it seems that your use case is installing a big tool yaml file on a server that already has a lot of tools installed. Is that the case? If so, I am wondering why shed-tools would not work correctly. The code to handle this use case is present.
big tool yaml file on a server that already has a lot of tools installed
Yes, we're syncing every week. 99.9% of the tools in the tool yaml are already installed, only 1-10 new tool revisions each time.
That sounds amazing that this should already work! That line you linked, looks like it's in 0.10.2, which should be what we have installed: https://github.com/usegalaxy-eu/usegalaxy-eu-tools/blob/master/requirements.txt#L2
That's really odd that it isn't working as expected.
The core of the issue is what I said above, repos that were updated without a version bump are not installable to the new revision. So ephemeris will attempt to do that each time the changeset revision differs, which is always. That's not an ephemeris issue, it's a Galaxy issue.
@mvdbeek Should we move this ticket there then?
The core of the issue is what I said above, repos that were updated without a version bump are not installable to the new revision.
I am confused now. So besides the changeset revision, galaxy also checks some sort of internal version number? I was under the (apparantly wrong) assumption that changeset revision should be the be-all end-all of tool versions in galaxy.
Can you point to the code? Maybe constructing a workaround in ephemeris is not that hard. A list comprehension and an extra check and EU galaxy should be chugging along happily. (If we manage to extract the magic versions).
@erasche that'd be good. @rhpvorderman yes, there are updates and installations. If the tool version isn't bumped we just do an hg update
The API is silly though, who cares about the install process if you want to install a given changeset revision?
I don't think it's ephemeris job to figure out the installable vs updatable revisions (you could do that by checking for gaps in the installable revisions from the toolshed, then check whether any installed revision falls in the gap preceeding the requested revision). But that would really fix the wrong end of the problem IMO.
@mvdbeek please change the title as you see fit.
Good to know it's another bug due to the fun that is tool updates.
I don't think it's ephemeris job to figure out the installable vs updatable revisions (you could do that by checking for gaps in the installable revisions from the toolshed, then check whether any installed revision falls in the gap preceeding the requested revision). But that would really fix the wrong end of the problem IMO.
But if the galaxy-api is fixed it takes some time to get in a stable version. And this problem is annoying for the EU galaxy time NOW.
So a hack in ephemeris is not that bad. Ephemeris can release a new version tomorrow so to speak. We can add a # FIXME comment so we can the delete the hackish code later if it gets in the way.
I do agree that it is not the proper way to fix this in ephemeris, but these EU galaxy people are quite essential to the galaxy effort, so it is nice not to let them wait on another Galaxy release and let them spent their time on more useful and fun stuff.
EDIT: TLDR; I am volunteering. Starting pycharm now...
Is this a duplicate of https://github.com/galaxyproject/galaxy/issues/6698 and https://github.com/galaxyproject/ephemeris/issues/111 ?
@nsoranzo yep it is!
EDIT: TLDR; I am volunteering. Starting pycharm now...
This problem is unworkaroundable I am affraid. Several API calls are needed to work around this problem in ephemeris, these are slow... so stuff remains slow.
Ok @mvdbeek any chance you have time/energy for this one? Thanks so much for trying @rhpvorderman
It might not be immediate but I'll definitely get to it.
Sure, sounds good. Let me know what I can do to help, seems like a lot of wasted CPU hours.
I'm not convinced this is the only problem behind the performance issue.
Here is a log of a small yml file:
Storing log file in: /tmp/ephemeris_W79sj_
(1/10) Installing repository graphclust_postprocessing_no_align from rnateam to section "GraphClust" at revision 0a48b2db75e7 (TRT: 0:00:13.894167)
Repository graphclust_postprocessing_no_align is already installed.
repository graphclust_postprocessing_no_align installed successfully (in 0:00:06.591924) at revision 0a48b2db75e7
(2/10) Installing repository graphclust_aggregate_alignments from rnateam to section "GraphClust" at revision 8778478a754f (TRT: 0:00:20.486699)
Repository graphclust_aggregate_alignments is already installed.
repository graphclust_aggregate_alignments installed successfully (in 0:00:06.859108) at revision 8778478a754f
(3/10) Installing repository graphclust_align_cluster from rnateam to section "GraphClust" at revision 953353eacec2 (TRT: 0:00:27.346054)
Repository graphclust_align_cluster is already installed.
repository graphclust_align_cluster installed successfully (in 0:00:07.507183) at revision 953353eacec2
(4/10) Installing repository graphclust_cmfinder from rnateam to section "GraphClust" at revision 7e7f43d58e13 (TRT: 0:00:34.853473)
Repository graphclust_cmfinder is already installed.
repository graphclust_cmfinder installed successfully (in 0:00:07.328518) at revision 7e7f43d58e13
(5/10) Installing repository graphclust_postprocessing from rnateam to section "GraphClust" at revision e080ebe95476 (TRT: 0:00:42.182253)
Repository graphclust_postprocessing is already installed.
repository graphclust_postprocessing installed successfully (in 0:00:07.629337) at revision e080ebe95476
(6/10) Installing repository graphclust_fasta_to_gspan from rnateam to section "GraphClust" at revision fcb1bb6dc0f9 (TRT: 0:00:49.811845)
Repository graphclust_fasta_to_gspan is already installed.
repository graphclust_fasta_to_gspan installed successfully (in 0:00:07.362743) at revision fcb1bb6dc0f9
(7/10) Installing repository graphclust_mlocarna from rnateam to section "GraphClust" at revision 98e9bc1fb249 (TRT: 0:00:57.174817)
Repository graphclust_mlocarna is already installed.
repository graphclust_mlocarna installed successfully (in 0:00:07.823141) at revision 98e9bc1fb249
(8/10) Installing repository graphclust_nspdk from rnateam to section "GraphClust" at revision 2aaf391798a3 (TRT: 0:01:04.998222)
Repository graphclust_nspdk is already installed.
repository graphclust_nspdk installed successfully (in 0:00:15.944729) at revision 2aaf391798a3
(9/10) Installing repository graphclust_prepocessing_for_mlocarna from rnateam to section "GraphClust" at revision 550ddcf5384c (TRT: 0:01:20.943271)
Repository graphclust_prepocessing_for_mlocarna is already installed.
repository graphclust_prepocessing_for_mlocarna installed successfully (in 0:00:08.177322) at revision 550ddcf5384c
(10/10) Installing repository graphclust_preprocessing from rnateam to section "GraphClust" at revision 16bcaef3dc1e (TRT: 0:01:29.121215)
Repository graphclust_preprocessing is already installed.
repository graphclust_preprocessing installed successfully (in 0:00:06.952430) at revision 16bcaef3dc1e
Installed repositories (10): [('graphclust_postprocessing_no_align', '0a48b2db75e7'), ('graphclust_aggregate_alignments', '8778478a754f'), ('graphclust_align_cluster', '953353eacec2'), ('graphclust_cmfinder', '7e7f43d58e13'), ('graphclust_postprocessing', 'e080ebe95476'), ('graphclust_fasta_to_gspan', 'fcb1bb6dc0f9'), ('graphclust_mlocarna', '98e9bc1fb249'), ('graphclust_nspdk', '2aaf391798a3'), ('graphclust_prepocessing_for_mlocarna', '550ddcf5384c'), ('graphclust_preprocessing', '16bcaef3dc1e')]
Skipped repositories (0): []
Errored repositories (0): []
All repositories have been installed.
Total run time: 0:01:36.074167
All of these tools are up-to-date. No installation needed, no update needed. Nevertheless, it took 1:36 min to process this file with 10 tools (let's irgnore the confusing output for moment with 'installed successfully'). With 2000 tools this alone, without any further bugs, is more than 5h.
As I mentioned above I think the best solution to get this running way faster is not to ask the toolshed for every singe tool, but get a "describtion" of the TS via one single API call.
I imagine something like a GET https://toolshed.g2.bx.psu.edu/api/get_tools/owner/ which will give you back:
iuc)The TS could also cache this file and reduce the API requests needed dramatically. Adding this API to the TS is hopefully not that complicated and would fix a huge performance problem on all usegalaxy.* instances.
Probably fixed by #8815 for forthcoming Galaxy 19.09.
@erasche @bgruening Can you check if also the speed of usegalaxy.eu weekly tool updates is improved?
Yes, I will test this tomorrow.
Pretty sure this is done and fixed.
Most helpful comment
But if the galaxy-api is fixed it takes some time to get in a stable version. And this problem is annoying for the EU galaxy time NOW.
So a hack in ephemeris is not that bad. Ephemeris can release a new version tomorrow so to speak. We can add a
# FIXMEcomment so we can the delete the hackish code later if it gets in the way.I do agree that it is not the proper way to fix this in ephemeris, but these EU galaxy people are quite essential to the galaxy effort, so it is nice not to let them wait on another Galaxy release and let them spent their time on more useful and fun stuff.
EDIT: TLDR; I am volunteering. Starting pycharm now...