Hi, we are trying to upgrade our institute's Galaxy instance from v17.05 to v18.05.
I pulled the v18.05 code from GitHub and tested it on our _Galaxy-test_ instances (runs jobs _locally_) and that works fine.
However, when upgrading our _production Galaxy_ instance, it doesn't submit jobs to our SGE DRM which used to work perfectly with Galaxy 17.05 until last week.
Any ideas on what we are missing or how we can fix job_conf to submit jobs to SGE via Galaxy would be very useful.
Our updated Galaxy-prod job_conf (have added drmaa_library_path):
<job_conf>
<plugins workers="4">
<!-- "workers" is the number of threads for the runner's work queue.
The default from <plugins> is used if not defined for a <plugin>.
-->
<plugin id="local" type="runner" load="galaxy.jobs.runners.local:LocalJobRunner"/>
<plugin id="drmaa" type="runner" load="galaxy.jobs.runners.drmaa:DRMAAJobRunner">
<param id="invalidjobexception_state">ok</param>
<param id="invalidjobexception_retries">0</param>
<param id="internalexception_state">ok</param>
<param id="internalexception_retries">0</param>
<param id="drmaa_library_path">/usr/lib64/libdrmaa.so.1.0</param> <!-- Override the $DRMAA_LIBRARY_PATH environment variable -->
</plugin>
</plugins>
<handlers default="handlers">
<handler id="handler0" tags="handlers"/>
<handler id="handler1" tags="handlers"/>
</handlers>
<destinations default="cluster">
<destination id="local" runner="local">
<param id="local_slots">6</param>
</destination>
<destination id="cluster" runner="drmaa">
<param id="local_slots">6</param>
<env file="/home/usern/galaxy/galaxy/setup_galaxy_venv.sh" /> <!-- will be sourced -->
</destination>
</destinations>
<tools>
<tool id="bwa" destination="cluster"/>
</tools>
<limits>
<limit type="registered_user_concurrent_jobs">20</limit>
</limits>
</job_conf>
Note: Also, we did the config. setup in the new galaxy.yml (replacing the old galaxy.ini) and noticed that our older main.log and handlers (handler0.log, handler1.log) that were created in the past no longer exist. Instead, Galaxy, on startup, creates galaxy.log and _doesn't create_ the 2 handlers I defined in job_conf.
You need to choose from one of the handler patterns at https://docs.galaxyproject.org/en/master/admin/scaling.html?highlight=mule#deployment-options
The typical and recommended scenarios is https://docs.galaxyproject.org/en/master/admin/scaling.html?highlight=mule#uwsgi-for-web-serving-with-mules-as-job-handlers
So you ned to follow the instructions at https://docs.galaxyproject.org/en/master/admin/scaling.html?highlight=mule#uwsgi-mule-job-handling
In your job_conf.xml you have to remove
<handlers default="handlers">
<handler id="handler0" tags="handlers"/>
<handler id="handler1" tags="handlers"/>
</handlers>
and add the corresponding farm to your galaxy.yml.
(Not that you can also leave the galaxy.ini in place nd then you don't have to do anything at all -- it'll just use the old config)
Thanks, if I leave galaxy.ini as is, I guess we don't have to set up this then: https://docs.galaxyproject.org/en/master/admin/scaling.html?highlight=mule#uwsgi-mule-job-handling
Also, then I can leave the
Yes
but you'll miss out on some great and very reliable job handling -- if you don't have an issue though it's not worth it and we'll keep on supporting this for quite some time
Thanks @mvdbeek , what features besides job_handling are affected by using the older galaxy.ini?
I'd be happy to use the new code, if I can manage to get it to work with our SGE. I can't see what I'm missing in our job_conf.
We are planning to move to Slurm in a few months time so maybe can try the new config. then.
Thanks @mvdbeek , what features besides job_handling are affected by using the older galaxy.ini?
So far nothing
if I can manage to get it to work with our SGE. I can't see what I'm missing in our job_conf.
Like I said, drop the handlers section in your job_conf.xml file (or follow the instructions if you have to assign handlers to specific destinations) and add the farms as described to your galaxy.yml
Thanks I'll give that a go now
FYI, for earlier test jobs, galaxy.log showed:
galaxy.tools DEBUG 2018-08-22 15:18:33,870 [p:9278,w:1,m:0] [uWSGIWorker1Core1] Validated and populated state for tool request (51.331 ms)
galaxy.tools.actions INFO 2018-08-22 15:18:34,042 [p:9278,w:1,m:0] [uWSGIWorker1Core1] Handled output named out_file1 for tool Grouping1 (135.123 ms)
galaxy.tools.actions INFO 2018-08-22 15:18:34,057 [p:9278,w:1,m:0] [uWSGIWorker1Core1] Added output datasets to history (13.704 ms)
galaxy.tools.actions INFO 2018-08-22 15:18:34,077 [p:9278,w:1,m:0] [uWSGIWorker1Core1] Verified access to datasets for Job[unflushed,tool_id=Grouping1] (5.543 ms)
galaxy.tools.actions INFO 2018-08-22 15:18:34,080 [p:9278,w:1,m:0] [uWSGIWorker1Core1] Setup for job Job[unflushed,tool_id=Grouping1] complete, ready to flush (21.771 ms)
galaxy.tools.actions INFO 2018-08-22 15:18:34,181 [p:9278,w:1,m:0] [uWSGIWorker1Core1] Flushed transaction for job Job[id=86655,tool_id=Grouping1] (99.826 ms)
galaxy.tools.execute DEBUG 2018-08-22 15:18:34,183 [p:9278,w:1,m:0] [uWSGIWorker1Core1] Tool [Grouping1] created job [86655] (290.321 ms)
galaxy.tools.execute DEBUG 2018-08-22 15:18:34,192 [p:9278,w:1,m:0] [uWSGIWorker1Core1] Executed 1 job(s) for tool Grouping1 request: (320.768 ms)
but no job folder was created in jobs_directory/, which I'm guessing could be as the handlers failed?
Yes, that is the expected output when the handlers are not active
That worked, thanks a lot :)
cool!
Would be good to document in galaxy.yml and in job_conf.xml. Cheers.
Also getting a pesky html error for favicon.ico (404 not found) despite us not making any changes/additions to the <head> code for the UI.
That is interesting, if you want to follow up please open a new issue.
Thanks, will do