Hi,
Since the upgrade to v16.04, our Galaxy instance wont submit its jobs anymore. I've tried to debug it, but to no avail. The biggest issue I'm having is that I'm getting no errors whatsoever. I'm running Galaxy with UWSGI in a dedicated conda environment. Our cluster is run by Torque and job submissions are done with DRMAA.
When checking the cluster status with qstat, there are no jobs queued, nor were there any files generated by Galaxy for the job.
some info:
job_conf.xml
<?xml version="1.0"?>
<job_conf>
<plugins workers="2">
<plugin id="pbs" type="runner" load="galaxy.jobs.runners.drmaa:DRMAAJobRunner">
<param id="drmaa_library_path">/usr/lib/pbs-drmaa/lib/libdrmaa.so</param>
</plugin>
</plugins>
<handlers default="handlers">
<handler id="handler0" tags="handlers"/>
<handler id="handler1" tags="handlers"/>
</handlers>
<destinations default="cluster">
<destination id="cluster" runner="pbs">
<param id="docker_enabled" from_environ="GALAXY_DOCKER_ENABLED">False</param>
<param id="docker_sudo" from_environ="GALAXY_DOCKER_SUDO">False</param>
<!-- The empty volumes from shouldn't affect Galaxy, set GALAXY_DOCKER_VOLUMES_FROM to use. -->
<param id="docker_volumes_from" from_environ="GALAXY_DOCKER_VOLUMES_FROM">galaxy</param>
<param id="docker_volumes" from_environ="GALAXY_DOCKER_VOLUMES">$defaults</param>
<param id="nativeSpecification">-l walltime=48:00:00,nodes=1:ppn=1</param>
<env id="PATH" raw="true">/home/galaxy/.conda/envs/galaxy/bin:/Shared/Software/sbin:/Shared/Software/bin:/usr/local/bin:/usr/bin:/bin</env>
</destination>
<destination id="long_4core_clusterjobs" runner="pbs">
<param id="docker_enabled" from_environ="GALAXY_DOCKER_ENABLED">False</param>
<param id="nativeSpecification">-l walltime=72:00:00,nodes=1:ppn=4</param>
<env id="PATH" raw="true">/home/galaxy/.conda/envs/galaxy/bin:/Shared/Software/sbin:/Shared/Software/bin:/usr/local/bin:/usr/bin:/bin</env>
</destination>
<destination id="long_3core_clusterjobs" runner="pbs">
<param id="docker_enabled" from_environ="GALAXY_DOCKER_ENABLED">False</param>
<param id="nativeSpecification">-l walltime=72:00:00,nodes=1:ppn=3</param>
<env id="PATH" raw="true">/home/galaxy/.conda/envs/galaxy/bin:/Shared/Software/sbin:/Shared/Software/bin:/usr/local/bin:/usr/bin:/bin</env>
</destination>
<destination id="long_2core_clusterjobs" runner="pbs">
<param id="docker_enabled" from_environ="GALAXY_DOCKER_ENABLED">False</param>
<param id="nativeSpecification">-l walltime=72:00:00,nodes=1:ppn=2</param>
<env id="PATH" raw="true">/home/galaxy/.conda/envs/galaxy/bin:/Shared/Software/sbin:/Shared/Software/bin:/usr/local/bin:/usr/bin:/bin</env>
</destination>
</destinations>
<limits>
</limits>
<tools>
<tool id="bowtie2" destination="long_3core_clusterjobs" />
<tool id="bowtie" destination="long_3core_clusterjobs" />
<tool id="tophat2" destination="long_4core_clusterjobs" />
<tool id="gatk" destination="long_4core_clusterjobs" />
<tool id="freebayes" destination="long_2core_clusterjobs" />
<tool id="bwa" destination="long_2core_clusterjobs" />
<tool id="bwa_mem" destination="long_2core_clusterjobs" />
</tools>
</job_conf>
conda info output
platform : linux-64
conda version : 4.0.6
conda-build version : 1.20.2
python version : 2.7.11.final.0
requests version : 2.10.0
root environment : /Shared/Software (read only)
default environment : /home/galaxy/.conda/envs/galaxy
envs directories : /home/galaxy/.conda/envs
/home/galaxy/envs
/Shared/Software/envs
package cache : /home/galaxy/.conda/envs/.pkgs
/home/galaxy/envs/.pkgs
/Shared/Software/pkgs
channel URLs : https://conda.anaconda.org/r/linux-64/
https://conda.anaconda.org/r/noarch/
https://conda.anaconda.org/bioconda/linux-64/
https://conda.anaconda.org/bioconda/noarch/
https://repo.continuum.io/pkgs/free/linux-64/
https://repo.continuum.io/pkgs/free/noarch/
https://repo.continuum.io/pkgs/pro/linux-64/
https://repo.continuum.io/pkgs/pro/noarch/
config file : /home/galaxy/.condarc
is foreign system : False
installed packages
# packages in environment at /home/galaxy/.conda/envs/galaxy:
#
amqp 1.4.8 <pip>
anyjson 0.3.3 <pip>
argh 0.26.1 py27_0 bioconda
babel 2.3.3 py27_0 defaults
beaker 1.7.0 <pip>
bioblend 0.7.0 py27_0 bioconda
boto 2.40.0 py27_0 defaults
bx-python 0.7.3 py27_0 bioconda
cheetah 2.4.4 py27_0 defaults
contextlib2 0.5.3 <pip>
decorator 4.0.9 py27_0 defaults
dictobj 0.3.1 <pip>
docutils 0.12 py27_0 defaults
drmaa 0.7.6 py27_0 defaults
ecdsa 0.13 py27_0 defaults
fabric 1.11.1 py27_0 defaults
fluent-logger 0.4.1 <pip>
importlib 1.0.3 <pip>
java-jdk 8.0.45 0 bioconda
kombu 3.0.30 <pip>
libgcc 5.2.0 0 defaults
mako 1.0.4 py27_0 defaults
markupsafe 0.23 py27_0 defaults
mercurial 3.8.2 py27_0 defaults
mkl 11.3.3 0 defaults
msgpack-python 0.4.7 <pip>
mysql-python 1.2.5 py27_0 defaults
nodejs 4.4.1 0 defaults
nose 1.3.7 py27_0 defaults
numpy 1.11.0 py27_1 defaults
openldap 2.4.36 1 defaults
openssl 1.0.2h 0 defaults
ordereddict 1.1 <pip>
paramiko 1.16.0 py27_0 defaults
parsley 1.3 <pip>
paste 1.7.5.1 py27_0 defaults
pastedeploy 1.5.2 py27_1 defaults
pastescript 2.0.2 <pip>
pathtools 0.1.2 py27_0 bioconda
pbr 1.8.0 <pip>
pbs-python 4.4.2.1 <pip>
pip 8.1.1 py27_1 defaults
psutil 4.1.0 <pip>
psycopg2 2.6.1 <pip>
pulsar-galaxy-lib 0.7.0.dev4 <pip>
pycrypto 2.6.1 py27_0 defaults
pygments 2.0.2 <pip>
pyparsing 2.1.1 py27_0 defaults
pysam 0.8.4+gx1 <pip>
python 2.7.11 0 defaults
python-ldap 2.4.13 py27_0 defaults
python-openid 2.2.5 <pip>
pytz 2016.4 py27_0 defaults
pyyaml 3.11 py27_1 defaults
raven 5.16.0 <pip>
readline 6.2 2 defaults
repoze.lru 0.6 py27_0 defaults
requests 2.10.0 py27_0 defaults
requests-toolbelt 0.5.0 py27_0 bioconda
routes 2.2 py27_0 defaults
setuptools 20.7.0 py27_0 defaults
six 1.10.0 py27_0 defaults
sqlalchemy 1.0.13 py27_0 defaults
sqlalchemy-migrate 0.10.0 <pip>
sqlite 3.9.2 0 defaults
sqlparse 0.1.19 py27_0 defaults
svgfig 1.1.6 <pip>
svgwrite 1.1.6 py27_0 bioconda
tempita 0.5.3.dev0 <pip>
tk 8.5.18 0 defaults
uwsgi 2.0.2 py27_0 travis
watchdog 0.8.3 py27_0 bioconda
wchartype 0.1 <pip>
weberror 0.10.3 <pip>
webhelpers 1.3 <pip>
webob 1.6.0 py27_0 defaults
wheel 0.29.0 py27_0 defaults
whoosh 2.4.1+gx1 <pip>
yaml 0.1.6 0 defaults
zlib 1.2.8 0 defaults
galaxy DEBUG log when submitting a job.
galaxy.tools DEBUG 2016-05-19 10:27:48,049 Validated and populated state for tool request (302.692 ms)
galaxy.objectstore DEBUG 2016-05-19 10:27:48,322 Selected backend 'files1' for creation of Dataset 1836
galaxy.tools.actions INFO 2016-05-19 10:27:48,397 Handled output named out_file1 for tool toolshed.g2.bx.psu.edu/repos/devteam/vcfvcfintersect/vcfvcfintersect/0.0.3 (186.174 ms)
galaxy.tools.actions INFO 2016-05-19 10:27:48,419 Added output datasets to history (20.399 ms)
galaxy.tools.actions INFO 2016-05-19 10:27:48,521 Verified access to datasets for Job[unflushed,tool_id=toolshed.g2.bx.psu.edu/repos/devteam/vcfvcfintersect/vcfvcfintersect/0.0.3] (60.093 ms)
galaxy.tools.actions INFO 2016-05-19 10:27:48,533 Setup for job Job[unflushed,tool_id=toolshed.g2.bx.psu.edu/repos/devteam/vcfvcfintersect/vcfvcfintersect/0.0.3] complete, ready to flush (113.284 ms)
galaxy.tools.actions INFO 2016-05-19 10:27:48,669 Flushed transaction for job Job[id=1509,tool_id=toolshed.g2.bx.psu.edu/repos/devteam/vcfvcfintersect/vcfvcfintersect/0.0.3] (124.145 ms)
galaxy.tools.execute DEBUG 2016-05-19 10:27:48,670 Tool [toolshed.g2.bx.psu.edu/repos/devteam/vcfvcfintersect/vcfvcfintersect/0.0.3] created job [1509] (509.816 ms)
galaxy.tools.execute DEBUG 2016-05-19 10:27:48,711 Executed 1 job(s) for tool toolshed.g2.bx.psu.edu/repos/devteam/vcfvcfintersect/vcfvcfintersect/0.0.3 request: (659.279 ms)
157.193.67.5 - - [19/May/2016:10:27:47 +0200] "POST /api/tools HTTP/1.0" 200 - "https://<redacted>/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/601.6.17 (KHTML, like Gecko) Version/9
.1.1 Safari/601.6.17"
Any help?
Thanks!
M
When reverting the code back to branch release_16.01, Galaxy tells me it has failed to prepare the job.
Galaxy log reports the following:
galaxy.tools INFO 2016-05-19 11:04:01,331 Parameter validation failed.
Traceback (most recent call last):
File "lib/galaxy/tools/__init__.py", line 1732, in check_and_update_param_values_helper
input.value_from_basic( input.value_to_basic( value, trans.app ), trans.app, ignore_errors=False )
File "lib/galaxy/tools/parameters/basic.py", line 190, in value_from_basic
return self.to_python( value, app )
File "lib/galaxy/tools/parameters/basic.py", line 2095, in to_python
return [v for v in map( single_to_python, values ) if v not in none_values]
File "lib/galaxy/tools/parameters/basic.py", line 2091, in single_to_python
return app.model.context.query( app.model.HistoryDatasetAssociation ).get( int( value ) )
ValueError: invalid literal for int() with base 10: "{'values': [{'src': 'hda'"
galaxy.tools INFO 2016-05-19 11:04:01,343 Parameter validation failed.
Traceback (most recent call last):
File "lib/galaxy/tools/__init__.py", line 1732, in check_and_update_param_values_helper
input.value_from_basic( input.value_to_basic( value, trans.app ), trans.app, ignore_errors=False )
File "lib/galaxy/tools/parameters/basic.py", line 190, in value_from_basic
return self.to_python( value, app )
File "lib/galaxy/tools/parameters/basic.py", line 2095, in to_python
return [v for v in map( single_to_python, values ) if v not in none_values]
File "lib/galaxy/tools/parameters/basic.py", line 2091, in single_to_python
return app.model.context.query( app.model.HistoryDatasetAssociation ).get( int( value ) )
ValueError: invalid literal for int() with base 10: "{'values': [{'src': 'hda'"
resubmitting the same job on v16.01 results in a running job.
Galaxy log reports the following:
157.193.67.5 - - [19/May/2016:11:06:24 +0200] "GET /api/tools/toolshed.g2.bx.psu.edu/repos/devteam/vcfvcfintersect/vcfvcfintersect/0.0.3/citations HTTP/1.0" 200 - "https://galaxy.cmgg.be/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/601.6.17 (KHTML, like Gecko) Version/9.1.1 Safari/601.6.17"
[pid: 26176|app: 0|req: 8/12] 10.0.7.3 () {50 vars in 1078 bytes} [Thu May 19 11:06:24 2016] GET /api/tools/toolshed.g2.bx.psu.edu/repos/devteam/vcfvcfintersect/vcfvcfintersect/0.0.3/citations => generated 229 bytes in 147 msecs (HTTP/1.0 200) 3 headers in 124 bytes (1 switches on core 0)
[galaxy.tools DEBUG 2016-05-19 11:06:31,277 Validated and populated state for tool request (170.828 ms)
galaxy.objectstore DEBUG 2016-05-19 11:06:31,469 Selected backend 'files2' for creation of Dataset 1837
galaxy.tools.actions INFO 2016-05-19 11:06:31,511 Handled output (142.898 ms)
galaxy.tools.actions INFO 2016-05-19 11:06:31,610 Verified access to datasets (16.404 ms)
galaxy.tools.execute DEBUG 2016-05-19 11:06:31,701 Tool [toolshed.g2.bx.psu.edu/repos/devteam/vcfvcfintersect/vcfvcfintersect/0.0.3] created job [1510] (367.937 ms)
galaxy.tools.execute DEBUG 2016-05-19 11:06:31,727 Executed all jobs for tool request: (448.567 ms)
157.193.67.5 - - [19/May/2016:11:06:30 +0200] "POST /api/tools HTTP/1.0" 200 - "https://<redacted>" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/601.6.17 (KHTML, like Gecko) Version/9.1.1 Safari/601.6.17"
"view tool error logs" in the galaxy admin interface reports the following:
2016-05-19 13:52:36.576037 Tool XML parsing ./tools/validation/fix_errors.xml [Errno 2] No such file or directory: './tools/validation/fix_errors.xml
the validation directory under galaxy-dist/tools seems to be deleted in v16.04. Can anyone tell me why and how to resolve this issue? It seems like I'm running some kind of older code somewhere that still looks for some deprecated files...
M
@matthdsm You just need to remove the line:
<tool file="validation/fix_errors.xml" />
from config/tool_conf.xml
Hi,
yeah, I've just done that. This resolves the error message, but doesn't fix the job submission error.
thanks for the tip anyway ;)
M
the issue seems to be caused by a missing dependency. our job handlers keep restarting with the following message:
INFO:galaxy.config:Logging at '10' level to '/home/galaxy/galaxy-dist/log/handler0.log'
No handlers could be found for logger "__main__"
Traceback (most recent call last):
File "./scripts/galaxy-main", line 249, in <module>
main()
File "./scripts/galaxy-main", line 245, in main
app_loop(args, log)
File "./scripts/galaxy-main", line 117, in app_loop
log=log,
File "./scripts/galaxy-main", line 105, in load_galaxy_app
**kwds
File "/home/galaxy/galaxy-dist/lib/galaxy/app.py", line 174, in __init__
self.job_manager = manager.JobManager( self )
File "/home/galaxy/galaxy-dist/lib/galaxy/jobs/manager.py", line 23, in __init__
self.job_handler = handler.JobHandler( app )
File "/home/galaxy/galaxy-dist/lib/galaxy/jobs/handler.py", line 32, in __init__
self.dispatcher = DefaultJobDispatcher( app )
File "/home/galaxy/galaxy-dist/lib/galaxy/jobs/handler.py", line 723, in __init__
self.job_runners = self.app.job_config.get_job_runner_plugins( self.app.config.server_name )
File "/home/galaxy/galaxy-dist/lib/galaxy/jobs/__init__.py", line 632, in get_job_runner_plugins
module = __import__( module_name )
File "/home/galaxy/galaxy-dist/lib/galaxy/jobs/runners/drmaa.py", line 18, in <module>
from pulsar.managers.util.drmaa import DrmaaSessionFactory
ImportError: No module named managers.util.drmaa
Galaxy handlers are loading modules based on the 'python path' mentioned in your logs.
In this case the /home/galaxy/galaxy-dist/lib directory had priority over the other lib dirs, so the handler loads the pulsar module from there, which in this case was only a minimal version with nothing but pulsar.client. When the module is remove from ./galaxy-dist/lib it will look for the full working version or the python pulsar module installed when the env was created (with venv or conda).
I will open this ticket due to the issue on my production instance when switched to v16.04. Our instance uses htcondor to dispatch the jobs. From the log message, the job preparation step is not done properly, I am seeing the following log message:
galaxy.tools DEBUG 2016-06-25 18:59:02,772 Validated and populated state for tool request (23.589 ms)
galaxy.tools.actions.upload DEBUG 2016-06-25 18:59:02,836 Persisted uploads (48.425 ms)
galaxy.tools.actions.upload DEBUG 2016-06-25 18:59:02,927 Checked and cleaned uploads (90.914 ms)
galaxy.tools.actions.upload_common INFO 2016-06-25 18:59:02,973 tool upload1 created job id 77959
galaxy.tools.actions.upload DEBUG 2016-06-25 18:59:03,012 Created upload job (84.935 ms)
galaxy.tools.execute DEBUG 2016-06-25 18:59:03,013 Tool [upload1] created job 77959 (225.126 ms)
galaxy.tools.execute DEBUG 2016-06-25 18:59:03,019 Executed 1 job(s) for tool upload1 request: (247.330 ms)
Can @matthdsm @tomiles give any suggestion to resolve this, I looked at the comment above - not sure whether I understood that properly.
thanks in advance, Vipin
When upgrading to Pulsar 16.04 - old .pyc files may be sitting around so you may need to run rm -rf $GALAXY_ROOT/lib/pulsar where $GALAXY_ROOT is replaced by your actual Galaxy root.
Hi @jmchilton, I am not seeing the pulsar directory under my Galaxy root. Here are the contents:
vm-infg2-[0]:~/galaxy_ratschlab/lib# ll
total 28
drwxr-xr-x 28 galaxy galaxy 4096 Jun 14 17:54 galaxy
drwxr-xr-x 3 galaxy galaxy 39 Jun 14 17:24 galaxy_ext
drwxr-xr-x 3 galaxy galaxy 39 Jun 14 17:24 galaxy_utils
-rw-r--r-- 1 galaxy galaxy 988 Jun 14 17:24 log_tempfile.py
-rwxr-xr-x 1 galaxy galaxy 9019 Jun 14 17:24 mimeparse.py
-rw-r--r-- 1 galaxy galaxy 56 Jun 14 17:24 psyco_full.py
drwxr-xr-x 12 galaxy galaxy 4096 Jun 14 17:54 tool_shed
I m not sure what is going wrong here, I will start with a fresh clone of the codebase and try.
Hi there,
Sorry about the long wait, but we won't be able to help since we don't use pulsar. Our issue was caused by an old reference in our $PYTHONPATH variable. as @jmchilton suggested, perhaps you should check your galaxy environment for old files and/or references that interfere with the new ones.
M
@vipints
I just meet the same trouble as you
Can you give me some suggestion?
@Brentbin, I didn't fix this issue yet. I will be working on my instance tomorrow morning (7/7) to resolve it.
@vipints
I've change
/$galaxy_dir/lib/galaxy/jobs/runner/pbs.py
from
import pbs
to
import sh as pbs
then galaxy could submitting job
while submitting a job the info of debug
galaxy.jobs DEBUG 2016-07-06 11:12:37,570 (68) Persisting job destination (destination id: qsub)
galaxy.jobs.runners DEBUG 2016-07-06 11:12:42,198 Job [68] queued (4772.901 ms)
galaxy.jobs.handler INFO 2016-07-06 11:12:42,259 (68) Job dispatched
galaxy.jobs.command_factory INFO 2016-07-06 11:12:47,211 Built script [/mnt/ceph/galaxy/database/jobs_directory/000/68/tool_script.sh] for tool command[python /mnt/ceph/galaxy/tools/data_source/upload.py /mnt/ceph/galaxy /mnt/ceph/galaxy/database/tmp/tmp02Z1_D /mnt/ceph/galaxy/database/tmp/tmp_X2L3t 62:/mnt/ceph/galaxy/database/jobs_directory/000/68/dataset_62_files:/mnt/ceph/galaxy/database/files/000/dataset_62.dat]
galaxy.jobs.runners ERROR 2016-07-06 11:12:52,484 (68) Unhandled exception calling queue_job
Traceback (most recent call last):
File "/mnt/ceph/galaxy/lib/galaxy/jobs/runners/__init__.py", line 104, in run_next
method(arg)
File "/mnt/ceph/galaxy/lib/galaxy/jobs/runners/pbs.py", line 229, in queue_job
c = pbs.pbs_connect( util.smart_str( pbs_server_name ) )
File "/opt/galaxy/.venv/lib/python2.7/site-packages/sh.py", line 2301, in __getattr__
return self.__env[name]
File "/opt/galaxy/.venv/lib/python2.7/site-packages/sh.py", line 2232, in __getitem__
return Command._create(k, **self.baked_args)
File "/opt/galaxy/.venv/lib/python2.7/site-packages/sh.py", line 776, in _create
raise CommandNotFound(program)
CommandNotFound: pbs_connect
To fix the pbs_connect
will help me
@Brentbin Sorry for the delayed response from my side. I didn't manage to look at the problem. I was going through the change which you made to the lib/galaxy/jobs/runner/pbs.py, not sure whether I understood properly - how you can import the pbs module from sh library. I am curious whether you solved the issue with pbs_connect?
Most helpful comment
Galaxy handlers are loading modules based on the 'python path' mentioned in your logs.
In this case the /home/galaxy/galaxy-dist/lib directory had priority over the other lib dirs, so the handler loads the pulsar module from there, which in this case was only a minimal version with nothing but pulsar.client. When the module is remove from ./galaxy-dist/lib it will look for the full working version or the python pulsar module installed when the env was created (with venv or conda).