When running Trim Galaore! on a collection of paired fastq files, I ran into two issues, possibly related:
Conda dependency seemingly installed but failed to build job environment.. See the full error message below.Traceback (most recent call last):
File "/galaxy/galaxy-app/lib/galaxy/jobs/runners/__init__.py", line 170, in prepare_job
job_wrapper.prepare()
File "/galaxy/galaxy-app/lib/galaxy/jobs/__init__.py", line 901, in prepare
self.dependency_shell_commands = self.tool.build_dependency_shell_commands(job_directory=self.working_directory)
File "/galaxy/galaxy-app/lib/galaxy/tools/__init__.py", line 1291, in build_dependency_shell_commands
metadata=metadata,
File "/galaxy/galaxy-app/lib/galaxy/tools/deps/__init__.py", line 99, in dependency_shell_commands
**kwds )
File "/galaxy/galaxy-app/lib/galaxy/tools/deps/__init__.py", line 118, in find_dep
dependency = resolver.resolve( name, version, type, **kwds )
File "/galaxy/galaxy-app/lib/galaxy/tools/deps/resolvers/conda.py", line 203, in resolve
raise Exception("Conda dependency seemingly installed but failed to build job environment.")
Exception: Conda dependency seemingly installed but failed to build job environment.
Oddly, I was able to work around this issue by running Trim Galore! using two separate lists, one for forward and one for reverse, (i.e. "Paired-end" instead of "Paired Collection"). Not sure if this is an issue with the tool or Galaxy's handling of lists of pairs...
I have this same issue in [1] with Autodock Vina tools. I did an initial analysis on this error, but yet don't had the time to conclude. I hope identify and fix the bug this week (probably tomorrow 26/10)
[1] [http://dev.list.galaxyproject.org/Problem-with-conda-on-galaxy-16-07-td4669995.html](http://dev.list.galaxyproject.org/Problem-with-conda-on-galaxy-16-07-td4669995.html)
As mentioned in #3126 we should report conda's stderr to have a better idea of why its failing.
@mvdbeek On my 16.07 install I also added this patch to log the Conda commands that are executed, might fit in the same PR:
diff --git a/lib/galaxy/tools/deps/conda_util.py b/lib/galaxy/tools/deps/conda_util.py
index 7240754..061e2c0 100644
--- a/lib/galaxy/tools/deps/conda_util.py
+++ b/lib/galaxy/tools/deps/conda_util.py
@@ -179,6 +179,7 @@ class CondaContext(object):
condarc_override = self.condarc_override
if condarc_override:
env["CONDARC"] = condarc_override
+ log.debug("Executing command: %s", command)
return self.shell_exec(command, env=env)
def exec_create(self, args):
I'm seeing this pretty regularly with Trim Galore! regardless of settings. One thing to note is that this tool generates quite a few galaxy.workflow.run DEBUG 2016-11-04 16:58:18,268 Workflow step 21592 of invocation 7110 delayed (1.789 ms) messages, and in fact, it takes a long time (with nothing else going on) to get all of the steps actually even submitted to cluster. @jmchilton I know you said you had never seen this, but I suggest running this tool on collections of 20+ pairs of datasets, seems to happen fairly often for me.
Also, the collection type doesn't seem to matter, errors have occurred with both lists of paired and pairs of lists.
This error does seem to occur on items that had been delayed for a long time. Also, the log still fills up with messages of delayed invocation, even though there doesn't appear to be anything waiting... Which is why I turned off debug logging in the first place (completely useless with GBs of these messages). Argh.
Well, this is interesting... @bgruening Have you seen this sort of thing?
``````
Error: LOCKERROR: It looks like conda is already doing something.
The lock ['/galaxy/galaxy-app/tool-dependencies/_conda/pkgs/.conda_lock-119903'] was found. Wait for it to finish before continuing.
If you are sure that conda is not running, remove it and try again.
You can also use: $ conda clean --lock
galaxy.jobs.runners ERROR 2016-11-04 15:40:48,763 (121093) Failure preparing job
Traceback (most recent call last):
File "/galaxy/galaxy-app/lib/galaxy/jobs/runners/__init__.py", line 170, in prepare_job
job_wrapper.prepare()
File "/galaxy/galaxy-app/lib/galaxy/jobs/__init__.py", line 901, in prepare
self.dependency_shell_commands = self.tool.build_dependency_shell_commands(job_directory=self.working_directory)
File "/galaxy/galaxy-app/lib/galaxy/tools/__init__.py", line 1291, in build_dependency_shell_commands
metadata=metadata,
File "/galaxy/galaxy-app/lib/galaxy/tools/deps/__init__.py", line 99, in dependency_shell_commands
*kwds )
File "/galaxy/galaxy-app/lib/galaxy/tools/deps/__init__.py", line 118, in find_dep
dependency = resolver.resolve( name, version, type, *kwds )
File "/galaxy/galaxy-app/lib/galaxy/tools/deps/resolvers/conda.py", line 203, in resolve
raise Exception("Conda dependency seemingly installed but failed to build job environment.")
Exception: Conda dependency seemingly installed but failed to build job environment.
galaxy.jobs.runners ERROR 2016-11-04 15:40:48,763 (121092) Failure preparing job
Traceback (most recent call last):
File "/galaxy/galaxy-app/lib/galaxy/jobs/runners/__init__.py", line 170, in prepare_job
job_wrapper.prepare()
File "/galaxy/galaxy-app/lib/galaxy/jobs/__init__.py", line 901, in prepare
self.dependency_shell_commands = self.tool.build_dependency_shell_commands(job_directory=self.working_directory)
File "/galaxy/galaxy-app/lib/galaxy/tools/__init__.py", line 1291, in build_dependency_shell_commands
metadata=metadata,
File "/galaxy/galaxy-app/lib/galaxy/tools/deps/__init__.py", line 99, in dependency_shell_commands
*kwds )
File "/galaxy/galaxy-app/lib/galaxy/tools/deps/__init__.py", line 118, in find_dep
dependency = resolver.resolve( name, version, type, *kwds )
File "/galaxy/galaxy-app/lib/galaxy/tools/deps/resolvers/conda.py", line 203, in resolve
raise Exception("Conda dependency seemingly installed but failed to build job environment.")
Exception: Conda dependency seemingly installed but failed to build job environment.```
``````
@lparsons I have. If you're feeling adventurous you could try #3106. The background is that we're building a new conda environment in each jobs job_working_directory, and under certain circumstances (eg the job_working_directory is not on the same filesystem as the conda dependencies, or a distributed filesystem) the environment creation takes a long time, during which conda creates a lock.
With #3106 we cache those environment when installing the tool. To use it you would need to set a use_chached_dependency_manager = True and tool_dependency_cache_dir = <some path> in the galaxy.ini and reinstall the tool that should make use of the cache (if you don't re-install it uses the job_working_directory as before). When creating a new job we just activate the cached environment instead of building a new environment per job.
@mvdbeek Thanks! Glad to get to the bottom of this. I'm not sure if I'm that adventurous yet, though I guess I stepped in it already by adopting conda deps so soon... ;-) Perhaps after the admin training I'll have time. Until then, folks will just have to keep rerunning jobs that fail I guess. Do you think this might be related to the excessive (and seemingly never ending) "invocation delayed" debug statements?
Yeah, it all depends. In my old lab things are working pretty smooth with conda, but in my new lab with a bigger, more outdated cluster things are more difficult. The invocation delayed statement sound like a separate issue to me, but I may be wrong.
I'm wondering if catching this error and waiting before trying again (instead of failing immediately) would be a more robust fix? Thoughts @mvdbeek @bgruening ?
I will have a look if we can intercept the error and sleep. But I'm afraid the really annoying part is that the environment is slow to build, so you're limited in concurrency to building one conda job at a time. If you have lots of small jobs that's a real dealbreaker. I wish they would at least acknowledge this as a real problem, instead of saying we don't support stacked environments because creating new ones is so cheap.
https://github.com/conda/conda/issues/3580 would be a nice solution, but alas ...
@lparsons Are you still seeing this? It should be resolved in 17.01 .
@nsoranzo I've enabled dependency caching, and I believe that resolved things (at least when I've been able to get the caches built successfully). Just updated to 17.01 yesterday, so I'm not sure if things have changed. Should I remove the dependency caching? I'm also wondering if updating to miniconda3 would help. Doing that will take a bit more effort though, as all dependencies will need to be reinstalled. Working on a script to use the API to do that, but I'm not sure if it will work yet.
I've enabled dependency caching, and I believe that resolved things (at least when I've been able to get the caches built successfully). Just updated to 17.01 yesterday, so I'm not sure if things have changed. Should I remove the dependency caching?
Ping @mvdbeek on this one
I'm also wondering if updating to miniconda3 would help. Doing that will take a bit more effort though, as all dependencies will need to be reinstalled. Working on a script to use the API to do that, but I'm not sure if it will work yet.
Not sure miniconda3 would make a difference. @jmchilton mentioned it may be possible to use 2 Conda resolvers at the same time (miniconda2 and miniconda3) to smooth the migration, but no one tried AFAIK.
@nsoranzo I've enabled dependency caching, and I believe that resolved things (at least when I've been able to get the caches built successfully). Just updated to 17.01 yesterday, so I'm not sure if things have changed. Should I remove the dependency caching?
Don't remove it yet, if it's working. For tools that have all of their dependencies satisfiable by the versioned conda resolver we're using a very similar approach to caching (the all-in-one resolver). But that doesn't work for tools that have one requirement for a TS package and one for a conda package. We also have resolver specific mappings now, so that we can map an unresolvable dependency to one that can be resolved, effectively making it possible to use the all-in-one resolution in all cases, but this is a trial-and-error process right now.
You can actually see the mapping and which environment will be used with https://github.com/galaxyproject/galaxy/pull/3479, so if things are working for you now you can reconsider disabling the cached environments in 17.05 (more easily ... that is).
Hi, this bug seems to be related to package version on conda-forge channel. I was getting the same error and when I change the version of some packages from other channels instead of conda-forge, the issue was solved. It was not necessary to set dependencies cache dir. This closed issue faced the same issue with conda and other packages.
I'm gonna close this, since the original problem was a conda lock error, this has been resolved with the all-in-one resolution. The second issue is that almost all conda errors get tracked with the same exception and no details. That is being tracked in #3634. If you have another issue feel free to open a new issue with detailed information on how to reproduce the problem.
I get something like this occasionally on 17.01, again with a sometimes slow filesystem:
Traceback (most recent call last):
File "/galaxy-central/lib/galaxy/jobs/runners/__init__.py", line 170, in prepare_job
job_wrapper.prepare()
File "/galaxy-central/lib/galaxy/jobs/__init__.py", line 971, in prepare
self.dependency_shell_commands = self.tool.build_dependency_shell_commands(job_directory=self.working_directory)
File "/galaxy-central/lib/galaxy/tools/__init__.py", line 1436, in build_dependency_shell_commands
tool_instance=self
File "/galaxy-central/lib/galaxy/tools/deps/__init__.py", line 112, in dependency_shell_commands
return [dependency.shell_commands(requirement) for requirement, dependency in requirement_to_dependency.items()]
File "/galaxy-central/lib/galaxy/tools/deps/resolvers/conda.py", line 392, in shell_commands
self.build_environment()
File "/galaxy-central/lib/galaxy/tools/deps/resolvers/conda.py", line 387, in build_environment
raise DependencyException("Conda dependency seemingly installed but failed to build job environment.")
I'm trying to go through the code to figure out where to throw in a log.warning() line, but since this is such a sporadic thing to begin with...
You should have this block:
def exec_command(self, operation, args):
command = self.command(operation, args)
env = {'HOME': self.conda_prefix} # We don't want to pollute ~/.conda, which may not even be writable
condarc_override = self.condarc_override
if condarc_override:
env["CONDARC"] = condarc_override
log.debug("Executing command: %s", command)
try:
return self.shell_exec(command, env=env)
except commands.CommandLineException as e:
log.warning(e)
return e.returncode
in lib/galaxy/tools/deps/conda_util.py, though which all conda commands are passing. Is there anything before this traceback?
The line in the log before the most recent instance of this is galaxy.tools.deps.conda_util WARNING 2017-05-12 11:31:01,392 Executing command: /data/galaxy/tool_deps/_conda/bin/conda clean --tarballs -y. I put in a log.warning rather than a log.debug so I wouldn't need to change the log level. There was another instance of that same command 4 minutes earlier.
Can you try this commit?
https://github.com/mvdbeek/galaxy/commit/3560f2c174d7cfe3720119cb2bbc9fc32895465d
It is based on 17.01, so it should apply cleanly.
That should at least give us the stdout/stderr of the command that is failing.
Sure, we'll see how long it takes for the code to get triggered :)
Most helpful comment
@mvdbeek On my 16.07 install I also added this patch to log the Conda commands that are executed, might fit in the same PR: