Galaxy: BUG: Conda dependency seemingly installed but failed to build job environment. (Conda Lock Error))

Created on 21 Oct 2016 · 25Comments · Source: galaxyproject/galaxy

When running Trim Galaore! on a collection of paired fastq files, I ran into two issues, possibly related:

Some of the jobs fail (randomly) with the error Conda dependency seemingly installed but failed to build job environment.. See the full error message below.
Only some of the job are dispatched immediately, even though there are no restrictions on the number of jobs to submit to SLURM. I'm not sure if this is at all related, but it's behavior I've not seen before and is correlated at the moment with the exception.

Traceback (most recent call last):
  File "/galaxy/galaxy-app/lib/galaxy/jobs/runners/__init__.py", line 170, in prepare_job
    job_wrapper.prepare()
  File "/galaxy/galaxy-app/lib/galaxy/jobs/__init__.py", line 901, in prepare
    self.dependency_shell_commands = self.tool.build_dependency_shell_commands(job_directory=self.working_directory)
  File "/galaxy/galaxy-app/lib/galaxy/tools/__init__.py", line 1291, in build_dependency_shell_commands
    metadata=metadata,
  File "/galaxy/galaxy-app/lib/galaxy/tools/deps/__init__.py", line 99, in dependency_shell_commands
    **kwds )
  File "/galaxy/galaxy-app/lib/galaxy/tools/deps/__init__.py", line 118, in find_dep
    dependency = resolver.resolve( name, version, type, **kwds )
  File "/galaxy/galaxy-app/lib/galaxy/tools/deps/resolvers/conda.py", line 203, in resolve
    raise Exception("Conda dependency seemingly installed but failed to build job environment.")
Exception: Conda dependency seemingly installed but failed to build job environment.

Source

lparsons

Most helpful comment

@mvdbeek On my 16.07 install I also added this patch to log the Conda commands that are executed, might fit in the same PR:

diff --git a/lib/galaxy/tools/deps/conda_util.py b/lib/galaxy/tools/deps/conda_util.py
index 7240754..061e2c0 100644
--- a/lib/galaxy/tools/deps/conda_util.py
+++ b/lib/galaxy/tools/deps/conda_util.py
@@ -179,6 +179,7 @@ class CondaContext(object):
         condarc_override = self.condarc_override
         if condarc_override:
             env["CONDARC"] = condarc_override
+        log.debug("Executing command: %s", command)
         return self.shell_exec(command, env=env)

     def exec_create(self, args):

nsoranzo on 4 Nov 2016

👍3

All 25 comments

Oddly, I was able to work around this issue by running Trim Galore! using two separate lists, one for forward and one for reverse, (i.e. "Paired-end" instead of "Paired Collection"). Not sure if this is an issue with the tool or Galaxy's handling of lists of pairs...

lparsons on 24 Oct 2016

I have this same issue in [1] with Autodock Vina tools. I did an initial analysis on this error, but yet don't had the time to conclude. I hope identify and fix the bug this week (probably tomorrow 26/10)

[1] [http://dev.list.galaxyproject.org/Problem-with-conda-on-galaxy-16-07-td4669995.html](http://dev.list.galaxyproject.org/Problem-with-conda-on-galaxy-16-07-td4669995.html)

leobiscassi on 26 Oct 2016

As mentioned in #3126 we should report conda's stderr to have a better idea of why its failing.

mvdbeek on 4 Nov 2016

@mvdbeek On my 16.07 install I also added this patch to log the Conda commands that are executed, might fit in the same PR:

diff --git a/lib/galaxy/tools/deps/conda_util.py b/lib/galaxy/tools/deps/conda_util.py
index 7240754..061e2c0 100644
--- a/lib/galaxy/tools/deps/conda_util.py
+++ b/lib/galaxy/tools/deps/conda_util.py
@@ -179,6 +179,7 @@ class CondaContext(object):
         condarc_override = self.condarc_override
         if condarc_override:
             env["CONDARC"] = condarc_override
+        log.debug("Executing command: %s", command)
         return self.shell_exec(command, env=env)

     def exec_create(self, args):

nsoranzo on 4 Nov 2016

👍3

I'm seeing this pretty regularly with Trim Galore! regardless of settings. One thing to note is that this tool generates quite a few galaxy.workflow.run DEBUG 2016-11-04 16:58:18,268 Workflow step 21592 of invocation 7110 delayed (1.789 ms) messages, and in fact, it takes a long time (with nothing else going on) to get all of the steps actually even submitted to cluster. @jmchilton I know you said you had never seen this, but I suggest running this tool on collections of 20+ pairs of datasets, seems to happen fairly often for me.

lparsons on 4 Nov 2016

Also, the collection type doesn't seem to matter, errors have occurred with both lists of paired and pairs of lists.

lparsons on 4 Nov 2016

This error does seem to occur on items that had been delayed for a long time. Also, the log still fills up with messages of delayed invocation, even though there doesn't appear to be anything waiting... Which is why I turned off debug logging in the first place (completely useless with GBs of these messages). Argh.

lparsons on 4 Nov 2016

Well, this is interesting... @bgruening Have you seen this sort of thing?

``````
Error: LOCKERROR: It looks like conda is already doing something.
The lock ['/galaxy/galaxy-app/tool-dependencies/_conda/pkgs/.conda_lock-119903'] was found. Wait for it to finish before continuing.
If you are sure that conda is not running, remove it and try again.
You can also use: $ conda clean --lock

galaxy.jobs.runners ERROR 2016-11-04 15:40:48,763 (121093) Failure preparing job
Traceback (most recent call last):
File "/galaxy/galaxy-app/lib/galaxy/jobs/runners/__init__.py", line 170, in prepare_job
job_wrapper.prepare()
File "/galaxy/galaxy-app/lib/galaxy/jobs/__init__.py", line 901, in prepare
self.dependency_shell_commands = self.tool.build_dependency_shell_commands(job_directory=self.working_directory)
File "/galaxy/galaxy-app/lib/galaxy/tools/__init__.py", line 1291, in build_dependency_shell_commands
metadata=metadata,
File "/galaxy/galaxy-app/lib/galaxy/tools/deps/__init__.py", line 99, in dependency_shell_commands
*kwds )
File "/galaxy/galaxy-app/lib/galaxy/tools/deps/__init__.py", line 118, in find_dep
dependency = resolver.resolve( name, version, type, *kwds )
File "/galaxy/galaxy-app/lib/galaxy/tools/deps/resolvers/conda.py", line 203, in resolve
raise Exception("Conda dependency seemingly installed but failed to build job environment.")
Exception: Conda dependency seemingly installed but failed to build job environment.
galaxy.jobs.runners ERROR 2016-11-04 15:40:48,763 (121092) Failure preparing job
Traceback (most recent call last):
File "/galaxy/galaxy-app/lib/galaxy/jobs/runners/__init__.py", line 170, in prepare_job
job_wrapper.prepare()
File "/galaxy/galaxy-app/lib/galaxy/jobs/__init__.py", line 901, in prepare
self.dependency_shell_commands = self.tool.build_dependency_shell_commands(job_directory=self.working_directory)
File "/galaxy/galaxy-app/lib/galaxy/tools/__init__.py", line 1291, in build_dependency_shell_commands
metadata=metadata,
File "/galaxy/galaxy-app/lib/galaxy/tools/deps/__init__.py", line 99, in dependency_shell_commands
*kwds )
File "/galaxy/galaxy-app/lib/galaxy/tools/deps/__init__.py", line 118, in find_dep
dependency = resolver.resolve( name, version, type, *kwds )
File "/galaxy/galaxy-app/lib/galaxy/tools/deps/resolvers/conda.py", line 203, in resolve
raise Exception("Conda dependency seemingly installed but failed to build job environment.")
Exception: Conda dependency seemingly installed but failed to build job environment.```
``````

lparsons on 4 Nov 2016

@lparsons I have. If you're feeling adventurous you could try #3106. The background is that we're building a new conda environment in each jobs job_working_directory, and under certain circumstances (eg the job_working_directory is not on the same filesystem as the conda dependencies, or a distributed filesystem) the environment creation takes a long time, during which conda creates a lock.
With #3106 we cache those environment when installing the tool. To use it you would need to set a use_chached_dependency_manager = True and tool_dependency_cache_dir = <some path> in the galaxy.ini and reinstall the tool that should make use of the cache (if you don't re-install it uses the job_working_directory as before). When creating a new job we just activate the cached environment instead of building a new environment per job.

mvdbeek on 4 Nov 2016

@mvdbeek Thanks! Glad to get to the bottom of this. I'm not sure if I'm that adventurous yet, though I guess I stepped in it already by adopting conda deps so soon... ;-) Perhaps after the admin training I'll have time. Until then, folks will just have to keep rerunning jobs that fail I guess. Do you think this might be related to the excessive (and seemingly never ending) "invocation delayed" debug statements?

lparsons on 4 Nov 2016

Yeah, it all depends. In my old lab things are working pretty smooth with conda, but in my new lab with a bigger, more outdated cluster things are more difficult. The invocation delayed statement sound like a separate issue to me, but I may be wrong.

mvdbeek on 4 Nov 2016

I'm wondering if catching this error and waiting before trying again (instead of failing immediately) would be a more robust fix? Thoughts @mvdbeek @bgruening ?

lparsons on 29 Nov 2016

I will have a look if we can intercept the error and sleep. But I'm afraid the really annoying part is that the environment is slow to build, so you're limited in concurrency to building one conda job at a time. If you have lots of small jobs that's a real dealbreaker. I wish they would at least acknowledge this as a real problem, instead of saying we don't support stacked environments because creating new ones is so cheap.

mvdbeek on 29 Nov 2016

https://github.com/conda/conda/issues/3580 would be a nice solution, but alas ...

mvdbeek on 29 Nov 2016

@lparsons Are you still seeing this? It should be resolved in 17.01 .

nsoranzo on 22 Feb 2017

@nsoranzo I've enabled dependency caching, and I believe that resolved things (at least when I've been able to get the caches built successfully). Just updated to 17.01 yesterday, so I'm not sure if things have changed. Should I remove the dependency caching? I'm also wondering if updating to miniconda3 would help. Doing that will take a bit more effort though, as all dependencies will need to be reinstalled. Working on a script to use the API to do that, but I'm not sure if it will work yet.

lparsons on 22 Feb 2017

I've enabled dependency caching, and I believe that resolved things (at least when I've been able to get the caches built successfully). Just updated to 17.01 yesterday, so I'm not sure if things have changed. Should I remove the dependency caching?

Ping @mvdbeek on this one

I'm also wondering if updating to miniconda3 would help. Doing that will take a bit more effort though, as all dependencies will need to be reinstalled. Working on a script to use the API to do that, but I'm not sure if it will work yet.

Not sure miniconda3 would make a difference. @jmchilton mentioned it may be possible to use 2 Conda resolvers at the same time (miniconda2 and miniconda3) to smooth the migration, but no one tried AFAIK.

nsoranzo on 22 Feb 2017

@nsoranzo I've enabled dependency caching, and I believe that resolved things (at least when I've been able to get the caches built successfully). Just updated to 17.01 yesterday, so I'm not sure if things have changed. Should I remove the dependency caching?

Don't remove it yet, if it's working. For tools that have all of their dependencies satisfiable by the versioned conda resolver we're using a very similar approach to caching (the all-in-one resolver). But that doesn't work for tools that have one requirement for a TS package and one for a conda package. We also have resolver specific mappings now, so that we can map an unresolvable dependency to one that can be resolved, effectively making it possible to use the all-in-one resolution in all cases, but this is a trial-and-error process right now.
You can actually see the mapping and which environment will be used with https://github.com/galaxyproject/galaxy/pull/3479, so if things are working for you now you can reconsider disabling the cached environments in 17.05 (more easily ... that is).

mvdbeek on 22 Feb 2017

Hi, this bug seems to be related to package version on conda-forge channel. I was getting the same error and when I change the version of some packages from other channels instead of conda-forge, the issue was solved. It was not necessary to set dependencies cache dir. This closed issue faced the same issue with conda and other packages.

adefelicibus on 25 Mar 2017

I'm gonna close this, since the original problem was a conda lock error, this has been resolved with the all-in-one resolution. The second issue is that almost all conda errors get tracked with the same exception and no details. That is being tracked in #3634. If you have another issue feel free to open a new issue with detailed information on how to reproduce the problem.

mvdbeek on 25 Mar 2017

👍1

I get something like this occasionally on 17.01, again with a sometimes slow filesystem:

Traceback (most recent call last):
  File "/galaxy-central/lib/galaxy/jobs/runners/__init__.py", line 170, in prepare_job
    job_wrapper.prepare()
  File "/galaxy-central/lib/galaxy/jobs/__init__.py", line 971, in prepare
    self.dependency_shell_commands = self.tool.build_dependency_shell_commands(job_directory=self.working_directory)
  File "/galaxy-central/lib/galaxy/tools/__init__.py", line 1436, in build_dependency_shell_commands
    tool_instance=self
  File "/galaxy-central/lib/galaxy/tools/deps/__init__.py", line 112, in dependency_shell_commands
    return [dependency.shell_commands(requirement) for requirement, dependency in requirement_to_dependency.items()]
  File "/galaxy-central/lib/galaxy/tools/deps/resolvers/conda.py", line 392, in shell_commands
    self.build_environment()
  File "/galaxy-central/lib/galaxy/tools/deps/resolvers/conda.py", line 387, in build_environment
    raise DependencyException("Conda dependency seemingly installed but failed to build job environment.")

I'm trying to go through the code to figure out where to throw in a log.warning() line, but since this is such a sporadic thing to begin with...

dpryan79 on 12 May 2017

You should have this block:

    def exec_command(self, operation, args):
        command = self.command(operation, args)
        env = {'HOME': self.conda_prefix}  # We don't want to pollute ~/.conda, which may not even be writable
        condarc_override = self.condarc_override
        if condarc_override:
            env["CONDARC"] = condarc_override
        log.debug("Executing command: %s", command)
        try:
            return self.shell_exec(command, env=env)
        except commands.CommandLineException as e:
            log.warning(e)
            return e.returncode

in lib/galaxy/tools/deps/conda_util.py, though which all conda commands are passing. Is there anything before this traceback?

mvdbeek on 12 May 2017

The line in the log before the most recent instance of this is galaxy.tools.deps.conda_util WARNING 2017-05-12 11:31:01,392 Executing command: /data/galaxy/tool_deps/_conda/bin/conda clean --tarballs -y. I put in a log.warning rather than a log.debug so I wouldn't need to change the log level. There was another instance of that same command 4 minutes earlier.

dpryan79 on 12 May 2017

Can you try this commit?
https://github.com/mvdbeek/galaxy/commit/3560f2c174d7cfe3720119cb2bbc9fc32895465d
It is based on 17.01, so it should apply cleanly.

That should at least give us the stdout/stderr of the command that is failing.

mvdbeek on 12 May 2017

Sure, we'll see how long it takes for the code to get triggered :)

dpryan79 on 12 May 2017

Was this page helpful?

0 / 5 - 0 ratings