Nextflow: Problems with bash configuration in command.run

Created on 8 Mar 2017 · 23Comments · Source: nextflow-io/nextflow

Related to https://github.com/nextflow-io/nextflow/issues/286 - I keep hitting the same problem, with nextflow throwing various 'unbound variable' errors. Our cluster setup has a bunch of unbound variables spread across multiple scripts on many machines; I started to try to fix them up with our sysadmin but in the end just changed process.shell as suggested:

$ cat ~/.nextflow/config
process.shell = ['/bin/bash','-e']

This works, with my .command.sh scripts now starting like this:

$ head -n1 work/5a/c18c14383839ee788020fec00b40c9/.command.sh
#!/bin/bash -e

But now I'm having problems with directives. For example, I'm trying to run khmer, which requires a python virtualenv:

    beforeScript "source /biol/programs/khmer/khmerEnv/bin/activate"

This fails:

Command wrapper:
  /etc/bashrc: line 81: PS1: unbound variable
  /biol/programs/khmer/khmerEnv/bin/activate: line 57: PS1: unbound variable

The top of the .command.run script looks like this:

$ cat work/5a/c18c14383839ee788020fec00b40c9/.command.run
#!/bin/bash
# NEXTFLOW TASK: qualityFilterPE (11_AGCCTT)
set -e
set -u

The /etc/bashrc error is what I was seeing before changing the process.shell configuration. It doesn't happen if I remove the beforeScript directive, but I'm not sure why it's coming up again - does nextflow create a subshell to run source?

But the real problem is the use of PS1 in activate, which is generated automatically by python virtualenv. So fixing this would require changing how virtualenv works. Arguably this is what should happen, but...

Is there something else I can configure to get around this - preventing nextflow using set -u in .command.run for example? (I'd prefer not to have to set PS1 to empty for non-interactive use if I can avoid it.)

Source

johnomics

Most helpful comment

Just for reference for the next person that finds this, here's the complete code snippet I used in my Nextflow process to fix the original issue (based on the discussion here and in the linked thread)

script:
"""
export PS=\${PS:-''} 
export PS1=\${PS1:-''}
source venv/bin/activate

"""

stevekm on 21 Feb 2018

👍2

All 23 comments

Yes, NF creates a sub-shell to run the task, but is happening because the beforeScript is sourced in the wrapper script ie. .command.run.

You can try to put the source /biol/programs/khmer/khmerEnv/bin/activate on top of your command script instead of using beforeScript.

Let me know if this solve the issue.

pditommaso on 8 Mar 2017

Thanks - OK, sourcing activate in the script works. However, I just noticed that I'm also getting the /etc/bashrc error at the start of every .command.log:

$ head -n1 */*/.command.log
==> 29/e680093a539c9c8c8d1c5e77d47979/.command.log <==
/etc/bashrc: line 81: PS1: unbound variable

==> 2a/3f952765e2d14d1a2b9be37f41e109/.command.log <==
/etc/bashrc: line 81: PS1: unbound variable

==> 80/7bab72003988b1081950e8b7c0739f/.command.log <==
/etc/bashrc: line 81: PS1: unbound variable

==> af/1170db1f75c32572832e3eea5cb02f/.command.log <==
/etc/bashrc: line 81: PS1: unbound variable

==> bf/434ee832334b80cb2e64164b64568d/.command.log <==
/etc/bashrc: line 81: PS1: unbound variable

==> dc/42900b7ff92c01b985f2c9d74b5a9d/.command.log <==
/etc/bashrc: line 81: PS1: unbound variable

==> df/5e540274fe100da998976c059c6d27/.command.log <==
/etc/bashrc: line 81: PS1: unbound variable

==> e2/2435f69b5a6ed6c098606ab6079a73/.command.log <==
/etc/bashrc: line 81: PS1: unbound variable

All the other output files appear to be fine and the tasks complete successfully, but I'd prefer to fix this if possible. Is there a way to get around these messages?

johnomics on 8 Mar 2017

I fear you need to define it to an empty value. It's a very bad practice to use unbound variables.

pditommaso on 8 Mar 2017

OK, fair enough - but just out of interest, what creates the .command.log file? What is starting the subshell, separate from the definition of process.shell?

johnomics on 8 Mar 2017

The .command.log is created by the top NF process that launches .command.run, that in turns launches .command.sh (when using the local executor).

If you are using a batch scheduler, it is created by the latter.

pditommaso on 8 Mar 2017

I'm closing this issue because there's no more feedback. Feel free to comment/reopen if needed.

pditommaso on 14 Mar 2017

So I've been getting similar errors. Specifically, commands that fail always have this line in their .command.log file:

"/n/sw/fasrcsw/apps/lmod/lmod/init/bash: line 87: PS1: unbound variable"

I've tried putting exportPS1="" ; before the command in NF's script section, but that doesn't seem to have done the trick either. Any ideas?

tantrev on 12 Apr 2017

👍1

Can you change into work dir of the failing task and execute the following command:

bash -x .command.run

Then include here the printed output.

pditommaso on 12 Apr 2017

Of course. When I run that command, it initially kicks this out:

+ LMOD_PKG=/n/sw/fasrcsw/apps/lmod/lmod
+ LMOD_DIR=/n/sw/fasrcsw/apps/lmod/lmod/libexec/
+ LMOD_CMD=/n/sw/fasrcsw/apps/lmod/lmod/libexec/lmod
+ export LMOD_PKG
+ export LMOD_CMD
+ export LMOD_DIR
+ '[' : '!=' : ']'
+ '[' '' ']'
+ '[' 4 -ge 3 ']'
+ '[' -r /n/sw/fasrcsw/apps/lmod/lmod/init/lmod_bash_completions ']'
+ . /n/sw/fasrcsw/apps/lmod/lmod/init/lmod_bash_completions
++ complete -F _module module
++ complete -F _ml ml
+ set -e
+ set -u
+ NXF_DEBUG=0
+ [[ 0 > 1 ]]
+ trap on_exit EXIT
+ trap on_term TERM INT USR1 USR2
+ NXF_SCRATCH=
+ [[ 0 > 0 ]]
+ touch /n/home04/tantrev/pure/work/32/49f154e88485c954a21e0c52e767bc/.command.begin
+ '[' -f /n/home04/tantrev/pure/work/32/49f154e88485c954a21e0c52e767bc/.command.env ']'
+ [[ -n '' ]]
+ rm -f 262598492008449892556870034539879592913.3.stderr
+ rm -f 262598492008449892556870034539879592913.3.orcaout
+ ln -s /n/home04/tantrev/pure/calculations/orca/output/ground_state/262598492008449892556870034539879592913.3/262598492008449892556870034539879592913.3.stderr 262598492008449892556870034539879592913.3.stderr
+ ln -s /n/home04/tantrev/pure/calculations/orca/output/ground_state/262598492008449892556870034539879592913.3/262598492008449892556870034539879592913.3.orcaout 262598492008449892556870034539879592913.3.orcaout
+ set +e
+ COUT=/n/home04/tantrev/pure/work/32/49f154e88485c954a21e0c52e767bc/.command.po
+ mkfifo /n/home04/tantrev/pure/work/32/49f154e88485c954a21e0c52e767bc/.command.po
+ CERR=/n/home04/tantrev/pure/work/32/49f154e88485c954a21e0c52e767bc/.command.pe
+ mkfifo /n/home04/tantrev/pure/work/32/49f154e88485c954a21e0c52e767bc/.command.pe
+ tee1=5617
+ tee2=5618
+ pid=5619
+ wait 5619
+ tee .command.err
+ tee .command.out
+ /bin/bash /n/home04/tantrev/pure/work/32/49f154e88485c954a21e0c52e767bc/.command.run.1
/n/sw/fasrcsw/apps/lmod/lmod/init/bash: line 87: PS1: unbound variable

And then when I CTRL+C, it kicks this out.

tantrev on 12 Apr 2017

What is invoking /n/sw/fasrcsw/apps/lmod/lmod/init/bash? Is it in your ~/.bashrc or ~/.profile file ?

pditommaso on 12 Apr 2017

I have no idea. When I try running which bash in my regular user account, I get back /bin/bash. I can't find anything in my .bashrc or .bash_profile that would obviously be the culprit.

My .bashrc file is as follows:

# .bashrc

# Source global definitions
if [ -f /etc/bashrc ]; then
        . /etc/bashrc
fi

# Shell settings
export HISTCONTROL=ignoredups

# Editor settings
export EDITOR=vi
export VISUAL=vi

# Limits
ulimit -c 0
ulimit -s unlimited

source new-modules.sh
module load gcc/6.1.0-fasrc01 openmpi/2.0.2.40dc0399-fasrc01
export QCSCRATCH=${HOME}/scratch
export PATH=/n/home04/tantrev/pure/orca_4_0_0_2_linux_x86-64:$PATH
export LD_LIBRARY_PATH=/n/sw/terachem-1.9/TeraChem/lib:$LD_LIBRARY_PATH
module load cuda/6.5-fasrc02
export QCPLATFORM=LINUX_Ix86_64
export QCSCRATCH=/scratch/tantrev
export QCMPI=mpich
export QCRSH=ssh
export QCFILEPREF=$QCLOCALSCR
export PATH="/usr/bin:$PATH"

And my .bash_profile is:

# .bash_profile

# Get the aliases and functions
if [ -f ~/.bashrc ]; then
        . ~/.bashrc
fi

# User specific environment and startup programs

PATH=$PATH:$HOME/bin

export PATH

tantrev on 12 Apr 2017

Can you try to edit the file .command.run and modify the line

/bin/bash /n/home04/tantrev/pure/work/32/49f154e88485c954a21e0c52e767bc/.command.run.1

with

/bin/bash --norc /n/home04/tantrev/pure/work/32/49f154e88485c954a21e0c52e767bc/.command.run.1

then execute as before bash -x .command.run

pditommaso on 12 Apr 2017

Sure thing.

Here's the immediate output after the modification:

+ LMOD_PKG=/n/sw/fasrcsw/apps/lmod/lmod
+ LMOD_DIR=/n/sw/fasrcsw/apps/lmod/lmod/libexec/
+ LMOD_CMD=/n/sw/fasrcsw/apps/lmod/lmod/libexec/lmod
+ export LMOD_PKG
+ export LMOD_CMD
+ export LMOD_DIR
+ '[' : '!=' : ']'
+ '[' '' ']'
+ '[' 4 -ge 3 ']'
+ '[' -r /n/sw/fasrcsw/apps/lmod/lmod/init/lmod_bash_completions ']'
+ . /n/sw/fasrcsw/apps/lmod/lmod/init/lmod_bash_completions
++ complete -F _module module
++ complete -F _ml ml
+ set -e
+ set -u
+ NXF_DEBUG=0
+ [[ 0 > 1 ]]
+ trap on_exit EXIT
+ trap on_term TERM INT USR1 USR2
+ NXF_SCRATCH=
+ [[ 0 > 0 ]]
+ touch /n/home04/tantrev/pure/work/32/49f154e88485c954a21e0c52e767bc/.command.begin
+ '[' -f /n/home04/tantrev/pure/work/32/49f154e88485c954a21e0c52e767bc/.command.env ']'
+ [[ -n '' ]]
+ rm -f 262598492008449892556870034539879592913.3.stderr
+ rm -f 262598492008449892556870034539879592913.3.orcaout
+ ln -s /n/home04/tantrev/pure/calculations/orca/output/ground_state/262598492008449892556870034539879592913.3/262598492008449892556870034539879592913.3.stderr 262598492008449892556870034539879592913.3.stderr
+ ln -s /n/home04/tantrev/pure/calculations/orca/output/ground_state/262598492008449892556870034539879592913.3/262598492008449892556870034539879592913.3.orcaout 262598492008449892556870034539879592913.3.orcaout
+ set +e
+ COUT=/n/home04/tantrev/pure/work/32/49f154e88485c954a21e0c52e767bc/.command.po
+ mkfifo /n/home04/tantrev/pure/work/32/49f154e88485c954a21e0c52e767bc/.command.po
+ CERR=/n/home04/tantrev/pure/work/32/49f154e88485c954a21e0c52e767bc/.command.pe
+ mkfifo /n/home04/tantrev/pure/work/32/49f154e88485c954a21e0c52e767bc/.command.pe
+ tee1=27231
+ tee2=27232
+ pid=27233
+ wait 27233
+ tee .command.out
+ tee .command.err
+ /bin/bash --norc /n/home04/tantrev/pure/work/32/49f154e88485c954a21e0c52e767bc/.command.run.1
/n/sw/fasrcsw/apps/lmod/lmod/init/bash: line 87: PS1: unbound variable

And here's the output after CTRL+C.

tantrev on 12 Apr 2017

Ok, no difference. However I think it can be solved adding the following entry in the nextflow.config file:

env.PS1=''

pditommaso on 12 Apr 2017

👍2

Thank you! I'll try that and get back to you soon. Sometimes it takes a little bit for the error to manifest itself.

tantrev on 12 Apr 2017

So I'm afraid I'm still getting errors. Here's what the .command.log file is still saying, after the NF config modification:

/n/sw/fasrcsw/apps/lmod/lmod/init/bash: line 87: PS1: unbound variable

The odd thing is that it only happens with some jobs, not all of them...

tantrev on 12 Apr 2017

Are you using a cluster or a local execution ?

pditommaso on 13 Apr 2017

Just a SLURM cluster through "fasrc".

tantrev on 13 Apr 2017

The fact that you are getting this problem only for some jobs suggests me that there's something odd (the missing PS1 variable) only for certain nodes in the cluster.

I would suggest to try to ask for help to your sysadmins and post here any progress on this issue.

pditommaso on 13 Apr 2017

Turns out I just had a faulty script that was hanging - the PS1 error wasn't actually affecting anything. Sorry for the confusion, thanks for all your help!

tantrev on 13 Apr 2017

So, unfortunately, it turns out there's still a problem. For some reason, even when jobs execute just fine, NF is reporting them as having "FAILED". But the expected output is produced and the ".exitcode" is zero. The PS1 error is the only error in the logfiles, however. Is it possible this PS1 error is the root cause of such behavior?

tantrev on 14 Apr 2017

It looks a different problem. Open a new issue reporting the NF stdout and .nextflow.log and eventually the code causing the problem.

pditommaso on 17 Apr 2017

script:
"""
export PS=\${PS:-''} 
export PS1=\${PS1:-''}
source venv/bin/activate

"""

stevekm on 21 Feb 2018

👍2

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Add doi field to manifest attributes

MaxUlysse · 3Comments

Error when parsing params starting with - or -- in quote

MaxUlysse · 3Comments

Error when trying to use grep in Nextflow

Z-Zen · 5Comments

Feature request: operator to save a channel to a file

lindenb · 3Comments

splitCsv does not handle quoted values containing commas correctly

wflynny · 6Comments