Nextflow: Host defined env variables are not accessible from the container

Created on 29 Jun 2018  路  14Comments  路  Source: nextflow-io/nextflow

Bug report

As we discuss with @pditommaso and @fmorency on gitter, we have found an issue when we use singularity container with nextflow where the CUDA_VISIBLE_DEVICES environment variable is not given to the container.

Steps to reproduce the problem

Here are the files to reproduce the behavior.

here are the commands run from the main node.

  • First command: nextflow main.nf run -process.executor slurm -with-singularity /singularity/gpu_image.img -profile gpu
  • Second command: nextflow main.nf run -process.executor slurm -profile gpu

Expected behavior and actual behavior

For both command we want the output to be something like this:

0
1
0
1
0

The first command crash by saying:

line 2: CUDA_VISIBLE_DEVICES: unbound variable

or if we add env.CUDA_VISIBLE_DEVICES='\$CUDA_VISIBLE_DEVICES' to nextflow.config it just echo an empty string.

The second command echo the expected output.

Environment

  • Nextflow version: 0.30.2.4867
  • Java version: 1.8.0_171
  • Operating system: Linux
kinbug prmoderate

All 14 comments

Quick question is this a variable assigned by SLURM ?

Yes, CUDA_VISIBLE_DEVICES is assigned by slurm.

I've uploaded a new snapshot that should solve the problem. You can try it using this command:

NXF_VER=0.31.0-SNAPSHOT nextflow run .. etc

Note, the variable must be defined in the config file w/o escaping the dollar ie:

env.CUDA_VISIBLE_DEVICES='$CUDA_VISIBLE_DEVICES'

It works perfectly, and the definition in the nextflow.config of env.CUDA_VISIBLE_DEVICES='$CUDA_VISIBLE_DEVICES' is not needed.
Thanks a lot for the fast fix.

I will keep this open because I'm not sure this is going to be the final solution. Ideally this should be transparent for the user.

Sorry, I was wrong. I forget to add the -with-singularity option when I run my tests...
It still does not work. The output is still empty with env.CUDA_VISIBLE_DEVICES='$CUDA_VISIBLE_DEVICES' and it still crashes with the same error message when I remove it.

runned command:

  • FAIL: NXF_VER=0.31.0-SNAPSHOT nextflow main.nf run -process.executor slurm -profile gpu -with-singularity /containers/gpu.img
  • OK: NXF_VER=0.31.0-SNAPSHOT nextflow main.nf run -process.executor slurm -profile gpu

I've made another snapshot. Update your version with this command

CAPSULE_RESET=1 NXF_VER=0.31.0-SNAPSHOT nextflow info

then run as

NXF_VER=0.31.0-SNAPSHOT nextflow run .. etc

NOTE, the variable definition in the config file is not more required, therefore remove this line

env.CUDA_VISIBLE_DEVICES='$CUDA_VISIBLE_DEVICES'

Thanks, I cannot test it right now, I'll try as soon as I can.

Ok, better. I also need to make other changes, therefore please wait to test it until further notice.

I have tested it this morning and the CUDA_VISIBLE_DEVICES is still empty when I use a singularity container even after

CAPSULE_RESET=1 NXF_VER=0.31.0-SNAPSHOT nextflow info

I'm closing this in favour of #803.

This is solved by #803 adding the following setting in the config file

singularity {
  envWhitelist = 'CUDA_VISIBLE_DEVICES'
}

It requires 0.31.0-SNAPSHOT build 4894. You can update to it using this command:

CAPSULE_RESET=1 NXF_VER=0.31.0-SNAPSHOT nextflow info

Then use NF as usual:

NXF_VER=0.31.0-SNAPSHOT nextflow run .. etc

Thanks I'll try.

It's working like a charm thanks a lot.

Was this page helpful?
0 / 5 - 0 ratings