Since version 0.28.0 nextflow is using /dev/shm to create temporary files. On some of our machines (CentOS 6.6) /dev/shm was not 777, causing nextflow to crash with something like:
mktemp: failed to create directory via template `/dev/shm/nxf.XXXXXXXXXX': Permission denied
mkfifo: cannot create fifo `/.command.out': Permission denied
mkfifo: cannot create fifo `/.command.err’: Permission denied
This error is visible in the .command.log file in the corresponding work directory.
In our case the default permissions were:
$ ls -ld /dev/shm
drwxr-xr-x 2 root root 140 Mar 20 16:08 /dev/shm
Doing chmod 777 /dev/shm fixes the problem (but requires root access).
This is an issue on our cluster too despite perms, CentOS 7.4 with slurm:
$ echo $TMPDIR
/tmp
$ ls -ld /dev/shm
drwxrwxrwt 2 root root 40 Mar 23 18:00 /dev/shm
$ export NXF_EXECUTOR=slurm
$ nextflow run hello
Fails with
N E X T F L O W ~ version 0.28.0
Launching `nextflow-io/hello` [sick_waddington] - revision: d4c9ea84de [master]
[warm up] executor > slurm
[57/017ae0] Submitted process > sayHello (3)
[a4/6bde95] Submitted process > sayHello (2)
[3f/273265] Submitted process > sayHello (1)
[47/4cbb0e] Submitted process > sayHello (4)
ERROR ~ Error executing process > 'sayHello (3)'
Caused by:
Process `sayHello (3)` terminated with an error exit status (1)
Command executed:
echo 'Hello world!'
Command exit status:
1
Command output:
(empty)
Command wrapper:
mktemp: failed to create directory via template ‘/dev/shm/nxf.XXXXXXXXXX’: Permission denied
mkfifo: cannot create fifo ‘/.command.out’: Permission denied
mkfifo: cannot create fifo ‘/.command.err’: Permission denied
/var/spool/slurmd/job15126/slurm_script: line 73: /.command.out: Permission denied
/var/spool/slurmd/job15126/slurm_script: line 69: /.command.err: No such file or directory
/var/spool/slurmd/job15126/slurm_script: line 67: /.command.out: No such file or directory
Work dir:
/home/xxx/tmp/work/57/017ae0ba17e1b4c164076ac366ecfd
Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`
-- Check '.nextflow.log' file for details
WARN: Killing pending tasks (3)
But the local executor works fine
$ export NXF_EXECUTOR=local
$ nextflow run hello
N E X T F L O W ~ version 0.28.0
Launching `nextflow-io/hello` [evil_linnaeus] - revision: d4c9ea84de [master]
[warm up] executor > local
[e6/d4f0df] Submitted process > sayHello (4)
[f1/eb3f37] Submitted process > sayHello (1)
[d2/bc5366] Submitted process > sayHello (2)
[31/50f6af] Submitted process > sayHello (3)
Bonjour world!
Ciao world!
Hola world!
Hello world!
@drejom I guess the problem is the permission for /dev/shm on cluster nodes, instead I think you have check on the login node. Can you confirm that?
I've uploaded a patch that should solve the permissions problem. Could you please give a try using the following command:
NXF_VER=0.28.1-SNAPSHOT nextflow run ...etc
Thanks @pditommaso, that patch did the trick!
$ NXF_VER=0.28.1-SNAPSHOT NXF_EXECUTOR=slurm nextflow run hello
N E X T F L O W ~ version 0.28.1-SNAPSHOT
Launching `nextflow-io/hello` [fervent_swartz] - revision: d4c9ea84de [master]
[warm up] executor > slurm
[48/4e031b] Submitted process > sayHello (4)
[57/a24cdd] Submitted process > sayHello (2)
[21/d8c089] Submitted process > sayHello (3)
[97/9823ab] Submitted process > sayHello (1)
Hola world!
Ciao world!
Hello world!
Bonjour world!
BTW, the perms on /dev/shm are the same for cluster nodes (drwxrwxrwt) as the head node. Seems theres also a RedHat bug-fix relating to /dev/shm - I'll follow up with our cluster admin.
Nice. The patch fallbacks on the local storage if it fails to create that file on /dev/shm, but yes, there's something odd in your cluster config.
Version 0.28.1
Most helpful comment
Version 0.28.1