Nextflow: Kubernetes - Error syncing pod

Created on 8 Dec 2017 · 8Comments · Source: nextflow-io/nextflow

Hi Paolo,

Many thanks to @theobarberbany, we were able to setup a Kubernetes cluster on our Openstack cloud. However, it is quite hacky at the moment - we used Terraform, Ansible and kubespray. Our Systems guys are planning to setup OpenShift quite soon, but at the moment I wanted to test Nextflow with our setup.

So, I cloned the rnaseq-nf pipeline, added process.executor = 'k8s' to nextflow.config and run nextflow run nextflow-io/rnaseq-nf -with-docker. It looked like the pods had started and were doing stuff. However, they all failed as shown in this screenshot:

screen shot 2017-12-08 at 3 44 26 pm

It shows the Error syncing pod. When we looked at the Kubernetes logs there was this message:

3m          3m           1         nxf-699465c1bc64c96c8fe97e545640889b.14fe5c1dced8899d   Pod       
spec.containers{nxf-699465c1bc64c96c8fe97e545640889b}   Warning   Failed                  kubelet, k8s-k8s-node-nf-4   
Error: failed to start container "nxf-699465c1bc64c96c8fe97e545640889b": Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused "chdir to cwd (\"/Users/tb15/Documents/Local_Projects/k8s/rnaseq-nf/work/69/9465c1bc64c96c8fe97e545640889b\") set in config.json failed: no such file or directory"

So it looks like somehow the local folder address was parsed to config.json. So all pods have errored because of this problem, but Nextflow was still waiting for them until I manually killed it with Ctrl+c:

screen shot 2017-12-08 at 3 44 47 pm

Have you or anyone else seen this problem before? Or it maybe related to our hacky k8s installation?

Source

wikiselev

All 8 comments

This is the problem:

oci runtime error: container_linux.go:247: starting container process caused "chdir to cwd (\"/Users/tb15/Documents/Local_Projects/k8s/rnaseq-nf/work/69/9465c1bc64c96c8fe97e545640889b\") set in config.json failed: no such file or directory

The current (experimental) implementation requires a shared files system (NFS similar) available in all nodes where K8S pods are executed. It doesn't seem your case.

The goal of #468 and #446 is to allow the deployment of a NF workload without requiring an external shared file system, but relying on the storage provided by Kubernetes itself.

pditommaso on 8 Dec 2017

@pditommaso Would something like glustrefs / cephfs work?

theobarberbany on 9 Dec 2017

Yes. In principle all of which support ReadWriteMany access mode feature. See table below here.

pditommaso on 11 Dec 2017

👍1

I'm closing this because this behaviour is expected. If you need further help feel free to comment below.

pditommaso on 12 Dec 2017

Hey @pditommaso, I've now got a cluster running kubernetes with glusterfs mounted as a persistentvolume. Is there any way to run a nextflow job not directly from a node (at a shared mount point?) but to use that persistentvolume?

theobarberbany on 13 Dec 2017

👍1

Not yet. This is the goal of #446 .

pditommaso on 13 Dec 2017

🎉1

Out of curiosity, I'm running various jobs on a 3 node k8s cluster (2 minion nodes, 1 master), and i've noticed that all the pod jobs that are submitted are run on only the master node. Is this normal?

theobarberbany on 15 Dec 2017

The jobs execution strategy is managed by Kubernetes, in principle they should distributed in the cluster. However NF is not aware of that.

pditommaso on 15 Dec 2017

👍1

Was this page helpful?

0 / 5 - 0 ratings