Currently NF interacts with a Kubernetes by using the kubectl command. This prevents to deploy NF itself as Pod.
This can be solved by allowing NF to access the Kubernetes API server via https connection. Interesting facts:
kubernetes DNS name automatically defined in each Pod;/var/run/secrets/kubernetes.io/serviceaccount/token; /var/run/secrets/kubernetes.io/serviceaccount/ca.crt; /var/run/secrets/kubernetes.io/serviceaccount/namespace in each container. Also the following variables are defined in container environment:
KUBERNETES_PORT=tcp://10.0.0.1:443
KUBERNETES_PORT_443_TCP=tcp://10.0.0.1:443
KUBERNETES_PORT_443_TCP_ADDR=10.0.0.1
KUBERNETES_PORT_443_TCP_PORT=443
KUBERNETES_PORT_443_TCP_PROTO=tcp
KUBERNETES_SERVICE_HOST=10.0.0.1
KUBERNETES_SERVICE_PORT=443
KUBERNETES_SERVICE_PORT_HTTPS=443
More details here.
Just had a detailed discussion with @skptic on this. The suggestions from my side is that the following are highly desirable:
Happy to help with planning and testing for this.
Hi Tim,
Hi Paolo, I've implemented code in Squonk to launch Pods/Jobs from within the OpenShift environment using the fabric8.kubernetes.client. I can share my experiences if that would be of interest.
I've also done a little stress-testing by launching hundreds of concurrent competing Pods where I control the memory they allocate and how _busy_ they are by burning up CPU cycles using a small Python-based Docker image.
- OK. This uses the REST API?
Yes, I will commit on separate branch to work on that
- Extends the K8S API. In our case we're just scheduling a Pod so it should be an identical process between OS and K8S.
Exactly
- OK. I created #531
OK
- Yes, but it's the K8S scheduler that knows how best to schedule, not NF
I see. It could be tricky. We need to see in practice how to manage this.
@alanbchristie it could be useful, do you have some free cycles to contribute/test the NF-K8S integration ?
I've just commit the a basic k8s client. See the KubeClient. Now the problem is to use it in the place of the kubectl command in the KubernetesExecutor
@tdudgeon Let's chat tomorrow morning about NF-K8S testing/integration and see what needs to be done, when and what environment's needed.
Hi guys, I will be very happy to test everything related to k8s integration for Nextflow. Please let me know what's needed.
I have also just created this issue #549, do you have any suggestions on it?
Many thanks!
Vlad
Hi Vlad, any contribution on the Kubernetes support is more than welcome but maybe Friday night is not the best time to discuss about that :) We can catch up early next week if you agree.
Sure Paolo, thanks for you reply anyway! ;-) I will be in touch next week.
@wikiselev the goal of this issue is to enhance the kubernetes executor so that it uses the kubernetes API instead of relying on the kubectl command. This will allow a much greater flexibility on kubernetes deployment with NF.
In practical terms I've already implemented a client skeleton. What is missing is to replace the use of kubectl with the this client and run the proper tests.
It required some head scratches but finally there's a shiny new built-in support for Kubernetes that greatly improves the previous implementation and provides a much more flexible integration with Nextflow.
run command, it only requires an extra cmd option. 1) Declare in your nextflow.config file one or more persistent volume claims adding a snippet similar to the following one:
k8s {
volumeClaims {
'vol-pvc' { mountPath = '/nextflow' }
}
}
Replace vol-pvc with the name of your persistent volume. See the K8s documentation for details how to configure persistent volume claims in your cluster.
2) Launch a workflow execution (using the latest snapshot) using the -with-k8s command line option, eg
NXF_VER=0.28.0-SNAPSHOT nextflow run rnaseq-nf -with-k8s
Nextflow creates and submit the execution of a pod in the K8s cluster that will act as the workflow driver application. Then this pod creates and schedules a worker pod for each task that is executed by your pipeline.
3) Once submitted the workflow execution, the foreground nextflow instance waits for the application to start and then prints the workflow output log, making easier for the user to follow the execution.
There are still some details to polish, such as:
However before to continue the implementation I need from the people interested in this feature a feedback with their comments, problems and proposed improvements.
Hi Paolo, many thanks for the update! This week and maybe next week are a bit crazy for me, but I will get to this asap and will provide you with the feedback. Thanks again for all your cool work. Vlad.
Hi Paolo, do you an example of a working persistent volume claim? I tried the one from the kubernetes official page:
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: myclaim
spec:
accessModes:
- ReadWriteOnce
volumeMode: Filesystem
resources:
requests:
storage: 8Gi
storageClassName: slow
selector:
matchLabels:
release: "stable"
matchExpressions:
- {key: environment, operator: In, values: [dev]}
However, it looks like storageClassName: slow is not recognised by my system... At the moment testing on a local kubernetes cluster (the one from Docker you mentioned before, thanks btw!), however will also do on OpenStack pretty soon.
Hi Vlad, for local testing I've used a pv/c definition like the one below:
apiVersion: v1
kind: PersistentVolume
metadata:
name: vol-local
spec:
capacity:
storage: 5Gi
accessModes:
- ReadWriteMany
storageClassName: shared
hostPath:
path: /Users/pditommaso/Sites
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: vol-pvc
spec:
accessModes:
- ReadWriteMany
storageClassName: shared
resources:
requests:
storage: 1Gi
Thanks Paolo, the claim worked for me!
What I've done:
rnaseq-nf pipelinenextflow.config:k8s {
volumeClaims {
'vol-pvc' { mountPath = '/Users/vk6/k8s-pers-vol' }
}
}
cded to the rnaseq-nf folder and run this:NXF_VER=0.28.0-SNAPSHOT nextflow run main.nf -with-k8s
Nextflow complains:
Not a valid project name: main.nf
The .nextflow.log:
Feb-19 11:33:33.966 [main] DEBUG nextflow.cli.Launcher - $> /Users/vk6/bin/nextflow run main.nf -with-k8s
Feb-19 11:33:34.139 [main] DEBUG nextflow.cli.Launcher - Operation aborted
nextflow.exception.AbortOperationException: Not a valid project name: main.nf
at nextflow.scm.AssetManager.resolveName(AssetManager.groovy:246)
at nextflow.scm.AssetManager.build(AssetManager.groovy:128)
at nextflow.scm.AssetManager.<init>(AssetManager.groovy:112)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:488)
at org.codehaus.groovy.reflection.CachedConstructor.invoke(CachedConstructor.java:83)
at org.codehaus.groovy.runtime.callsite.ConstructorSite$ConstructorSiteNoUnwrapNoCoerce.callConstructor(ConstructorSite.java:105)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallConstructor(CallSiteArray.java:60)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callConstructor(AbstractCallSite.java:235)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callConstructor(AbstractCallSite.java:255)
at nextflow.k8s.K8sDriverLauncher.makeConfig(K8sDriverLauncher.groovy:177)
at nextflow.k8s.K8sDriverLauncher.run(K8sDriverLauncher.groovy:116)
at nextflow.cli.CmdRun.run(CmdRun.groovy:201)
at nextflow.cli.Launcher.run(Launcher.groovy:427)
at nextflow.cli.Launcher.main(Launcher.groovy:581)
Is there anything I am missing?
Don't forget that when using K8s the pipeline is going to run inside the kubernetes, therefore it cannot run a local script. Currently it needs to pull a pipeline project from GitHub.
If you use this command it should work:
NXF_VER=0.28.0-SNAPSHOT nextflow run rnaseq-nf -with-k8s
Thanks Paolo, ok, I've forked the rnaseq-nf repo, added the k8s configuration and pushed it back to my group cellgeni GitHub. Then run:
NXF_VER=0.28.0-SNAPSHOT nextflow run cellgeni/rnaseq-nf -with-k8s
and everything worked like a charm! Really cool stuff. Thanks a lot for working on this!
Is there anything else I can test locally while we are still working on k8s implementation on OpenStack?
Well, the main point is to understand how it works in a real production scenario and to smooth the deployment based on your feedback.
Ok, will be back soon.
Nice. I'm going to uploaded an updated snapshot tomorrow and some K8s documentation during the week.
Just committed some improvements here. Now there's a specific command to launch the execution in a k8s cluster. For example:
nextflow kuberun <pipeline-name> -v vol-claim:/mount/path
The <pipeline-name> argument can be project hosted in Git repository or the absolute path in the K8s cluster of an already deployed project.
The -v command line option allows the specification of the volume claim and the mount path.
BONUS: specifying login as pipeline name, it launches a Bash interactive session to login in the K8s preconfigured pod.
More details here.
Most helpful comment
It required some head scratches but finally there's a shiny new built-in support for Kubernetes that greatly improves the previous implementation and provides a much more flexible integration with Nextflow.
What it includes
runcommand, it only requires an extra cmd option.Caveats
How does it work
1) Declare in your
nextflow.configfile one or more persistent volume claims adding a snippet similar to the following one:Replace
vol-pvcwith the name of your persistent volume. See the K8s documentation for details how to configure persistent volume claims in your cluster.2) Launch a workflow execution (using the latest snapshot) using the
-with-k8scommand line option, egNextflow creates and submit the execution of a pod in the K8s cluster that will act as the workflow driver application. Then this pod creates and schedules a worker pod for each task that is executed by your pipeline.
3) Once submitted the workflow execution, the foreground nextflow instance waits for the application to start and then prints the workflow output log, making easier for the user to follow the execution.
What to do next
There are still some details to polish, such as:
However before to continue the implementation I need from the people interested in this feature a feedback with their comments, problems and proposed improvements.