Is this a BUG REPORT or FEATURE REQUEST? (choose one):
FEATURE REQUEST
*Description *:
It would be really great to run GPU workloads on minikube.
I successfully ran GPU workload on GKE using instructions from https://docs.google.com/document/d/1hYOqaOVSu68ZaUsmCKwyP6kf6UtlTMiE_hxoJ2uUqvs/edit# . I was looking to replicate this in minikube.
Example pod that successfully runs GPU workload on GKE:
~~~
apiVersion: v1
kind: Pod
metadata:
name: gpu-container
spec:
volumes:
- name: nvidia-libraries
hostPath:
path: /home/kubernetes/bin/nvidia/lib
containers:
- name: gpu-container
image: mxnet/python:gpu
args:
- python
- -c
- "import mxnet as mx; a = mx.nd.ones((2, 3), mx.gpu()); b = a * 2 + 1; print b.asnumpy()"
resources:
limits:
alpha.kubernetes.io/nvidia-gpu: 1
volumeMounts:
- name: nvidia-libraries
mountPath: /usr/local/nvidia/lib64
~
Expected output: [[ 3. 3. 3.] [ 3. 3. 3.]]
I was looking to replicate this workflow within minikube. I have correct GPU local setup that runs the image in nvidia-docker.
I installed and started local minikube with:
~~~~
wget https://storage.googleapis.com/minikube-builds/2050/minikube-linux-amd64 && mv minikube-linux-amd64 /usr/bin/minikube && chmod +x /usr/bin/minikube
curl -Lo kubectl https://storage.googleapis.com/kubernetes-release/release/v1.8.0/bin/linux/amd64/kubectl && chmod +x kubectl
sudo gsutil cp gs://minikube/k8sReleases/v1.8.0/localkube-linux-amd64 /usr/local/bin/localkube && chmod +x localkube
export MINIKUBE_WANTUPDATENOTIFICATION=false
export MINIKUBE_WANTREPORTERRORPROMPT=false
export MINIKUBE_HOME=$HOME
export CHANGE_MINIKUBE_NONE_USER=true
mkdir $HOME/.kube || true
touch $HOME/.kube/config
export KUBECONFIG=$HOME/.kube/config
sudo -E minikube start --vm-driver=none
~~~~
I copied all required cuda and nvidia libraries into local host dirtectory /home/kubernetes/bin/nvidia/lib
I added GPU node capacity:
~~~
kubectl proxy
curl --header "Content-Type: application/json-patch+json" \
--request PATCH \
--data '[{"op": "add", "path": "/status/capacity/alpha.kubernetes.io~1nvidia-gpu", "value": "1"}]' \
http://127.0.0.1:8001/api/v1/nodes/kozikowpc/status
~
Yet when I start the same pod as on GKE I get pod status "CreateContainerConfigError" and event kubelet, kozikowpc Error: GPUs are not supported. I've seen some code for GPU support in minikube: https://github.com/kubernetes/minikube/blob/master/vendor/k8s.io/kubernetes/pkg/kubelet/gpu/nvidia/nvidia_gpu_manager.go . Is there anything I am doing wrong?
@kozikow Have you enabled the feature gate Accelerators=true? Not sure if thats still required but a google search returned that.
After adding "--feature-gates=Accelerators=true" to minikube the container starts, but I get cuda libraries errors: https://gist.github.com/kozikow/be44083d4812c554d84271edf01853aa . Same workflow succeeds on GKE or in nvidia-docker .
For reference, other gpu workflow succeeds with similar pod configuration:
Image: gcr.io/tensorflow/tensorflow:latest-gpu
Code:
~~~~
import tensorflow as tf
with tf.device('/gpu:0'):
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
with tf.Session() as sess:
print (sess.run(c))
~~~~
I suspected that there is a cuda library mismatch between my host machine and container. However, it successfully starts in nvidia-docker. Is there any magic that nvidia-docker is doing that I am missing?
We do not support this use case yet because it wasn't clear if minikube is used for spinning up k8s clusters on linux hosts. On hosts where minikube spins up a VM it is harder to consume GPUs since it requires isolating and attaching extra GPUs on the host to minikube VM.
@dlorenc is vm-driver=none supported officially by minikube? Would it make sense to use kubeadm instead?
@vishh We already have an option to use kubeadm. The "none" driver is officially supported for localkube and possibly soon the kubeadm bootstrapper.
The "none" driver runs the cluster directly on the host without a VM.
Is there a recommended way of local testing of GPU workloads? minikube with --vm-driver=none --feature-gates=Accelerators=true gets pretty close to achieving this task - some GPU containers are running successfully.
I suspect the only missing link is some CUDA library trickery that GKE or nvidia-docker are doing. I have been reading the code of nvidia-docker ( https://github.com/NVIDIA/nvidia-docker ) or GKE GPU installer ( https://github.com/ContainerEngine/accelerators/tree/master/cos-nvidia-gpu-installer ), but I didn't find anything yet.
Same problem here. Would be great if you could advise how to solve this problem.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
/remove-lifecycle stale
FWIW I described our current setup for developing GPU containers on kubernetes in https://tensorflight.blog/2018/02/23/dev-environment-for-gke/ . Please let me know if minikube gets GPU support or there is any other way.
I too would like to see this happen :grinning:
FYI: This feature is being tackled via the ML Working Group.
Here's my setup in case it's valuable for the continuation of this RFE
minikube version: v0.26.1
Kubernetes version being created: 1.10
Starting minikube:
minikube start --feature-gates=DevicePlugins=true --vm-driver none --feature-gates=Accelerators=true
Device plugin being installed
$ kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v1.9/nvidia-device-plugin.yml
Querying the node to see if it sees the GPU
kubectl get nodes -o=custom-columns=NAME:.metadata.name,GPUs:.status.capacity.'nvidia\.com/gpu'
NAME GPUs
minikube
If I understand things correctly the --vm-driver none leverages the existing docker runtime on the host to which I have set to nvidia-docker.
However, no matter what I do, I can't seem to get the node to recognize the GPU as an available resource. I know this isn't officially supported yet :) but I thought I'd contribute my env to help with the progression.
edit
figured it out, I was using the 1.9 nvidia device plugin rather than the 1.10. Once I changed those out, the node was recognizing the GPU.
/assign
@Nick-Harvey Didn't work for minikube v0.28.0
Hello,
I have a PR that adds GPU support to minikube #2936. It would be really helpful if people on this thread try it out. The instructions are in the PR.
Thank you!
Most helpful comment
Here's my setup in case it's valuable for the continuation of this RFE
minikube version: v0.26.1
Kubernetes version being created: 1.10
Starting minikube:
minikube start --feature-gates=DevicePlugins=true --vm-driver none --feature-gates=Accelerators=trueDevice plugin being installed
$ kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v1.9/nvidia-device-plugin.ymlQuerying the node to see if it sees the GPU
kubectl get nodes -o=custom-columns=NAME:.metadata.name,GPUs:.status.capacity.'nvidia\.com/gpu'NAME GPUs
minikube
If I understand things correctly the
--vm-driver noneleverages the existing docker runtime on the host to which I have set to nvidia-docker.However, no matter what I do, I can't seem to get the node to recognize the GPU as an available resource. I know this isn't officially supported yet :) but I thought I'd contribute my env to help with the progression.
edit
figured it out, I was using the 1.9 nvidia device plugin rather than the 1.10. Once I changed those out, the node was recognizing the GPU.