Minikube: GPU support

Created on 24 Oct 2017 · 15Comments · Source: kubernetes/minikube

Is this a BUG REPORT or FEATURE REQUEST? (choose one):

FEATURE REQUEST

*Description *:

It would be really great to run GPU workloads on minikube.

I successfully ran GPU workload on GKE using instructions from https://docs.google.com/document/d/1hYOqaOVSu68ZaUsmCKwyP6kf6UtlTMiE_hxoJ2uUqvs/edit# . I was looking to replicate this in minikube.

Example pod that successfully runs GPU workload on GKE:
~~
apiVersion: v1
kind: Pod
metadata:
name: gpu-container
spec:
volumes:
- name: nvidia-libraries
hostPath:
path: /home/kubernetes/bin/nvidia/lib
containers:
- name: gpu-container
image: mxnet/python:gpu
args:
- python
- -c
- "import mxnet as mx; a = mx.nd.ones((2, 3), mx.gpu()); b = a * 2 + 1; print b.asnumpy()"
resources:
limits:
alpha.kubernetes.io/nvidia-gpu: 1
volumeMounts:
- name: nvidia-libraries
mountPath: /usr/local/nvidia/lib64
~~

Expected output: [[ 3. 3. 3.] [ 3. 3. 3.]]

I was looking to replicate this workflow within minikube. I have correct GPU local setup that runs the image in nvidia-docker.

I installed and started local minikube with:
~~~~
wget https://storage.googleapis.com/minikube-builds/2050/minikube-linux-amd64 && mv minikube-linux-amd64 /usr/bin/minikube && chmod +x /usr/bin/minikube
curl -Lo kubectl https://storage.googleapis.com/kubernetes-release/release/v1.8.0/bin/linux/amd64/kubectl && chmod +x kubectl
sudo gsutil cp gs://minikube/k8sReleases/v1.8.0/localkube-linux-amd64 /usr/local/bin/localkube && chmod +x localkube

export MINIKUBE_WANTUPDATENOTIFICATION=false
export MINIKUBE_WANTREPORTERRORPROMPT=false
export MINIKUBE_HOME=$HOME
export CHANGE_MINIKUBE_NONE_USER=true
mkdir $HOME/.kube || true
touch $HOME/.kube/config
export KUBECONFIG=$HOME/.kube/config
sudo -E minikube start --vm-driver=none
~~~~

I copied all required cuda and nvidia libraries into local host dirtectory /home/kubernetes/bin/nvidia/lib

I added GPU node capacity:
~~
kubectl proxy
curl --header "Content-Type: application/json-patch+json" \
--request PATCH \
--data '[{"op": "add", "path": "/status/capacity/alpha.kubernetes.io~1nvidia-gpu", "value": "1"}]' \
http://127.0.0.1:8001/api/v1/nodes/kozikowpc/status
~~

Yet when I start the same pod as on GKE I get pod status "CreateContainerConfigError" and event kubelet, kozikowpc Error: GPUs are not supported. I've seen some code for GPU support in minikube: https://github.com/kubernetes/minikube/blob/master/vendor/k8s.io/kubernetes/pkg/kubelet/gpu/nvidia/nvidia_gpu_manager.go . Is there anything I am doing wrong?

kinfeature

Source

kozikow

👍2

Most helpful comment

Here's my setup in case it's valuable for the continuation of this RFE

minikube version: v0.26.1
Kubernetes version being created: 1.10

Starting minikube:
minikube start --feature-gates=DevicePlugins=true --vm-driver none --feature-gates=Accelerators=true

Device plugin being installed
$ kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v1.9/nvidia-device-plugin.yml

Querying the node to see if it sees the GPU
kubectl get nodes -o=custom-columns=NAME:.metadata.name,GPUs:.status.capacity.'nvidia\.com/gpu'
NAME GPUs
minikube

If I understand things correctly the --vm-driver none leverages the existing docker runtime on the host to which I have set to nvidia-docker.

However, no matter what I do, I can't seem to get the node to recognize the GPU as an available resource. I know this isn't officially supported yet :) but I thought I'd contribute my env to help with the progression.

edit
figured it out, I was using the 1.9 nvidia device plugin rather than the 1.10. Once I changed those out, the node was recognizing the GPU.

Nick-Harvey on 25 Apr 2018

👍9 🎉2

All 15 comments

@kozikow Have you enabled the feature gate Accelerators=true? Not sure if thats still required but a google search returned that.

r2d4 on 25 Oct 2017

After adding "--feature-gates=Accelerators=true" to minikube the container starts, but I get cuda libraries errors: https://gist.github.com/kozikow/be44083d4812c554d84271edf01853aa . Same workflow succeeds on GKE or in nvidia-docker .

For reference, other gpu workflow succeeds with similar pod configuration:

Image: gcr.io/tensorflow/tensorflow:latest-gpu
Code:
~~~~
import tensorflow as tf
with tf.device('/gpu:0'):
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)

with tf.Session() as sess:
print (sess.run(c))
~~~~

I suspected that there is a cuda library mismatch between my host machine and container. However, it successfully starts in nvidia-docker. Is there any magic that nvidia-docker is doing that I am missing?

kozikow on 25 Oct 2017

We do not support this use case yet because it wasn't clear if minikube is used for spinning up k8s clusters on linux hosts. On hosts where minikube spins up a VM it is harder to consume GPUs since it requires isolating and attaching extra GPUs on the host to minikube VM.
@dlorenc is vm-driver=none supported officially by minikube? Would it make sense to use kubeadm instead?

vishh on 25 Oct 2017

@vishh We already have an option to use kubeadm. The "none" driver is officially supported for localkube and possibly soon the kubeadm bootstrapper.

The "none" driver runs the cluster directly on the host without a VM.

r2d4 on 25 Oct 2017

Is there a recommended way of local testing of GPU workloads? minikube with --vm-driver=none --feature-gates=Accelerators=true gets pretty close to achieving this task - some GPU containers are running successfully.

I suspect the only missing link is some CUDA library trickery that GKE or nvidia-docker are doing. I have been reading the code of nvidia-docker ( https://github.com/NVIDIA/nvidia-docker ) or GKE GPU installer ( https://github.com/ContainerEngine/accelerators/tree/master/cos-nvidia-gpu-installer ), but I didn't find anything yet.

kozikow on 26 Oct 2017

Same problem here. Would be great if you could advise how to solve this problem.

sebastianlach on 7 Nov 2017

👍3

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 7 Feb 2018

/remove-lifecycle stale

sebastianlach on 13 Feb 2018

FWIW I described our current setup for developing GPU containers on kubernetes in https://tensorflight.blog/2018/02/23/dev-environment-for-gke/ . Please let me know if minikube gets GPU support or there is any other way.

kozikow on 23 Feb 2018

👍1

I too would like to see this happen :grinning:

Nick-Harvey on 24 Apr 2018

FYI: This feature is being tackled via the ML Working Group.

vishh on 24 Apr 2018

Here's my setup in case it's valuable for the continuation of this RFE

minikube version: v0.26.1
Kubernetes version being created: 1.10

Starting minikube:
minikube start --feature-gates=DevicePlugins=true --vm-driver none --feature-gates=Accelerators=true

Device plugin being installed
$ kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v1.9/nvidia-device-plugin.yml

Querying the node to see if it sees the GPU
kubectl get nodes -o=custom-columns=NAME:.metadata.name,GPUs:.status.capacity.'nvidia\.com/gpu'
NAME GPUs
minikube

If I understand things correctly the --vm-driver none leverages the existing docker runtime on the host to which I have set to nvidia-docker.

edit
figured it out, I was using the 1.9 nvidia device plugin rather than the 1.10. Once I changed those out, the node was recognizing the GPU.

Nick-Harvey on 25 Apr 2018

👍9 🎉2

/assign

mindprince on 20 Jun 2018

@Nick-Harvey Didn't work for minikube v0.28.0

aclowkey on 26 Jun 2018

Hello,
I have a PR that adds GPU support to minikube #2936. It would be really helpful if people on this thread try it out. The instructions are in the PR.
Thank you!

mindprince on 27 Jun 2018

Was this page helpful?

0 / 5 - 0 ratings