Kops: GPU bootstrap method not setting capacity

Created on 4 May 2017  路  34Comments  路  Source: kubernetes/kops

I've followed the docs here

I'm using kops to deploy on AWS.

After updating the cluster, I see a few more lib files under /usr/lib so it seems like the bootstrap container did run.

However, the p2.xlarge instance still doesn't have the capacity set:

        {
            "name": "ip-1-2-3-4.us-west-2.compute.internal",
            "selfLink": "/api/v1/nodesip-1-2-3-4.us-west-2.compute.internal",
            "uid": "xxx",
            "resourceVersion": "104430",
            "creationTimestamp": "2017-05-04T00:19:15Z",
            "labels": {
                "beta.kubernetes.io/arch": "amd64",
                "beta.kubernetes.io/instance-type": "p2.xlarge",
                "beta.kubernetes.io/os": "linux",
                "failure-domain.beta.kubernetes.io/region": "us-west-2",
                "failure-domain.beta.kubernetes.io/zone": "us-west-2c",
                "kubernetes.io/hostname": "ip-1-2-3-4.us-west-2.compute.internal",
                "kubernetes.io/role": "node",
                "node-role.kubernetes.io/node": ""
            },
            "annotations": {
                "node.alpha.kubernetes.io/ttl": "0",
                "volumes.kubernetes.io/controller-managed-attach-detach": "true"
            },
            "Status": {
                "Capacity": {
                    "alpha.kubernetes.io/nvidia-gpu": "0",
                    "cpu": "4",
                    "memory": "62884272Ki",
                    "pods": "110"
                },
                "Allocatable": {
                    "alpha.kubernetes.io/nvidia-gpu": "0",
                    "cpu": "4",
                    "memory": "62781872Ki",
                    "pods": "110"
                },

In case this gets applied at startup ... I've tried terminating all the VMs in my cluster ... no dice.

I've also tried doing kops edit ig ... for the gpu node to add the label alpha.kubernetes.io/nvidia-gpu-name="Tesla K80" and cycling the gpu node (terminating/allowing restart), again no dice.

While I did the kops update cluster... and kops rolling-update cluster ... I'm not sure if the Accelerators:true setting is taking effect.

If I look at my k8s api server pod ... I see the startup command is ...

      /usr/local/bin/kube-apiserver --address=127.0.0.1 --admission-control=NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,DefaultStorageClass,DefaultTolerationSeconds,ResourceQuota --allow-privileged=true --anonymous-auth=false --apiserver-count=1 --authorization-mode=AlwaysAllow --basic-auth-file=/srv/kubernetes/basic_auth.csv --client-ca-file=/srv/kubernetes/ca.crt --cloud-provider=aws --etcd-servers-overrides=/events#http://127.0.0.1:4002 --etcd-servers=http://127.0.0.1:4001 --insecure-port=8080 --kubelet-preferred-address-types=InternalIP,Hostname,ExternalIP,LegacyHostIP --secure-port=443 --service-cluster-ip-range=100.64.0.0/13 --storage-backend=etcd2 --tls-cert-file=/srv/kubernetes/server.cert --tls-private-key-file=/srv/kubernetes/server.key --token-auth-file=/srv/kubernetes/known_tokens.csv --v=2 1>>/var/log/kube-apiserver.log 2>&1

which doesn't have the feature gates flag. So perhaps it's not actually getting set?

I'm running client/server:

Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.1", GitCommit:"b0b7a323cc5a4a2019b2e9520c21c7830b7f708e", GitTreeState:"clean", BuildDate:"2017-04-03T20:44:38Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.2", GitCommit:"477efc3cbe6a7effca06bd1452fa356e2201e1ee", GitTreeState:"clean", BuildDate:"2017-04-19T20:22:08Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}

and kops:

$kops version
Version 1.6.0-beta.1 (git-77f222d)
lifecyclrotten

All 34 comments

@justinsb - it seems like you're the most familiar w this method ... can you offer any insight? Or something to try and debug? Thanks!

SSH'ing into the p2.xlarge instance ... it seems like nvidia drivers are installed ...

$ nvidia-smi
Thu May  4 20:49:03 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.39                 Driver Version: 375.39                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           On   | 0000:00:1E.0     Off |                    0 |
| N/A   52C    P8    28W / 149W |      0MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

but the kubelet isn't detecting that the node has a GPU resource? Again, I think that points to the Accelerators=true fg not being enabled ... so the kubelet doesn't auto-detect the GPU capacity.

However, I don't see a shared lib folder like: /usr/lib/nvidia-375 (or /usr/lib/nvidia) ... so perhaps the driver install didn't complete successfully?

$ sudo find / -name nvidia
/usr/src/linux-headers-4.4.41-k8s/drivers/video/fbdev/nvidia
/usr/src/linux-headers-4.4.41-k8s/drivers/net/ethernet/nvidia
/usr/src/nvidia-375.39/nvidia
/usr/share/nvidia
/var/lib/nvidia
/sys/bus/pci/drivers/nvidia
$ sudo !!
sudo ls -alh /var/lib/nvidia
total 44K
drwx------  2 root root 4.0K May  4 19:25 .
drwxr-xr-x 29 root root 4.0K May  4 19:25 ..
-rw-r--r--  1 root root  891 May  4 19:25 dirs
-rw-------  1 root root  29K May  4 19:25 log
$ sudo ls /var/lib/nvidia
dirs  log
$ ls /usr/share/nvidia
nvidia-application-profiles-375.39-key-documentation  nvidia-application-profiles-375.39-rc
$ ls /sys/bus/pci/drivers/nvidia/
0000:00:1e.0  bind  module  new_id  remove_id  uevent  unbind

While I'm here ... I'm pretty unclear on how/where to look for the bootstrap container run ... is that a container run as part of the k8s api server pod? It would be helpful to know so that I can take a look at the output for debugging as well.

FYI the contents of /usr/lib on the gpu VM are:

$ ls -1 /usr/lib
apt
binfmt.d
cloud-init
compat-ld
coreutils
docker
dpkg
gcc
git-core
gnupg
gold-ld
grub
grub-legacy
klibc
ldscripts
libau.so
libau.so.2
libau.so.2.7
libbfd-2.25-system.so
libGL.so.1
libopcodes-2.25-system.so
libsupp.a
locale
lognorm
man-db
mime
modules-load.d
openssh
os-release
perl5
pyshared
python2.6
python2.7
python3
python3.4
rsyslog
sasl2
sftp-server
ssl
sudo
sysctl.d
sysstat
systemd
tar
tasksel
tc
tmpfiles.d
valgrind
x86_64-linux-gnu
xorg

Its unclear to me if those are the 'right' shared libs...

SSH'ing into the gpu node ... I'm able to see the kubelet run command via ps:

root      2758     1  0 19:23 ?        00:01:48 /usr/local/bin/kubelet --allow-privileged=true --babysit-daemons=true --cgroup-root=/ --cloud-provider=aws --cluster-dns=100.64.0.10 --cluster-domain=cluster.local --enable-debugging-handlers=true --eviction-hard=memory.available<100Mi,nodefs.available<10%,nodefs.inodesFree<5%,imagefs.available<10%,imagefs.inodesFree<5% --feature-gates=Accelerators=true --hostname-override=ip-1-2-3-4.us-west-2.compute.internal --kubeconfig=/var/lib/kubelet/kubeconfig --network-plugin-mtu=9001 --network-plugin=kubenet --node-labels=kubernetes.io/role=node,node-role.kubernetes.io/node= --non-masquerade-cidr=100.64.0.0/10 --pod-manifest-path=/etc/kubernetes/manifests --register-schedulable=true --require-kubeconfig=true --v=2 --cni-bin-dir=/opt/cni/bin/ --cni-conf-dir=/etc/cni/net.d/ --network-plugin-dir=/opt/cni/bin/

And the feature-gates flag is set. But again, it seems like the kubelet doesn't detect the gpu capacity:

$kc get node/ip-x-x-x-x.us-west-2.compute.internal -o yaml
apiVersion: v1
kind: Node
metadata:
  annotations:
    node.alpha.kubernetes.io/ttl: "0"
    volumes.kubernetes.io/controller-managed-attach-detach: "true"
  creationTimestamp: 2017-05-04T19:23:31Z
  labels:
    alpha.kubernetes.io/nvidia-gpu-name: Tesla-K80
    beta.kubernetes.io/arch: amd64
    beta.kubernetes.io/instance-type: p2.xlarge
    beta.kubernetes.io/os: linux
    failure-domain.beta.kubernetes.io/region: us-west-2
    failure-domain.beta.kubernetes.io/zone: us-west-2c
    kubernetes.io/hostname: ip-x-x-x-x.us-west-2.compute.internal
    kubernetes.io/role: node
    node-role.kubernetes.io/node: ""
  name: ip-x-x-x-x.us-west-2.compute.internal
  resourceVersion: "122420"
  selfLink: /api/v1/nodesip-x-x-x-x.us-west-2.compute.internal
status:
  addresses:
  - address: x.x.x.x
    type: InternalIP
  - address: x.x.x.x
    type: LegacyHostIP
  - address: x.x.x.x
    type: ExternalIP
  - address: ip-x-x-x-x.us-west-2.compute.internal
    type: InternalDNS
  - address: ec2-x-x-x-x.us-west-2.compute.amazonaws.com
    type: ExternalDNS
  - address: ip-x-x-x-x.us-west-2.compute.internal
    type: Hostname
  allocatable:
    alpha.kubernetes.io/nvidia-gpu: "0"
    cpu: "4"
    memory: 62781872Ki
    pods: "110"
  capacity:
    alpha.kubernetes.io/nvidia-gpu: "0"
    cpu: "4"
    memory: 62884272Ki
    pods: "110"
  conditions:
  - lastHeartbeatTime: 2017-05-04T22:48:14Z
    lastTransitionTime: 2017-05-04T19:23:31Z
    message: kubelet has sufficient disk space available
    reason: KubeletHasSufficientDisk
    status: "False"
    type: OutOfDisk
  - lastHeartbeatTime: 2017-05-04T22:48:14Z
    lastTransitionTime: 2017-05-04T19:23:31Z
    message: kubelet has sufficient memory available
    reason: KubeletHasSufficientMemory
    status: "False"
    type: MemoryPressure
  - lastHeartbeatTime: 2017-05-04T22:48:14Z
    lastTransitionTime: 2017-05-04T19:23:31Z
    message: kubelet has no disk pressure
    reason: KubeletHasNoDiskPressure
    status: "False"
    type: DiskPressure
  - lastHeartbeatTime: 2017-05-04T22:48:14Z
    lastTransitionTime: 2017-05-04T19:23:51Z
    message: kubelet is posting ready status
    reason: KubeletReady
    status: "True"
    type: Ready
  - lastHeartbeatTime: 2017-05-04T19:23:37Z
    lastTransitionTime: 2017-05-04T19:23:37Z
    message: RouteController created a route
    reason: RouteCreated
    status: "False"
    type: NetworkUnavailable
  daemonEndpoints:
    kubeletEndpoint:
      Port: 10250
  images:
  - names:
    - protokube:1.6.0-beta.1
    sizeBytes: 377410190
  - names:
    - gcr.io/google_containers/kube-proxy@sha256:b40aba0591eae05019ef99aa16ae3621ce15fc3d467cb42c0d01c845446fe3ee
    - gcr.io/google_containers/kube-proxy:v1.6.2
    sizeBytes: 109158443
  - names:
    - kopeio/nvidia-bootstrap@sha256:4f64141baa82837eb95d699a0f9f42759ceec8038b31ed6ac92e30fc99f012d3
    - kopeio/nvidia-bootstrap:1.6
    sizeBytes: 5086458
  - names:
    - gcr.io/google_containers/pause-amd64@sha256:163ac025575b775d1c0f9bf0bdd0f086883171eb475b5068e7defa4ca9e76516
    - gcr.io/google_containers/pause-amd64:3.0
    sizeBytes: 746888
  nodeInfo:
    architecture: amd64
    bootID: 796f6d02-1fd8-405e-a12b-683789050b03
    containerRuntimeVersion: docker://1.12.6
    kernelVersion: 4.4.41-k8s
    kubeProxyVersion: v1.6.2
    kubeletVersion: v1.6.2
    machineID: 34213acfe0d74798a23d34b94a5c76c8
    operatingSystem: linux
    osImage: Debian GNU/Linux 8 (jessie)
    systemUUID: EC2DA947-F621-AE3F-3569-63DE153B6D75

(I added that gpu name label manually)

Which makes me suspect the drivers aren't installed correctly?

A few more notes ...

My cluster is different from the cluster in the example in that the default nodes instance group is just r4.xlarge instances ... I added an instance group gpunodes of size 1 of type p2.xlarge

However, SSH'ing into the gpu node and looking at syslog ... it appears that the hook is running the kopeio/nvidia-bootstrap image. I also tried updating that image to the latest version (1.6.0 not 1.6) and no dice.

At this point, I'm really not sure what to try.

I guess I could try installing the nvidia drivers manually via SSH ... but it seems unlikely that k8s will 'detect' the change in resources for that node. (But honestly I have no idea if that code runs at startup or periodically). I suppose I could also make my manual install into a script and an image and provide that instead of the bootstrap script ... but it feels a bit like shooting in the dark ... especially since I don't have a great way to get the logs from that container's run.

I did notice that when I upped to the bootstrap version 1.6.0 and restarted ... there was a brief window where the nvidia drivers didn't seem like they were installed on the system (logs below). If k8s checks what resources are allocatable on the node at kubelet startup ... perhaps there is a race? Idk.

admin@ip-1-2-3-4:~$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver.
Make sure that the latest NVIDIA driver is installed and running.

and a few seconds later it works ...

admin@ip-1-2-3-4:~$ nvidia-smi
Fri May  5 00:41:05 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.39                 Driver Version: 375.39                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 0000:00:1E.0     Off |                    0 |
| N/A   56C    P0    74W / 149W |      0MiB / 11439MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

I'm open to suggestions for things to try next to troubleshoot ...

Ok ... I think I have conclusive proof that this is a race.

Reading the k8s code base, I've found where it checks for gpu support.

kubelet.go calls the gpumanager.Start() here

(and gpuManager.Start() is the only thing that calls the discoverGPUs() function ... which is the only thing that sets the state of ngm.allGPUs which is what is returned from a cpuManager.Capacity() request)

a few lines later in kubelet.go it initializes the resource manager

that's relevant only because

a) it happens after the gpuManager.Start() call
b) it prints info lines to the log, so I can get a timestamp of when this command run in syslog

Which is:

May  5 21:14:18 ip-172-20-36-93 kubelet[2923]: I0505 21:14:18.361135    2923 fs_resource_analyzer.go:66] Starting FS ResourceAnalyzer

The important thing there is the timestamp --- 21:14:18

Now looking through the syslog I can also see the nvidia-bootstrap container get run (pid is 2406) ...

...
May  5 21:14:28 ip-172-20-36-93 docker[2406]: NVIDIA-Linux-x86_64-375.39.run: OK
May  5 21:14:28 ip-172-20-36-93 docker[2406]: Verifying archive integrity... OK
May  5 21:14:29 ip-172-20-36-93 kubelet[2923]: I0505 21:14:29.286157    2923 kube_docker_client.go:329] Stop pulling image "gcr.io/google_containers/kube-proxy:v1.6.2": "Status: Downloaded newer image for gcr.io/google_containers/kube-proxy:v1.6.2"
May  5 21:14:29 ip-172-20-36-93 kubelet[2923]: I0505 21:14:29.467564    2923 kubelet.go:1842] SyncLoop (PLEG): "kube-proxy-ip-172-20-36-93.us-west-2.compute.internal_kube-system(3bda8e56a1aba69aa2fb00c321157d2f)", event: &pleg.PodLifecycleEvent{ID:"3bda8e56a1aba69aa2fb00c321157d2f", Type:"ContainerStarted", Data:"40e711a8b6c1814cc098b06c6764b37c4d6d6ea83f7c1e0916668b9ba4e4c733"}
May  5 21:14:35 ip-172-20-36-93 docker[2406]: Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 375.39.........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
May  5 21:14:37 ip-172-20-36-93 systemd[1]: Starting Run docker-healthcheck once...
May  5 21:14:37 ip-172-20-36-93 docker-healthcheck[6707]: docker healthy
May  5 21:14:37 ip-172-20-36-93 systemd[1]: Started Run docker-healthcheck once.
May  5 21:14:38 ip-172-20-36-93 docker[2406]: Welcome to the NVIDIA Software Installer for Unix/Linux
May  5 21:14:38 ip-172-20-36-93 docker[2406]: Detected 4 CPUs online; setting concurrency level to 4.
May  5 21:14:38 ip-172-20-36-93 docker[2406]: License accepted by command line option.
May  5 21:14:38 ip-172-20-36-93 docker[2406]: Installing NVIDIA driver version 375.39.
May  5 21:14:38 ip-172-20-36-93 docker[2406]: Performing CC sanity check with CC="/usr/bin/cc".
May  5 21:14:38 ip-172-20-36-93 docker[2406]: Kernel source path: '/lib/modules/4.4.41-k8s/build'
May  5 21:14:38 ip-172-20-36-93 docker[2406]: Kernel output path: '/lib/modules/4.4.41-k8s/build'
May  5 21:14:38 ip-172-20-36-93 docker[2406]: Performing rivafb check.
May  5 21:14:38 ip-172-20-36-93 docker[2406]: Performing nvidiafb check.
May  5 21:14:38 ip-172-20-36-93 docker[2406]: Performing Xen check.
May  5 21:14:38 ip-172-20-36-93 docker[2406]: Performing PREEMPT_RT check.
May  5 21:14:38 ip-172-20-36-93 docker[2406]: Cleaning kernel module build directory.
May  5 21:14:38 ip-172-20-36-93 docker[2406]: Building kernel modules
...
ton more stuff in the middle here
...
May  5 21:17:30 ip-172-20-36-93 docker[2406]: Enabled persistence mode for GPU 0000:00:1E.0.
May  5 21:17:30 ip-172-20-36-93 docker[2406]: All done.
May  5 21:17:30 ip-172-20-36-93 docker[2406]: Applications clocks commands have been set to UNRESTRICTED for GPU 0000:00:1E.0
May  5 21:17:30 ip-172-20-36-93 docker[2406]: All done.
May  5 21:17:30 ip-172-20-36-93 kernel: [  274.128859] NVRM: Persistence mode is deprecated and will be removed in a future release. Please use nvidia-persistenced instead.
May  5 21:17:30 ip-172-20-36-93 docker[2406]: All done.
May  5 21:17:30 ip-172-20-36-93 docker[2406]: All done.
May  5 21:17:30 ip-172-20-36-93 docker[2406]: Applications clocks set to "(MEM 2505, SM 875)" for GPU 0000:00:1E.0
May  5 21:17:30 ip-172-20-36-93 docker[2406]: All done.

So as far as I can tell its completed successfully. But note the timestamp -- 21:17:30 ... much after the discoverGPUs() method in the kubelet has been called. (In fact the timestamps at the beginning of the log suggest that it doesn't even start installing the driver until after the kubelet's discoverGPU() call has completed)

FYI - as far as I can tell the drivers are working ...

admin@ip-1-2-3-4:~$ ls /dev | grep nvidia
nvidia0
nvidiactl
nvidia-uvm

But since the detection code ran way before the gpu devices were mounted, the kubelet doesn't detect it.

The 'symptom' I'm seeing is that pods won't schedule to any nodes because no nodes have the gpu resource. Similarly, if I inspect the node, it reports to have no GPU resources.

I've tried restarting the VM. I'll admit - I don't have a good understanding of linux drivers, but was hoping once the devices were mounted, a restart would retain the gpu devices (hopefully some service on startup would mount them???) and the kubelet would then detect them. No dice.

At this point I think I'm going to have to pursue another avenue of using GPUs with kubernetes.

Even if I were to install the drivers locally manually (I've tried w/o much luck so far), and automate that into an image I could provide into the kops hook ... it seems like there's no guarantee that the bootstrap container finishes execution before the kubelet looks for the gpu. It seems like this design will result in a race.

That said ... maybe the Accelerators=true features gate enables a code path that I've missed and should detect the GPUs after startup.

At this point ... I think I'll have to rely on node selectors to schedule the pods. Which will suffice in the short term, but is not the type of resource scheduling I'll need going forward.

That said ... I really really really want kops to support this feature. We have a lot of demand for our platform for GPU based scheduling ... and I would love to use kops to manage that deployment (bcz that'll help us manage this across cloud providers as well).

Let me know if I can help.

maybe a workaround is to just bake the latest nvidia driver into the ami image?
I've hit this similar issue with the on-the-fly driver install and currently, my best guess at how to reliably make it work is to just bake a new ami with drivers

i have more info on how i worked around this on https://github.com/kubernetes/kops/issues/1726#issuecomment-299748490

obviously, when you bake ur own AMI, you wouldn't need to use kopeio/nvidia-bootstrap anymore

Yes, we recommend baking the NVIDIA driver inside an AMI, and running nvidia-modprobe at boot time to circumvent the unreliable GPU detection in Kubernetes.

Is that allowed by the nvidia driver distribution rules @flx42 ?

Thanks @diwu1989 - your details in the other ticket allowed me to get this up and running.

That said ... I am looking forward to when kops can support this seamlessly.

I was able to get an image working that:

a) installs the nvidia drivers
b) updates the /etc/rc.local file, and
c) restarts the VM

Which I think basically solves this problem. It doesn't matter that there's a race ... we just restart the VM and the devices are mounted on startup so kubelet detects the drivers.

The only issue is this one which I've run into a fair bit while testing. Still ... its pretty easy to work around.

Since I have customers waiting on this feature, I've already ran with this solution myself for our platform's needs. You can see the image here. But all things considered I'd rather this be maintained by kops/k8s.

If you think this is a valid solution:

  • I'll submit it as a PR to update the bootstrap image
  • I think it mitigates this issue ... though you may still want the detection to poll, I'm not sure

Let me know

@sjezewski what exactly is the workaround?

  1. did you make a new ami image with the nvidia drivers?
  2. what did you update under rc.local ?
  3. I need that to be autoscale meanning I don't want to reboot the node every time I add one to the cluster

@justinsb i think @sjezewski is right about the race condition
I've restarted the kubelet service (not the node) and it was ok the pod you supplied in the sample works

is there a way to signal kubelet to restart once the nvidia boostrap is completed?

Just to help people around as I ran in to the same problem. The best solution for now is to create a custom AMI for gpu instancegroups unless it is handled in kops properly (currently gpu detection is perhaps only valid for p2 instances, and the race).

nvidia-smi -pm 1 || true
nvidia-smi -acp 0 || true
nvidia-smi --auto-boost-default=0 || true
nvidia-smi --auto-boost-permission=0 || true
nvidia-modprobe -u -c=0 -m || true

and kublet detects capacity properly when new igs are created

thanks I'll give it a try now

awesome it worked

Glad you got it working @innovia. My workaround/fix was just to use the image I linked above in lieu of the kope.io/nvidia-bootstrap image.

That image will do the install + restart so it gets around the race.

I'm not sure which is easier for kops to maintain in the long term -- a set of AMIs that work on all GPU instances across cloud providers ... or a bootstrap image that installs the proper drivers and mounts the right devices. I would guess the latter but I'm not sure.

Again ... if you want me to submit the bootstrapper I used as a PR, happy to do that @justinsb

Please submit a PR and or docs

So what hooks image can we use? Need we to follow pachyderm docs?

I've tried to use the pachyderm/nvidia_driver_install:latest hook but seems that it doesn't work

You have to understand that no matter which hook you will use it will run a docker container via kubelet and in order for kubelet to detect the new installed drivers it has to be restarted and thats something you can not do from inside a container just like you can control the restart on your computer from inside a docker conatiner so build your own ec2 image based on the docker instructions from the boostrap docker image and set that image on the instance group using kops edit ig ...

So also this will not work: http://docs.pachyderm.io/en/latest/cookbook/gpus.html#enable-gpus-at-the-k8s-level?

@sjezewski How supposed to work this line?

Yea, so when I've used that image, it has restarted the first time (after installing the drivers) and then not after ... and that script seems to work fine for that purpose. However, I tested on the p2.xlarge, and this was a while ago now. I'm not sure why it wouldn't be working.

Note that sometimes the node restarts but doesn't get registered properly by the k8s cluster (you can see I link to a bug above describing this). (its this issue). So if you're seeing the VM in question restart but not appear under kubectl get nodes you may have to kill the k8s apiserver pod under the kube-system namespace.

Sounds like its time to test this, and get it submitted as a PR so it can be maintained in the right place.

I think that you cannot shutdown the host from the container without some bad hack

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

This is going to be closed by the bot. How this will relate with https://github.com/kubernetes/kubernetes/issues/54011?

See also section:
v1.8 onwards

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Nice finally solved :stuck_out_tongue_closed_eyes: Thank you bot.

I was able to work around the issue without needing to create a new AMI image, nor modifying the Dockerfile for the kopeio/nvidia-bootstrap container.

The workaround requires adding this custom kops hook to the gpunodes instancegroup definition.

spec:
  hooks:
  - name: nvidiagpu
    manifest: |-
      Type=oneshot
      ExecStartPre=/usr/bin/docker pull kopeio/nvidia-bootstrap:1.6
      ExecStart=-/usr/bin/docker run -v /:/rootfs/ -v /var/run/dbus:/var/run/dbus -v /run/systemd:/run/systemd --net=host --privileged kopeio/nvidia-bootstrap:1.6
      ExecStop=/bin/systemctl restart kubelet
      Restart=no

The ExecStop above instructs kops hook systemd to restart kubelet after the bootstrap container has run.

Note that the minus (-) sign in front of the ExecStart was required, because subsequent runs of the bootstrap container upon system reboot would fail due to mknod on a device file that already exists from the last run. This caused the systemd unit to fail, which systemd would notice and restart the kops hook unit again a minute later, creating a download loop of a 200MB+ Nvidia installation file.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

olalonde picture olalonde  路  4Comments

owenmorgan picture owenmorgan  路  3Comments

minasys picture minasys  路  3Comments

lnformer picture lnformer  路  3Comments

georgebuckerfield picture georgebuckerfield  路  4Comments