Is it possible to use nvidia-docker inside Docker in Docker? I tried different configurations but none of them worked. I could share my hacky Dockerfile if you need it.
docker-daemon-in-docker? Not docker-client-in-docker, right?
I guess. My goal is to have multiple containers with access to the GPUs inside one "outer" container with access.
It should be possible to make it work, but it's clearly going to be painful since you will need to mount the NVIDIA driver files inside the docker:dind container... which actually mean you could need to launch this docker-in-docker container with nvidia-docker :)
You're basically on your own here.
@Baschdl Did you manage to do it? I'm trying to do the same....
No, I didn't manage to do it. But good luck with it. 😄
I managed to get _something_ up in kubernetes - within the DinD context. It's very experimental.
First, I modified the ubuntu-dind image (https://github.com/billyteves/ubuntu-dind) to install nvidia-docker (i.e. added the instructions in the nvidia-docker site to the Dockerfile) and changed it to be based on nvidia/cuda:9.2-runtime-ubuntu16.04.
Then I created a pod with two containers, a frontend ubuntu container and the a privileged docker daemon container as a sidecar. The sidecar's image is the modified one I mentioned above.
Here's my pod definition:
apiVersion: v1
kind: Pod
metadata:
name: test-dind
spec:
containers:
- name: docker
# image: tensorflow/tensorflow:latest-gpu
image: ubuntu:18.04
imagePullPolicy: Always
command: ['sleep', '600']
resources:
requests:
cpu: 10m
memory: 256Mi
securityContext:
privileged: false
env:
- name: DOCKER_HOST
value: tcp://localhost:2375
- name: dind-daemon
image: perdasilva/ubuntu-dind:16.04
imagePullPolicy: Always
env:
- name: PORT
value: "2375"
resources:
requests:
cpu: 20m
memory: 512Mi
nvidia.com/gpu: 1
limits:
nvidia.com/gpu: 1
securityContext:
privileged: true
volumeMounts:
- name: docker-graph-storage
mountPath: /var/lib/docker
volumes:
- name: docker-graph-storage
emptyDir: {}
You might be able to reproduce this on linux using just docker. I have no idea what the pitfalls or security issues might be here (aside from the privileged container). All I can attest to is that docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi worked for me once I got a terminal to the frontend container and installed docker in it. I also did a docker run -ti --runtime=nvidia --rm tensorflow/tensorflow:latest-gpu /bin/bash and executed a simple matrix multiplication on gpu. It worked.
It would be great to get some feedback from the nvidia experts here what the potential issues with this approach could be.
I've created a repo for nvidia-docker inside dind. I use it to run bert-as-service for continuous integration.
Most helpful comment
I managed to get _something_ up in kubernetes - within the DinD context. It's very experimental.
First, I modified the ubuntu-dind image (https://github.com/billyteves/ubuntu-dind) to install nvidia-docker (i.e. added the instructions in the nvidia-docker site to the Dockerfile) and changed it to be based on nvidia/cuda:9.2-runtime-ubuntu16.04.
Then I created a pod with two containers, a frontend ubuntu container and the a privileged docker daemon container as a sidecar. The sidecar's image is the modified one I mentioned above.
Here's my pod definition:
You might be able to reproduce this on linux using just docker. I have no idea what the pitfalls or security issues might be here (aside from the privileged container). All I can attest to is that
docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smiworked for me once I got a terminal to the frontend container and installed docker in it. I also did adocker run -ti --runtime=nvidia --rm tensorflow/tensorflow:latest-gpu /bin/bashand executed a simple matrix multiplication on gpu. It worked.It would be great to get some feedback from the nvidia experts here what the potential issues with this approach could be.