nvidia-docker inside Docker in Docker

Created on 29 Apr 2017  Â·  7Comments  Â·  Source: NVIDIA/nvidia-docker

Is it possible to use nvidia-docker inside Docker in Docker? I tried different configurations but none of them worked. I could share my hacky Dockerfile if you need it.

unsupported

Most helpful comment

I managed to get _something_ up in kubernetes - within the DinD context. It's very experimental.

First, I modified the ubuntu-dind image (https://github.com/billyteves/ubuntu-dind) to install nvidia-docker (i.e. added the instructions in the nvidia-docker site to the Dockerfile) and changed it to be based on nvidia/cuda:9.2-runtime-ubuntu16.04.

Then I created a pod with two containers, a frontend ubuntu container and the a privileged docker daemon container as a sidecar. The sidecar's image is the modified one I mentioned above.

Here's my pod definition:

apiVersion: v1 
kind: Pod 
metadata: 
    name: test-dind 
spec: 
    containers: 
      - name: docker
        # image: tensorflow/tensorflow:latest-gpu
        image: ubuntu:18.04
        imagePullPolicy: Always
        command: ['sleep', '600'] 
        resources: 
            requests: 
                cpu: 10m 
                memory: 256Mi 
        securityContext: 
            privileged: false
        env: 
          - name: DOCKER_HOST 
            value: tcp://localhost:2375 
      - name: dind-daemon 
        image: perdasilva/ubuntu-dind:16.04
        imagePullPolicy: Always
        env:
          - name: PORT
            value: "2375"
        resources: 
            requests: 
                cpu: 20m 
                memory: 512Mi
                nvidia.com/gpu: 1
            limits:
                nvidia.com/gpu: 1
        securityContext: 
            privileged: true
        volumeMounts: 
          - name: docker-graph-storage 
            mountPath: /var/lib/docker
    volumes: 
      - name: docker-graph-storage 
        emptyDir: {}

You might be able to reproduce this on linux using just docker. I have no idea what the pitfalls or security issues might be here (aside from the privileged container). All I can attest to is that docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi worked for me once I got a terminal to the frontend container and installed docker in it. I also did a docker run -ti --runtime=nvidia --rm tensorflow/tensorflow:latest-gpu /bin/bash and executed a simple matrix multiplication on gpu. It worked.

It would be great to get some feedback from the nvidia experts here what the potential issues with this approach could be.

All 7 comments

docker-daemon-in-docker? Not docker-client-in-docker, right?

I guess. My goal is to have multiple containers with access to the GPUs inside one "outer" container with access.

It should be possible to make it work, but it's clearly going to be painful since you will need to mount the NVIDIA driver files inside the docker:dind container... which actually mean you could need to launch this docker-in-docker container with nvidia-docker :)
You're basically on your own here.

@Baschdl Did you manage to do it? I'm trying to do the same....

No, I didn't manage to do it. But good luck with it. 😄

I managed to get _something_ up in kubernetes - within the DinD context. It's very experimental.

First, I modified the ubuntu-dind image (https://github.com/billyteves/ubuntu-dind) to install nvidia-docker (i.e. added the instructions in the nvidia-docker site to the Dockerfile) and changed it to be based on nvidia/cuda:9.2-runtime-ubuntu16.04.

Then I created a pod with two containers, a frontend ubuntu container and the a privileged docker daemon container as a sidecar. The sidecar's image is the modified one I mentioned above.

Here's my pod definition:

apiVersion: v1 
kind: Pod 
metadata: 
    name: test-dind 
spec: 
    containers: 
      - name: docker
        # image: tensorflow/tensorflow:latest-gpu
        image: ubuntu:18.04
        imagePullPolicy: Always
        command: ['sleep', '600'] 
        resources: 
            requests: 
                cpu: 10m 
                memory: 256Mi 
        securityContext: 
            privileged: false
        env: 
          - name: DOCKER_HOST 
            value: tcp://localhost:2375 
      - name: dind-daemon 
        image: perdasilva/ubuntu-dind:16.04
        imagePullPolicy: Always
        env:
          - name: PORT
            value: "2375"
        resources: 
            requests: 
                cpu: 20m 
                memory: 512Mi
                nvidia.com/gpu: 1
            limits:
                nvidia.com/gpu: 1
        securityContext: 
            privileged: true
        volumeMounts: 
          - name: docker-graph-storage 
            mountPath: /var/lib/docker
    volumes: 
      - name: docker-graph-storage 
        emptyDir: {}

You might be able to reproduce this on linux using just docker. I have no idea what the pitfalls or security issues might be here (aside from the privileged container). All I can attest to is that docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi worked for me once I got a terminal to the frontend container and installed docker in it. I also did a docker run -ti --runtime=nvidia --rm tensorflow/tensorflow:latest-gpu /bin/bash and executed a simple matrix multiplication on gpu. It worked.

It would be great to get some feedback from the nvidia experts here what the potential issues with this approach could be.

I've created a repo for nvidia-docker inside dind. I use it to run bert-as-service for continuous integration.

https://github.com/Henderake/dind-nvidia-docker

Was this page helpful?
0 / 5 - 0 ratings