Nvidia-docker: Using nvidia-docker from third-party tools

Created on 25 Jan 2016 · 26Comments · Source: NVIDIA/nvidia-docker

It's very easy to use nvidia-docker when running individual containers, but is there a way to run nvidia-docker instead of docker from other Docker tools like docker-compose, Tutum, Rancher, etc?

I am assuming one would just need to specify the nvidia-docker volume to be mounted in the container, but I couldn't find any documentation on the correct syntax.

new feature question

Source

hannes-brt

Most helpful comment

So is there method to use the nvidia plugin with docker-compose now,
or can that be broken out into it's own specific issue?

ruffsl on 5 May 2016

👍2

All 26 comments

If the tool supports overriding the docker command, then you should use that to plug in nvidia-docker. For example, we provide this option ourselves with environment variable NV_DOCKER.

If it's not possible, you can query the Docker CLI arguments to the plugin:

$ curl -s localhost:3476/docker/cli
--device=/dev/nvidiactl --device=/dev/nvidia-uvm --device=/dev/nvidia0 --device=/dev/nvidia1 --volume-driver=nvidia-docker --volume=nvidia_driver_352.68:/usr/local/nvidia:ro

Of course you will need to transform this into YAML format for docker-compose (for example).
We were wondering what to do in this case and we couldn't find a clean solution. Would that help if we add a REST endpoint that returns the CLI above as YAML? Or as JSON?

Thank you!

flx42 on 26 Jan 2016

If a sorf-of multiline variable substitution was possible in compose files, I guess you could add a YAML format option for the for the REST endpoint. But I think the environment variables are queried after the YAML syntax is parsed, resulting in "invalid mode" error from the improper structure. This still might be useful for just a 'copy and paste' approach for compose files though.

Is there a way we could use the volume-driver option with say set label format to convey the card IDs to export?

ruffsl on 26 Jan 2016

@ruffsl I like the idea of using the variable substitution.
Can't we leverage the one-line YAML syntax? If the Rest, output something like:

NV_DEVICES="['/dev/nvidia0', '/dev/nvidia1']"

We could easily do something like (the same way docker-machine does it):

eval "$(curl -s localhost:3476/docker/env)"

given the following docker-compose.yml, it would work right?

devices: ${NV_DEVICES}

3XX0 on 26 Jan 2016

I'm just testing a foo bar example:

test:
  image: ubuntu
  volumes: ${FOO_BAR}
  command: ping 127.0.0.1

and am seeing this:

$ mkdir /tmp/foo
$ mkdir /tmp/bar
$ export FOO_BAR="['/tmp/bar:/bar', '/tmp/foo:/foo']"
$ docker-compose up
ERROR: Validation failed in file './docker-compose.yml', reason(s):
Service 'test' configuration key 'volumes' contains an invalid type, it should be an array

Is this the correct one-line YAML syntax? I've never seen it before.
Also, how could you keep other volumes or device lists defined alongside (stuff you'd like to keep in the compose file)?

ruffsl on 26 Jan 2016

Not sure if docker-compose supports it but it's part of the YAML spec:
http://yaml.org/spec/1.2/spec.html#id2759963

For other volumes/devices if docker-compose supports multiple volumes or devices keywords it would work, otherwise by omitting the brackets I suppose.

3XX0 on 26 Jan 2016

It seems to work without variable substitution:

test:
  image: ubuntu
  devices: ['/dev/nvidiactl', '/dev/nvidia-uvm', '/dev/nvidia0']
  command: ls /dev/

Variable substitution might be designed to generate values, and not YAML.
But let's wait for an official response from the docker-compose developers.

@ruffsl: I didn't understand the following, could you explain it?

Is there a way we could use the volume-driver option with say set label format to convey the card IDs to export?

flx42 on 26 Jan 2016

@flx42 This was an example of what I was thinking

test:
  image: ubuntu
  volume_driver: nvidia-docker-driver
  labels:
    nvidia.gpu: "0,1"
  command: nvidia-smi

I'm not sure how feasible this would be, but I think its attractive in its simplicity from a user perspective. You've developed a plugin, could a custom driver be extend to parse the label metadata? This then could be a clean way to define nvidia containers with compose.

You could also use the variable substitution with this as well: export NV_GPU='0,1'

labels:
    nvidia.gpu: ${NV_GPU}

ruffsl on 26 Jan 2016

A volume plugin will not be able to mount devices, and actually a plugin can't even inspect the starting image AFAIK.

flx42 on 26 Jan 2016

Good progress on https://github.com/docker/compose/issues/2750
There is a pending PR to solve this use-case (haven't tested it).
But it will require a bleeding edge version of docker-compose if it's accepted.

flx42 on 27 Jan 2016

@flx42 is correct, currently the only supported plugin is VolumeDriver and it only deals with volume names (code).
I think a JSON endpoint could be useful here, however, it means you would need to write small wrappers around docker-compose and other tools to do the conversion (e.g. JSON -> YAML config)

3XX0 on 28 Jan 2016

Another vote here. I'm trying to get Kubernetes to talk to the plugin. Kubernetes doesn't build command lines, it uses a Docker client API (fsouza's, but there's a migration in progress to the official one). JSON might work. Or perhaps embedding parts of nvidia-docker and nvidia-docker-plugin as a library inside kubelet (the daemon that manages the node) or a helper process running on the same machine.

therc on 7 Apr 2016

I'm not really familiar with Kubernetes but we are definitively interested in supporting it. Since it's written in Go, the nvidia package should do it. Alternatively we could use the nvidia-docker-plugin as a flex volume driver (maybe?).
Anyhow, feel free to create a separate issue and we'll address those requirements specifically.

3XX0 on 7 Apr 2016

Nut now uses nvidia-docker-plugin to mount GPUs in containers :)
I'm not using nvidia-docker/nvidia module though, but rather targeting the REST API directly to retrieve GPU paths, volume name, and to inject those values in docker API using go-dockerclient.

matthieudelaro on 26 Apr 2016

So is there method to use the nvidia plugin with docker-compose now,
or can that be broken out into it's own specific issue?

ruffsl on 5 May 2016

👍2

Well it's not working out of the box but with the addition of the /docker/cli/json endpoint you can generate docker-compose files easily. For example:

#! /usr/bin/env python

import urllib2
import json
import yaml
import sys

if len(sys.argv) == 1:
    print "usage: %s service [key=value]..." % sys.argv[0]
    sys.exit(0)

resp = urllib2.urlopen("http://localhost:3476/docker/cli/json").read()

args = json.loads(resp)
args["volumes"] = args.pop("Volumes")
args["devices"] = args.pop("Devices")
args["volume_driver"] = args.pop("VolumeDriver")

doc = {sys.argv[1]: args}
for arg in sys.argv[2:]:
    k, v = arg.split("=")
    args[k] = v

yaml.safe_dump(doc, file('docker-compose.yml', 'w'), default_flow_style=False)

./compose.py cuda image=nvidia/cuda command=nvidia-smi
docker-compose up

3XX0 on 5 May 2016

Whilst I appreciate this as a step in the right direction, this still isn't an ideal solution from my point of view. I'd like to see proper integration with docker-compose stay on the radar.

anibali on 6 May 2016

Agreed this is not a canonical solution by any means. This issue should be reopened :/

MadcowD on 28 Jun 2016

@MadcowD I don't think there is much more we can do right now for a better integration. But it's still on our radar since we have people in our team using docker-compose with nvidia-docker.

flx42 on 28 Jun 2016

@3XX0 I am trying to use your docker compose example above with a version '2' docker-compose but I am running into difficulty.

Here is my docker-compose file:

version: '2'

volumes:
  nvidia_driver_352.63:
      driver: nvidia-docker

services:
  cuda:
    command: nvidia-smi
    devices:
    - /dev/nvidiactl
    - /dev/nvidia-uvm
    - /dev/nvidia0
    image: nvidia/cuda
    volumes:
    - nvidia_driver_352.63:/usr/local/nvidia:ro

I get the following error:

Creating volume "utility_nvidia_driver_352.63" with nvidia-docker driver
ERROR: create utility_nvidia_driver_352.63: unsupported volume: utility_nvidia_driver_352.63

Any thoughts?

jmerkow on 29 Jul 2016

Last time I tried, I had to create the volume beforehand with docker volume create and specify the volume as external in the compose file (see here). Not really ideal though...

3XX0 on 29 Jul 2016

Better than nothing. This worked for me.

Steps:

$ docker volume create --name=nvidia_driver_352.63 -d nvidia-docker # create the docker

docker-compose:

version: '2'

volumes:
  nvidia_driver_352.63:
    external: true

services:
  cuda:
    command: nvidia-smi
    devices:
    - /dev/nvidiactl
    - /dev/nvidia-uvm
    - /dev/nvidia0
    image: nvidia/cuda
    volumes:
    - nvidia_driver_352.63:/usr/local/nvidia/:ro

You should be able to generate this yaml file (and generate the volume) by modifying compose.py above.

Thank you.

jmerkow on 29 Jul 2016

FYI. I use a different solution that is a little easier to manage between machines.

in the compose file I set an external driver as before, however, I give the service a static, common name, and set the name alias from variable interpolation like so:

volumes:
  nvidia_driver:
     external:
       name: ${NVIDIA_DRIVER_VOLUME}

then in my service, I use this common name (nvidia_driver)

    volumes:
     - nvidia_driver:/usr/local/nvidia/:ro

All that remains is to set environment variable NVIDIA_DRIVER_VOLUME to your local driver name. This can be obtained from docker volume ls, or from @3XX0's example code (just set the environment variable instead of writing a docker-compose file). I just inserted an export statement into my .bashrc.

jmerkow on 22 Sep 2016

@jmerkow: @eywalker created a project called nvidia-docker-compose, we haven't tested it, but you might be interested to look at it.

flx42 on 22 Sep 2016

nvidia-docker ... works fine in my environment, but docker run $(curl -s http://localhost:3476/docker/cli) ... does not work with the following message:

docker: Error response from daemon: create 0cdaed180e31650f260d3902833b65560bf5ba6d995c8450138990711d6
6be36: bad volume format: 0cdaed180e31650f260d3902833b65560bf5ba6d995c8450138990711d66be36.
See 'docker run --help'.

Another symptom is that tensorflow.Session() hangs in Python 3.5.2 inside containers only when the exactly same containers are launched via my custom docker-py integration that interprets and adds configuration arguments from http://localhost:3476/docker/cli. If launched with nvidia-docker command, it works fine!

I'd like to know what exactly nvidia-docker command does, not only some volume/binding arguments, but also internal differences to the plain docker command.
For example, I found that it sets two environment variables:

CUDA_DISABLE_UNIFIED_MEMORY=1
CUDA_CACHE_DISABLE=1

and afterwards loads the nvml C library while the nvidia-docker command is running.
What differences does it make? What are the potential causes for indefinite hang in tensorflow when not launched with nvidia-docker?

achimnol on 5 Nov 2016

I've found that it's not actually hanging but becomes very very slow. (e.g., 10 sec in CPU or nvidia-docker vs. 92 sec in GPU with docker-py invocation) May be related to #224 ...?

achimnol on 5 Nov 2016

@achimnol We explain what nvidia-docker does on our wiki.

The "bad volume format" error is a limitation of Docker, see #181.

Finally, as I explained in #224, we have heard multiple users claiming their code was slower inside Docker, but every single time it was because they compiled the project with different flags, or they had different settings during execution.

flx42 on 5 Nov 2016

Was this page helpful?

0 / 5 - 0 ratings