It's very easy to use nvidia-docker when running individual containers, but is there a way to run nvidia-docker instead of docker from other Docker tools like docker-compose, Tutum, Rancher, etc?
I am assuming one would just need to specify the nvidia-docker volume to be mounted in the container, but I couldn't find any documentation on the correct syntax.
If the tool supports overriding the docker command, then you should use that to plug in nvidia-docker. For example, we provide this option ourselves with environment variable NV_DOCKER.
If it's not possible, you can query the Docker CLI arguments to the plugin:
$ curl -s localhost:3476/docker/cli
--device=/dev/nvidiactl --device=/dev/nvidia-uvm --device=/dev/nvidia0 --device=/dev/nvidia1 --volume-driver=nvidia-docker --volume=nvidia_driver_352.68:/usr/local/nvidia:ro
Of course you will need to transform this into YAML format for docker-compose (for example).
We were wondering what to do in this case and we couldn't find a clean solution. Would that help if we add a REST endpoint that returns the CLI above as YAML? Or as JSON?
Thank you!
If a sorf-of multiline variable substitution was possible in compose files, I guess you could add a YAML format option for the for the REST endpoint. But I think the environment variables are queried after the YAML syntax is parsed, resulting in "invalid mode" error from the improper structure. This still might be useful for just a 'copy and paste' approach for compose files though.
Is there a way we could use the volume-driver option with say set label format to convey the card IDs to export?
@ruffsl I like the idea of using the variable substitution.
Can't we leverage the one-line YAML syntax? If the Rest, output something like:
NV_DEVICES="['/dev/nvidia0', '/dev/nvidia1']"
We could easily do something like (the same way docker-machine does it):
eval "$(curl -s localhost:3476/docker/env)"
given the following docker-compose.yml, it would work right?
devices: ${NV_DEVICES}
I'm just testing a foo bar example:
test:
image: ubuntu
volumes: ${FOO_BAR}
command: ping 127.0.0.1
and am seeing this:
$ mkdir /tmp/foo
$ mkdir /tmp/bar
$ export FOO_BAR="['/tmp/bar:/bar', '/tmp/foo:/foo']"
$ docker-compose up
ERROR: Validation failed in file './docker-compose.yml', reason(s):
Service 'test' configuration key 'volumes' contains an invalid type, it should be an array
Is this the correct one-line YAML syntax? I've never seen it before.
Also, how could you keep other volumes or device lists defined alongside (stuff you'd like to keep in the compose file)?
Not sure if docker-compose supports it but it's part of the YAML spec:
http://yaml.org/spec/1.2/spec.html#id2759963
For other volumes/devices if docker-compose supports multiple volumes or devices keywords it would work, otherwise by omitting the brackets I suppose.
It seems to work without variable substitution:
test:
image: ubuntu
devices: ['/dev/nvidiactl', '/dev/nvidia-uvm', '/dev/nvidia0']
command: ls /dev/
Variable substitution might be designed to generate values, and not YAML.
But let's wait for an official response from the docker-compose developers.
@ruffsl: I didn't understand the following, could you explain it?
Is there a way we could use the volume-driver option with say set label format to convey the card IDs to export?
@flx42 This was an example of what I was thinking
test:
image: ubuntu
volume_driver: nvidia-docker-driver
labels:
nvidia.gpu: "0,1"
command: nvidia-smi
I'm not sure how feasible this would be, but I think its attractive in its simplicity from a user perspective. You've developed a plugin, could a custom driver be extend to parse the label metadata? This then could be a clean way to define nvidia containers with compose.
You could also use the variable substitution with this as well: export NV_GPU='0,1'
labels:
nvidia.gpu: ${NV_GPU}
A volume plugin will not be able to mount devices, and actually a plugin can't even inspect the starting image AFAIK.
Good progress on https://github.com/docker/compose/issues/2750
There is a pending PR to solve this use-case (haven't tested it).
But it will require a bleeding edge version of docker-compose if it's accepted.
@flx42 is correct, currently the only supported plugin is VolumeDriver and it only deals with volume names (code).
I think a JSON endpoint could be useful here, however, it means you would need to write small wrappers around docker-compose and other tools to do the conversion (e.g. JSON -> YAML config)
Another vote here. I'm trying to get Kubernetes to talk to the plugin. Kubernetes doesn't build command lines, it uses a Docker client API (fsouza's, but there's a migration in progress to the official one). JSON might work. Or perhaps embedding parts of nvidia-docker and nvidia-docker-plugin as a library inside kubelet (the daemon that manages the node) or a helper process running on the same machine.
I'm not really familiar with Kubernetes but we are definitively interested in supporting it. Since it's written in Go, the nvidia package should do it. Alternatively we could use the nvidia-docker-plugin as a flex volume driver (maybe?).
Anyhow, feel free to create a separate issue and we'll address those requirements specifically.
Nut now uses nvidia-docker-plugin to mount GPUs in containers :)
I'm not using nvidia-docker/nvidia module though, but rather targeting the REST API directly to retrieve GPU paths, volume name, and to inject those values in docker API using go-dockerclient.
So is there method to use the nvidia plugin with docker-compose now,
or can that be broken out into it's own specific issue?
Well it's not working out of the box but with the addition of the /docker/cli/json endpoint you can generate docker-compose files easily. For example:
#! /usr/bin/env python
import urllib2
import json
import yaml
import sys
if len(sys.argv) == 1:
print "usage: %s service [key=value]..." % sys.argv[0]
sys.exit(0)
resp = urllib2.urlopen("http://localhost:3476/docker/cli/json").read()
args = json.loads(resp)
args["volumes"] = args.pop("Volumes")
args["devices"] = args.pop("Devices")
args["volume_driver"] = args.pop("VolumeDriver")
doc = {sys.argv[1]: args}
for arg in sys.argv[2:]:
k, v = arg.split("=")
args[k] = v
yaml.safe_dump(doc, file('docker-compose.yml', 'w'), default_flow_style=False)
./compose.py cuda image=nvidia/cuda command=nvidia-smi
docker-compose up
Whilst I appreciate this as a step in the right direction, this still isn't an ideal solution from my point of view. I'd like to see proper integration with docker-compose stay on the radar.
Agreed this is not a canonical solution by any means. This issue should be reopened :/
@MadcowD I don't think there is much more we can do right now for a better integration. But it's still on our radar since we have people in our team using docker-compose with nvidia-docker.
@3XX0 I am trying to use your docker compose example above with a version '2' docker-compose but I am running into difficulty.
Here is my docker-compose file:
version: '2'
volumes:
nvidia_driver_352.63:
driver: nvidia-docker
services:
cuda:
command: nvidia-smi
devices:
- /dev/nvidiactl
- /dev/nvidia-uvm
- /dev/nvidia0
image: nvidia/cuda
volumes:
- nvidia_driver_352.63:/usr/local/nvidia:ro
I get the following error:
Creating volume "utility_nvidia_driver_352.63" with nvidia-docker driver
ERROR: create utility_nvidia_driver_352.63: unsupported volume: utility_nvidia_driver_352.63
Any thoughts?
Last time I tried, I had to create the volume beforehand with docker volume create and specify the volume as external in the compose file (see here). Not really ideal though...
Better than nothing. This worked for me.
Steps:
$ docker volume create --name=nvidia_driver_352.63 -d nvidia-docker # create the docker
docker-compose:
version: '2'
volumes:
nvidia_driver_352.63:
external: true
services:
cuda:
command: nvidia-smi
devices:
- /dev/nvidiactl
- /dev/nvidia-uvm
- /dev/nvidia0
image: nvidia/cuda
volumes:
- nvidia_driver_352.63:/usr/local/nvidia/:ro
You should be able to generate this yaml file (and generate the volume) by modifying compose.py above.
Thank you.
FYI. I use a different solution that is a little easier to manage between machines.
in the compose file I set an external driver as before, however, I give the service a static, common name, and set the name alias from variable interpolation like so:
volumes:
nvidia_driver:
external:
name: ${NVIDIA_DRIVER_VOLUME}
then in my service, I use this common name (nvidia_driver)
volumes:
- nvidia_driver:/usr/local/nvidia/:ro
All that remains is to set environment variable NVIDIA_DRIVER_VOLUME to your local driver name. This can be obtained from docker volume ls, or from @3XX0's example code (just set the environment variable instead of writing a docker-compose file). I just inserted an export statement into my .bashrc.
@jmerkow: @eywalker created a project called nvidia-docker-compose, we haven't tested it, but you might be interested to look at it.
nvidia-docker ... works fine in my environment, but docker run $(curl -s http://localhost:3476/docker/cli) ... does not work with the following message:
docker: Error response from daemon: create 0cdaed180e31650f260d3902833b65560bf5ba6d995c8450138990711d6
6be36: bad volume format: 0cdaed180e31650f260d3902833b65560bf5ba6d995c8450138990711d66be36.
See 'docker run --help'.
Another symptom is that tensorflow.Session() hangs in Python 3.5.2 inside containers only when the exactly same containers are launched via my custom docker-py integration that interprets and adds configuration arguments from http://localhost:3476/docker/cli. If launched with nvidia-docker command, it works fine!
I'd like to know what exactly nvidia-docker command does, not only some volume/binding arguments, but also internal differences to the plain docker command.
For example, I found that it sets two environment variables:
CUDA_DISABLE_UNIFIED_MEMORY=1
CUDA_CACHE_DISABLE=1
and afterwards loads the nvml C library while the nvidia-docker command is running.
What differences does it make? What are the potential causes for indefinite hang in tensorflow when not launched with nvidia-docker?
I've found that it's not actually hanging but becomes very very slow. (e.g., 10 sec in CPU or nvidia-docker vs. 92 sec in GPU with docker-py invocation) May be related to #224 ...?
@achimnol We explain what nvidia-docker does on our wiki.
The "bad volume format" error is a limitation of Docker, see #181.
Finally, as I explained in #224, we have heard multiple users claiming their code was slower inside Docker, but every single time it was because they compiled the project with different flags, or they had different settings during execution.
Most helpful comment
So is there method to use the nvidia plugin with docker-compose now,
or can that be broken out into it's own specific issue?