Nvidia-docker: docker: Error response from daemon: create nvidia_driver_361.93.02: VolumeDriver.Create: internal error, check logs for details.

Created on 11 Oct 2016  路  6Comments  路  Source: NVIDIA/nvidia-docker

I am trying to install nvidia-docker on a machine with 2x M40 GPUs running Centos 7.2 with a fresh yum update. I completely removed all hints of docker anything on the server before starting.

I have 2x 1TB NVMe devices formatted btrfs and mounted on /var/lib/docker and /var/lib/nvidia-docker. The install of docker and nvidia docker looks fine but when i try to run nvidia-smi in a container it barfs.

I can talk to the container via docker
[root@patternlab ~]# docker run --rm nvidia/cuda cat /etc/debian_version
jessie/sid

but can't using nvidia-docker
[root@patternlab ~]# nvidia-docker run --rm nvidia/cuda cat /etc/debian_version
docker: Error response from daemon: create nvidia_driver_361.93.02: VolumeDriver.Create: internal error, check logs for details.
See 'docker run --help'.

/var/log/messages shows:
Oct 10 18:46:05 patternlab dockerd: time="2016-10-10T18:46:05.194777571-07:00" level=error msg="Handler for GET /v1.24/volumes/nvidia_driver_361.93.02 returned error: get nvidia_driver_361.93.02: no such volume"

The only thing i see strange is /var/lib/nvidia-docker/volumes/nvidia_driver is empty

[root@patternlab ~]# ls -al /usr/bin/nvidia-cuda-mps-control /var/lib/nvidia-docker/volumes/nvidia_driver
-rwxr-xr-x. 1 root root 59000 Oct 6 18:35 /usr/bin/nvidia-cuda-mps-control

/var/lib/nvidia-docker/volumes/nvidia_driver:
total 0
drwxr-xr-x. 1 nvidia-docker nvidia-docker 0 Oct 10 18:50 .
drwxr-xr-x. 1 nvidia-docker nvidia-docker 26 Oct 9 11:46 ..

[root@patternlab ~]# nvidia-docker run --rm nvidia/cuda nvidia-smi
docker: Error response from daemon: create nvidia_driver_361.93.02: VolumeDriver.Create: internal error, check logs for details.
See 'docker run --help'.

[root@patternlab ~]# rpm -qa | grep docker
docker-engine-1.12.1-1.el7.centos.x86_64
docker-engine-selinux-1.12.1-1.el7.centos.noarch
nvidia-docker-1.0.0~rc.3-1.x86_64

[root@patternlab ~]# docker version
Client:
Version: 1.12.1
API version: 1.24
Go version: go1.6.3
Git commit: 23cf638
Built:
OS/Arch: linux/amd64

Server:
Version: 1.12.1
API version: 1.24
Go version: go1.6.3
Git commit: 23cf638
Built:
OS/Arch: linux/amd64

[root@patternlab ~]# docker info
Containers: 39
Running: 0
Paused: 0
Stopped: 39
Images: 129
Server Version: 1.12.1
Storage Driver: btrfs
Build Version: Btrfs v3.19.1
Library Version: 101
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: null bridge host overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: seccomp
Kernel Version: 3.10.0-327.36.1.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 40
Total Memory: 125.8 GiB
Name: patternlab...
ID: BRCR:FEZ7:E5MS:AQRE:B7MR:HUV7:KPTC:MH4Q:2KP2:AHP4:ZWLO:EOMO
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
Insecure Registries:
127.0.0.0/8

This is /var/log/messages showing startup and failure when a container is run

Oct 10 18:45:09 patternlab systemd: Starting NVIDIA Docker plugin...
Oct 10 18:45:09 patternlab nvidia-docker-plugin: /usr/bin/nvidia-docker-plugin | 2016/10/10 18:45:09 Loading NVIDIA unified memory
Oct 10 18:45:09 patternlab kernel: nvidia-uvm: Loaded the UVM driver in 8 mode, major device number 243
Oct 10 18:45:09 patternlab nvidia-docker-plugin: /usr/bin/nvidia-docker-plugin | 2016/10/10 18:45:09 Loading NVIDIA management library
Oct 10 18:45:09 patternlab systemd: Started NVIDIA Docker plugin.
Oct 10 18:45:17 patternlab nvidia-docker-plugin: /usr/bin/nvidia-docker-plugin | 2016/10/10 18:45:17 Discovering GPU devices
Oct 10 18:45:18 patternlab nvidia-docker-plugin: /usr/bin/nvidia-docker-plugin | 2016/10/10 18:45:18 Provisioning volumes at /var/lib/nvidia-docker/volumes
Oct 10 18:45:18 patternlab nvidia-docker-plugin: /usr/bin/nvidia-docker-plugin | 2016/10/10 18:45:18 Serving plugin API at /var/lib/nvidia-docker
Oct 10 18:45:18 patternlab nvidia-docker-plugin: /usr/bin/nvidia-docker-plugin | 2016/10/10 18:45:18 Serving remote API at localhost:3476
Oct 10 18:46:05 patternlab dockerd: time="2016-10-10T18:46:05.121675274-07:00" level=error msg="Handler for GET /v1.24/containers/nvidia/cuda/json returned error: No such container: nvidia/cuda"
Oct 10 18:46:05 patternlab dockerd: time="2016-10-10T18:46:05.170028283-07:00" level=error msg="Handler for GET /v1.24/containers/nvidia/cuda/json returned error: No such container: nvidia/cuda"
Oct 10 18:46:05 patternlab nvidia-docker-plugin: /usr/bin/nvidia-docker-plugin | 2016/10/10 18:46:05 Received activate request
Oct 10 18:46:05 patternlab nvidia-docker-plugin: /usr/bin/nvidia-docker-plugin | 2016/10/10 18:46:05 Plugins activated [VolumeDriver]
Oct 10 18:46:05 patternlab dockerd: time="2016-10-10T18:46:05.194777571-07:00" level=error msg="Handler for GET /v1.24/volumes/nvidia_driver_361.93.02 returned error: get nvidia_driver_361.93.02: no such volume"
Oct 10 18:46:05 patternlab nvidia-docker-plugin: /usr/bin/nvidia-docker-plugin | 2016/10/10 18:46:05 Received create request for volume 'nvidia_driver_361.93.02'
Oct 10 18:46:05 patternlab nvidia-docker-plugin: /usr/bin/nvidia-docker-plugin | 2016/10/10 18:46:05 Error: link /usr/bin/nvidia-cuda-mps-control /var/lib/nvidia-docker/volumes/nvidia_driver/361.93.02/bin/nvidia-cuda-mps-control: invalid cross-device link
Oct 10 18:46:06 patternlab dockerd: time="2016-10-10T18:46:06.103073247-07:00" level=error msg="Handler for POST /v1.24/containers/create returned error: create nvidia_driver_361.93.02: VolumeDriver.Create: internal error, check logs for details\n"
Oct 10 18:50:07 patternlab dockerd: time="2016-10-10T18:50:07.500706506-07:00" level=error msg="Handler for GET /v1.24/containers/nvidia/cuda/json returned error: No such container: nvidia/cuda"
Oct 10 18:50:07 patternlab dockerd: time="2016-10-10T18:50:07.546354707-07:00" level=error msg="Handler for GET /v1.24/containers/nvidia/cuda/json returned error: No such container: nvidia/cuda"
Oct 10 18:50:07 patternlab dockerd: time="2016-10-10T18:50:07.565630827-07:00" level=error msg="Handler for GET /v1.24/volumes/nvidia_driver_361.93.02 returned error: get nvidia_driver_361.93.02: no such volume"
Oct 10 18:50:07 patternlab nvidia-docker-plugin: /usr/bin/nvidia-docker-plugin | 2016/10/10 18:50:07 Received create request for volume 'nvidia_driver_361.93.02'
Oct 10 18:50:07 patternlab nvidia-docker-plugin: /usr/bin/nvidia-docker-plugin | 2016/10/10 18:50:07 Error: link /usr/bin/nvidia-cuda-mps-control /var/lib/nvidia-docker/volumes/nvidia_driver/361.93.02/bin/nvidia-cuda-mps-control: invalid cross-device link
Oct 10 18:50:08 patternlab dockerd: time="2016-10-10T18:50:08.248957378-07:00" level=error msg="Handler for POST /v1.24/containers/create returned error: create nvidia_driver_361.93.02: VolumeDriver.Create: internal error, check logs for details\n"
Oct 10 18:50:19 patternlab dockerd: time="2016-10-10T18:50:19.124472272-07:00" level=error msg="Handler for GET /v1.24/containers/nvidia/cuda/json returned error: No such container: nvidia/cuda"
Oct 10 18:50:19 patternlab dockerd: time="2016-10-10T18:50:19.171098610-07:00" level=error msg="Handler for GET /v1.24/containers/nvidia/cuda/json returned error: No such container: nvidia/cuda"
Oct 10 18:50:19 patternlab dockerd: time="2016-10-10T18:50:19.196644375-07:00" level=error msg="Handler for GET /v1.24/volumes/nvidia_driver_361.93.02 returned error: get nvidia_driver_361.93.02: no such volume"
Oct 10 18:50:19 patternlab nvidia-docker-plugin: /usr/bin/nvidia-docker-plugin | 2016/10/10 18:50:19 Received create request for volume 'nvidia_driver_361.93.02'
Oct 10 18:50:19 patternlab nvidia-docker-plugin: /usr/bin/nvidia-docker-plugin | 2016/10/10 18:50:19 Error: link /usr/bin/nvidia-cuda-mps-control /var/lib/nvidia-docker/volumes/nvidia_driver/361.93.02/bin/nvidia-cuda-mps-control: invalid cross-device link
Oct 10 18:50:19 patternlab dockerd: time="2016-10-10T18:50:19.875520810-07:00" level=error msg="Handler for POST /v1.24/containers/create returned error: create nvidia_driver_361.93.02: VolumeDriver.Create: internal error, check logs for details\n"

Most helpful comment

Hello,

The issue is right here:

Oct 10 18:46:05 patternlab nvidia-docker-plugin: /usr/bin/nvidia-docker-plugin | 2016/10/10 18:46:05 Error: link /usr/bin/nvidia-cuda-mps-control /var/lib/nvidia-docker/volumes/nvidia_driver/361.93.02/bin/nvidia-cuda-mps-control: invalid cross-device link

It's a known limitation, sorry about that.

With systemd, you can do something like that:

# systemctl edit nvidia-docker

[Service]
ExecStart=
ExecStart=/usr/bin/nvidia-docker-plugin -s $SOCK_DIR -d /usr/local/nvidia-driver

All 6 comments

Hello,

The issue is right here:

Oct 10 18:46:05 patternlab nvidia-docker-plugin: /usr/bin/nvidia-docker-plugin | 2016/10/10 18:46:05 Error: link /usr/bin/nvidia-cuda-mps-control /var/lib/nvidia-docker/volumes/nvidia_driver/361.93.02/bin/nvidia-cuda-mps-control: invalid cross-device link

It's a known limitation, sorry about that.

With systemd, you can do something like that:

# systemctl edit nvidia-docker

[Service]
ExecStart=
ExecStart=/usr/bin/nvidia-docker-plugin -s $SOCK_DIR -d /usr/local/nvidia-driver

should /usr/local/nvidia-driver also be a btrfs NVMe ?

Path /usr/local/nvidia-driver is just an example, because sometimes people have /usr and /var on different partitions, using -d will put the volume directory on the same partition as /usr, where the NVIDIA driver is installed by default.

So, you can modify the install path of the driver, or you can modify the path of the nvidia-docker volumes, but they must reside on the same partition.

So no, it doesn't have to be btrfs NVMe, I think this path can be on a different partition than docker itself and it should work fine.

Its happy now! Thanks

No problem! Nice machine you have there :)

Was this page helpful?
0 / 5 - 0 ratings