Nvidia-docker: Debian 10 (Buster) error response from daemon: Unknown runtime specified nvidia. / OCI runtime create failed.

Created on 29 Aug 2019 · 25Comments · Source: NVIDIA/nvidia-docker

1. Issue or feature description

System information

OS: Debian GNU/Linux 10 (buster) x86_64
Kernel: 4.19.0-5-amd64
CPU: Intel i7-6700 (8) @ 3.400GHz
GPU 1: Intel HD Graphics 530
GPU 2: NVIDIA GeForce RTX 2070
Docker: 19.03.1, build 74b1e89

Problem description
I followed the Quickstart documentation in order to install a docker image with GPU support on Debian Buster. However, when I try to run the docker container for verification, I only get the following error message:

svdhero@ml-box-pmt:~$ docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi
docker: Error response from daemon: Unknown runtime specified nvidia.

Alternatively, I also tried

svdhero@ml-box-pmt:~$ docker run --rm nvidia/cuda nvidia-smi
docker: Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "exec: \"nvidia-smi\": executable file not found in $PATH": unknown.

without any luck, as one can see.

Previously, I installed my NVIDIA drivers successfully via

sudo apt install nvidia-driver

as one can see here:

svdhero@ml-box-pmt:~$ nvidia-smi 
Fri Aug 23 13:01:51 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.74       Driver Version: 418.74       CUDA Version: N/A      |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 2070    On   | 00000000:01:00.0 Off |                  N/A |
|  0%   39C    P8     3W / 175W |      0MiB /  7952MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

I also installed docker successfully, as one can see here:

svdhero@ml-box-pmt:~$ docker --version
Docker version 19.03.1, build 74b1e89

svdhero@ml-box-pmt:~$ docker run --rm hello-world

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (amd64)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/

And finally I have installed nvidia-container-toolkit via:

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker

which seemed to have been successful:

svdhero@ml-box-pmt:~$ sudo apt search nvidia-container-toolkit
Sorting... Done
Full Text Search... Done
nvidia-container-toolkit/buster,now 1.0.3-1 amd64 [installed]
  NVIDIA container runtime hook

However, as stated at the beginning, I get the Unknown runtime specified nvidia error.
This is a brand-new Debian install with no legacy packages installed.

2. Steps to reproduce the issue

Install latest docker version on Debian 10 Buster.
Install latest nvidia-container-toolkit
Run command docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi
Get an Unknown runtime specified nvidia error.

3. Information to attach (optional if deemed irrelevant)

See attached text file nvidia_system_information.txt containing:

[x] Some nvidia-container information: nvidia-container-cli -k -d /dev/tty info
[x] Kernel version from uname -a
[x] Any relevant kernel output lines from dmesg
[x] Driver information from nvidia-smi -a
[x] Docker version from docker version
[x] NVIDIA packages version from dpkg -l '*nvidia*'
[x] NVIDIA container library version from nvidia-container-cli -V
[ ] NVIDIA container library logs (see troubleshooting)
[x] Docker command, image and tag used

Source

svdHero

Most helpful comment

--runtime only works with the old nvidia-docker2 package
--gpus works with both the old nvidia-docker2 package and the new nvidia-container-toolkit package
Native integration means that the CLI supports isolating GPUs natively. That however doesn't mean that it isn't a plugin (i.e you still need to install the vendor specific plugin)
driver error: failed to process request means that something is wrong with your driver install

If you had attached the requested logs, in this specific case : sudo nvidia-container-cli -k -d /dev/tty info we would have been able to help you more effectively.

RenaudWasTaken on 2 Sep 2019

👍3

All 25 comments

Starting with 19.03, Docker natively supports GPU's. Try running:
docker run --gpus all nvidia/cuda nvidia-smi

cdesiniotis on 29 Aug 2019

👍1

Starting with 19.03, Docker natively supports GPU's.

I know. I was just not sure what that actually meant. Nvidia-support does not come out of the box with merely installing Docker and nothing else, i.e. one still has to install Nvidia Container Toolkit via apt install nvidia-container-toolkit even with Docker 19.03. Is that correct? At least that's how I interpreted the repo's README.

Try running: docker run --gpus all nvidia/cuda nvidia-smi

I did and got:

svdhero@ml-box:~$ docker run --gpus all nvidia/cuda nvidia-smi
docker: Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "process_linux.go:430: container init caused \"process_linux.go:413: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods --debug=/var/log/nvidia-container-toolkit.log configure --ldconfig=@/sbin/ldconfig --device=all --compute --utility --require=cuda>=10.1 brand=tesla,driver>=384,driver<385 brand=tesla,driver>=396,driver<397 brand=tesla,driver>=410,driver<411 --pid=1409 /var/lib/docker/overlay2/2e5623f4c6dc82d2230bcdeca765998e5a5acfce1a7cb66ace64494f67b73d35/merged]\\\\nnvidia-container-cli: initialization error: driver error: failed to process request\\\\n\\\"\"": unknown.
ERRO[0000] error waiting for container: context canceled

svdHero on 30 Aug 2019

I have tried the same procedure on Ubuntu 18.04 (same hardware system) and everything worked perfectly. It must be related to Debian 10 support. Any further suggestions?

svdHero on 2 Sep 2019

--runtime only works with the old nvidia-docker2 package
--gpus works with both the old nvidia-docker2 package and the new nvidia-container-toolkit package
Native integration means that the CLI supports isolating GPUs natively. That however doesn't mean that it isn't a plugin (i.e you still need to install the vendor specific plugin)
driver error: failed to process request means that something is wrong with your driver install

If you had attached the requested logs, in this specific case : sudo nvidia-container-cli -k -d /dev/tty info we would have been able to help you more effectively.

RenaudWasTaken on 2 Sep 2019

👍3

Thanks @RenaudWasTaken for your explanations. They helped me a lot to understand the wording "native support".

The reason why I did not include the NVIDIA container library logs is that I didn't see any logs in the console output apart from the the error messages above. Should there be any log files anywhere?
I did follow the instructions in the troubleshooting, but it doesn't say where to find the logs.

Can you tell me where to find the logs? I am happy to attach them.

svdHero on 2 Sep 2019

What is the output of:
$ sudo nvidia-container-cli -k -d /dev/tty info
$ docker run --gpus all nvidia/cuda nvidia-smi

The first one may have an impact on the second so the order matters

RenaudWasTaken on 2 Sep 2019

I had the output of "nvidia-container-cli" already attached. See the originally attached file above.
I've re-run the commands and got:

svdhero@ml-box:~$ sudo nvidia-container-cli -k -d /dev/tty info

-- WARNING, the following logs are for debugging purposes only --

I0902 13:16:07.314082 1675 nvc.c:281] initializing library context (version=1.0.4, build=67b384e9768f692581d08aecb8088f4dd09ea59d)
I0902 13:16:07.314126 1675 nvc.c:255] using root /
I0902 13:16:07.314133 1675 nvc.c:256] using ldcache /etc/ld.so.cache
I0902 13:16:07.314137 1675 nvc.c:257] using unprivileged user 65534:65534
I0902 13:16:07.314775 1676 nvc.c:191] loading kernel module nvidia
I0902 13:16:07.314984 1676 nvc.c:203] loading kernel module nvidia_uvm
I0902 13:16:07.365531 1676 nvc.c:211] loading kernel module nvidia_modeset
I0902 13:16:07.366147 1686 driver.c:133] starting driver service
E0902 13:16:07.366574 1686 driver.c:197] could not start driver service: load library failed: libcuda.so.1: cannot open shared object file: no such file or directory
I0902 13:16:07.366745 1675 driver.c:233] driver service terminated successfully
nvidia-container-cli: initialization error: driver error: failed to process request

svdhero@ml-box:~$ docker run --gpus all nvidia/cuda nvidia-smi
docker: Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "process_linux.go:430: container init caused \"process_linux.go:413: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods --debug=/var/log/nvidia-container-toolkit.log configure --ldconfig=@/sbin/ldconfig --device=all --compute --utility --require=cuda>=10.1 brand=tesla,driver>=384,driver<385 brand=tesla,driver>=396,driver<397 brand=tesla,driver>=410,driver<411 --pid=1732 /var/lib/docker/overlay2/42e97edeaced35365443a1d74e56f0708f2f94ccd3e194735a9ff802aa159aa8/merged]\\\\nnvidia-container-cli: initialization error: driver error: failed to process request\\\\n\\\"\"": unknown.
ERRO[0000] error waiting for container: context canceled

What's wrong with libcuda.so.1? Shouldn't CUDA come with the NVIDIA Container Toolkit and/or with the container image? I only installed the driver:

svdhero@ml-box:~$ nvidia-smi
Mon Sep  2 15:27:08 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.74       Driver Version: 418.74       CUDA Version: N/A      |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 2070    On   | 00000000:01:00.0 Off |                  N/A |
|  0%   43C    P8     3W / 175W |      0MiB /  7952MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

svdHero on 2 Sep 2019

Hey, could you provide us the output of ldconfig -p | grep libcuda please?

Ethyling on 4 Sep 2019

The output is empty. There is no libcuda listed in /etc/ld.so.cache on my host system. Should it have come with the driver installation via sudo apt install nvidia-driver?

svdHero on 5 Sep 2019

I think you need to install nvidia-cuda-toolkit

Ethyling on 5 Sep 2019

So maybe I misunderstood, but isn't the whole point of nvidia-container-toolkit and the docker images that I do not have to install CUDA manually, but get everything out-of-the-box, i.e. out-of-the-container?

Also the docs here don't mention anything about installing CUDA. I certainly did not have to do that on Ubuntu. There I only installed the driver. But I will go and check if libcuda is listed on my Ubuntu system. Just to make sure.

Could you point me in the right direction how to install nvidia-cuda-toolkit on Debian 10, please. Do I just go here: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html ?
Unfortunately Debian is not among the officially supported distributions on said CUDA website. Why not? It does not seem to be a small or unimportant distribution. :-(

svdHero on 5 Sep 2019

Any further ideas anyone? If not, I will have to abandon Debian and go with Ubuntu.

svdHero on 11 Sep 2019

I met the same problem on Ubuntu18.04，maybe Debian is't the reason.

xc@xc:~$ docker run --gpus all nvidia/cuda nvidia-smi
docker: Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "process_linux.go:430: container init caused "process_linux.go:413: running prestart hook 0 caused \"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: requirement error: unsatisfied condition: cuda>=10.1\\n\""": unknown.
ERRO[0000] error waiting for container: context canceled

melon-pie on 11 Sep 2019

@svdHero

So maybe I misunderstood, but isn't the whole point of nvidia-container-toolkit and the docker images that I do not have to install CUDA manually, but get everything out-of-the-box, i.e. out-of-the-container?

No, nvidia-container-toolkit is injecting the local CUDA libs into the container at runtime, because the driver and the libs must have the same version.

Also the docs here don't mention anything about installing CUDA. I certainly did not have to do that on Ubuntu. There I only installed the driver. But I will go and check if libcuda is listed on my Ubuntu system. Just to make sure.

By installing the driver, you should have libcuda.so on your system.

Ethyling on 11 Sep 2019

Hey,

I met the same problem on Ubuntu18.04，maybe Debian is't the reason.

xc@xc:~$ docker run --gpus all nvidia/cuda nvidia-smi
docker: Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "process_linux.go:430: container init caused "process_linux.go:413: running prestart hook 0 caused \"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: requirement error: unsatisfied condition: cuda>=10.1\\n\""": unknown.
ERRO[0000] error waiting for container: context canceled

The error message here is requirement error: unsatisfied condition: cuda>=10.1. So to run this image, you need to have cuda>=10.1. You can update your driver or run another image (nvidia/cuda:9.0-base for example). You can find all the tags here https://hub.docker.com/r/nvidia/cuda/tags

Ethyling on 11 Sep 2019

👍1

No, nvidia-container-toolkit is injecting the local CUDA libs into the container at runtime, because the driver and the libs must have the same version.

Ah okay. Thanks for the clarification. Then I really misunderstood. Learn something new every day...

The problem must be down to sudo apt install nvidia-driver then. I wonder what's wrong with that. @Ethyling did you install the drivers the same way or did you download the drivers from NVIDIA directly? I am assuming you have a Debian system running, since it was you who added the Debian 10 support to the repo?

svdHero on 11 Sep 2019

Hey,
I met the same problem on Ubuntu18.04，maybe Debian is't the reason.
xc@xc:~$ docker run --gpus all nvidia/cuda nvidia-smi
docker: Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "process_linux.go:430: container init caused "process_linux.go:413: running prestart hook 0 caused \"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: requirement error: unsatisfied condition: cuda>=10.1\\n\""": unknown.
ERRO[0000] error waiting for container: context canceled
The error message here is requirement error: unsatisfied condition: cuda>=10.1. So to run this image, you need to have cuda>=10.1. You can update your driver or run another image (nvidia/cuda:9.0-base for example). You can find all the tags here https://hub.docker.com/r/nvidia/cuda/tags

@Ethyling Thank you very much for your professional reply！

melon-pie on 12 Sep 2019

I was having the same issue. Updating my driver to version 430 fixed it, so thanks a lot!

gurumaia on 16 Sep 2019

For me the problem still exists. When I install the official NVIDIA driver from https://www.nvidia.de/Download/index.aspx CUDA is installed correctly:

svdhero@ml-box:~$ nvidia-smi 
Tue Sep 24 14:30:24 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.50       Driver Version: 430.50       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 2070    Off  | 00000000:01:00.0 Off |                  N/A |
| 21%   39C    P0     1W / 175W |      0MiB /  7982MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

However, when I try to install nvidia-container-toolkit via apt, I get an ncurses-error-dialog telling me:

   ┌───────────────────────────────────────────────────────────────────────┤ Configuring nvidia-installer-cleanup ├───────────────────────────────────────────────────────────────────────┐
   │                                                                                                                                                                                      │ 
   │ The nvidia-installer program was found on this system. This is probably left over from an earlier installation of the non-free NVIDIA graphics driver, installed using the NVIDIA    │ 
   │ *.run file directly. This installation is incompatible with the Debian packages. To install the Debian packages safely, it is therefore necessary to undo the changes performed by   │ 
   │ nvidia-installer.                                                                                                                                                                    │ 
   │                                                                                                                                                                                      │ 
   │ Run "nvidia-installer --uninstall"?                                                                                                                                                  │ 
   │                                                                                                                                                                                      │ 
   │                                                        <Yes>                                                           <No>                                                          │ 
   │                                                                                                                                                                                      │ 
   └──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

What shall I do? If I remove the proprietary NVIDIA driver, I am back to square one. If I do NOT remove it, apt aborts the installation.

svdHero on 24 Sep 2019

I am having the same problems listed in this thread using Ubuntu 18.04. =Is there any information I could provide that would help pinpoint the issue?

OtherLeadingBrand on 30 Sep 2019

@svdHero please uninstall the driver first
@OtherLeadingBrand please open a new issue with the template filled

RenaudWasTaken on 30 Sep 2019

@RenaudWasTaken If I uninstall the driver, then the driver from the Debian repositories is pulled in and I am back to square one with the issue described in my previous posts, i.e. no CUDA installed. I deliberately did install the NVIDIA driver to solve said problems. So uninstalling it seems a bit strange, or am I missing something here?

Anyway, I gave up on Debian now and went down the Ubuntu happy path. Just couldn't bother anymore. However, I wonder how other Debian users manage. At least over here in Germany, many universities run Debian almost exclusively AFAIK.

@OtherLeadingBrand Read this forum post by marcmuc and follow the happy path. :smile:

svdHero on 1 Oct 2019

I had a similar issue, which I could solve following these notes. Looking around I see that many are confused on this issue. There are some instructions on web stating that "...Docker Versions earlier than 19.03 require nvidia-docker2 and the --runtime=nvidia flag. On versions including and after 19.03, you will use the nvidia-container-toolkit package and the --gpus all flag". There is a diffused perception that with 19.03 you don't need to install Drivers and CUDA packages, due to the fact that Docker takes care in a "native way" of the NVIDIA GPU. So one can erroneously think that the toolkit package handles and install behind the scene the Drivers as well. In reality, if we are not developing in CUDA, with the toolkit process we may skip to install the CUDA package, however, we cannot avoid installing the NVIDIA drivers, without those drivers, even with Docker version > 19.03, we generate errors of the type described in this issue. The results may look erratic if we have fresh distributions without any driver and also distributions having some residual of driver installations.