Nvidia-docker: Can not use nvidia-docker. docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: ...

Created on 20 Mar 2020  ·  41Comments  ·  Source: NVIDIA/nvidia-docker

_The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense._

_Also, before reporting a new issue, please make sure that:_


1. Issue or feature description

previous steps are same with the tutorial.
after installing nvidia-container-toolkit sudo apt-get install -y nvidia-container-toolkit
when I used the test examples, it always got error.
docker run --gpus all nvidia/cuda:10.0-base nvidia-smi
error:

_docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: requirement error: unsatisfied condition: cuda>=10.0, please update your driver to a newer version, or use an earlier cuda container\\n\\"\"": unknown.
ERRO[0018] error waiting for container: context canceled_

2. Steps to reproduce the issue

just when I run the test examples:
docker run --gpus all nvidia/cuda:10.0-base nvidia-smi

error message

docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: requirement error: unsatisfied condition: cuda>=10.0, please update your driver to a newer version, or use an earlier cuda container\\n\\"\"": unknown.
ERRO[0018] error waiting for container: context canceled

I also tried docker run --gpus 1 nvidia/cuda nvidia-smi
the error is similar

docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: requirement error: unsatisfied condition: cuda>=10.1, please update your driver to a newer version, or use an earlier cuda container\\n\\"\"": unknown.
ERRO[0124] error waiting for container: context canceled

3. Information to attach (optional if deemed irrelevant)

  • [ ] Some nvidia-container information: nvidia-container-cli -k -d /dev/tty info
  • [ ] Kernel version from uname -a
  • [ ] Any relevant kernel output lines from dmesg
  • [ ] Driver information from nvidia-smi -a
  • [ ] Docker version from docker version
  • [ ] NVIDIA packages version from dpkg -l '*nvidia*' _or_ rpm -qa '*nvidia*'
  • [ ] NVIDIA container library version from nvidia-container-cli -V
  • [ ] NVIDIA container library logs (see troubleshooting)
  • [ ] Docker command, image and tag used

Most helpful comment

I am seeing the same with driver version 440.82:

# docker run \
    --rm \
    --runtime=nvidia \
    -e NVIDIA_VISIBLE_DEVICES=all \
    -e NVIDIA_DRIVER_CAPABILITIES=all \
    nvidia/cuda nvidia-smi
/run/torcx/bin/docker: Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:402: container init caused \"process_linux.go:385: running prestart hook 1 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli.real: initialization error: driver error: failed to process request\\\\n\\\"\"": unknown.
# uname -a
Linux core1 4.19.106-coreos #1 SMP Wed Feb 26 21:43:18 -00 2020 x86_64 Intel(R) Core(TM) i5-6600K CPU @ 3.50GHz GenuineIntel GNU/Linux
# dockerd --version
Docker version 18.06.3-ce, build d7080c1
# /opt/drivers/nvidia/bin/nvidia-smi
Fri Apr 17 12:54:34 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82       Driver Version: 440.82       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 2060    Off  | 00000000:02:00.0 Off |                  N/A |
|  0%   45C    P8     9W / 160W |      0MiB /  5932MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

All 41 comments

hi @tytcc - what NVIDIA driver version are you running on your Linux system? You should have at least r410 to run CUDA 10.0 containers and r418 to run CUDA 10.1 containers.

Please provide the output of nvidia-smi

Thanks for your answering. @dualvtable
Now I know my NVIDIA driver version is too old.

@tytcc I also faced the same problem on ubunut16.04 machine. I have the latest driver 440.64.00 installed and now i tried to run example
docker run --gpus all nvidia/cuda:10.0-base nvidia-smi
i get this error
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: cuda error: unknown error\\\\n\\\"\"": unknown. ERRO[0001] error waiting for container: context canceled

Getting the same output after installing nvidia-containers-toolkit.

***@pop-os:~$ docker run --gpus all nvidia/cuda:10.0-base nvidia-smi
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: detection error: driver error: failed to process request\\\\n\\\"\"": unknown.
ERRO[0000] error waiting for container: context canceled 

Tried the steps mentioned in #1114 but still no luck.

nvidia-smi output:

NVIDIA-SMI 440.64       Driver Version: 440.64       CUDA Version: 10.2
0  Quadro M2000M       Off 

OS details:

***@pop-os:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Pop!_OS 18.04 LTS
Release:    18.04
Codename:   bionic

I am seeing the same with driver version 440.82:

# docker run \
    --rm \
    --runtime=nvidia \
    -e NVIDIA_VISIBLE_DEVICES=all \
    -e NVIDIA_DRIVER_CAPABILITIES=all \
    nvidia/cuda nvidia-smi
/run/torcx/bin/docker: Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:402: container init caused \"process_linux.go:385: running prestart hook 1 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli.real: initialization error: driver error: failed to process request\\\\n\\\"\"": unknown.
# uname -a
Linux core1 4.19.106-coreos #1 SMP Wed Feb 26 21:43:18 -00 2020 x86_64 Intel(R) Core(TM) i5-6600K CPU @ 3.50GHz GenuineIntel GNU/Linux
# dockerd --version
Docker version 18.06.3-ce, build d7080c1
# /opt/drivers/nvidia/bin/nvidia-smi
Fri Apr 17 12:54:34 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82       Driver Version: 440.82       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 2060    Off  | 00000000:02:00.0 Off |                  N/A |
|  0%   45C    P8     9W / 160W |      0MiB /  5932MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Getting the same error:

keo7@home-desktop:~$ uname -a
Linux home-desktop 4.19.0-8-amd64 #1 SMP Debian 4.19.98-1+deb10u1 (2020-04-27) x86_64 GNU/Linux
keo7@home-desktop:~$ dockerd --version
Docker version 19.03.8, build afacb8b7f0
keo7@home-desktop:~$ nvidia-smi
Fri May  8 22:39:56 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82       Driver Version: 440.82       CUDA Version: N/A      |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  TITAN V             On   | 00000000:26:00.0  On |                  N/A |
| 28%   43C    P2    38W / 250W |    678MiB / 12066MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

@KeironO I wouldn't bother using the nvidia runtime in my opinion, it's disruptive to the setup of your distribution's runc (or whatever OCI runtime you have), clearly it has some issues and all it does is wrap runc with some helpers controlled by environment variables (at least from what I can tell).

If you can find out what your application needs you should be able to expose the devices and libraries from the host manually without having to have an extra binary to manage.

If there are other benefits I'd be interested to know as I have my GPU accelerated workloads running without needing to change my host's Docker setup.

I am also getting the same error with the same setup as @KeironO.

Hi there!

nvidia-container-cli.real: initialization error: driver error: failed to process request\\n\\"\"": unknown.

@billwhiteley @KeironO
Most of the time this issue is linked to an incorrect driver installation or incorrect driver loading . We can usually figure out which one it is when the issue template is filled :)

Unfortunately being able to run nvidia-smi doesn't mean that your driver is fully loaded and you'll see issues later down the line (such as when running CUDA code or tensorflow).

I wouldn't bother using the nvidia runtime in my opinion, it's disruptive to the setup of your distribution's runc (or whatever OCI runtime you have), clearly it has some issues and all it does is wrap runc with some helpers controlled by environment variables (at least from what I can tell).
If you can find out what your application needs you should be able to expose the devices and libraries from the host manually without having to have an extra binary to manage.

The NVIDIA runtime is only expected to be installed in a Kubernetes environment. For a docker only the nvidia-container-toolkit is required (see the README).

As for implementing what the NVIDIA Container Toolkit does, you can certainly do that, however this would this probably have a high upfront cost for you to understand the details of the NVIDIA driver and userland architecture, and I'm not sure you want to be maintaining such a piece of software :) You would also be missing on new driver features as they come out, and if the CUDA or NVIDIA driver model changes you'd have to rewrite that software.
Without bringing up enterprise support or general support, if your use case is narrow enough and you don't mind paying that maintenance cost that's definitely an option :)

For a Kubernetes environment the NVIDIA runtime provides even less benefit, all you need are the NVIDIA drivers/libraries on the host and this DaemonSet and then GPUs can be requested in the normal Kubernetes way:

resources:
  limits:
    nvidia.com/gpu: 1

Relevant Kubernetes documentation is here.

If your libraries aren't in the default location (/home/kubernetes/bin/nvidia for some reason) you can specify the location manually using the -host-path flag. You may need to add an NVIDIA entry to your container's /etc/ld.so.conf.d and run ldconfig so that the libraries can be found by your application.

Here's the full usage:

Usage of /usr/bin/nvidia-gpu-device-plugin:
  -alsologtostderr
        log to standard error as well as files
  -container-path string
        Path on the container that mounts '-host-path' (default "/usr/local/nvidia")
  -container-vulkan-icd-path string
        Path on the container that mounts '-host-vulkan-icd-path' (default "/etc/vulkan/icd.d")
  -host-path string
        Path on the host that contains nvidia libraries. This will be mounted inside the container as '-container-path' (default "/home/kubernetes/bin/nvidia")
  -host-vulkan-icd-path string
        Path on the host that contains the Nvidia Vulkan installable client driver. This will be mounted inside the container as '-container-vulkan-icd-path' (default "/home/kubernetes/bin/nvidia/vulkan/icd.d")
  -log_backtrace_at value
        when logging hits line file:N, emit a stack trace
  -log_dir string
        If non-empty, write log files in this directory
  -logtostderr
        log to standard error instead of files
  -plugin-directory string
        The directory path to create plugin socket (default "/device-plugin")
  -stderrthreshold value
        logs at or above this threshold go to stderr
  -v value
        log level for V logs
  -vmodule value
        comma-separated list of pattern=N settings for file-filtered logging

im having the same issuse.

0511 15:53:14.054294 27377 nvc.c:281] initializing library context (version=1.0.7, build=b71f87c04b8eca8a16bf60995506c35c937347d9)
I0511 15:53:14.054490 27377 nvc.c:255] using root /
I0511 15:53:14.054525 27377 nvc.c:256] using ldcache /etc/ld.so.cache
I0511 15:53:14.054595 27377 nvc.c:257] using unprivileged user 1000:1000
W0511 15:53:14.056714 27378 nvc.c:186] failed to set inheritable capabilities
W0511 15:53:14.056939 27378 nvc.c:187] skipping kernel modules load due to failure
I0511 15:53:14.058134 27379 driver.c:133] starting driver service
I0511 15:53:14.107994 27377 nvc_info.c:438] requesting driver information with ''
I0511 15:53:14.109434 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvoptix.so.440.33.01
I0511 15:53:14.109515 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/tls/libnvidia-tls.so.440.33.01
I0511 15:53:14.109800 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.440.33.01 over /usr/lib/x86_64-linux-gnu/tls/libnvidia-tls.so.440.33.01
I0511 15:53:14.110348 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.440.33.01
I0511 15:53:14.111277 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.440.33.01
I0511 15:53:14.112608 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.440.33.01
I0511 15:53:14.114313 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.440.33.01
I0511 15:53:14.114387 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.440.33.01
I0511 15:53:14.115208 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ifr.so.440.33.01
I0511 15:53:14.115956 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.440.33.01
I0511 15:53:14.116012 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.440.33.01
I0511 15:53:14.116075 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.440.33.01
I0511 15:53:14.116886 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.440.33.01
I0511 15:53:14.117984 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.440.33.01
I0511 15:53:14.118698 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.440.33.01
I0511 15:53:14.118783 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.440.33.01
I0511 15:53:14.119561 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.440.33.01
I0511 15:53:14.119626 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.440.33.01
I0511 15:53:14.120347 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-cbl.so.440.33.01
I0511 15:53:14.121159 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvcuvid.so.440.33.01
I0511 15:53:14.121611 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libcuda.so.440.33.01
I0511 15:53:14.121935 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.440.33.01
I0511 15:53:14.122775 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.440.33.01
I0511 15:53:14.123599 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.440.33.01
I0511 15:53:14.123773 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.440.33.01
I0511 15:53:14.125503 27377 nvc_info.c:154] skipping /usr/lib/i386-linux-gnu/libnvidia-ptxjitcompiler.so.440.48.02
I0511 15:53:14.126273 27377 nvc_info.c:154] skipping /usr/lib/i386-linux-gnu/libnvidia-opticalflow.so.440.48.02
I0511 15:53:14.127346 27377 nvc_info.c:154] skipping /usr/lib/i386-linux-gnu/libnvidia-opencl.so.440.48.02
I0511 15:53:14.128794 27377 nvc_info.c:154] skipping /usr/lib/i386-linux-gnu/libnvidia-ml.so.440.48.02
I0511 15:53:14.130323 27377 nvc_info.c:154] skipping /usr/lib/i386-linux-gnu/libnvidia-fbc.so.440.48.02
I0511 15:53:14.132006 27377 nvc_info.c:154] skipping /usr/lib/i386-linux-gnu/libnvidia-fatbinaryloader.so.440.48.02
I0511 15:53:14.133270 27377 nvc_info.c:154] skipping /usr/lib/i386-linux-gnu/libnvidia-encode.so.440.48.02
I0511 15:53:14.135013 27377 nvc_info.c:154] skipping /usr/lib/i386-linux-gnu/libnvidia-compiler.so.440.48.02
I0511 15:53:14.136295 27377 nvc_info.c:154] skipping /usr/lib/i386-linux-gnu/libnvcuvid.so.440.48.02
I0511 15:53:14.137890 27377 nvc_info.c:154] skipping /usr/lib/i386-linux-gnu/libcuda.so.440.48.02
W0511 15:53:14.138059 27377 nvc_info.c:303] missing library libvdpau_nvidia.so
W0511 15:53:14.138076 27377 nvc_info.c:307] missing compat32 library libnvidia-ml.so
W0511 15:53:14.138088 27377 nvc_info.c:307] missing compat32 library libnvidia-cfg.so
W0511 15:53:14.138098 27377 nvc_info.c:307] missing compat32 library libcuda.so
W0511 15:53:14.138108 27377 nvc_info.c:307] missing compat32 library libnvidia-opencl.so
W0511 15:53:14.138124 27377 nvc_info.c:307] missing compat32 library libnvidia-ptxjitcompiler.so
W0511 15:53:14.138148 27377 nvc_info.c:307] missing compat32 library libnvidia-fatbinaryloader.so
W0511 15:53:14.138166 27377 nvc_info.c:307] missing compat32 library libnvidia-compiler.so
W0511 15:53:14.138185 27377 nvc_info.c:307] missing compat32 library libvdpau_nvidia.so
W0511 15:53:14.138205 27377 nvc_info.c:307] missing compat32 library libnvidia-encode.so
W0511 15:53:14.138227 27377 nvc_info.c:307] missing compat32 library libnvidia-opticalflow.so
W0511 15:53:14.138250 27377 nvc_info.c:307] missing compat32 library libnvcuvid.so
W0511 15:53:14.138267 27377 nvc_info.c:307] missing compat32 library libnvidia-eglcore.so
W0511 15:53:14.138287 27377 nvc_info.c:307] missing compat32 library libnvidia-glcore.so
W0511 15:53:14.138308 27377 nvc_info.c:307] missing compat32 library libnvidia-tls.so
W0511 15:53:14.138328 27377 nvc_info.c:307] missing compat32 library libnvidia-glsi.so
W0511 15:53:14.138349 27377 nvc_info.c:307] missing compat32 library libnvidia-fbc.so
W0511 15:53:14.138367 27377 nvc_info.c:307] missing compat32 library libnvidia-ifr.so
W0511 15:53:14.138384 27377 nvc_info.c:307] missing compat32 library libnvidia-rtcore.so
W0511 15:53:14.138405 27377 nvc_info.c:307] missing compat32 library libnvoptix.so
W0511 15:53:14.138426 27377 nvc_info.c:307] missing compat32 library libGLX_nvidia.so
W0511 15:53:14.138444 27377 nvc_info.c:307] missing compat32 library libEGL_nvidia.so
W0511 15:53:14.138468 27377 nvc_info.c:307] missing compat32 library libGLESv2_nvidia.so
W0511 15:53:14.138491 27377 nvc_info.c:307] missing compat32 library libGLESv1_CM_nvidia.so
W0511 15:53:14.138511 27377 nvc_info.c:307] missing compat32 library libnvidia-glvkspirv.so
W0511 15:53:14.138531 27377 nvc_info.c:307] missing compat32 library libnvidia-cbl.so
I0511 15:53:14.140096 27377 nvc_info.c:233] selecting /usr/bin/nvidia-smi
I0511 15:53:14.140154 27377 nvc_info.c:233] selecting /usr/bin/nvidia-debugdump
I0511 15:53:14.140212 27377 nvc_info.c:233] selecting /usr/bin/nvidia-persistenced
I0511 15:53:14.140269 27377 nvc_info.c:233] selecting /usr/bin/nvidia-cuda-mps-control
I0511 15:53:14.140324 27377 nvc_info.c:233] selecting /usr/bin/nvidia-cuda-mps-server
I0511 15:53:14.140395 27377 nvc_info.c:370] listing device /dev/nvidiactl
I0511 15:53:14.140415 27377 nvc_info.c:370] listing device /dev/nvidia-uvm
I0511 15:53:14.140432 27377 nvc_info.c:370] listing device /dev/nvidia-uvm-tools
I0511 15:53:14.140449 27377 nvc_info.c:370] listing device /dev/nvidia-modeset
I0511 15:53:14.140520 27377 nvc_info.c:274] listing ipc /run/nvidia-persistenced/socket
W0511 15:53:14.140573 27377 nvc_info.c:278] missing ipc /tmp/nvidia-mps
I0511 15:53:14.140594 27377 nvc_info.c:494] requesting device information with ''
I0511 15:53:14.147767 27377 nvc_info.c:524] listing device /dev/nvidia0 (GPU-23fcb2ab-a6c2-b9e3-f455-6bf92a57b371 at 00000000:03:00.0)
NVRM version:   440.33.01
CUDA version:   10.2

Device Index:   0
Device Minor:   0
Model:          GeForce 920MX
Brand:          GeForce
GPU UUID:       GPU-23fcb2ab-a6c2-b9e3-f455-6bf92a57b371
Bus Location:   00000000:03:00.0
Architecture:   5.0
I0511 15:53:14.147861 27377 nvc.c:318] shutting down library context
I0511 15:53:14.148492 27379 driver.c:192] terminating driver service
I0511 15:53:14.234076 27377 driver.c:233] driver service terminated successfully

kernel version

Linux hema 5.3.0-51-generic #44~18.04.2-Ubuntu SMP Thu Apr 23 14:27:18 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

nvidia-smi -a

==============NVSMI LOG==============

Timestamp                           : Mon May 11 17:55:40 2020
Driver Version                      : 440.33.01
CUDA Version                        : 10.2

Attached GPUs                       : 1
GPU 00000000:03:00.0
    Product Name                    : GeForce 920MX
    Product Brand                   : GeForce
    Display Mode                    : Disabled
    Display Active                  : Disabled
    Persistence Mode                : Enabled
    Accounting Mode                 : Disabled
    Accounting Mode Buffer Size     : 4000
    Driver Model
        Current                     : N/A
        Pending                     : N/A
    Serial Number                   : N/A
    GPU UUID                        : GPU-23fcb2ab-a6c2-b9e3-f455-6bf92a57b371
    Minor Number                    : 0
    VBIOS Version                   : 82.08.5A.00.0D
    MultiGPU Board                  : No
    Board ID                        : 0x300
    GPU Part Number                 : N/A
    Inforom Version
        Image Version               : N/A
        OEM Object                  : N/A
        ECC Object                  : N/A
        Power Management Object     : N/A
    GPU Operation Mode
        Current                     : N/A
        Pending                     : N/A
    GPU Virtualization Mode
        Virtualization Mode         : None
        Host VGPU Mode              : N/A
    IBMNPU
        Relaxed Ordering Mode       : N/A
    PCI
        Bus                         : 0x03
        Device                      : 0x00
        Domain                      : 0x0000
        Device Id                   : 0x134F10DE
        Bus Id                      : 00000000:03:00.0
        Sub System Id               : 0x39F117AA
        GPU Link Info
            PCIe Generation
                Max                 : 3
                Current             : 3
            Link Width
                Max                 : 4x
                Current             : 4x
        Bridge Chip
            Type                    : N/A
            Firmware                : N/A
        Replays Since Reset         : 0
        Replay Number Rollovers     : 0
        Tx Throughput               : 493000 KB/s
        Rx Throughput               : 3000 KB/s
    Fan Speed                       : N/A
    Performance State               : P0
    Clocks Throttle Reasons
        Idle                        : Not Active
        Applications Clocks Setting : Not Active
        SW Power Cap                : Not Active
        HW Slowdown                 : Not Active
            HW Thermal Slowdown     : N/A
            HW Power Brake Slowdown : N/A
        Sync Boost                  : Not Active
        SW Thermal Slowdown         : Not Active
        Display Clock Setting       : Not Active
    FB Memory Usage
        Total                       : 2004 MiB
        Used                        : 870 MiB
        Free                        : 1134 MiB
    BAR1 Memory Usage
        Total                       : 256 MiB
        Used                        : 3 MiB
        Free                        : 253 MiB
    Compute Mode                    : Default
    Utilization
        Gpu                         : 0 %
        Memory                      : 0 %
        Encoder                     : N/A
        Decoder                     : N/A
    Encoder Stats
        Active Sessions             : 0
        Average FPS                 : 0
        Average Latency             : 0
    FBC Stats
        Active Sessions             : 0
        Average FPS                 : 0
        Average Latency             : 0
    Ecc Mode
        Current                     : N/A
        Pending                     : N/A
    ECC Errors
        Volatile
            Single Bit            
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : N/A
            Double Bit            
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : N/A
        Aggregate
            Single Bit            
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : N/A
            Double Bit            
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : N/A
    Retired Pages
        Single Bit ECC              : N/A
        Double Bit ECC              : N/A
        Pending Page Blacklist      : N/A
    Temperature
        GPU Current Temp            : 41 C
        GPU Shutdown Temp           : 99 C
        GPU Slowdown Temp           : 94 C
        GPU Max Operating Temp      : 98 C
        Memory Current Temp         : N/A
        Memory Max Operating Temp   : N/A
    Power Readings
        Power Management            : N/A
        Power Draw                  : N/A
        Power Limit                 : N/A
        Default Power Limit         : N/A
        Enforced Power Limit        : N/A
        Min Power Limit             : N/A
        Max Power Limit             : N/A
    Clocks
        Graphics                    : 993 MHz
        SM                          : 993 MHz
        Memory                      : 900 MHz
        Video                       : 973 MHz
    Applications Clocks
        Graphics                    : 967 MHz
        Memory                      : 900 MHz
    Default Applications Clocks
        Graphics                    : 965 MHz
        Memory                      : 900 MHz
    Max Clocks
        Graphics                    : 993 MHz
        SM                          : 993 MHz
        Memory                      : 900 MHz
        Video                       : 973 MHz
    Max Customer Boost Clocks
        Graphics                    : N/A
    Clock Policy
        Auto Boost                  : N/A
        Auto Boost Default          : N/A
    Processes
        Process ID                  : 1347
            Type                    : G
            Name                    : /usr/lib/xorg/Xorg
            Used GPU Memory         : 34 MiB
        Process ID                  : 1655
            Type                    : G
            Name                    : /usr/bin/gnome-shell
            Used GPU Memory         : 76 MiB
        Process ID                  : 2765
            Type                    : G
            Name                    : /usr/lib/xorg/Xorg
            Used GPU Memory         : 184 MiB
        Process ID                  : 2951
            Type                    : G
            Name                    : /usr/bin/gnome-shell
            Used GPU Memory         : 273 MiB
        Process ID                  : 3830
            Type                    : G
            Name                    : /opt/google/chrome/chrome --type=gpu-process --field-trial-handle=17903442744480519122,5081937925041455948,131072 --gpu-preferences=MAAAAAAAAAAgAAAAAAAAAAAAAAAAAAAAAABgAAAAAAAQAAAAAAAAAAAAAAAAAAAACAAAAAAAAAA= --shared-files
            Used GPU Memory         : 292 MiB


docker version

Client: Docker Engine - Community
 Version:           19.03.8
 API version:       1.40
 Go version:        go1.12.17
 Git commit:        afacb8b7f0
 Built:             Wed Mar 11 01:25:46 2020
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.8
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.17
  Git commit:       afacb8b7f0
  Built:            Wed Mar 11 01:24:19 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.13
  GitCommit:        7ad184331fa3e55e52b890ea95e65ba581ae3429
 runc:
  Version:          1.0.0-rc10
  GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

nvidia packages

Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                                           Version                      Architecture                 Description
+++-==============================================-============================-============================-=================================================================================================
un  libgldispatch0-nvidia                          <none>                       <none>                       (no description available)
ii  libnvidia-cfg1-440:amd64                       440.33.01-0ubuntu1           amd64                        NVIDIA binary OpenGL/GLX configuration library
un  libnvidia-cfg1-any                             <none>                       <none>                       (no description available)
un  libnvidia-common                               <none>                       <none>                       (no description available)
ii  libnvidia-common-440                           440.82-0ubuntu0~0.18.04.1    all                          Shared files used by the NVIDIA libraries
rc  libnvidia-compute-435:amd64                    435.21-0ubuntu0.18.04.2      amd64                        NVIDIA libcompute package
ii  libnvidia-compute-440:amd64                    440.33.01-0ubuntu1           amd64                        NVIDIA libcompute package
ii  libnvidia-container-tools                      1.0.7-1                      amd64                        NVIDIA container runtime library (command-line tools)
ii  libnvidia-container1:amd64                     1.0.7-1                      amd64                        NVIDIA container runtime library
un  libnvidia-decode                               <none>                       <none>                       (no description available)
ii  libnvidia-decode-440:amd64                     440.33.01-0ubuntu1           amd64                        NVIDIA Video Decoding runtime libraries
un  libnvidia-encode                               <none>                       <none>                       (no description available)
ii  libnvidia-encode-440:amd64                     440.33.01-0ubuntu1           amd64                        NVENC Video Encoding runtime library
un  libnvidia-fbc1                                 <none>                       <none>                       (no description available)
ii  libnvidia-fbc1-440:amd64                       440.33.01-0ubuntu1           amd64                        NVIDIA OpenGL-based Framebuffer Capture runtime library
un  libnvidia-gl                                   <none>                       <none>                       (no description available)
ii  libnvidia-gl-440:amd64                         440.33.01-0ubuntu1           amd64                        NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
un  libnvidia-ifr1                                 <none>                       <none>                       (no description available)
ii  libnvidia-ifr1-440:amd64                       440.33.01-0ubuntu1           amd64                        NVIDIA OpenGL-based Inband Frame Readback runtime library
un  libnvidia-ml1                                  <none>                       <none>                       (no description available)
un  nvidia-304                                     <none>                       <none>                       (no description available)
un  nvidia-340                                     <none>                       <none>                       (no description available)
un  nvidia-384                                     <none>                       <none>                       (no description available)
un  nvidia-390                                     <none>                       <none>                       (no description available)
un  nvidia-common                                  <none>                       <none>                       (no description available)
rc  nvidia-compute-utils-435                       435.21-0ubuntu0.18.04.2      amd64                        NVIDIA compute utilities
ii  nvidia-compute-utils-440                       440.33.01-0ubuntu1           amd64                        NVIDIA compute utilities
ii  nvidia-container-runtime                       3.1.4-1                      amd64                        NVIDIA container runtime
un  nvidia-container-runtime-hook                  <none>                       <none>                       (no description available)
ii  nvidia-container-toolkit                       1.0.5-1                      amd64                        NVIDIA container runtime hook
rc  nvidia-dkms-435                                435.21-0ubuntu0.18.04.2      amd64                        NVIDIA DKMS package
ii  nvidia-dkms-440                                440.33.01-0ubuntu1           amd64                        NVIDIA DKMS package
un  nvidia-dkms-kernel                             <none>                       <none>                       (no description available)
un  nvidia-docker                                  <none>                       <none>                       (no description available)
rc  nvidia-docker2                                 2.2.2-1                      all                          nvidia-docker CLI wrapper
ii  nvidia-driver-440                              440.33.01-0ubuntu1           amd64                        NVIDIA driver metapackage
un  nvidia-driver-binary                           <none>                       <none>                       (no description available)
un  nvidia-kernel-common                           <none>                       <none>                       (no description available)
rc  nvidia-kernel-common-435                       435.21-0ubuntu0.18.04.2      amd64                        Shared files used with the kernel module
ii  nvidia-kernel-common-440                       440.33.01-0ubuntu1           amd64                        Shared files used with the kernel module
un  nvidia-kernel-source                           <none>                       <none>                       (no description available)
un  nvidia-kernel-source-435                       <none>                       <none>                       (no description available)
ii  nvidia-kernel-source-440                       440.33.01-0ubuntu1           amd64                        NVIDIA kernel source package
un  nvidia-legacy-304xx-vdpau-driver               <none>                       <none>                       (no description available)
un  nvidia-legacy-340xx-vdpau-driver               <none>                       <none>                       (no description available)
un  nvidia-libopencl1-dev                          <none>                       <none>                       (no description available)
ii  nvidia-modprobe                                440.33.01-0ubuntu1           amd64                        Load the NVIDIA kernel driver and create device files
un  nvidia-opencl-icd                              <none>                       <none>                       (no description available)
un  nvidia-persistenced                            <none>                       <none>                       (no description available)
ii  nvidia-prime                                   0.8.8.2                      all                          Tools to enable NVIDIA's Prime
ii  nvidia-settings                                440.64-0ubuntu0~0.18.04.1    amd64                        Tool for configuring the NVIDIA graphics driver
un  nvidia-settings-binary                         <none>                       <none>                       (no description available)
un  nvidia-smi                                     <none>                       <none>                       (no description available)
un  nvidia-utils                                   <none>                       <none>                       (no description available)
ii  nvidia-utils-440                               440.33.01-0ubuntu1           amd64                        NVIDIA driver support binaries
un  nvidia-vdpau-driver                            <none>                       <none>                       (no description available)
ii  xserver-xorg-video-nvidia-440                  440.33.01-0ubuntu1           amd64                        NVIDIA binary Xorg driver

Meet the same problem, any solutions?

@elliothe i have uninstalled CUDA, Nvidia Drivers, nvidia docker and docker. Then installed everything again from scratch. This solved the problem for me

@HemaZ Thanks for the solution. I may do the same if I have no alternative ways.

Got same problem, fixed by run sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit

OS/docker info:

$ dockerd --version
Docker version 19.03.8, build afacb8b7f0
$ uname -a
Linux x 5.3.0-53-generic #47~18.04.1-Ubuntu SMP Thu May 7 13:10:50 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

@albin3 didn't fix it for me ..
I followed all instructions in https://developer.nvidia.com/blog/announcing-cuda-on-windows-subsystem-for-linux-2 yet still seeing:

docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:297: applying cgroup configuration for process caused \"mountpoint for devices not found\"": unknown.

Same issue here.
Trying to run this repository´s demo but I got the following error
$ docker-compose up

ERROR: for vehicle_counting  Cannot start service vehicle_counting: OCI runtime create failed: container_linux.go:349: starting 
container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 1 cause 
\\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request\\\\n\\\"\"": unknown

ERROR: for vehicle_counting  Cannot start service vehicle_counting: OCI runtime create failed: container_linux.go:349: starting 
container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 1 caused 
\\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process 
request\\\\n\\\"\"": unknown

Tried to instlal nvidia-toolkit as suggested in here but still not working.

Here's my $ docker info output

Client:
 Debug Mode: false

Server:
 Containers: 3
  Running: 0
  Paused: 0
  Stopped: 3
 Images: 7
 Server Version: 19.03.12
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: nvidia runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 7ad184331fa3e55e52b890ea95e65ba581ae3429
 runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
 init version: fec3683
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 5.4.0-42-generic
 Operating System: Ubuntu 20.04.1 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 1
 Total Memory: 3.844GiB
 Name: geo-vbox
 ID: PLLH:2H5F:NGLW:52TT:2Q77:AUHV:S3PX:3THU:XIEA:NYMX:FEYD:E2AT
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No swap limit support

Any idea how to solve it?

Same issue. I working with docker version Docker version 19.03.12, build 48a66213fe inside wsl 2 emulation for win10

I also have the same problem, working with Docker version 19.03.12inside WSL 2 emulation for Windows 10. Kernal Version: 4.19.121-microsoft-standard.

Having same issue with AGX Xavier:
https://github.com/NVIDIA/nvidia-docker/issues/1203#issuecomment-670640220

Exact same issue here. Followed nvidia guide

Window 10 version 1909 build 18363.1049
Docker version 19.03.12
WSL2
Ubuntu 18.04 and 20.04
Kernal Version: 4.19.121-microsoft-standard
Windows nvidia drivers 455.41
CUDA 11.1

The output of
nvidia-container-cli -k -d /dev/tty info

I0821 16:21:57.950311 5686 nvc.c:282] initializing library context (version=1.3.0, build=af0220ff5c503d9ac6a1b5a491918229edbb37a4)
I0821 16:21:57.950354 5686 nvc.c:256] using root /
I0821 16:21:57.950358 5686 nvc.c:257] using ldcache /etc/ld.so.cache
I0821 16:21:57.950376 5686 nvc.c:258] using unprivileged user 1000:1000
I0821 16:21:57.950389 5686 nvc.c:299] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I0821 16:21:57.950454 5686 nvc.c:301] dxcore initialization failed, continuing assuming a non-WSL environment
W0821 16:21:57.950514 5686 nvc.c:172] failed to detect NVIDIA devices
W0821 16:21:57.950641 5687 nvc.c:187] failed to set inheritable capabilities
W0821 16:21:57.950680 5687 nvc.c:188] skipping kernel modules load due to failure
I0821 16:21:57.950836 5688 driver.c:101] starting driver service
E0821 16:21:57.950966 5688 driver.c:161] could not start driver service: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory
I0821 16:21:57.951083 5686 driver.c:196] driver service terminated successfully
nvidia-container-cli: initialization error: driver error: failed to process request

Same here stuck

Same here stuck

The original issue described here, that has an error of:

nvidia-container-cli: requirement error: unsatisfied condition: cuda>=10.1, please update your driver to a newer version, or use an earlier cuda container

Is due to the fact that the original poster had an NVIDIA driver that was too old to run CUDA 10.1.

The poster acknowledged this and closed the issue on March 21st.
https://github.com/NVIDIA/nvidia-docker/issues/1225#issuecomment-601990042

Since that time, this issue has been reopened and commented on many times with unrelated error messages.

Since the original issue was resolved, I am going to close this issue again, and encourage you to open a new issue if you are still having problems with different errors.

https://ngc.nvidia.com/catalog/containers/nvidia:l4t-base

try using this base image. it solved all my problems to jetson tegra arm64 architecture issues and now I can seamlessly docker pull and use my docker images across jetson tegra devices

Anytime nvidia docker fails you will see an error that begins with:

docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: ...

This part of the error message is output by docker itself, and is out of our control.

It's the part after stderr: that is relevant to nvidia-docker.

In the original post, this error was:
```
nvidia-container-cli: requirement error: unsatisfied condition: cuda>=10.0, please update your driver to a newer version, or use an earlier cuda container
````

@gsss124 is this actually the same error response you were seeing? Given the description of your problem, it seems unlikely.

In any case, I would recommend performing your step 4 using docker's daemon.json file instead of editing the docker service directly:

$ cat /etc/docker/daemon.json
{
    "data-root": "/your/custom/location",
    "default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

Anytime nvidia docker fails you will see an error that begins with:

docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: ...

This part of the error message is output by docker itself, and is out of our control.

It's the part after stderr: that is relevant to nvidia-docker.

In the original post, this error was:

nvidia-container-cli: requirement error: unsatisfied condition: cuda>=10.0, please update your driver to a newer version, or use an earlier cuda container

@gsss124 is this actually the same error response you were seeing? Given the description of your problem, it seems unlikely.

Thanks for the reply. This was not the error, it was only related to OCI. Now docker info gives the custom data-root location, but to my surprise it is still using the system drive as I see a reduction in space available on the system drive and space available is same in my custom data-root drive. So, I will delete my reply above.

In any case, I would recommend performing your step 4 using docker's daemon.json file instead of editing the docker service directly:

$ cat /etc/docker/daemon.json
{
    "data-root": "/your/custom/location",
    "default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

Thanks for this, but I tried this method and it did not work for me. But I will give it a shot again by adding system restart step. I even tried nvidia-container-runtime separately, that didn't work. After editing docker.service, it gave me data-root as a my custom location but still using the /var/lib/docker location to store data! I don't understand what is happening.

In any case, I would recommend performing your step 4 using docker's daemon.json file instead of editing the docker service directly:

$ cat /etc/docker/daemon.json
{
    "data-root": "/your/custom/location",
    "default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

Thanks for this, but I tried this method and it did not work for me. But I will give it a shot again by adding system restart step. I even tried nvidia-container-runtime separately, that didn't work. After editing docker.service, it gave me data-root as a my custom location but still using the /var/lib/docker location to store data! I don't understand what is happening.

To my horror, it has created a new drive taking a part of space of system drive, named it to my custom data-root name and renamed my old drive! It's not using /var/lib/docker, but a part of it renamed to my custom data-root name.

sudo service docker start

  • Starting Docker: docker [ OK ]

sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: mount error: stat failed: /usr/lib/wsl/lib/libcuda.so.1: no such file or directory\\n\\"\"": unknown.

docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: mount error: stat failed: /usr/lib/wsl/lib/libcuda.so.1: no such file or directory\\n\\"\"": unknown.

ldconfig -p | grep cuda
libicudata.so.66 (libc6,x86-64) => /lib/x86_64-linux-gnu/libicudata.so.66
libcuda.so.1 (libc6,x86-64) => /usr/lib/wsl/lib/libcuda.so.1

ls -al /usr/lib/wsl/lib
total 70792
dr-xr-xr-x 1 root root 512 Sep 18 15:53 .
drwxr-xr-x 4 root root 4096 Sep 18 12:28 ..
-r--r--r-- 1 root root 124664 Aug 30 09:51 libcuda.so
-r--r--r-- 2 root root 832936 Sep 12 08:44 libd3d12.so
-r--r--r-- 2 root root 5073944 Sep 12 08:44 libd3d12core.so
-r--r--r-- 2 root root 25069816 Sep 12 08:44 libdirectml.so
-r--r--r-- 2 root root 878768 Sep 12 08:44 libdxcore.so
-r--r--r-- 1 root root 40496936 Aug 30 09:51 libnvwgf2umx.so

sudo ln -s /usr/lib/wsl/lib/libcuda.so /usr/lib/wsl/lib/libcuda.so.1
ln: failed to create symbolic link '/usr/lib/wsl/lib/libcuda.so.1': Read-only file system

seems as if its missing and the video driver is still required, unless there is something that can make it appear at location

ls: cannot access '/usr/lib/wsl/lib/libcuda.so.1'

any thoughts?

From my understanding putting the video driver is no longer required in docker -- ubuntu guest

Directory of C:\Windows\System32lxsslib

09/18/2020 03:53 PM

.
08/30/2020 09:51 AM 124,664 libcuda.so
09/12/2020 08:44 AM 832,936 libd3d12.so
09/12/2020 08:44 AM 5,073,944 libd3d12core.so
09/12/2020 08:44 AM 25,069,816 libdirectml.so
09/12/2020 08:44 AM 878,768 libdxcore.so
08/30/2020 09:51 AM 40,496,936 libnvwgf2umx.so
6 File(s) 72,477,064 bytes
1 Dir(s) 643,723,309,056 bytes free

C:\Windows\System32lxsslib>mklink libcuda.so.1 libcuda.so
symbolic link created for libcuda.so.1 <<===>> libcuda.so

still no work, but seems closer

more info docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
18 sudo find / -iname /usr/lib/wsl/lib/libcuda.so.1
19 sudo find / -iname libcuda.so.1
20 ldconfig -p | grep cuda
21 ls /usr/lib/wsl/lib/libcuda.so.1
22 #sudo ls -d libcuda.so.1
23 cd /
24 sudo ls -d libcuda.so.1
25 ls -al /usr/lib/wsl
26 ls -al /usr/lib/wsl/drivers
27 ls -al /usr/lib/wsl/drivers | grep -i libcuda*
28 ls -al /usr/lib/wsl/
29 ls -al /usr/lib/wsl/lib
30 sudo ln -s /usr/lib/wsl/lib/libcuda.so /usr/lib/wsl/lib/libcuda.so.1
31 sudo ln -s /usr/lib/wsl/lib/libcuda.so.1 /usr/lib/wsl/lib/libcuda.so
32 sudo ln -s /usr/lib/wsl/lib/libcuda.so /usr/lib/wsl/lib/libcuda.so.1
33 echo $LD_LIBRARY_PATH
34 sudo apt install nvidia-361-dev
35 nvidia-smi
36 sudo apt isntall nvidia-utils-435
37 sudo apt install nvidia-utils-435
38 cd %SYSTEMROOT%\System32lxsslib
39 cd %SYSTEMROOT%\
40 cd %SYSTEMROOT%
41 ls
42 ls /usr/lib/wsl/lib/
43 ls -al /usr/lib/wsl/lib/
44 docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
45 sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
46 sudo apt-remove nvidia-docker2
47 sudo apt-get remove nvidia-docker2
48 sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
49 docker run --rm --privileged nvidia/cuda nvidia-smi
50 nvidia-docker run --rm nvidia/cuda nvidia-smi
51 nvidia-docker run --rm --privileged nvidia/cuda nvidia-smi
52 docker run --rm --privileged nvidia/cuda nvidia-smi
53 nvidia-smi
54 sudo apt-get install nvidia-docker2
55 nvidia-docker run --rm --privileged nvidia/cuda nvidia-smi
56 docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
57 docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark -compare
58 nvcc --version
59 sudo apt-get install nvidia-cuda-toolkit
60 docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark -compare

In any case, I would recommend performing your step 4 using docker's daemon.json file instead of editing the docker service directly:

$ cat /etc/docker/daemon.json
{
    "data-root": "/your/custom/location",
    "default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

I tried this again by editing /etc/docker/daemon.json and got the following stderr:
nvidia-container-cli: ldcache error: process /sbin/ldconfig.real failed with error code: 1\\\\n\\\"\""

Full output:
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: ldcache error: process /sbin/ldconfig.real failed with error code: 1\\\\n\\\"\"": unknown

docker info now displays the required custom directory and space is reduced in the right directory. Now it is ldcache error. I checked here but my seccomp output is YES:

>>cat /boot/config-$(uname -r) | grep -i seccomp
Output:
CONFIG_SECCOMP=y CONFIG_HAVE_ARCH_SECCOMP_FILTER=y CONFIG_SECCOMP_FILTER=y

_Please suggest what might be the problem._

sudo service docker start

* Starting Docker: docker                                                                                       [ OK ]

sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused "process_linux.go:432: running prestart hook 0 caused \"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: mount error: stat failed: /usr/lib/wsl/lib/libcuda.so.1: no such file or directory\n\""": unknown.

docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused "process_linux.go:432: running prestart hook 0 caused \"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: mount error: stat failed: /usr/lib/wsl/lib/libcuda.so.1: no such file or directory\n\""": unknown.

ldconfig -p | grep cuda
libicudata.so.66 (libc6,x86-64) => /lib/x86_64-linux-gnu/libicudata.so.66
libcuda.so.1 (libc6,x86-64) => /usr/lib/wsl/lib/libcuda.so.1

ls -al /usr/lib/wsl/lib
total 70792
dr-xr-xr-x 1 root root 512 Sep 18 15:53 .
drwxr-xr-x 4 root root 4096 Sep 18 12:28 ..
-r--r--r-- 1 root root 124664 Aug 30 09:51 libcuda.so
-r--r--r-- 2 root root 832936 Sep 12 08:44 libd3d12.so
-r--r--r-- 2 root root 5073944 Sep 12 08:44 libd3d12core.so
-r--r--r-- 2 root root 25069816 Sep 12 08:44 libdirectml.so
-r--r--r-- 2 root root 878768 Sep 12 08:44 libdxcore.so
-r--r--r-- 1 root root 40496936 Aug 30 09:51 libnvwgf2umx.so

sudo ln -s /usr/lib/wsl/lib/libcuda.so /usr/lib/wsl/lib/libcuda.so.1
ln: failed to create symbolic link '/usr/lib/wsl/lib/libcuda.so.1': Read-only file system

seems as if its missing and the video driver is still required, unless there is something that can make it appear at location

ls: cannot access '/usr/lib/wsl/lib/libcuda.so.1'

any thoughts?

From my understanding putting the video driver is no longer required in docker -- ubuntu guest

Are you using a virtual machine? As stated by @klueska output after stderr is of interest. Your error says
stderr: nvidia-container-cli: mount error: stat failed: /usr/lib/wsl/lib/libcuda.so.1: no such file or directory\\n\""": unknown.
Something related to nvidia-driver not being available where required.

@wanfuse123 please file a new issue if you need help debugging this. Your issue looks unrelated to the one here (especially since it seems you are running on Windows, and not linux).

@tytcc I also faced the same problem on ubunut16.04 machine. I have the latest driver 440.64.00 installed and now i tried to run example
docker run --gpus all nvidia/cuda:10.0-base nvidia-smi
i get this error
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: cuda error: unknown error\\\\n\\\"\"": unknown. ERRO[0001] error waiting for container: context canceled

I also face the problem, did you solve it?

nvidia-smi does not work under wsl2 as of right now. Use the following test
instead

"medium.com" + "how-to-use-nvidia-gpu-in-docker-to-run-tensorflow"

use their

On Sat, Oct 17, 2020 at 10:42 PM chauncygu notifications@github.com wrote:

@tytcc https://github.com/tytcc I also faced the same problem on
ubunut16.04 machine. I have the latest driver 440.64.00 installed and now i
tried to run example
docker run --gpus all nvidia/cuda:10.0-base nvidia-smi
i get this error
docker: Error response from daemon: OCI runtime create failed:
container_linux.go:349: starting container process caused
"process_linux.go:449: container init caused \"process_linux.go:432:
running prestart hook 0 caused \\"error running hook: exit status 1,
stdout: , stderr: nvidia-container-cli: initialization error: cuda error:
unknown error\\n\\"\"": unknown. ERRO[0001] error waiting for container:
context canceled

I also face the problem, did you solve it?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/NVIDIA/nvidia-docker/issues/1225#issuecomment-711108195,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ADDBZWO35LJAFRGXOXF4SXLSLJIY3ANCNFSM4LQL2LDA
.

sorry got cut off. Use their testing examples and that container. It costs
5 bucks for access but I thought it was worth it for one year access. (
NOTE I have nothing to do with their site. I just spent the five bucks for
it)

anyway use their testing examples.

You can't use "nvidia-smi" it is not working right now in the
containers. Apparently nvidia and microsoft are working hard on the problem

On Sat, Oct 17, 2020 at 11:00 PM Steven Anderson wanfuse123@gmail.com
wrote:

nvidia-smi does not work under wsl2 as of right now. Use the following
test instead

"medium.com" + "how-to-use-nvidia-gpu-in-docker-to-run-tensorflow"

use their

On Sat, Oct 17, 2020 at 10:42 PM chauncygu notifications@github.com
wrote:

@tytcc https://github.com/tytcc I also faced the same problem on
ubunut16.04 machine. I have the latest driver 440.64.00 installed and now i
tried to run example
docker run --gpus all nvidia/cuda:10.0-base nvidia-smi
i get this error
docker: Error response from daemon: OCI runtime create failed:
container_linux.go:349: starting container process caused
"process_linux.go:449: container init caused \"process_linux.go:432:
running prestart hook 0 caused \\"error running hook: exit status 1,
stdout: , stderr: nvidia-container-cli: initialization error: cuda error:
unknown error\\n\\"\"": unknown. ERRO[0001] error waiting for container:
context canceled

I also face the problem, did you solve it?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/NVIDIA/nvidia-docker/issues/1225#issuecomment-711108195,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ADDBZWO35LJAFRGXOXF4SXLSLJIY3ANCNFSM4LQL2LDA
.

update on the medium link, look at the comments I have made an updated script that runs a simple test.

@elliothe i have uninstalled CUDA, Nvidia Drivers, nvidia docker and docker. Then installed everything again from scratch. This solved the problem for me

I think it is right. It work for me when I uninstall Nvidia Driver(version 460):
thanks.

iser@iser:~$ sudo apt-get purge nvidia*
正在读取软件包列表... 完成
正在分析软件包的依赖关系树
正在读取状态信息... 完成
注意,根据Glob 'nvidia' 选中了 'nvidia-kernel-common-418-server'
注意,根据Glob 'nvidia
' 选中了 'nvidia-325-updates'
注意,根据Glob 'nvidia' 选中了 'nvidia-346-updates'
注意,根据Glob 'nvidia
' 选中了 'nvidia-driver-binary'
注意,根据Glob 'nvidia' 选中了 'nvidia-331-dev'
注意,根据Glob 'nvidia
' 选中了 'nvidia-304-updates-dev'
注意,根据Glob 'nvidia' 选中了 'nvidia-compute-utils-418-server'
注意,根据Glob 'nvidia
' 选中了 'nvidia-384-dev'
注意,根据Glob 'nvidia' 选中了 'nvidia-docker2'
注意,根据Glob 'nvidia
' 选中了 'nvidia-libopencl1-346-updates'
注意,根据Glob 'nvidia' 选中了 'nvidia-driver-440-server'
注意,根据Glob 'nvidia
' 选中了 'nvidia-340-updates-uvm'

-------following is the note of installing successfully.----------

Adding group iser' (GID 1000) ... Done. Adding useriser' ...
Adding new user iser' (1000) with groupiser' ...
Creating home directory /home/iser' ... Copying files from/etc/skel' ...
[ OK ] Congratulations! You have successfully finished setting up Apollo Dev Environment.
[ OK ] To login into the newly created apollo_dev_iser container, please run the following command:
[ OK ] bash docker/scripts/dev_into.sh
[ OK ] Enjoy!

Was this page helpful?
0 / 5 - 0 ratings