Whenever I try to build or run an NVidia container, Docker fails with the error message:
docker: Error response from daemon: OCI runtime create failed: container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: container error: cgroup subsystem devices not found: unknown.
$ docker run --rm --gpus all nvidia/cuda:11.0-base-ubuntu20.04 nvidia-smi
nvidia-container-cli -k -d /dev/tty infoDevice Index: 0
Device Minor: 0
Model: GeForce GTX 980 Ti
Brand: GeForce
GPU UUID: GPU-6518be5e-14ff-e277-21aa-73b482890bee
Bus Location: 00000000:07:00.0
Architecture: 5.2
I0107 20:43:11.947903 36435 nvc.c:337] shutting down library context
I0107 20:43:11.948696 36437 driver.c:156] terminating driver service
I0107 20:43:11.949026 36435 driver.c:196] driver service terminated successfully
- [x] Kernel version from `uname -a`
Linux lambda 5.8.0-3-amd64 #1 SMP Debian 5.8.14-1 (2020-10-10) x86_64 GNU/Linux
- [ ] Any relevant kernel output lines from `dmesg`
- [x] Driver information from `nvidia-smi -a`
```
Thu Jan 7 15:45:08 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 980 Ti On | 00000000:07:00.0 On | N/A |
| 0% 45C P5 29W / 250W | 403MiB / 6083MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 3023 G /usr/lib/xorg/Xorg 177MiB |
| 0 N/A N/A 4833 G /usr/bin/gnome-shell 166MiB |
| 0 N/A N/A 7609 G ...AAAAAAAAA= --shared-files 54MiB |
+-----------------------------------------------------------------------------+
docker version - [x] NVIDIA packages version from `dpkg -l '*nvidia*'` _or_ `rpm -qa '*nvidia*'`
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-======================================-==============-============-=================================================================
un bumblebee-nvidia
ii glx-alternative-nvidia 1.2.0 amd64 allows the selection of NVIDIA as GLX provider
un libegl-nvidia-legacy-390xx0
un libegl-nvidia-tesla-418-0
un libegl-nvidia-tesla-440-0
un libegl-nvidia-tesla-450-0
ii libegl-nvidia0:amd64 450.80.02-2 amd64 NVIDIA binary EGL library
ii libegl-nvidia0:i386 450.80.02-2 i386 NVIDIA binary EGL library
un libegl1-glvnd-nvidia
un libegl1-nvidia
un libgl1-glvnd-nvidia-glx
ii libgl1-nvidia-glvnd-glx:amd64 450.80.02-2 amd64 NVIDIA binary OpenGL/GLX library (GLVND variant)
ii libgl1-nvidia-glvnd-glx:i386 450.80.02-2 i386 NVIDIA binary OpenGL/GLX library (GLVND variant)
un libgl1-nvidia-glx
un libgl1-nvidia-glx-any
un libgl1-nvidia-glx-i386
un libgl1-nvidia-legacy-390xx-glx
un libgl1-nvidia-tesla-418-glx
un libgldispatch0-nvidia
ii libgles-nvidia1:amd64 450.80.02-2 amd64 NVIDIA binary OpenGL|ES 1.x library
ii libgles-nvidia1:i386 450.80.02-2 i386 NVIDIA binary OpenGL|ES 1.x library
ii libgles-nvidia2:amd64 450.80.02-2 amd64 NVIDIA binary OpenGL|ES 2.x library
ii libgles-nvidia2:i386 450.80.02-2 i386 NVIDIA binary OpenGL|ES 2.x library
un libgles1-glvnd-nvidia
un libgles2-glvnd-nvidia
un libglvnd0-nvidia
ii libglx-nvidia0:amd64 450.80.02-2 amd64 NVIDIA binary GLX library
ii libglx-nvidia0:i386 450.80.02-2 i386 NVIDIA binary GLX library
un libglx0-glvnd-nvidia
un libnvidia-cbl
un libnvidia-cfg.so.1
ii libnvidia-cfg1:amd64 450.80.02-2 amd64 NVIDIA binary OpenGL/GLX configuration library
un libnvidia-cfg1-any
ii libnvidia-container-tools 1.3.1-1 amd64 NVIDIA container runtime library (command-line tools)
ii libnvidia-container1:amd64 1.3.1-1 amd64 NVIDIA container runtime library
ii libnvidia-eglcore:amd64 450.80.02-2 amd64 NVIDIA binary EGL core libraries
ii libnvidia-eglcore:i386 450.80.02-2 i386 NVIDIA binary EGL core libraries
un libnvidia-eglcore-450.80.02
ii libnvidia-encode1:amd64 450.80.02-2 amd64 NVENC Video Encoding runtime library
ii libnvidia-glcore:amd64 450.80.02-2 amd64 NVIDIA binary OpenGL/GLX core libraries
ii libnvidia-glcore:i386 450.80.02-2 i386 NVIDIA binary OpenGL/GLX core libraries
un libnvidia-glcore-450.80.02
ii libnvidia-glvkspirv:amd64 450.80.02-2 amd64 NVIDIA binary Vulkan Spir-V compiler library
ii libnvidia-glvkspirv:i386 450.80.02-2 i386 NVIDIA binary Vulkan Spir-V compiler library
un libnvidia-glvkspirv-450.80.02
un libnvidia-legacy-340xx-cfg1
un libnvidia-legacy-390xx-cfg1
ii libnvidia-ml-dev:amd64 11.1.1-3 amd64 NVIDIA Management Library (NVML) development files
un libnvidia-ml.so.1
ii libnvidia-ml1:amd64 450.80.02-2 amd64 NVIDIA Management Library (NVML) runtime library
ii libnvidia-ptxjitcompiler1:amd64 450.80.02-2 amd64 NVIDIA PTX JIT Compiler
ii libnvidia-rtcore:amd64 450.80.02-2 amd64 NVIDIA binary Vulkan ray tracing (rtcore) library
un libnvidia-rtcore-450.80.02
un libnvidia-tesla-418-cfg1
un libnvidia-tesla-440-cfg1
un libnvidia-tesla-450-cfg1
un libnvidia-tesla-450-cuda1
un libnvidia-tesla-450-ml1
un libopengl0-glvnd-nvidia
ii nvidia-alternative 450.80.02-2 amd64 allows the selection of NVIDIA as GLX provider
un nvidia-alternative--kmod-alias
un nvidia-alternative-legacy-173xx
un nvidia-alternative-legacy-71xx
un nvidia-alternative-legacy-96xx
ii nvidia-container-runtime 3.4.0-1 amd64 NVIDIA container runtime
un nvidia-container-runtime-hook
ii nvidia-container-toolkit 1.4.0-1 amd64 NVIDIA container runtime hook
ii nvidia-cuda-dev:amd64 11.1.1-3 amd64 NVIDIA CUDA development files
un nvidia-cuda-doc
ii nvidia-cuda-gdb 11.1.1-3 amd64 NVIDIA CUDA Debugger (GDB)
un nvidia-cuda-mps
ii nvidia-cuda-toolkit 11.1.1-3 amd64 NVIDIA CUDA development toolkit
ii nvidia-cuda-toolkit-doc 11.1.1-3 all NVIDIA CUDA and OpenCL documentation
un nvidia-current
un nvidia-current-updates
un nvidia-docker
ii nvidia-docker2 2.5.0-1 all nvidia-docker CLI wrapper
ii nvidia-driver 450.80.02-2 amd64 NVIDIA metapackage
un nvidia-driver-any
ii nvidia-driver-bin 450.80.02-2 amd64 NVIDIA driver support binaries
un nvidia-driver-bin-450.80.02
un nvidia-driver-binary
ii nvidia-driver-libs:amd64 450.80.02-2 amd64 NVIDIA metapackage (OpenGL/GLX/EGL/GLES libraries)
ii nvidia-driver-libs:i386 450.80.02-2 i386 NVIDIA metapackage (OpenGL/GLX/EGL/GLES libraries)
un nvidia-driver-libs-any
un nvidia-driver-libs-nonglvnd
ii nvidia-egl-common 450.80.02-2 amd64 NVIDIA binary EGL driver - common files
ii nvidia-egl-icd:amd64 450.80.02-2 amd64 NVIDIA EGL installable client driver (ICD)
ii nvidia-egl-icd:i386 450.80.02-2 i386 NVIDIA EGL installable client driver (ICD)
un nvidia-glx-any
ii nvidia-installer-cleanup 20151021+12 amd64 cleanup after driver installation with the nvidia-installer
un nvidia-kernel-450.80.02
ii nvidia-kernel-common 20151021+12 amd64 NVIDIA binary kernel module support files
ii nvidia-kernel-dkms 450.80.02-2 amd64 NVIDIA binary kernel module DKMS source
un nvidia-kernel-source
ii nvidia-kernel-support 450.80.02-2 amd64 NVIDIA binary kernel module support files
un nvidia-kernel-support--v1
un nvidia-kernel-support-any
un nvidia-legacy-304xx-alternative
un nvidia-legacy-304xx-driver
un nvidia-legacy-340xx-alternative
un nvidia-legacy-340xx-vdpau-driver
un nvidia-legacy-390xx-vdpau-driver
un nvidia-legacy-390xx-vulkan-icd
ii nvidia-legacy-check 450.80.02-2 amd64 check for NVIDIA GPUs requiring a legacy driver
un nvidia-libopencl1
un nvidia-libopencl1-dev
ii nvidia-modprobe 460.27.04-1 amd64 utility to load NVIDIA kernel modules and create device nodes
un nvidia-nonglvnd-vulkan-common
un nvidia-nonglvnd-vulkan-icd
un nvidia-opencl-dev
un nvidia-opencl-icd
un nvidia-openjdk-8-jre
ii nvidia-persistenced 450.57-1 amd64 daemon to maintain persistent software state in the NVIDIA driver
ii nvidia-profiler 11.1.1-3 amd64 NVIDIA Profiler for CUDA and OpenCL
ii nvidia-settings 450.80.02-1+b1 amd64 tool for configuring the NVIDIA graphics driver
un nvidia-settings-gtk-450.80.02
ii nvidia-smi 450.80.02-2 amd64 NVIDIA System Management Interface
ii nvidia-support 20151021+12 amd64 NVIDIA binary graphics driver support files
un nvidia-tesla-418-vdpau-driver
un nvidia-tesla-418-vulkan-icd
un nvidia-tesla-440-vdpau-driver
un nvidia-tesla-440-vulkan-icd
un nvidia-tesla-450-driver
un nvidia-tesla-450-vulkan-icd
un nvidia-tesla-alternative
ii nvidia-vdpau-driver:amd64 450.80.02-2 amd64 Video Decode and Presentation API for Unix - NVIDIA driver
ii nvidia-visual-profiler 11.1.1-3 amd64 NVIDIA Visual Profiler for CUDA and OpenCL
ii nvidia-vulkan-common 450.80.02-2 amd64 NVIDIA Vulkan driver - common files
ii nvidia-vulkan-icd:amd64 450.80.02-2 amd64 NVIDIA Vulkan installable client driver (ICD)
ii nvidia-vulkan-icd:i386 450.80.02-2 i386 NVIDIA Vulkan installable client driver (ICD)
un nvidia-vulkan-icd-any
ii xserver-xorg-video-nvidia 450.80.02-2 amd64 NVIDIA binary Xorg driver
un xserver-xorg-video-nvidia-any
un xserver-xorg-video-nvidia-legacy-304xx
- [x] NVIDIA container library version from `nvidia-container-cli -V`
version: 1.3.1
build date: 2020-12-14T14:18+00:00
build revision: ac02636a318fe7dcc71eaeb3cc55d0c8541c1072
build compiler: x86_64-linux-gnu-gcc-8 8.3.0
build platform: x86_64
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections
- [ ] NVIDIA container library logs (see [troubleshooting](https://github.com/NVIDIA/nvidia-docker/wiki/Troubleshooting))
- [x] Docker command, image and tag used
docker run --rm --gpus all nvidia/cuda:11.0-base-ubuntu20.04 nvidia-smi
```
Hi,
I'm experiencing the same issue. For now I've worked around it:
In /etc/nvidia-container-runtime/config.toml I've set no-cgroups = true and now the container starts, but the nvidia devices are not added to the container. Once the devices are added the container works again.
Here are the relevant lines from my docker-compose.yml:
devices:
- /dev/nvidia0:/dev/nvidia0
- /dev/nvidiactl:/dev/nvidiactl
- /dev/nvidia-modeset:/dev/nvidia-modeset
- /dev/nvidia-uvm:/dev/nvidia-uvm
- /dev/nvidia-uvm-tools:/dev/nvidia-uvm-tools
This is equivalent to docker run --device /dev/whatever ..., but I'm not sure of the exact syntax.
Hope this helps.
This seems to be related to systemd upgrade to 247.2-2 which was uploaded to sid three weeks ago and made its way to testing now. This commit highlights the change of cgroup hierarchy: https://salsa.debian.org/systemd-team/systemd/-/commit/170fb124a32884bd9975ee4ea9e1ffbbc2ee26b4
Indeed, default setup does not expose anymore /sys/fs/cgroup/devices which libnvidia-container uses according to https://github.com/NVIDIA/libnvidia-container/blob/ac02636a318fe7dcc71eaeb3cc55d0c8541c1072/src/nvc_container.c#L379-L382
Using the documented systemd.unified_cgroup_hierarchy=false kernel command line parameter switch back the /sys/fs/cgroup/devices entry and libnvidia-container is happier.
@lissyx Thank you for printing out the crux of the issue.
We are in the process of rearchitecting the nvidia container stack in such a way that issues such as this should not exist in the future (because we will rely on runc (or whatever the configured container runtime is) to do all cgroup setup instead of doing it ourselves).
That said, this rearchitecting effort will take at least another 9 months to complete. I'm curious what the impact is (and how difficult it would be to add cgroupsv2 support to libnvidia-container in the meantime to prevent issues like this until the rearchitecting is complete).
Wanted to also chime in to say that I'm also experiencing this on Fedora 33
Could the title be updated to indicate that it is systemd cgroup layout related?
I was under the impression this issue was related to adding cgroup v2 support.
The systemd cgroup layout issue was resoolved in:
https://gitlab.com/nvidia/container-toolkit/libnvidia-container/-/merge_requests/49
And released today as part of libnvidia-container v1.3.2:
https://github.com/NVIDIA/libnvidia-container/releases/tag/v1.3.2
If these resolve this issue, please comment and close. Thanks.
I was under the impression this issue was related to adding cgroup v2 support.
The systemd cgroup layout issue was resoolved in:
https://gitlab.com/nvidia/container-toolkit/libnvidia-container/-/merge_requests/49And released today as part of libnvidia-container v1.3.2:
https://github.com/NVIDIA/libnvidia-container/releases/tag/v1.3.2If these resolve this issue, please comment and close. Thanks.
Issue resolved by the latest release. Thank you everyone <3
I was under the impression this issue was related to adding cgroup v2 support.
The systemd cgroup layout issue was resoolved in:
https://gitlab.com/nvidia/container-toolkit/libnvidia-container/-/merge_requests/49
And released today as part of libnvidia-container v1.3.2:
https://github.com/NVIDIA/libnvidia-container/releases/tag/v1.3.2
If these resolve this issue, please comment and close. Thanks.Issue resolved by the latest release. Thank you everyone <3
Did you set the following parameter: systemd.unified_cgroup_hierarchy=false?
Or did you just upgrade all the packages?
I was under the impression this issue was related to adding cgroup v2 support.
The systemd cgroup layout issue was resoolved in:
https://gitlab.com/nvidia/container-toolkit/libnvidia-container/-/merge_requests/49
And released today as part of libnvidia-container v1.3.2:
https://github.com/NVIDIA/libnvidia-container/releases/tag/v1.3.2
If these resolve this issue, please comment and close. Thanks.Issue resolved by the latest release. Thank you everyone <3
Did you set the following parameter:
systemd.unified_cgroup_hierarchy=false?Or did you just upgrade all the packages?
For me it was solved by upgrading the package.
Thank you, @super-cooper, for the reply.
I am having exactly the same issue on Debian Testing even after an upgrade.
docker: Error response from daemon: OCI runtime create failed: container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: container error: cgroup subsystem devices not found: unknown.
docker run --rm --gpus all nvidia/cuda:11.0-base-ubuntu20.04 nvidia-smi
nvidia-container-cli -k -d /dev/tty infoI0130 05:23:50.494974 4486 nvc.c:282] initializing library context (version=1.3.2, build=fa9c778f687e9ac7be52b0299fa3b6ac2d9fbf93)
I0130 05:23:50.495160 4486 nvc.c:256] using root /
I0130 05:23:50.495178 4486 nvc.c:257] using ldcache /etc/ld.so.cache
I0130 05:23:50.495194 4486 nvc.c:258] using unprivileged user 1000:1000
I0130 05:23:50.495256 4486 nvc.c:299] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I0130 05:23:50.495644 4486 nvc.c:301] dxcore initialization failed, continuing assuming a non-WSL environment
W0130 05:23:50.499341 4487 nvc.c:187] failed to set inheritable capabilities
W0130 05:23:50.499369 4487 nvc.c:188] skipping kernel modules load due to failure
I0130 05:23:50.499601 4488 driver.c:101] starting driver service
I0130 05:23:50.504376 4486 nvc_info.c:680] requesting driver information with ''
I0130 05:23:50.506132 4486 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.460.32.03
I0130 05:23:50.506191 4486 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.460.32.03
I0130 05:23:50.506283 4486 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ptxjitcompiler.so.460.32.03
I0130 05:23:50.506375 4486 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so.460.32.03
I0130 05:23:50.506418 4486 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.460.32.03
I0130 05:23:50.506467 4486 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.460.32.03
I0130 05:23:50.506512 4486 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.460.32.03
I0130 05:23:50.506557 4486 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.460.32.03
I0130 05:23:50.506669 4486 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-cfg.so.460.32.03
I0130 05:23:50.506714 4486 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-cbl.so.460.32.03
I0130 05:23:50.507077 4486 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libcuda.so.460.32.03
I0130 05:23:50.507376 4486 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libGLX_nvidia.so.460.32.03
I0130 05:23:50.507476 4486 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libGLESv2_nvidia.so.460.32.03
I0130 05:23:50.507569 4486 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libGLESv1_CM_nvidia.so.460.32.03
I0130 05:23:50.507669 4486 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libEGL_nvidia.so.460.32.03
W0130 05:23:50.507732 4486 nvc_info.c:350] missing library libnvidia-opencl.so
W0130 05:23:50.507741 4486 nvc_info.c:350] missing library libnvidia-fatbinaryloader.so
W0130 05:23:50.507748 4486 nvc_info.c:350] missing library libnvidia-allocator.so
W0130 05:23:50.507754 4486 nvc_info.c:350] missing library libnvidia-compiler.so
W0130 05:23:50.507760 4486 nvc_info.c:350] missing library libnvidia-ngx.so
W0130 05:23:50.507766 4486 nvc_info.c:350] missing library libvdpau_nvidia.so
W0130 05:23:50.507772 4486 nvc_info.c:350] missing library libnvidia-encode.so
W0130 05:23:50.507781 4486 nvc_info.c:350] missing library libnvidia-opticalflow.so
W0130 05:23:50.507788 4486 nvc_info.c:350] missing library libnvcuvid.so
W0130 05:23:50.507796 4486 nvc_info.c:350] missing library libnvidia-fbc.so
W0130 05:23:50.507806 4486 nvc_info.c:350] missing library libnvidia-ifr.so
W0130 05:23:50.507815 4486 nvc_info.c:350] missing library libnvoptix.so
W0130 05:23:50.507823 4486 nvc_info.c:354] missing compat32 library libnvidia-ml.so
W0130 05:23:50.507832 4486 nvc_info.c:354] missing compat32 library libnvidia-cfg.so
W0130 05:23:50.507848 4486 nvc_info.c:354] missing compat32 library libcuda.so
W0130 05:23:50.507859 4486 nvc_info.c:354] missing compat32 library libnvidia-opencl.so
W0130 05:23:50.507869 4486 nvc_info.c:354] missing compat32 library libnvidia-ptxjitcompiler.so
W0130 05:23:50.507880 4486 nvc_info.c:354] missing compat32 library libnvidia-fatbinaryloader.so
W0130 05:23:50.507889 4486 nvc_info.c:354] missing compat32 library libnvidia-allocator.so
W0130 05:23:50.507897 4486 nvc_info.c:354] missing compat32 library libnvidia-compiler.so
W0130 05:23:50.507906 4486 nvc_info.c:354] missing compat32 library libnvidia-ngx.so
W0130 05:23:50.507915 4486 nvc_info.c:354] missing compat32 library libvdpau_nvidia.so
W0130 05:23:50.507925 4486 nvc_info.c:354] missing compat32 library libnvidia-encode.so
W0130 05:23:50.507933 4486 nvc_info.c:354] missing compat32 library libnvidia-opticalflow.so
W0130 05:23:50.507942 4486 nvc_info.c:354] missing compat32 library libnvcuvid.so
W0130 05:23:50.507950 4486 nvc_info.c:354] missing compat32 library libnvidia-eglcore.so
W0130 05:23:50.507960 4486 nvc_info.c:354] missing compat32 library libnvidia-glcore.so
W0130 05:23:50.507970 4486 nvc_info.c:354] missing compat32 library libnvidia-tls.so
W0130 05:23:50.507979 4486 nvc_info.c:354] missing compat32 library libnvidia-glsi.so
W0130 05:23:50.507988 4486 nvc_info.c:354] missing compat32 library libnvidia-fbc.so
W0130 05:23:50.507998 4486 nvc_info.c:354] missing compat32 library libnvidia-ifr.so
W0130 05:23:50.508007 4486 nvc_info.c:354] missing compat32 library libnvidia-rtcore.so
W0130 05:23:50.508015 4486 nvc_info.c:354] missing compat32 library libnvoptix.so
W0130 05:23:50.508025 4486 nvc_info.c:354] missing compat32 library libGLX_nvidia.so
W0130 05:23:50.508031 4486 nvc_info.c:354] missing compat32 library libEGL_nvidia.so
W0130 05:23:50.508040 4486 nvc_info.c:354] missing compat32 library libGLESv2_nvidia.so
W0130 05:23:50.508050 4486 nvc_info.c:354] missing compat32 library libGLESv1_CM_nvidia.so
W0130 05:23:50.508060 4486 nvc_info.c:354] missing compat32 library libnvidia-glvkspirv.so
W0130 05:23:50.508068 4486 nvc_info.c:354] missing compat32 library libnvidia-cbl.so
I0130 05:23:50.508515 4486 nvc_info.c:276] selecting /usr/lib/nvidia/current/nvidia-smi
I0130 05:23:50.508580 4486 nvc_info.c:276] selecting /usr/lib/nvidia/current/nvidia-debugdump
I0130 05:23:50.508612 4486 nvc_info.c:276] selecting /usr/bin/nvidia-persistenced
W0130 05:23:50.509049 4486 nvc_info.c:376] missing binary nvidia-cuda-mps-control
W0130 05:23:50.509060 4486 nvc_info.c:376] missing binary nvidia-cuda-mps-server
I0130 05:23:50.509100 4486 nvc_info.c:438] listing device /dev/nvidiactl
I0130 05:23:50.509109 4486 nvc_info.c:438] listing device /dev/nvidia-uvm
I0130 05:23:50.509118 4486 nvc_info.c:438] listing device /dev/nvidia-uvm-tools
I0130 05:23:50.509127 4486 nvc_info.c:438] listing device /dev/nvidia-modeset
I0130 05:23:50.509168 4486 nvc_info.c:317] listing ipc /run/nvidia-persistenced/socket
W0130 05:23:50.509192 4486 nvc_info.c:321] missing ipc /tmp/nvidia-mps
I0130 05:23:50.509200 4486 nvc_info.c:745] requesting device information with ''
I0130 05:23:50.516712 4486 nvc_info.c:628] listing device /dev/nvidia0 (GPU-6064a007-a943-7f11-1ad7-12ac87046652 at 00000000:01:00.0)
NVRM version: 460.32.03
CUDA version: 11.2
Device Index: 0
Device Minor: 0
Model: GeForce GTX 960M
Brand: GeForce
GPU UUID: GPU-6064a007-a943-7f11-1ad7-12ac87046652
Bus Location: 00000000:01:00.0
Architecture: 5.0
I0130 05:23:50.516775 4486 nvc.c:337] shutting down library context
I0130 05:23:50.517704 4488 driver.c:156] terminating driver service
I0130 05:23:50.518087 4486 driver.c:196] driver service terminated successfully
uname -aLinux stas 5.10.0-2-amd64 #1 SMP Debian 5.10.9-1 (2021-01-20) x86_64 GNU/Linux
dmesg[ 487.597570] docker0: port 1(vethb7a49e6) entered blocking state
[ 487.597573] docker0: port 1(vethb7a49e6) entered disabled state
[ 487.597786] device vethb7a49e6 entered promiscuous mode
[ 487.773120] docker0: port 1(vethb7a49e6) entered disabled state
[ 487.776548] device vethb7a49e6 left promiscuous mode
[ 487.776556] docker0: port 1(vethb7a49e6) entered disabled state
nvidia-smi -aTimestamp : Sat Jan 30 08:26:51 2021
Driver Version : 460.32.03
CUDA Version : 11.2
Attached GPUs : 1
GPU 00000000:01:00.0
Product Name : GeForce GTX 960M
Product Brand : GeForce
Display Mode : Disabled
Display Active : Disabled
Persistence Mode : Enabled
MIG Mode
Current : N/A
Pending : N/A
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : N/A
GPU UUID : GPU-6064a007-a943-7f11-1ad7-12ac87046652
Minor Number : 0
VBIOS Version : 82.07.82.00.10
MultiGPU Board : No
Board ID : 0x100
GPU Part Number : N/A
Inforom Version
Image Version : N/A
OEM Object : N/A
ECC Object : N/A
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GPU Virtualization Mode
Virtualization Mode : None
Host VGPU Mode : N/A
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x01
Device : 0x00
Domain : 0x0000
Device Id : 0x139B10DE
Bus Id : 00000000:01:00.0
Sub System Id : 0x380217AA
GPU Link Info
PCIe Generation
Max : 3
Current : 1
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Fan Speed : N/A
Performance State : P8
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : N/A
HW Power Brake Slowdown : N/A
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 4046 MiB
Used : 4 MiB
Free : 4042 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 1 MiB
Free : 255 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Aggregate
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending Page Blacklist : N/A
Remapped Rows : N/A
Temperature
GPU Current Temp : 33 C
GPU Shutdown Temp : 101 C
GPU Slowdown Temp : 96 C
GPU Max Operating Temp : 92 C
GPU Target Temperature : N/A
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
Power Readings
Power Management : N/A
Power Draw : N/A
Power Limit : N/A
Default Power Limit : N/A
Enforced Power Limit : N/A
Min Power Limit : N/A
Max Power Limit : N/A
Clocks
Graphics : 135 MHz
SM : 135 MHz
Memory : 405 MHz
Video : 405 MHz
Applications Clocks
Graphics : 1097 MHz
Memory : 2505 MHz
Default Applications Clocks
Graphics : 1097 MHz
Memory : 2505 MHz
Max Clocks
Graphics : 1202 MHz
SM : 1202 MHz
Memory : 2505 MHz
Video : 1081 MHz
Max Customer Boost Clocks
Graphics : N/A
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Processes
GPU instance ID : N/A
Compute instance ID : N/A
Process ID : 1351
Type : G
Name : /usr/lib/xorg/Xorg
Used GPU Memory : 2 MiB
docker versionClient: Docker Engine - Community
Version: 20.10.2
API version: 1.41
Go version: go1.13.15
Git commit: 2291f61
Built: Mon Dec 28 16:17:34 2020
OS/Arch: linux/amd64
Context: default
Experimental: true
Server: Docker Engine - Community
Engine:
Version: 20.10.2
API version: 1.41 (minimum version 1.12)
Go version: go1.13.15
Git commit: 8891c58
Built: Mon Dec 28 16:15:28 2020
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.4.3
GitCommit: 269548fa27e0089a8b8278fc4fc781d7f65a939b
runc:
Version: 1.0.0-rc92
GitCommit: ff819c7e9184c13b7c2607fe6c30ae19403a7aff
docker-init:
Version: 0.19.0
GitCommit: de40ad0
dpkg -l '*nvidia*' _or_ rpm -qa '*nvidia*'Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-======================================-==============================-============-=================================================================
un bumblebee-nvidia <none> <none> (no description available)
ii glx-alternative-nvidia 1.2.0 amd64 allows the selection of NVIDIA as GLX provider
un libegl-nvidia-legacy-390xx0 <none> <none> (no description available)
un libegl-nvidia-tesla-418-0 <none> <none> (no description available)
un libegl-nvidia-tesla-440-0 <none> <none> (no description available)
un libegl-nvidia-tesla-450-0 <none> <none> (no description available)
ii libegl-nvidia0:amd64 460.32.03-1 amd64 NVIDIA binary EGL library
un libegl1-glvnd-nvidia <none> <none> (no description available)
un libegl1-nvidia <none> <none> (no description available)
un libgl1-glvnd-nvidia-glx <none> <none> (no description available)
ii libgl1-nvidia-glvnd-glx:amd64 460.32.03-1 amd64 NVIDIA binary OpenGL/GLX library (GLVND variant)
un libgl1-nvidia-glx <none> <none> (no description available)
un libgl1-nvidia-glx-any <none> <none> (no description available)
un libgl1-nvidia-glx-i386 <none> <none> (no description available)
un libgl1-nvidia-legacy-390xx-glx <none> <none> (no description available)
un libgl1-nvidia-tesla-418-glx <none> <none> (no description available)
un libgldispatch0-nvidia <none> <none> (no description available)
ii libgles-nvidia1:amd64 460.32.03-1 amd64 NVIDIA binary OpenGL|ES 1.x library
ii libgles-nvidia2:amd64 460.32.03-1 amd64 NVIDIA binary OpenGL|ES 2.x library
un libgles1-glvnd-nvidia <none> <none> (no description available)
un libgles2-glvnd-nvidia <none> <none> (no description available)
un libglvnd0-nvidia <none> <none> (no description available)
ii libglx-nvidia0:amd64 460.32.03-1 amd64 NVIDIA binary GLX library
un libglx0-glvnd-nvidia <none> <none> (no description available)
ii libnvidia-cbl:amd64 460.32.03-1 amd64 NVIDIA binary Vulkan ray tracing (cbl) library
un libnvidia-cbl-460.32.03 <none> <none> (no description available)
un libnvidia-cfg.so.1 <none> <none> (no description available)
ii libnvidia-cfg1:amd64 460.32.03-1 amd64 NVIDIA binary OpenGL/GLX configuration library
un libnvidia-cfg1-any <none> <none> (no description available)
ii libnvidia-container-tools 1.3.2-1 amd64 NVIDIA container runtime library (command-line tools)
ii libnvidia-container1:amd64 1.3.2-1 amd64 NVIDIA container runtime library
ii libnvidia-eglcore:amd64 460.32.03-1 amd64 NVIDIA binary EGL core libraries
un libnvidia-eglcore-460.32.03 <none> <none> (no description available)
ii libnvidia-glcore:amd64 460.32.03-1 amd64 NVIDIA binary OpenGL/GLX core libraries
un libnvidia-glcore-460.32.03 <none> <none> (no description available)
ii libnvidia-glvkspirv:amd64 460.32.03-1 amd64 NVIDIA binary Vulkan Spir-V compiler library
un libnvidia-glvkspirv-460.32.03 <none> <none> (no description available)
un libnvidia-legacy-340xx-cfg1 <none> <none> (no description available)
un libnvidia-legacy-390xx-cfg1 <none> <none> (no description available)
un libnvidia-ml.so.1 <none> <none> (no description available)
ii libnvidia-ml1:amd64 460.32.03-1 amd64 NVIDIA Management Library (NVML) runtime library
ii libnvidia-ptxjitcompiler1:amd64 460.32.03-1 amd64 NVIDIA PTX JIT Compiler
ii libnvidia-rtcore:amd64 460.32.03-1 amd64 NVIDIA binary Vulkan ray tracing (rtcore) library
un libnvidia-rtcore-460.32.03 <none> <none> (no description available)
un libnvidia-tesla-418-cfg1 <none> <none> (no description available)
un libnvidia-tesla-440-cfg1 <none> <none> (no description available)
un libnvidia-tesla-450-cfg1 <none> <none> (no description available)
un libopengl0-glvnd-nvidia <none> <none> (no description available)
ii nvidia-alternative 460.32.03-1 amd64 allows the selection of NVIDIA as GLX provider
un nvidia-alternative--kmod-alias <none> <none> (no description available)
un nvidia-alternative-legacy-173xx <none> <none> (no description available)
un nvidia-alternative-legacy-71xx <none> <none> (no description available)
un nvidia-alternative-legacy-96xx <none> <none> (no description available)
ii nvidia-container-runtime 3.4.1-1 amd64 NVIDIA container runtime
un nvidia-container-runtime-hook <none> <none> (no description available)
ii nvidia-container-toolkit 1.4.1-1 amd64 NVIDIA container runtime hook
un nvidia-cuda-mps <none> <none> (no description available)
un nvidia-current <none> <none> (no description available)
un nvidia-current-updates <none> <none> (no description available)
ii nvidia-detect 460.32.03-1 amd64 NVIDIA GPU detection utility
un nvidia-docker <none> <none> (no description available)
ii nvidia-docker2 2.5.0-1 all nvidia-docker CLI wrapper
ii nvidia-driver 460.32.03-1 amd64 NVIDIA metapackage
un nvidia-driver-any <none> <none> (no description available)
ii nvidia-driver-bin 460.32.03-1 amd64 NVIDIA driver support binaries
un nvidia-driver-bin-460.32.03 <none> <none> (no description available)
un nvidia-driver-binary <none> <none> (no description available)
ii nvidia-driver-libs:amd64 460.32.03-1 amd64 NVIDIA metapackage (OpenGL/GLX/EGL/GLES libraries)
un nvidia-driver-libs-any <none> <none> (no description available)
un nvidia-driver-libs-nonglvnd <none> <none> (no description available)
ii nvidia-egl-common 460.32.03-1 amd64 NVIDIA binary EGL driver - common files
ii nvidia-egl-icd:amd64 460.32.03-1 amd64 NVIDIA EGL installable client driver (ICD)
un nvidia-glx-any <none> <none> (no description available)
ii nvidia-installer-cleanup 20151021+13 amd64 cleanup after driver installation with the nvidia-installer
un nvidia-kernel-460.32.03 <none> <none> (no description available)
ii nvidia-kernel-common 20151021+13 amd64 NVIDIA binary kernel module support files
ii nvidia-kernel-dkms 460.32.03-1 amd64 NVIDIA binary kernel module DKMS source
un nvidia-kernel-source <none> <none> (no description available)
ii nvidia-kernel-support 460.32.03-1 amd64 NVIDIA binary kernel module support files
un nvidia-kernel-support--v1 <none> <none> (no description available)
un nvidia-kernel-support-any <none> <none> (no description available)
un nvidia-legacy-304xx-alternative <none> <none> (no description available)
un nvidia-legacy-304xx-driver <none> <none> (no description available)
un nvidia-legacy-340xx-alternative <none> <none> (no description available)
un nvidia-legacy-340xx-vdpau-driver <none> <none> (no description available)
un nvidia-legacy-390xx-vdpau-driver <none> <none> (no description available)
un nvidia-legacy-390xx-vulkan-icd <none> <none> (no description available)
ii nvidia-legacy-check 460.32.03-1 amd64 check for NVIDIA GPUs requiring a legacy driver
un nvidia-libopencl1-dev <none> <none> (no description available)
ii nvidia-modprobe 460.32.03-1 amd64 utility to load NVIDIA kernel modules and create device nodes
un nvidia-nonglvnd-vulkan-common <none> <none> (no description available)
un nvidia-nonglvnd-vulkan-icd <none> <none> (no description available)
un nvidia-opencl-icd <none> <none> (no description available)
ii nvidia-openjdk-8-jre 9.+8u272-b10-0+deb9u1~11.1.1-4 amd64 Obsolete OpenJDK Java runtime, for NVIDIA applications
ii nvidia-persistenced 460.32.03-1 amd64 daemon to maintain persistent software state in the NVIDIA driver
un nvidia-settings <none> <none> (no description available)
ii nvidia-smi 460.32.03-1 amd64 NVIDIA System Management Interface
ii nvidia-support 20151021+13 amd64 NVIDIA binary graphics driver support files
un nvidia-tesla-418-vdpau-driver <none> <none> (no description available)
un nvidia-tesla-418-vulkan-icd <none> <none> (no description available)
un nvidia-tesla-440-vdpau-driver <none> <none> (no description available)
un nvidia-tesla-440-vulkan-icd <none> <none> (no description available)
un nvidia-tesla-450-vulkan-icd <none> <none> (no description available)
un nvidia-tesla-alternative <none> <none> (no description available)
ii nvidia-vdpau-driver:amd64 460.32.03-1 amd64 Video Decode and Presentation API for Unix - NVIDIA driver
ii nvidia-vulkan-common 460.32.03-1 amd64 NVIDIA Vulkan driver - common files
ii nvidia-vulkan-icd:amd64 460.32.03-1 amd64 NVIDIA Vulkan installable client driver (ICD)
un nvidia-vulkan-icd-any <none> <none> (no description available)
ii xserver-xorg-video-nvidia 460.32.03-1 amd64 NVIDIA binary Xorg driver
un xserver-xorg-video-nvidia-any <none> <none> (no description available)
un xserver-xorg-video-nvidia-legacy-304xx <none> <none> (no description available)
nvidia-container-cli -Vversion: 1.3.2
build date: 2021-01-25T11:07+00:00
build revision: fa9c778f687e9ac7be52b0299fa3b6ac2d9fbf93
build compiler: x86_64-linux-gnu-gcc-8 8.3.0
build platform: x86_64
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections
/var/log/nvidia-container-toolkit.log is not generated.docker run --rm --gpus all nvidia/cuda:11.0-base-ubuntu20.04 nvidia-smi
@klueska Could you please check the issue?
@regzon thanks for indicating that this is still and issue. Could you please check what your systemd cgroup configuration is? (see for example this other issue which shows similar behaviour: https://github.com/docker/cli/issues/2104#issuecomment-535560873)
@regzon your issue is likely related to the fact that libnvidia-container does not support cgroups v2.
You will need to follow the suggestion in the comments above for https://github.com/NVIDIA/nvidia-docker/issues/1447#issuecomment-760059332 to force systemd to use v1 cgroups.
In any case -- we do not officially support Debian Testing nor cgroups v2 (yet).
@elezar @klueska thank you for your help. When forcing the systemd to not use the unified hierarchy, everything works fine. I thought that the latest libnvidia-container upgrade would resolve the issue (as it did for @super-cooper). But if the upgrade is not intended to fix the issue with cgroups, then everything is fine.
@klueska I'm having the same "issue", i.e. missing support for cgroups v2 (which I would very much like for other reasons).
Is there already an issue for this to track?
We are not planning on building support for cgroups v2 into the existing nvidia-docker stack.
Please see my comment above for more info:
https://github.com/NVIDIA/nvidia-docker/issues/1447#issuecomment-760189260
Let me rephrase it then: I want to use nvidia-docker on a system where cgroup v2 is enabled (systemd.unified_cgroup_hierarchy=true).
Right now this is not working and this bug is closed. So is there an issue that I can track to know when I can use nvidia-docker on hosts with cgroup v2 enabled?
We have it tracked in our internal JIRA with a link to this this issue as the location to report once the work is complete:
https://github.com/NVIDIA/libnvidia-container/issues/111
Most helpful comment
This seems to be related to
systemdupgrade to247.2-2which was uploaded to sid three weeks ago and made its way to testing now. This commit highlights the change of cgroup hierarchy: https://salsa.debian.org/systemd-team/systemd/-/commit/170fb124a32884bd9975ee4ea9e1ffbbc2ee26b4Indeed, default setup does not expose anymore
/sys/fs/cgroup/deviceswhichlibnvidia-containeruses according to https://github.com/NVIDIA/libnvidia-container/blob/ac02636a318fe7dcc71eaeb3cc55d0c8541c1072/src/nvc_container.c#L379-L382Using the documented
systemd.unified_cgroup_hierarchy=falsekernel command line parameter switch back the/sys/fs/cgroup/devicesentry andlibnvidia-containeris happier.