Nvidia-docker: cgroup issue with nvidia container runtime on Debian testing

Created on 7 Jan 2021  路  17Comments  路  Source: NVIDIA/nvidia-docker

1. Issue or feature description

Whenever I try to build or run an NVidia container, Docker fails with the error message:

docker: Error response from daemon: OCI runtime create failed: container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: container error: cgroup subsystem devices not found: unknown.

2. Steps to reproduce the issue

$ docker run --rm --gpus all nvidia/cuda:11.0-base-ubuntu20.04 nvidia-smi

3. Information to attach (optional if deemed irrelevant)

  • [x] Some nvidia-container information: nvidia-container-cli -k -d /dev/tty info
    ```
    I0107 20:43:11.917241 36435 nvc.c:282] initializing library context (version=1.3.1, build=ac02636a318fe7dcc71eaeb3cc55d0c8541c1072)
    I0107 20:43:11.917283 36435 nvc.c:256] using root /
    I0107 20:43:11.917290 36435 nvc.c:257] using ldcache /etc/ld.so.cache
    I0107 20:43:11.917300 36435 nvc.c:258] using unprivileged user 1000:1000
    I0107 20:43:11.917316 36435 nvc.c:299] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
    I0107 20:43:11.917404 36435 nvc.c:301] dxcore initialization failed, continuing assuming a non-WSL environment
    W0107 20:43:11.918351 36436 nvc.c:187] failed to set inheritable capabilities
    W0107 20:43:11.918381 36436 nvc.c:188] skipping kernel modules load due to failure
    I0107 20:43:11.918527 36437 driver.c:101] starting driver service
    I0107 20:43:11.921734 36435 nvc_info.c:680] requesting driver information with ''
    I0107 20:43:11.932012 36435 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.450.80.02
    I0107 20:43:11.932402 36435 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.450.80.02
    I0107 20:43:11.932976 36435 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ptxjitcompiler.so.450.80.02
    I0107 20:43:11.933027 36435 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so.450.80.02
    I0107 20:43:11.933435 36435 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.450.80.02
    I0107 20:43:11.933470 36435 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.450.80.02
    I0107 20:43:11.933501 36435 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.450.80.02
    I0107 20:43:11.933991 36435 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-encode.so.450.80.02
    I0107 20:43:11.934024 36435 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.450.80.02
    I0107 20:43:11.934094 36435 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-cfg.so.450.80.02
    I0107 20:43:11.934545 36435 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libnvcuvid.so.450.80.02
    I0107 20:43:11.934976 36435 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libcuda.so.450.80.02
    I0107 20:43:11.935258 36435 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libGLX_nvidia.so.450.80.02
    I0107 20:43:11.935783 36435 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libGLESv2_nvidia.so.450.80.02
    I0107 20:43:11.936188 36435 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libGLESv1_CM_nvidia.so.450.80.02
    I0107 20:43:11.936243 36435 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libEGL_nvidia.so.450.80.02
    I0107 20:43:11.936622 36435 nvc_info.c:169] selecting /usr/lib/i386-linux-gnu/libnvidia-tls.so.450.80.02
    I0107 20:43:11.937013 36435 nvc_info.c:169] selecting /usr/lib/i386-linux-gnu/libnvidia-glvkspirv.so.450.80.02
    I0107 20:43:11.937296 36435 nvc_info.c:169] selecting /usr/lib/i386-linux-gnu/libnvidia-glsi.so.450.80.02
    I0107 20:43:11.937573 36435 nvc_info.c:169] selecting /usr/lib/i386-linux-gnu/libnvidia-glcore.so.450.80.02
    I0107 20:43:11.937881 36435 nvc_info.c:169] selecting /usr/lib/i386-linux-gnu/libnvidia-eglcore.so.450.80.02
    I0107 20:43:11.938438 36435 nvc_info.c:169] selecting /usr/lib/i386-linux-gnu/nvidia/current/libGLX_nvidia.so.450.80.02
    I0107 20:43:11.938920 36435 nvc_info.c:169] selecting /usr/lib/i386-linux-gnu/nvidia/current/libGLESv2_nvidia.so.450.80.02
    I0107 20:43:11.939282 36435 nvc_info.c:169] selecting /usr/lib/i386-linux-gnu/nvidia/current/libGLESv1_CM_nvidia.so.450.80.02
    I0107 20:43:11.939730 36435 nvc_info.c:169] selecting /usr/lib/i386-linux-gnu/nvidia/current/libEGL_nvidia.so.450.80.02
    W0107 20:43:11.939751 36435 nvc_info.c:350] missing library libnvidia-opencl.so
    W0107 20:43:11.939756 36435 nvc_info.c:350] missing library libnvidia-fatbinaryloader.so
    W0107 20:43:11.939761 36435 nvc_info.c:350] missing library libnvidia-allocator.so
    W0107 20:43:11.939767 36435 nvc_info.c:350] missing library libnvidia-compiler.so
    W0107 20:43:11.939772 36435 nvc_info.c:350] missing library libnvidia-ngx.so
    W0107 20:43:11.939776 36435 nvc_info.c:350] missing library libvdpau_nvidia.so
    W0107 20:43:11.939780 36435 nvc_info.c:350] missing library libnvidia-opticalflow.so
    W0107 20:43:11.939785 36435 nvc_info.c:350] missing library libnvidia-fbc.so
    W0107 20:43:11.939790 36435 nvc_info.c:350] missing library libnvidia-ifr.so
    W0107 20:43:11.939795 36435 nvc_info.c:350] missing library libnvoptix.so
    W0107 20:43:11.939801 36435 nvc_info.c:350] missing library libnvidia-cbl.so
    W0107 20:43:11.939805 36435 nvc_info.c:354] missing compat32 library libnvidia-ml.so
    W0107 20:43:11.939810 36435 nvc_info.c:354] missing compat32 library libnvidia-cfg.so
    W0107 20:43:11.939814 36435 nvc_info.c:354] missing compat32 library libcuda.so
    W0107 20:43:11.939818 36435 nvc_info.c:354] missing compat32 library libnvidia-opencl.so
    W0107 20:43:11.939823 36435 nvc_info.c:354] missing compat32 library libnvidia-ptxjitcompiler.so
    W0107 20:43:11.939828 36435 nvc_info.c:354] missing compat32 library libnvidia-fatbinaryloader.so
    W0107 20:43:11.939832 36435 nvc_info.c:354] missing compat32 library libnvidia-allocator.so
    W0107 20:43:11.939837 36435 nvc_info.c:354] missing compat32 library libnvidia-compiler.so
    W0107 20:43:11.939841 36435 nvc_info.c:354] missing compat32 library libnvidia-ngx.so
    W0107 20:43:11.939846 36435 nvc_info.c:354] missing compat32 library libvdpau_nvidia.so
    W0107 20:43:11.939851 36435 nvc_info.c:354] missing compat32 library libnvidia-encode.so
    W0107 20:43:11.939856 36435 nvc_info.c:354] missing compat32 library libnvidia-opticalflow.so
    W0107 20:43:11.939860 36435 nvc_info.c:354] missing compat32 library libnvcuvid.so
    W0107 20:43:11.939865 36435 nvc_info.c:354] missing compat32 library libnvidia-fbc.so
    W0107 20:43:11.939870 36435 nvc_info.c:354] missing compat32 library libnvidia-ifr.so
    W0107 20:43:11.939874 36435 nvc_info.c:354] missing compat32 library libnvidia-rtcore.so
    W0107 20:43:11.939879 36435 nvc_info.c:354] missing compat32 library libnvoptix.so
    W0107 20:43:11.939884 36435 nvc_info.c:354] missing compat32 library libnvidia-cbl.so
    I0107 20:43:11.940108 36435 nvc_info.c:276] selecting /usr/lib/nvidia/current/nvidia-smi
    I0107 20:43:11.940153 36435 nvc_info.c:276] selecting /usr/lib/nvidia/current/nvidia-debugdump
    I0107 20:43:11.940169 36435 nvc_info.c:276] selecting /usr/bin/nvidia-persistenced
    W0107 20:43:11.941108 36435 nvc_info.c:376] missing binary nvidia-cuda-mps-control
    W0107 20:43:11.941117 36435 nvc_info.c:376] missing binary nvidia-cuda-mps-server
    I0107 20:43:11.941136 36435 nvc_info.c:438] listing device /dev/nvidiactl
    I0107 20:43:11.941142 36435 nvc_info.c:438] listing device /dev/nvidia-uvm
    I0107 20:43:11.941146 36435 nvc_info.c:438] listing device /dev/nvidia-uvm-tools
    I0107 20:43:11.941151 36435 nvc_info.c:438] listing device /dev/nvidia-modeset
    I0107 20:43:11.941175 36435 nvc_info.c:317] listing ipc /run/nvidia-persistenced/socket
    W0107 20:43:11.941193 36435 nvc_info.c:321] missing ipc /tmp/nvidia-mps
    I0107 20:43:11.941198 36435 nvc_info.c:745] requesting device information with ''
    I0107 20:43:11.947879 36435 nvc_info.c:628] listing device /dev/nvidia0 (GPU-6518be5e-14ff-e277-21aa-73b482890bee at 00000000:07:00.0)
    NVRM version: 450.80.02
    CUDA version: 11.0

Device Index: 0
Device Minor: 0
Model: GeForce GTX 980 Ti
Brand: GeForce
GPU UUID: GPU-6518be5e-14ff-e277-21aa-73b482890bee
Bus Location: 00000000:07:00.0
Architecture: 5.2
I0107 20:43:11.947903 36435 nvc.c:337] shutting down library context
I0107 20:43:11.948696 36437 driver.c:156] terminating driver service
I0107 20:43:11.949026 36435 driver.c:196] driver service terminated successfully

 - [x] Kernel version from `uname -a`

Linux lambda 5.8.0-3-amd64 #1 SMP Debian 5.8.14-1 (2020-10-10) x86_64 GNU/Linux

 - [ ] Any relevant kernel output lines from `dmesg`
 - [x] Driver information from `nvidia-smi -a`
 ```
Thu Jan  7 15:45:08 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02    Driver Version: 450.80.02    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 980 Ti  On   | 00000000:07:00.0  On |                  N/A |
|  0%   45C    P5    29W / 250W |    403MiB /  6083MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      3023      G   /usr/lib/xorg/Xorg                177MiB |
|    0   N/A  N/A      4833      G   /usr/bin/gnome-shell              166MiB |
|    0   N/A  N/A      7609      G   ...AAAAAAAAA= --shared-files       54MiB |
+-----------------------------------------------------------------------------+
  • [x] Docker version from docker version
    ```
    Server: Docker Engine - Community
    Engine:
    Version: 20.10.2
    API version: 1.41 (minimum version 1.12)
    Go version: go1.13.15
    Git commit: 8891c58
    Built: Mon Dec 28 16:15:28 2020
    OS/Arch: linux/amd64
    Experimental: false
    containerd:
    Version: 1.4.3
    GitCommit: 269548fa27e0089a8b8278fc4fc781d7f65a939b
    nvidia:
    Version: 1.0.0-rc92
    GitCommit: ff819c7e9184c13b7c2607fe6c30ae19403a7aff
    docker-init:
    Version: 0.19.0
    GitCommit: de40ad0
 - [x] NVIDIA packages version from `dpkg -l '*nvidia*'` _or_ `rpm -qa '*nvidia*'`

Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-======================================-==============-============-=================================================================
un bumblebee-nvidia (no description available)
ii glx-alternative-nvidia 1.2.0 amd64 allows the selection of NVIDIA as GLX provider
un libegl-nvidia-legacy-390xx0 (no description available)
un libegl-nvidia-tesla-418-0 (no description available)
un libegl-nvidia-tesla-440-0 (no description available)
un libegl-nvidia-tesla-450-0 (no description available)
ii libegl-nvidia0:amd64 450.80.02-2 amd64 NVIDIA binary EGL library
ii libegl-nvidia0:i386 450.80.02-2 i386 NVIDIA binary EGL library
un libegl1-glvnd-nvidia (no description available)
un libegl1-nvidia (no description available)
un libgl1-glvnd-nvidia-glx (no description available)
ii libgl1-nvidia-glvnd-glx:amd64 450.80.02-2 amd64 NVIDIA binary OpenGL/GLX library (GLVND variant)
ii libgl1-nvidia-glvnd-glx:i386 450.80.02-2 i386 NVIDIA binary OpenGL/GLX library (GLVND variant)
un libgl1-nvidia-glx (no description available)
un libgl1-nvidia-glx-any (no description available)
un libgl1-nvidia-glx-i386 (no description available)
un libgl1-nvidia-legacy-390xx-glx (no description available)
un libgl1-nvidia-tesla-418-glx (no description available)
un libgldispatch0-nvidia (no description available)
ii libgles-nvidia1:amd64 450.80.02-2 amd64 NVIDIA binary OpenGL|ES 1.x library
ii libgles-nvidia1:i386 450.80.02-2 i386 NVIDIA binary OpenGL|ES 1.x library
ii libgles-nvidia2:amd64 450.80.02-2 amd64 NVIDIA binary OpenGL|ES 2.x library
ii libgles-nvidia2:i386 450.80.02-2 i386 NVIDIA binary OpenGL|ES 2.x library
un libgles1-glvnd-nvidia (no description available)
un libgles2-glvnd-nvidia (no description available)
un libglvnd0-nvidia (no description available)
ii libglx-nvidia0:amd64 450.80.02-2 amd64 NVIDIA binary GLX library
ii libglx-nvidia0:i386 450.80.02-2 i386 NVIDIA binary GLX library
un libglx0-glvnd-nvidia (no description available)
un libnvidia-cbl (no description available)
un libnvidia-cfg.so.1 (no description available)
ii libnvidia-cfg1:amd64 450.80.02-2 amd64 NVIDIA binary OpenGL/GLX configuration library
un libnvidia-cfg1-any (no description available)
ii libnvidia-container-tools 1.3.1-1 amd64 NVIDIA container runtime library (command-line tools)
ii libnvidia-container1:amd64 1.3.1-1 amd64 NVIDIA container runtime library
ii libnvidia-eglcore:amd64 450.80.02-2 amd64 NVIDIA binary EGL core libraries
ii libnvidia-eglcore:i386 450.80.02-2 i386 NVIDIA binary EGL core libraries
un libnvidia-eglcore-450.80.02 (no description available)
ii libnvidia-encode1:amd64 450.80.02-2 amd64 NVENC Video Encoding runtime library
ii libnvidia-glcore:amd64 450.80.02-2 amd64 NVIDIA binary OpenGL/GLX core libraries
ii libnvidia-glcore:i386 450.80.02-2 i386 NVIDIA binary OpenGL/GLX core libraries
un libnvidia-glcore-450.80.02 (no description available)
ii libnvidia-glvkspirv:amd64 450.80.02-2 amd64 NVIDIA binary Vulkan Spir-V compiler library
ii libnvidia-glvkspirv:i386 450.80.02-2 i386 NVIDIA binary Vulkan Spir-V compiler library
un libnvidia-glvkspirv-450.80.02 (no description available)
un libnvidia-legacy-340xx-cfg1 (no description available)
un libnvidia-legacy-390xx-cfg1 (no description available)
ii libnvidia-ml-dev:amd64 11.1.1-3 amd64 NVIDIA Management Library (NVML) development files
un libnvidia-ml.so.1 (no description available)
ii libnvidia-ml1:amd64 450.80.02-2 amd64 NVIDIA Management Library (NVML) runtime library
ii libnvidia-ptxjitcompiler1:amd64 450.80.02-2 amd64 NVIDIA PTX JIT Compiler
ii libnvidia-rtcore:amd64 450.80.02-2 amd64 NVIDIA binary Vulkan ray tracing (rtcore) library
un libnvidia-rtcore-450.80.02 (no description available)
un libnvidia-tesla-418-cfg1 (no description available)
un libnvidia-tesla-440-cfg1 (no description available)
un libnvidia-tesla-450-cfg1 (no description available)
un libnvidia-tesla-450-cuda1 (no description available)
un libnvidia-tesla-450-ml1 (no description available)
un libopengl0-glvnd-nvidia (no description available)
ii nvidia-alternative 450.80.02-2 amd64 allows the selection of NVIDIA as GLX provider
un nvidia-alternative--kmod-alias (no description available)
un nvidia-alternative-legacy-173xx (no description available)
un nvidia-alternative-legacy-71xx (no description available)
un nvidia-alternative-legacy-96xx (no description available)
ii nvidia-container-runtime 3.4.0-1 amd64 NVIDIA container runtime
un nvidia-container-runtime-hook (no description available)
ii nvidia-container-toolkit 1.4.0-1 amd64 NVIDIA container runtime hook
ii nvidia-cuda-dev:amd64 11.1.1-3 amd64 NVIDIA CUDA development files
un nvidia-cuda-doc (no description available)
ii nvidia-cuda-gdb 11.1.1-3 amd64 NVIDIA CUDA Debugger (GDB)
un nvidia-cuda-mps (no description available)
ii nvidia-cuda-toolkit 11.1.1-3 amd64 NVIDIA CUDA development toolkit
ii nvidia-cuda-toolkit-doc 11.1.1-3 all NVIDIA CUDA and OpenCL documentation
un nvidia-current (no description available)
un nvidia-current-updates (no description available)
un nvidia-docker (no description available)
ii nvidia-docker2 2.5.0-1 all nvidia-docker CLI wrapper
ii nvidia-driver 450.80.02-2 amd64 NVIDIA metapackage
un nvidia-driver-any (no description available)
ii nvidia-driver-bin 450.80.02-2 amd64 NVIDIA driver support binaries
un nvidia-driver-bin-450.80.02 (no description available)
un nvidia-driver-binary (no description available)
ii nvidia-driver-libs:amd64 450.80.02-2 amd64 NVIDIA metapackage (OpenGL/GLX/EGL/GLES libraries)
ii nvidia-driver-libs:i386 450.80.02-2 i386 NVIDIA metapackage (OpenGL/GLX/EGL/GLES libraries)
un nvidia-driver-libs-any (no description available)
un nvidia-driver-libs-nonglvnd (no description available)
ii nvidia-egl-common 450.80.02-2 amd64 NVIDIA binary EGL driver - common files
ii nvidia-egl-icd:amd64 450.80.02-2 amd64 NVIDIA EGL installable client driver (ICD)
ii nvidia-egl-icd:i386 450.80.02-2 i386 NVIDIA EGL installable client driver (ICD)
un nvidia-glx-any (no description available)
ii nvidia-installer-cleanup 20151021+12 amd64 cleanup after driver installation with the nvidia-installer
un nvidia-kernel-450.80.02 (no description available)
ii nvidia-kernel-common 20151021+12 amd64 NVIDIA binary kernel module support files
ii nvidia-kernel-dkms 450.80.02-2 amd64 NVIDIA binary kernel module DKMS source
un nvidia-kernel-source (no description available)
ii nvidia-kernel-support 450.80.02-2 amd64 NVIDIA binary kernel module support files
un nvidia-kernel-support--v1 (no description available)
un nvidia-kernel-support-any (no description available)
un nvidia-legacy-304xx-alternative (no description available)
un nvidia-legacy-304xx-driver (no description available)
un nvidia-legacy-340xx-alternative (no description available)
un nvidia-legacy-340xx-vdpau-driver (no description available)
un nvidia-legacy-390xx-vdpau-driver (no description available)
un nvidia-legacy-390xx-vulkan-icd (no description available)
ii nvidia-legacy-check 450.80.02-2 amd64 check for NVIDIA GPUs requiring a legacy driver
un nvidia-libopencl1 (no description available)
un nvidia-libopencl1-dev (no description available)
ii nvidia-modprobe 460.27.04-1 amd64 utility to load NVIDIA kernel modules and create device nodes
un nvidia-nonglvnd-vulkan-common (no description available)
un nvidia-nonglvnd-vulkan-icd (no description available)
un nvidia-opencl-dev (no description available)
un nvidia-opencl-icd (no description available)
un nvidia-openjdk-8-jre (no description available)
ii nvidia-persistenced 450.57-1 amd64 daemon to maintain persistent software state in the NVIDIA driver
ii nvidia-profiler 11.1.1-3 amd64 NVIDIA Profiler for CUDA and OpenCL
ii nvidia-settings 450.80.02-1+b1 amd64 tool for configuring the NVIDIA graphics driver
un nvidia-settings-gtk-450.80.02 (no description available)
ii nvidia-smi 450.80.02-2 amd64 NVIDIA System Management Interface
ii nvidia-support 20151021+12 amd64 NVIDIA binary graphics driver support files
un nvidia-tesla-418-vdpau-driver (no description available)
un nvidia-tesla-418-vulkan-icd (no description available)
un nvidia-tesla-440-vdpau-driver (no description available)
un nvidia-tesla-440-vulkan-icd (no description available)
un nvidia-tesla-450-driver (no description available)
un nvidia-tesla-450-vulkan-icd (no description available)
un nvidia-tesla-alternative (no description available)
ii nvidia-vdpau-driver:amd64 450.80.02-2 amd64 Video Decode and Presentation API for Unix - NVIDIA driver
ii nvidia-visual-profiler 11.1.1-3 amd64 NVIDIA Visual Profiler for CUDA and OpenCL
ii nvidia-vulkan-common 450.80.02-2 amd64 NVIDIA Vulkan driver - common files
ii nvidia-vulkan-icd:amd64 450.80.02-2 amd64 NVIDIA Vulkan installable client driver (ICD)
ii nvidia-vulkan-icd:i386 450.80.02-2 i386 NVIDIA Vulkan installable client driver (ICD)
un nvidia-vulkan-icd-any (no description available)
ii xserver-xorg-video-nvidia 450.80.02-2 amd64 NVIDIA binary Xorg driver
un xserver-xorg-video-nvidia-any (no description available)
un xserver-xorg-video-nvidia-legacy-304xx (no description available)

 - [x] NVIDIA container library version from `nvidia-container-cli -V`

version: 1.3.1
build date: 2020-12-14T14:18+00:00
build revision: ac02636a318fe7dcc71eaeb3cc55d0c8541c1072
build compiler: x86_64-linux-gnu-gcc-8 8.3.0
build platform: x86_64
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections

 - [ ] NVIDIA container library logs (see [troubleshooting](https://github.com/NVIDIA/nvidia-docker/wiki/Troubleshooting))
 - [x] Docker command, image and tag used

docker run --rm --gpus all nvidia/cuda:11.0-base-ubuntu20.04 nvidia-smi
```

Most helpful comment

This seems to be related to systemd upgrade to 247.2-2 which was uploaded to sid three weeks ago and made its way to testing now. This commit highlights the change of cgroup hierarchy: https://salsa.debian.org/systemd-team/systemd/-/commit/170fb124a32884bd9975ee4ea9e1ffbbc2ee26b4

Indeed, default setup does not expose anymore /sys/fs/cgroup/devices which libnvidia-container uses according to https://github.com/NVIDIA/libnvidia-container/blob/ac02636a318fe7dcc71eaeb3cc55d0c8541c1072/src/nvc_container.c#L379-L382

Using the documented systemd.unified_cgroup_hierarchy=false kernel command line parameter switch back the /sys/fs/cgroup/devices entry and libnvidia-container is happier.

All 17 comments

Hi,

I'm experiencing the same issue. For now I've worked around it:

In /etc/nvidia-container-runtime/config.toml I've set no-cgroups = true and now the container starts, but the nvidia devices are not added to the container. Once the devices are added the container works again.

Here are the relevant lines from my docker-compose.yml:

    devices:
      - /dev/nvidia0:/dev/nvidia0
      - /dev/nvidiactl:/dev/nvidiactl
      - /dev/nvidia-modeset:/dev/nvidia-modeset
      - /dev/nvidia-uvm:/dev/nvidia-uvm
      - /dev/nvidia-uvm-tools:/dev/nvidia-uvm-tools

This is equivalent to docker run --device /dev/whatever ..., but I'm not sure of the exact syntax.

Hope this helps.

This seems to be related to systemd upgrade to 247.2-2 which was uploaded to sid three weeks ago and made its way to testing now. This commit highlights the change of cgroup hierarchy: https://salsa.debian.org/systemd-team/systemd/-/commit/170fb124a32884bd9975ee4ea9e1ffbbc2ee26b4

Indeed, default setup does not expose anymore /sys/fs/cgroup/devices which libnvidia-container uses according to https://github.com/NVIDIA/libnvidia-container/blob/ac02636a318fe7dcc71eaeb3cc55d0c8541c1072/src/nvc_container.c#L379-L382

Using the documented systemd.unified_cgroup_hierarchy=false kernel command line parameter switch back the /sys/fs/cgroup/devices entry and libnvidia-container is happier.

@lissyx Thank you for printing out the crux of the issue.
We are in the process of rearchitecting the nvidia container stack in such a way that issues such as this should not exist in the future (because we will rely on runc (or whatever the configured container runtime is) to do all cgroup setup instead of doing it ourselves).

That said, this rearchitecting effort will take at least another 9 months to complete. I'm curious what the impact is (and how difficult it would be to add cgroupsv2 support to libnvidia-container in the meantime to prevent issues like this until the rearchitecting is complete).

Wanted to also chime in to say that I'm also experiencing this on Fedora 33

Could the title be updated to indicate that it is systemd cgroup layout related?

I was under the impression this issue was related to adding cgroup v2 support.

The systemd cgroup layout issue was resoolved in:
https://gitlab.com/nvidia/container-toolkit/libnvidia-container/-/merge_requests/49

And released today as part of libnvidia-container v1.3.2:
https://github.com/NVIDIA/libnvidia-container/releases/tag/v1.3.2

If these resolve this issue, please comment and close. Thanks.

I was under the impression this issue was related to adding cgroup v2 support.

The systemd cgroup layout issue was resoolved in:
https://gitlab.com/nvidia/container-toolkit/libnvidia-container/-/merge_requests/49

And released today as part of libnvidia-container v1.3.2:
https://github.com/NVIDIA/libnvidia-container/releases/tag/v1.3.2

If these resolve this issue, please comment and close. Thanks.

Issue resolved by the latest release. Thank you everyone <3

I was under the impression this issue was related to adding cgroup v2 support.
The systemd cgroup layout issue was resoolved in:
https://gitlab.com/nvidia/container-toolkit/libnvidia-container/-/merge_requests/49
And released today as part of libnvidia-container v1.3.2:
https://github.com/NVIDIA/libnvidia-container/releases/tag/v1.3.2
If these resolve this issue, please comment and close. Thanks.

Issue resolved by the latest release. Thank you everyone <3

Did you set the following parameter: systemd.unified_cgroup_hierarchy=false?

Or did you just upgrade all the packages?

I was under the impression this issue was related to adding cgroup v2 support.
The systemd cgroup layout issue was resoolved in:
https://gitlab.com/nvidia/container-toolkit/libnvidia-container/-/merge_requests/49
And released today as part of libnvidia-container v1.3.2:
https://github.com/NVIDIA/libnvidia-container/releases/tag/v1.3.2
If these resolve this issue, please comment and close. Thanks.

Issue resolved by the latest release. Thank you everyone <3

Did you set the following parameter: systemd.unified_cgroup_hierarchy=false?

Or did you just upgrade all the packages?

For me it was solved by upgrading the package.

Thank you, @super-cooper, for the reply.

I am having exactly the same issue on Debian Testing even after an upgrade.

1. Issue or feature description

docker: Error response from daemon: OCI runtime create failed: container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: container error: cgroup subsystem devices not found: unknown.

2. Steps to reproduce the issue

docker run --rm --gpus all nvidia/cuda:11.0-base-ubuntu20.04 nvidia-smi

3. Information to attach (optional if deemed irrelevant)

  • [x] Some nvidia-container information: nvidia-container-cli -k -d /dev/tty info
I0130 05:23:50.494974 4486 nvc.c:282] initializing library context (version=1.3.2, build=fa9c778f687e9ac7be52b0299fa3b6ac2d9fbf93)
I0130 05:23:50.495160 4486 nvc.c:256] using root /
I0130 05:23:50.495178 4486 nvc.c:257] using ldcache /etc/ld.so.cache
I0130 05:23:50.495194 4486 nvc.c:258] using unprivileged user 1000:1000
I0130 05:23:50.495256 4486 nvc.c:299] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I0130 05:23:50.495644 4486 nvc.c:301] dxcore initialization failed, continuing assuming a non-WSL environment
W0130 05:23:50.499341 4487 nvc.c:187] failed to set inheritable capabilities
W0130 05:23:50.499369 4487 nvc.c:188] skipping kernel modules load due to failure
I0130 05:23:50.499601 4488 driver.c:101] starting driver service
I0130 05:23:50.504376 4486 nvc_info.c:680] requesting driver information with ''
I0130 05:23:50.506132 4486 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.460.32.03
I0130 05:23:50.506191 4486 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.460.32.03
I0130 05:23:50.506283 4486 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ptxjitcompiler.so.460.32.03
I0130 05:23:50.506375 4486 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so.460.32.03
I0130 05:23:50.506418 4486 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.460.32.03
I0130 05:23:50.506467 4486 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.460.32.03
I0130 05:23:50.506512 4486 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.460.32.03
I0130 05:23:50.506557 4486 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.460.32.03
I0130 05:23:50.506669 4486 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-cfg.so.460.32.03
I0130 05:23:50.506714 4486 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-cbl.so.460.32.03
I0130 05:23:50.507077 4486 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libcuda.so.460.32.03
I0130 05:23:50.507376 4486 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libGLX_nvidia.so.460.32.03
I0130 05:23:50.507476 4486 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libGLESv2_nvidia.so.460.32.03
I0130 05:23:50.507569 4486 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libGLESv1_CM_nvidia.so.460.32.03
I0130 05:23:50.507669 4486 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libEGL_nvidia.so.460.32.03
W0130 05:23:50.507732 4486 nvc_info.c:350] missing library libnvidia-opencl.so
W0130 05:23:50.507741 4486 nvc_info.c:350] missing library libnvidia-fatbinaryloader.so
W0130 05:23:50.507748 4486 nvc_info.c:350] missing library libnvidia-allocator.so
W0130 05:23:50.507754 4486 nvc_info.c:350] missing library libnvidia-compiler.so
W0130 05:23:50.507760 4486 nvc_info.c:350] missing library libnvidia-ngx.so
W0130 05:23:50.507766 4486 nvc_info.c:350] missing library libvdpau_nvidia.so
W0130 05:23:50.507772 4486 nvc_info.c:350] missing library libnvidia-encode.so
W0130 05:23:50.507781 4486 nvc_info.c:350] missing library libnvidia-opticalflow.so
W0130 05:23:50.507788 4486 nvc_info.c:350] missing library libnvcuvid.so
W0130 05:23:50.507796 4486 nvc_info.c:350] missing library libnvidia-fbc.so
W0130 05:23:50.507806 4486 nvc_info.c:350] missing library libnvidia-ifr.so
W0130 05:23:50.507815 4486 nvc_info.c:350] missing library libnvoptix.so
W0130 05:23:50.507823 4486 nvc_info.c:354] missing compat32 library libnvidia-ml.so
W0130 05:23:50.507832 4486 nvc_info.c:354] missing compat32 library libnvidia-cfg.so
W0130 05:23:50.507848 4486 nvc_info.c:354] missing compat32 library libcuda.so
W0130 05:23:50.507859 4486 nvc_info.c:354] missing compat32 library libnvidia-opencl.so
W0130 05:23:50.507869 4486 nvc_info.c:354] missing compat32 library libnvidia-ptxjitcompiler.so
W0130 05:23:50.507880 4486 nvc_info.c:354] missing compat32 library libnvidia-fatbinaryloader.so
W0130 05:23:50.507889 4486 nvc_info.c:354] missing compat32 library libnvidia-allocator.so
W0130 05:23:50.507897 4486 nvc_info.c:354] missing compat32 library libnvidia-compiler.so
W0130 05:23:50.507906 4486 nvc_info.c:354] missing compat32 library libnvidia-ngx.so
W0130 05:23:50.507915 4486 nvc_info.c:354] missing compat32 library libvdpau_nvidia.so
W0130 05:23:50.507925 4486 nvc_info.c:354] missing compat32 library libnvidia-encode.so
W0130 05:23:50.507933 4486 nvc_info.c:354] missing compat32 library libnvidia-opticalflow.so
W0130 05:23:50.507942 4486 nvc_info.c:354] missing compat32 library libnvcuvid.so
W0130 05:23:50.507950 4486 nvc_info.c:354] missing compat32 library libnvidia-eglcore.so
W0130 05:23:50.507960 4486 nvc_info.c:354] missing compat32 library libnvidia-glcore.so
W0130 05:23:50.507970 4486 nvc_info.c:354] missing compat32 library libnvidia-tls.so
W0130 05:23:50.507979 4486 nvc_info.c:354] missing compat32 library libnvidia-glsi.so
W0130 05:23:50.507988 4486 nvc_info.c:354] missing compat32 library libnvidia-fbc.so
W0130 05:23:50.507998 4486 nvc_info.c:354] missing compat32 library libnvidia-ifr.so
W0130 05:23:50.508007 4486 nvc_info.c:354] missing compat32 library libnvidia-rtcore.so
W0130 05:23:50.508015 4486 nvc_info.c:354] missing compat32 library libnvoptix.so
W0130 05:23:50.508025 4486 nvc_info.c:354] missing compat32 library libGLX_nvidia.so
W0130 05:23:50.508031 4486 nvc_info.c:354] missing compat32 library libEGL_nvidia.so
W0130 05:23:50.508040 4486 nvc_info.c:354] missing compat32 library libGLESv2_nvidia.so
W0130 05:23:50.508050 4486 nvc_info.c:354] missing compat32 library libGLESv1_CM_nvidia.so
W0130 05:23:50.508060 4486 nvc_info.c:354] missing compat32 library libnvidia-glvkspirv.so
W0130 05:23:50.508068 4486 nvc_info.c:354] missing compat32 library libnvidia-cbl.so
I0130 05:23:50.508515 4486 nvc_info.c:276] selecting /usr/lib/nvidia/current/nvidia-smi
I0130 05:23:50.508580 4486 nvc_info.c:276] selecting /usr/lib/nvidia/current/nvidia-debugdump
I0130 05:23:50.508612 4486 nvc_info.c:276] selecting /usr/bin/nvidia-persistenced
W0130 05:23:50.509049 4486 nvc_info.c:376] missing binary nvidia-cuda-mps-control
W0130 05:23:50.509060 4486 nvc_info.c:376] missing binary nvidia-cuda-mps-server
I0130 05:23:50.509100 4486 nvc_info.c:438] listing device /dev/nvidiactl
I0130 05:23:50.509109 4486 nvc_info.c:438] listing device /dev/nvidia-uvm
I0130 05:23:50.509118 4486 nvc_info.c:438] listing device /dev/nvidia-uvm-tools
I0130 05:23:50.509127 4486 nvc_info.c:438] listing device /dev/nvidia-modeset
I0130 05:23:50.509168 4486 nvc_info.c:317] listing ipc /run/nvidia-persistenced/socket
W0130 05:23:50.509192 4486 nvc_info.c:321] missing ipc /tmp/nvidia-mps
I0130 05:23:50.509200 4486 nvc_info.c:745] requesting device information with ''
I0130 05:23:50.516712 4486 nvc_info.c:628] listing device /dev/nvidia0 (GPU-6064a007-a943-7f11-1ad7-12ac87046652 at 00000000:01:00.0)
NVRM version:   460.32.03
CUDA version:   11.2

Device Index:   0
Device Minor:   0
Model:          GeForce GTX 960M
Brand:          GeForce
GPU UUID:       GPU-6064a007-a943-7f11-1ad7-12ac87046652
Bus Location:   00000000:01:00.0
Architecture:   5.0
I0130 05:23:50.516775 4486 nvc.c:337] shutting down library context
I0130 05:23:50.517704 4488 driver.c:156] terminating driver service
I0130 05:23:50.518087 4486 driver.c:196] driver service terminated successfully
  • [x] Kernel version from uname -a
Linux stas 5.10.0-2-amd64 #1 SMP Debian 5.10.9-1 (2021-01-20) x86_64 GNU/Linux
  • [x] Any relevant kernel output lines from dmesg
[  487.597570] docker0: port 1(vethb7a49e6) entered blocking state
[  487.597573] docker0: port 1(vethb7a49e6) entered disabled state
[  487.597786] device vethb7a49e6 entered promiscuous mode
[  487.773120] docker0: port 1(vethb7a49e6) entered disabled state
[  487.776548] device vethb7a49e6 left promiscuous mode
[  487.776556] docker0: port 1(vethb7a49e6) entered disabled state
  • [x] Driver information from nvidia-smi -a
Timestamp                                 : Sat Jan 30 08:26:51 2021
Driver Version                            : 460.32.03
CUDA Version                              : 11.2

Attached GPUs                             : 1
GPU 00000000:01:00.0
    Product Name                          : GeForce GTX 960M
    Product Brand                         : GeForce
    Display Mode                          : Disabled
    Display Active                        : Disabled
    Persistence Mode                      : Enabled
    MIG Mode
        Current                           : N/A
        Pending                           : N/A
    Accounting Mode                       : Disabled
    Accounting Mode Buffer Size           : 4000
    Driver Model
        Current                           : N/A
        Pending                           : N/A
    Serial Number                         : N/A
    GPU UUID                              : GPU-6064a007-a943-7f11-1ad7-12ac87046652
    Minor Number                          : 0
    VBIOS Version                         : 82.07.82.00.10
    MultiGPU Board                        : No
    Board ID                              : 0x100
    GPU Part Number                       : N/A
    Inforom Version
        Image Version                     : N/A
        OEM Object                        : N/A
        ECC Object                        : N/A
        Power Management Object           : N/A
    GPU Operation Mode
        Current                           : N/A
        Pending                           : N/A
    GPU Virtualization Mode
        Virtualization Mode               : None
        Host VGPU Mode                    : N/A
    IBMNPU
        Relaxed Ordering Mode             : N/A
    PCI
        Bus                               : 0x01
        Device                            : 0x00
        Domain                            : 0x0000
        Device Id                         : 0x139B10DE
        Bus Id                            : 00000000:01:00.0
        Sub System Id                     : 0x380217AA
        GPU Link Info
            PCIe Generation
                Max                       : 3
                Current                   : 1
            Link Width
                Max                       : 16x
                Current                   : 16x
        Bridge Chip
            Type                          : N/A
            Firmware                      : N/A
        Replays Since Reset               : 0
        Replay Number Rollovers           : 0
        Tx Throughput                     : 0 KB/s
        Rx Throughput                     : 0 KB/s
    Fan Speed                             : N/A
    Performance State                     : P8
    Clocks Throttle Reasons
        Idle                              : Active
        Applications Clocks Setting       : Not Active
        SW Power Cap                      : Not Active
        HW Slowdown                       : Not Active
            HW Thermal Slowdown           : N/A
            HW Power Brake Slowdown       : N/A
        Sync Boost                        : Not Active
        SW Thermal Slowdown               : Not Active
        Display Clock Setting             : Not Active
    FB Memory Usage
        Total                             : 4046 MiB
        Used                              : 4 MiB
        Free                              : 4042 MiB
    BAR1 Memory Usage
        Total                             : 256 MiB
        Used                              : 1 MiB
        Free                              : 255 MiB
    Compute Mode                          : Default
    Utilization
        Gpu                               : 0 %
        Memory                            : 0 %
        Encoder                           : 0 %
        Decoder                           : 0 %
    Encoder Stats
        Active Sessions                   : 0
        Average FPS                       : 0
        Average Latency                   : 0
    FBC Stats
        Active Sessions                   : 0
        Average FPS                       : 0
        Average Latency                   : 0
    Ecc Mode
        Current                           : N/A
        Pending                           : N/A
    ECC Errors
        Volatile
            Single Bit
                Device Memory             : N/A
                Register File             : N/A
                L1 Cache                  : N/A
                L2 Cache                  : N/A
                Texture Memory            : N/A
                Texture Shared            : N/A
                CBU                       : N/A
                Total                     : N/A
            Double Bit
                Device Memory             : N/A
                Register File             : N/A
                L1 Cache                  : N/A
                L2 Cache                  : N/A
                Texture Memory            : N/A
                Texture Shared            : N/A
                CBU                       : N/A
                Total                     : N/A
        Aggregate
            Single Bit
                Device Memory             : N/A
                Register File             : N/A
                L1 Cache                  : N/A
                L2 Cache                  : N/A
                Texture Memory            : N/A
                Texture Shared            : N/A
                CBU                       : N/A
                Total                     : N/A
            Double Bit
                Device Memory             : N/A
                Register File             : N/A
                L1 Cache                  : N/A
                L2 Cache                  : N/A
                Texture Memory            : N/A
                Texture Shared            : N/A
                CBU                       : N/A
                Total                     : N/A
    Retired Pages
        Single Bit ECC                    : N/A
        Double Bit ECC                    : N/A
        Pending Page Blacklist            : N/A
    Remapped Rows                         : N/A
    Temperature
        GPU Current Temp                  : 33 C
        GPU Shutdown Temp                 : 101 C
        GPU Slowdown Temp                 : 96 C
        GPU Max Operating Temp            : 92 C
        GPU Target Temperature            : N/A
        Memory Current Temp               : N/A
        Memory Max Operating Temp         : N/A
    Power Readings
        Power Management                  : N/A
        Power Draw                        : N/A
        Power Limit                       : N/A
        Default Power Limit               : N/A
        Enforced Power Limit              : N/A
        Min Power Limit                   : N/A
        Max Power Limit                   : N/A
    Clocks
        Graphics                          : 135 MHz
        SM                                : 135 MHz
        Memory                            : 405 MHz
        Video                             : 405 MHz
    Applications Clocks
        Graphics                          : 1097 MHz
        Memory                            : 2505 MHz
    Default Applications Clocks
        Graphics                          : 1097 MHz
        Memory                            : 2505 MHz
    Max Clocks
        Graphics                          : 1202 MHz
        SM                                : 1202 MHz
        Memory                            : 2505 MHz
        Video                             : 1081 MHz
    Max Customer Boost Clocks
        Graphics                          : N/A
    Clock Policy
        Auto Boost                        : N/A
        Auto Boost Default                : N/A
    Processes
        GPU instance ID                   : N/A
        Compute instance ID               : N/A
        Process ID                        : 1351
            Type                          : G
            Name                          : /usr/lib/xorg/Xorg
            Used GPU Memory               : 2 MiB
  • [x] Docker version from docker version
Client: Docker Engine - Community
 Version:           20.10.2
 API version:       1.41
 Go version:        go1.13.15
 Git commit:        2291f61
 Built:             Mon Dec 28 16:17:34 2020
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.2
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       8891c58
  Built:            Mon Dec 28 16:15:28 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.3
  GitCommit:        269548fa27e0089a8b8278fc4fc781d7f65a939b
 runc:
  Version:          1.0.0-rc92
  GitCommit:        ff819c7e9184c13b7c2607fe6c30ae19403a7aff
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
  • [x] NVIDIA packages version from dpkg -l '*nvidia*' _or_ rpm -qa '*nvidia*'
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                                   Version                        Architecture Description
+++-======================================-==============================-============-=================================================================
un  bumblebee-nvidia                       <none>                         <none>       (no description available)
ii  glx-alternative-nvidia                 1.2.0                          amd64        allows the selection of NVIDIA as GLX provider
un  libegl-nvidia-legacy-390xx0            <none>                         <none>       (no description available)
un  libegl-nvidia-tesla-418-0              <none>                         <none>       (no description available)
un  libegl-nvidia-tesla-440-0              <none>                         <none>       (no description available)
un  libegl-nvidia-tesla-450-0              <none>                         <none>       (no description available)
ii  libegl-nvidia0:amd64                   460.32.03-1                    amd64        NVIDIA binary EGL library
un  libegl1-glvnd-nvidia                   <none>                         <none>       (no description available)
un  libegl1-nvidia                         <none>                         <none>       (no description available)
un  libgl1-glvnd-nvidia-glx                <none>                         <none>       (no description available)
ii  libgl1-nvidia-glvnd-glx:amd64          460.32.03-1                    amd64        NVIDIA binary OpenGL/GLX library (GLVND variant)
un  libgl1-nvidia-glx                      <none>                         <none>       (no description available)
un  libgl1-nvidia-glx-any                  <none>                         <none>       (no description available)
un  libgl1-nvidia-glx-i386                 <none>                         <none>       (no description available)
un  libgl1-nvidia-legacy-390xx-glx         <none>                         <none>       (no description available)
un  libgl1-nvidia-tesla-418-glx            <none>                         <none>       (no description available)
un  libgldispatch0-nvidia                  <none>                         <none>       (no description available)
ii  libgles-nvidia1:amd64                  460.32.03-1                    amd64        NVIDIA binary OpenGL|ES 1.x library
ii  libgles-nvidia2:amd64                  460.32.03-1                    amd64        NVIDIA binary OpenGL|ES 2.x library
un  libgles1-glvnd-nvidia                  <none>                         <none>       (no description available)
un  libgles2-glvnd-nvidia                  <none>                         <none>       (no description available)
un  libglvnd0-nvidia                       <none>                         <none>       (no description available)
ii  libglx-nvidia0:amd64                   460.32.03-1                    amd64        NVIDIA binary GLX library
un  libglx0-glvnd-nvidia                   <none>                         <none>       (no description available)
ii  libnvidia-cbl:amd64                    460.32.03-1                    amd64        NVIDIA binary Vulkan ray tracing (cbl) library
un  libnvidia-cbl-460.32.03                <none>                         <none>       (no description available)
un  libnvidia-cfg.so.1                     <none>                         <none>       (no description available)
ii  libnvidia-cfg1:amd64                   460.32.03-1                    amd64        NVIDIA binary OpenGL/GLX configuration library
un  libnvidia-cfg1-any                     <none>                         <none>       (no description available)
ii  libnvidia-container-tools              1.3.2-1                        amd64        NVIDIA container runtime library (command-line tools)
ii  libnvidia-container1:amd64             1.3.2-1                        amd64        NVIDIA container runtime library
ii  libnvidia-eglcore:amd64                460.32.03-1                    amd64        NVIDIA binary EGL core libraries
un  libnvidia-eglcore-460.32.03            <none>                         <none>       (no description available)
ii  libnvidia-glcore:amd64                 460.32.03-1                    amd64        NVIDIA binary OpenGL/GLX core libraries
un  libnvidia-glcore-460.32.03             <none>                         <none>       (no description available)
ii  libnvidia-glvkspirv:amd64              460.32.03-1                    amd64        NVIDIA binary Vulkan Spir-V compiler library
un  libnvidia-glvkspirv-460.32.03          <none>                         <none>       (no description available)
un  libnvidia-legacy-340xx-cfg1            <none>                         <none>       (no description available)
un  libnvidia-legacy-390xx-cfg1            <none>                         <none>       (no description available)
un  libnvidia-ml.so.1                      <none>                         <none>       (no description available)
ii  libnvidia-ml1:amd64                    460.32.03-1                    amd64        NVIDIA Management Library (NVML) runtime library
ii  libnvidia-ptxjitcompiler1:amd64        460.32.03-1                    amd64        NVIDIA PTX JIT Compiler
ii  libnvidia-rtcore:amd64                 460.32.03-1                    amd64        NVIDIA binary Vulkan ray tracing (rtcore) library
un  libnvidia-rtcore-460.32.03             <none>                         <none>       (no description available)
un  libnvidia-tesla-418-cfg1               <none>                         <none>       (no description available)
un  libnvidia-tesla-440-cfg1               <none>                         <none>       (no description available)
un  libnvidia-tesla-450-cfg1               <none>                         <none>       (no description available)
un  libopengl0-glvnd-nvidia                <none>                         <none>       (no description available)
ii  nvidia-alternative                     460.32.03-1                    amd64        allows the selection of NVIDIA as GLX provider
un  nvidia-alternative--kmod-alias         <none>                         <none>       (no description available)
un  nvidia-alternative-legacy-173xx        <none>                         <none>       (no description available)
un  nvidia-alternative-legacy-71xx         <none>                         <none>       (no description available)
un  nvidia-alternative-legacy-96xx         <none>                         <none>       (no description available)
ii  nvidia-container-runtime               3.4.1-1                        amd64        NVIDIA container runtime
un  nvidia-container-runtime-hook          <none>                         <none>       (no description available)
ii  nvidia-container-toolkit               1.4.1-1                        amd64        NVIDIA container runtime hook
un  nvidia-cuda-mps                        <none>                         <none>       (no description available)
un  nvidia-current                         <none>                         <none>       (no description available)
un  nvidia-current-updates                 <none>                         <none>       (no description available)
ii  nvidia-detect                          460.32.03-1                    amd64        NVIDIA GPU detection utility
un  nvidia-docker                          <none>                         <none>       (no description available)
ii  nvidia-docker2                         2.5.0-1                        all          nvidia-docker CLI wrapper
ii  nvidia-driver                          460.32.03-1                    amd64        NVIDIA metapackage
un  nvidia-driver-any                      <none>                         <none>       (no description available)
ii  nvidia-driver-bin                      460.32.03-1                    amd64        NVIDIA driver support binaries
un  nvidia-driver-bin-460.32.03            <none>                         <none>       (no description available)
un  nvidia-driver-binary                   <none>                         <none>       (no description available)
ii  nvidia-driver-libs:amd64               460.32.03-1                    amd64        NVIDIA metapackage (OpenGL/GLX/EGL/GLES libraries)
un  nvidia-driver-libs-any                 <none>                         <none>       (no description available)
un  nvidia-driver-libs-nonglvnd            <none>                         <none>       (no description available)
ii  nvidia-egl-common                      460.32.03-1                    amd64        NVIDIA binary EGL driver - common files
ii  nvidia-egl-icd:amd64                   460.32.03-1                    amd64        NVIDIA EGL installable client driver (ICD)
un  nvidia-glx-any                         <none>                         <none>       (no description available)
ii  nvidia-installer-cleanup               20151021+13                    amd64        cleanup after driver installation with the nvidia-installer
un  nvidia-kernel-460.32.03                <none>                         <none>       (no description available)
ii  nvidia-kernel-common                   20151021+13                    amd64        NVIDIA binary kernel module support files
ii  nvidia-kernel-dkms                     460.32.03-1                    amd64        NVIDIA binary kernel module DKMS source
un  nvidia-kernel-source                   <none>                         <none>       (no description available)
ii  nvidia-kernel-support                  460.32.03-1                    amd64        NVIDIA binary kernel module support files
un  nvidia-kernel-support--v1              <none>                         <none>       (no description available)
un  nvidia-kernel-support-any              <none>                         <none>       (no description available)
un  nvidia-legacy-304xx-alternative        <none>                         <none>       (no description available)
un  nvidia-legacy-304xx-driver             <none>                         <none>       (no description available)
un  nvidia-legacy-340xx-alternative        <none>                         <none>       (no description available)
un  nvidia-legacy-340xx-vdpau-driver       <none>                         <none>       (no description available)
un  nvidia-legacy-390xx-vdpau-driver       <none>                         <none>       (no description available)
un  nvidia-legacy-390xx-vulkan-icd         <none>                         <none>       (no description available)
ii  nvidia-legacy-check                    460.32.03-1                    amd64        check for NVIDIA GPUs requiring a legacy driver
un  nvidia-libopencl1-dev                  <none>                         <none>       (no description available)
ii  nvidia-modprobe                        460.32.03-1                    amd64        utility to load NVIDIA kernel modules and create device nodes
un  nvidia-nonglvnd-vulkan-common          <none>                         <none>       (no description available)
un  nvidia-nonglvnd-vulkan-icd             <none>                         <none>       (no description available)
un  nvidia-opencl-icd                      <none>                         <none>       (no description available)
ii  nvidia-openjdk-8-jre                   9.+8u272-b10-0+deb9u1~11.1.1-4 amd64        Obsolete OpenJDK Java runtime, for NVIDIA applications
ii  nvidia-persistenced                    460.32.03-1                    amd64        daemon to maintain persistent software state in the NVIDIA driver
un  nvidia-settings                        <none>                         <none>       (no description available)
ii  nvidia-smi                             460.32.03-1                    amd64        NVIDIA System Management Interface
ii  nvidia-support                         20151021+13                    amd64        NVIDIA binary graphics driver support files
un  nvidia-tesla-418-vdpau-driver          <none>                         <none>       (no description available)
un  nvidia-tesla-418-vulkan-icd            <none>                         <none>       (no description available)
un  nvidia-tesla-440-vdpau-driver          <none>                         <none>       (no description available)
un  nvidia-tesla-440-vulkan-icd            <none>                         <none>       (no description available)
un  nvidia-tesla-450-vulkan-icd            <none>                         <none>       (no description available)
un  nvidia-tesla-alternative               <none>                         <none>       (no description available)
ii  nvidia-vdpau-driver:amd64              460.32.03-1                    amd64        Video Decode and Presentation API for Unix - NVIDIA driver
ii  nvidia-vulkan-common                   460.32.03-1                    amd64        NVIDIA Vulkan driver - common files
ii  nvidia-vulkan-icd:amd64                460.32.03-1                    amd64        NVIDIA Vulkan installable client driver (ICD)
un  nvidia-vulkan-icd-any                  <none>                         <none>       (no description available)
ii  xserver-xorg-video-nvidia              460.32.03-1                    amd64        NVIDIA binary Xorg driver
un  xserver-xorg-video-nvidia-any          <none>                         <none>       (no description available)
un  xserver-xorg-video-nvidia-legacy-304xx <none>                         <none>       (no description available)
  • [x] NVIDIA container library version from nvidia-container-cli -V
version: 1.3.2
build date: 2021-01-25T11:07+00:00
build revision: fa9c778f687e9ac7be52b0299fa3b6ac2d9fbf93
build compiler: x86_64-linux-gnu-gcc-8 8.3.0
build platform: x86_64
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections
  • [x] NVIDIA container library logs (see troubleshooting)
    /var/log/nvidia-container-toolkit.log is not generated.
  • [x] Docker command, image and tag used
docker run --rm --gpus all nvidia/cuda:11.0-base-ubuntu20.04 nvidia-smi

@klueska Could you please check the issue?

@regzon thanks for indicating that this is still and issue. Could you please check what your systemd cgroup configuration is? (see for example this other issue which shows similar behaviour: https://github.com/docker/cli/issues/2104#issuecomment-535560873)

@regzon your issue is likely related to the fact that libnvidia-container does not support cgroups v2.

You will need to follow the suggestion in the comments above for https://github.com/NVIDIA/nvidia-docker/issues/1447#issuecomment-760059332 to force systemd to use v1 cgroups.

In any case -- we do not officially support Debian Testing nor cgroups v2 (yet).

@elezar @klueska thank you for your help. When forcing the systemd to not use the unified hierarchy, everything works fine. I thought that the latest libnvidia-container upgrade would resolve the issue (as it did for @super-cooper). But if the upgrade is not intended to fix the issue with cgroups, then everything is fine.

@klueska I'm having the same "issue", i.e. missing support for cgroups v2 (which I would very much like for other reasons).
Is there already an issue for this to track?

We are not planning on building support for cgroups v2 into the existing nvidia-docker stack.

Please see my comment above for more info:
https://github.com/NVIDIA/nvidia-docker/issues/1447#issuecomment-760189260

Let me rephrase it then: I want to use nvidia-docker on a system where cgroup v2 is enabled (systemd.unified_cgroup_hierarchy=true).
Right now this is not working and this bug is closed. So is there an issue that I can track to know when I can use nvidia-docker on hosts with cgroup v2 enabled?

We have it tracked in our internal JIRA with a link to this this issue as the location to report once the work is complete:
https://github.com/NVIDIA/libnvidia-container/issues/111

Was this page helpful?
0 / 5 - 0 ratings