Nvidia-docker: nvidia-container-cli: detection error: driver error: failed to process request

Created on 5 Nov 2019  路  2Comments  路  Source: NVIDIA/nvidia-docker

1. Issue or feature description

Can not run any containers with --gpus. Results in the following error:

docker run --gpus all nvidia/cuda:10.0-base nvidia-smi

docker: Error response from daemon: OCI runtime create failed: 
container_linux.go:346: starting container process caused "process_linux.go:449: 
container init caused \"process_linux.go:432:
running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , 
stderr: nvidia-container-cli: detection error: driver error: failed to process request\\\\n\\\"\"": unknown.
ERRO[0000] error waiting for container: context canceled 

2. Steps to reproduce the issue

  • Pop! OS 18.04 LTS
  • NVIDIA Driver 435.21
  • Docker version 19.03.4, build 9013bf583a
  • Run docker run --gpus all nvidia/cuda:10.0-base nvidia-smi

    3. Information to

    • [X] Some nvidia-container information: nvidia-container-cli -k -d /dev/tty info
-- WARNING, the following logs are for debugging purposes only --

I1105 09:36:05.700922 25049 nvc.c:281] initializing library context (version=1.0.6, build=0000000000000000000000000000000000000000)
I1105 09:36:05.701033 25049 nvc.c:255] using root /
I1105 09:36:05.701054 25049 nvc.c:256] using ldcache /etc/ld.so.cache
I1105 09:36:05.701068 25049 nvc.c:257] using unprivileged user 65534:65534
I1105 09:36:05.704115 25050 nvc.c:191] loading kernel module nvidia
I1105 09:36:05.704692 25050 nvc.c:203] loading kernel module nvidia_uvm
I1105 09:36:05.705045 25050 nvc.c:211] loading kernel module nvidia_modeset
I1105 09:36:05.705877 25051 driver.c:133] starting driver service
I1105 09:36:05.763137 25049 nvc_info.c:437] requesting driver information with ''
nvidia-container-cli: detection error: driver error: failed to process request
I1105 09:36:05.763273 25049 nvc.c:318] shutting down library context
W1105 09:36:05.773442 25049 driver.c:220] terminating driver service (forced)
I1105 09:36:05.855677 25049 driver.c:233] driver service terminated with signal 9
  • [X] Kernel version from uname -a
    5.3.0-20-generic #21+system76~1572310493~18.04~b3805b2-Ubuntu

  • [X] NVIDIA packages version from dpkg -l '*nvidia*' _or_ rpm -qa '*nvidia*'

Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                                      Version                   Architecture              Description
+++-=========================================-=========================-=========================-========================================================================================
un  libgldispatch0-nvidia                     <none>                    <none>                    (no description available)
ii  libnvidia-cfg1-435:amd64                  435.21-1pop1~1571925584~1 amd64                     NVIDIA binary OpenGL/GLX configuration library
un  libnvidia-cfg1-any                        <none>                    <none>                    (no description available)
un  libnvidia-common                          <none>                    <none>                    (no description available)
ii  libnvidia-common-435                      435.21-1pop1~1571925584~1 all                       Shared files used by the NVIDIA libraries
rc  libnvidia-compute-418:amd64               430.34-1pop1~1563200531~1 amd64                     Transitional package for libnvidia-compute-430
rc  libnvidia-compute-430:amd64               435.21-1pop1~1567200870~1 amd64                     Transitional package for libnvidia-compute-435
ii  libnvidia-compute-435:amd64               435.21-1pop1~1571925584~1 amd64                     NVIDIA libcompute package
ii  libnvidia-compute-435:i386                435.21-1pop1~1571925584~1 i386                      NVIDIA libcompute package
ii  libnvidia-container-tools                 1.0.6-1pop1~1571281295~18 amd64                     NVIDIA container runtime library (command-line tools)
ii  libnvidia-container1:amd64                1.0.6-1pop1~1571281295~18 amd64                     NVIDIA container runtime library
un  libnvidia-decode                          <none>                    <none>                    (no description available)
ii  libnvidia-decode-435:amd64                435.21-1pop1~1571925584~1 amd64                     NVIDIA Video Decoding runtime libraries
ii  libnvidia-decode-435:i386                 435.21-1pop1~1571925584~1 i386                      NVIDIA Video Decoding runtime libraries
un  libnvidia-encode                          <none>                    <none>                    (no description available)
ii  libnvidia-encode-435:amd64                435.21-1pop1~1571925584~1 amd64                     NVENC Video Encoding runtime library
ii  libnvidia-encode-435:i386                 435.21-1pop1~1571925584~1 i386                      NVENC Video Encoding runtime library
un  libnvidia-fbc1                            <none>                    <none>                    (no description available)
ii  libnvidia-fbc1-435:amd64                  435.21-1pop1~1571925584~1 amd64                     NVIDIA OpenGL-based Framebuffer Capture runtime library
ii  libnvidia-fbc1-435:i386                   435.21-1pop1~1571925584~1 i386                      NVIDIA OpenGL-based Framebuffer Capture runtime library
un  libnvidia-gl                              <none>                    <none>                    (no description available)
ii  libnvidia-gl-435:amd64                    435.21-1pop1~1571925584~1 amd64                     NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
ii  libnvidia-gl-435:i386                     435.21-1pop1~1571925584~1 i386                      NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
un  libnvidia-ifr1                            <none>                    <none>                    (no description available)
ii  libnvidia-ifr1-435:amd64                  435.21-1pop1~1571925584~1 amd64                     NVIDIA OpenGL-based Inband Frame Readback runtime library
ii  libnvidia-ifr1-435:i386                   435.21-1pop1~1571925584~1 i386                      NVIDIA OpenGL-based Inband Frame Readback runtime library
un  libnvidia-ml1                             <none>                    <none>                    (no description available)
un  nvidia-304                                <none>                    <none>                    (no description available)
un  nvidia-340                                <none>                    <none>                    (no description available)
un  nvidia-384                                <none>                    <none>                    (no description available)
un  nvidia-390                                <none>                    <none>                    (no description available)
un  nvidia-common                             <none>                    <none>                    (no description available)
ii  nvidia-compute-utils-435                  435.21-1pop1~1571925584~1 amd64                     NVIDIA compute utilities
ii  nvidia-container-runtime                  3.1.4-0pop1~1569270714~18 amd64                     NVIDIA container runtime
un  nvidia-container-runtime-hook             <none>                    <none>                    (no description available)
ii  nvidia-container-toolkit                  1.0.5-0pop1~1569270707~18 amd64                     NVIDIA container runtime hook
ii  nvidia-dkms-435                           435.21-1pop1~1571925584~1 amd64                     NVIDIA DKMS package
un  nvidia-dkms-kernel                        <none>                    <none>                    (no description available)
ii  nvidia-driver-435                         435.21-1pop1~1571925584~1 amd64                     NVIDIA driver metapackage
un  nvidia-driver-binary                      <none>                    <none>                    (no description available)
un  nvidia-kernel-common                      <none>                    <none>                    (no description available)
ii  nvidia-kernel-common-435                  435.21-1pop1~1571925584~1 amd64                     Shared files used with the kernel module
un  nvidia-kernel-source                      <none>                    <none>                    (no description available)
ii  nvidia-kernel-source-435                  435.21-1pop1~1571925584~1 amd64                     NVIDIA kernel source package
un  nvidia-legacy-340xx-vdpau-driver          <none>                    <none>                    (no description available)
un  nvidia-opencl-icd                         <none>                    <none>                    (no description available)
un  nvidia-persistenced                       <none>                    <none>                    (no description available)
un  nvidia-prime                              <none>                    <none>                    (no description available)
ii  nvidia-settings                           390.77-0ubuntu0.18.04.1   amd64                     Tool for configuring the NVIDIA graphics driver
un  nvidia-settings-binary                    <none>                    <none>                    (no description available)
un  nvidia-smi                                <none>                    <none>                    (no description available)
un  nvidia-utils                              <none>                    <none>                    (no description available)
ii  nvidia-utils-435                          435.21-1pop1~1571925584~1 amd64                     NVIDIA driver support binaries
un  nvidia-vdpau-driver                       <none>                    <none>                    (no description available)
ii  system76-driver-nvidia                    19.04.20~1572357231~18.04 all                       Latest nvidia driver for System76 computers
ii  xserver-xorg-video-nvidia-435             435.21-1pop1~1571925584~1 amd64                     NVIDIA binary Xorg driver
  • [X] NVIDIA container library version from nvidia-container-cli -V
build date: 2019-10-17T03:23+00:00
build revision: 0000000000000000000000000000000000000000
build compiler: x86_64-linux-gnu-gcc-7 7.4.0
build platform: x86_64
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -Wdate-time -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -I/usr/include/tirpc -g -O2 -fdebug-prefix-map=/build/libnvidia-container-Q2bRCn/libnvidia-container-1.0.6=. -fstack-protector-strong -Wformat -Werror=format-security -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections -Wl,-Bsymbolic-functions -Wl,-z,relro
-- WARNING, the following logs are for debugging purposes only --

I1105 09:17:34.548451 19203 nvc.c:281] initializing library context (version=1.0.6, build=0000000000000000000000000000000000000000)
I1105 09:17:34.548533 19203 nvc.c:255] using root /
I1105 09:17:34.548545 19203 nvc.c:256] using ldcache /etc/ld.so.cache
I1105 09:17:34.548556 19203 nvc.c:257] using unprivileged user 65534:65534
I1105 09:17:34.550684 19209 nvc.c:191] loading kernel module nvidia
I1105 09:17:34.551097 19209 nvc.c:203] loading kernel module nvidia_uvm
I1105 09:17:34.551337 19209 nvc.c:211] loading kernel module nvidia_modeset
I1105 09:17:34.551901 19210 driver.c:133] starting driver service
I1105 09:17:34.612315 19203 nvc_container.c:364] configuring container with 'compute utility supervised'
I1105 09:17:34.612713 19203 nvc_container.c:212] selecting /var/lib/docker/overlay2/a69837ca5f4f81a5d68b974b6e48b03ab82e17fa7d24c2875e9df8bb58040ddd/merged/usr/local/cuda-10.0/compat/libcuda.so.410.104
I1105 09:17:34.612832 19203 nvc_container.c:212] selecting /var/lib/docker/overlay2/a69837ca5f4f81a5d68b974b6e48b03ab82e17fa7d24c2875e9df8bb58040ddd/merged/usr/local/cuda-10.0/compat/libnvidia-fatbinaryloader.so.410.104
I1105 09:17:34.612903 19203 nvc_container.c:212] selecting /var/lib/docker/overlay2/a69837ca5f4f81a5d68b974b6e48b03ab82e17fa7d24c2875e9df8bb58040ddd/merged/usr/local/cuda-10.0/compat/libnvidia-ptxjitcompiler.so.410.104
I1105 09:17:34.613167 19203 nvc_container.c:384] setting pid to 19190
I1105 09:17:34.613190 19203 nvc_container.c:385] setting rootfs to /var/lib/docker/overlay2/a69837ca5f4f81a5d68b974b6e48b03ab82e17fa7d24c2875e9df8bb58040ddd/merged
I1105 09:17:34.613206 19203 nvc_container.c:386] setting owner to 0:0
I1105 09:17:34.613234 19203 nvc_container.c:387] setting bins directory to /usr/bin
I1105 09:17:34.613251 19203 nvc_container.c:388] setting libs directory to /usr/lib/x86_64-linux-gnu
I1105 09:17:34.613268 19203 nvc_container.c:389] setting libs32 directory to /usr/lib/i386-linux-gnu
I1105 09:17:34.613284 19203 nvc_container.c:390] setting cudart directory to /usr/local/cuda
I1105 09:17:34.613300 19203 nvc_container.c:391] setting ldconfig to @/sbin/ldconfig.real (host relative)
I1105 09:17:34.613316 19203 nvc_container.c:392] setting mount namespace to /proc/19190/ns/mnt
I1105 09:17:34.613332 19203 nvc_container.c:394] setting devices cgroup to /sys/fs/cgroup/devices/docker/b64f8ae6075dd67d1927dfe37b42112a266a7288372633b594ca14b7c56eda3a
I1105 09:17:34.613356 19203 nvc_info.c:437] requesting driver information with ''
I1105 09:17:34.613470 19203 nvc.c:318] shutting down library context
W1105 09:17:34.623628 19203 driver.c:220] terminating driver service (forced)
I1105 09:17:34.708486 19203 driver.c:233] driver service terminated with signal 9

  • [X] Docker command, image and tag used
    docker run --gpus all nvidia/cuda:10.0-base nvidia-smi

Most helpful comment

Fixed by downgrading to 430.50 and using the official NVIDIA container repo rather than the one in Pop! OS. For vanilla Ubuntu, this is straightforward but for Pop! OS you need to add and pin both the Graphics Driver PPA and nvidia-docker repo since the 430 driver in Pop is just an alias for 435.

Steps:

1. Add the repo and PPA

sudo add-apt-repository ppa:graphics-drivers/ppa

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update

2. Then pin the repo and PPA by creating:

/etc/apt/preferences.d/nvidia-docker-pin-1002

Package: *
Pin: origin nvidia.github.io 
Pin-Priority: 1002

and
/etc/apt/preferences.d/nvidia-ppa-pin-1002

Package: *
Pin: release o=LP-PPA-graphics-drivers
Pin-Priority: 1002

3. Install the driver

sudo apt install nvidia-driver-430

4. Restart your machine

5. Install nvidia-container-toolkit and restart docker

sudo apt install nvidia-container-toolkit

systemctl restart docker

Once this is fixed in Pop! OS you can remove the repo and PPA (don't forget to remove the pinning).

Closing since this seems to be specific to Pop! OS.

All 2 comments

Fixed by downgrading to 430.50 and using the official NVIDIA container repo rather than the one in Pop! OS. For vanilla Ubuntu, this is straightforward but for Pop! OS you need to add and pin both the Graphics Driver PPA and nvidia-docker repo since the 430 driver in Pop is just an alias for 435.

Steps:

1. Add the repo and PPA

sudo add-apt-repository ppa:graphics-drivers/ppa

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update

2. Then pin the repo and PPA by creating:

/etc/apt/preferences.d/nvidia-docker-pin-1002

Package: *
Pin: origin nvidia.github.io 
Pin-Priority: 1002

and
/etc/apt/preferences.d/nvidia-ppa-pin-1002

Package: *
Pin: release o=LP-PPA-graphics-drivers
Pin-Priority: 1002

3. Install the driver

sudo apt install nvidia-driver-430

4. Restart your machine

5. Install nvidia-container-toolkit and restart docker

sudo apt install nvidia-container-toolkit

systemctl restart docker

Once this is fixed in Pop! OS you can remove the repo and PPA (don't forget to remove the pinning).

Closing since this seems to be specific to Pop! OS.

Exact same issue with Pop_os, with NVIDIA drivers:440.59 and previously with 440.44,
ii nvidia-driver-440 440.59-1pop1~1584480240~1 amd64
ii nvidia-container-toolkit 1.0.5-0pop1~1569270707~18 amd64

Solution seems to be to use:
the NVIDIA repo for the nvidia-container-toolkit not the Pop one.

issue is in Pop's packaged nvidia-container-toolkit.
as suggested in:
https://github.com/pop-os/nvidia-graphics-drivers/issues/31

Follow the steps in:
https://github.com/pop-os/nvidia-container-toolkit/issues/1

but only reinstall the
sudo apt install nvidia-container-toolkit
The driver from Pop works fine. No need to reinstall.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

o1lo01ol1o picture o1lo01ol1o  路  4Comments

SpotCrowdTech picture SpotCrowdTech  路  3Comments

terrybroad picture terrybroad  路  4Comments

mmitterma picture mmitterma  路  4Comments

adbeda picture adbeda  路  3Comments