Nvidia-docker: Couldn't find libnvidia-ml.so library in your system

Created on 2 Nov 2018  路  37Comments  路  Source: NVIDIA/nvidia-docker


1. Issue or feature description

Missing libnvidia-ml.so and libcublas.9.so library in docker container.

My system is Ubuntu 18.10 and I tried with nvidia drivers 390, 396 and 410.

2. Steps to reproduce the issue

docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi

NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system.
Please also try adding directory that contains libnvidia-ml.so to your system PATH.

This also holds for the tensorflow docker images. When I run the cuda image in interactive mode and try to import tensorflow via python it says that libcublas.9.so is not found although I can see it in the /usr/local/cuda/lib64 directory.

Everything works fine on host machine though.

3. Information to attach (optional if deemed irrelevant)

  • [ ] Kernel version from uname -a
Linux box 4.18.0-10-generic #11-Ubuntu SMP Thu Oct 11 15:13:55 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
  • [ ] Any relevant kernel output lines from dmesg
  • [ ] Driver information from nvidia-smi -a
==============NVSMI LOG==============

Timestamp                           : Fri Nov  2 11:09:45 2018
Driver Version                      : 410.73
CUDA Version                        : 10.0

Attached GPUs                       : 1
GPU 00000000:65:00.0
    Product Name                    : GeForce GTX 1080 Ti
    Product Brand                   : GeForce
    Display Mode                    : Enabled
    Display Active                  : Enabled
    Persistence Mode                : Disabled
    Accounting Mode                 : Disabled
    Accounting Mode Buffer Size     : 4000
    Driver Model
        Current                     : N/A
        Pending                     : N/A
    Serial Number                   : N/A
    GPU UUID                        : GPU-14bfddbd-9230-c05e-fa52-d468af601fc4
    Minor Number                    : 0
    VBIOS Version                   : 86.02.39.00.2E
    MultiGPU Board                  : No
    Board ID                        : 0x6500
    GPU Part Number                 : N/A
    Inforom Version
        Image Version               : G001.0000.01.04
        OEM Object                  : 1.1
        ECC Object                  : N/A
        Power Management Object     : N/A
    GPU Operation Mode
        Current                     : N/A
        Pending                     : N/A
    GPU Virtualization Mode
        Virtualization mode         : None
    IBMNPU
        Relaxed Ordering Mode       : N/A
    PCI
        Bus                         : 0x65
        Device                      : 0x00
        Domain                      : 0x0000
        Device Id                   : 0x1B0610DE
        Bus Id                      : 00000000:65:00.0
        Sub System Id               : 0x147019DA
        GPU Link Info
            PCIe Generation
                Max                 : 3
                Current             : 3
            Link Width
                Max                 : 16x
                Current             : 16x
        Bridge Chip
            Type                    : N/A
            Firmware                : N/A
        Replays since reset         : 0
        Tx Throughput               : 3000 KB/s
        Rx Throughput               : 2000 KB/s
    Fan Speed                       : 0 %
    Performance State               : P0
    Clocks Throttle Reasons
        Idle                        : Active
        Applications Clocks Setting : Not Active
        SW Power Cap                : Not Active
        HW Slowdown                 : Not Active
            HW Thermal Slowdown     : Not Active
            HW Power Brake Slowdown : Not Active
        Sync Boost                  : Not Active
        SW Thermal Slowdown         : Not Active
        Display Clock Setting       : Not Active
    FB Memory Usage
        Total                       : 11177 MiB
        Used                        : 751 MiB
        Free                        : 10426 MiB
    BAR1 Memory Usage
        Total                       : 256 MiB
        Used                        : 6 MiB
        Free                        : 250 MiB
    Compute Mode                    : Default
    Utilization
        Gpu                         : 3 %
        Memory                      : 1 %
        Encoder                     : 0 %
        Decoder                     : 0 %
    Encoder Stats
        Active Sessions             : 0
        Average FPS                 : 0
        Average Latency             : 0
    FBC Stats
        Active Sessions             : 0
        Average FPS                 : 0
        Average Latency             : 0
    Ecc Mode
        Current                     : N/A
        Pending                     : N/A
    ECC Errors
        Volatile
            Single Bit            
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : N/A
            Double Bit            
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : N/A
        Aggregate
            Single Bit            
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : N/A
            Double Bit            
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : N/A
    Retired Pages
        Single Bit ECC              : N/A
        Double Bit ECC              : N/A
        Pending                     : N/A
    Temperature
        GPU Current Temp            : 35 C
        GPU Shutdown Temp           : 96 C
        GPU Slowdown Temp           : 93 C
        GPU Max Operating Temp      : N/A
        Memory Current Temp         : N/A
        Memory Max Operating Temp   : N/A
    Power Readings
        Power Management            : Supported
        Power Draw                  : 60.67 W
        Power Limit                 : 250.00 W
        Default Power Limit         : 250.00 W
        Enforced Power Limit        : 250.00 W
        Min Power Limit             : 125.00 W
        Max Power Limit             : 300.00 W
    Clocks
        Graphics                    : 1480 MHz
        SM                          : 1480 MHz
        Memory                      : 5508 MHz
        Video                       : 1265 MHz
    Applications Clocks
        Graphics                    : N/A
        Memory                      : N/A
    Default Applications Clocks
        Graphics                    : N/A
        Memory                      : N/A
    Max Clocks
        Graphics                    : 1911 MHz
        SM                          : 1911 MHz
        Memory                      : 5505 MHz
        Video                       : 1620 MHz
    Max Customer Boost Clocks
        Graphics                    : N/A
    Clock Policy
        Auto Boost                  : N/A
        Auto Boost Default          : N/A
    Processes
        Process ID                  : 1454
            Type                    : G
            Name                    : /usr/lib/xorg/Xorg
            Used GPU Memory         : 40 MiB
        Process ID                  : 1533
            Type                    : G
            Name                    : /usr/bin/gnome-shell
            Used GPU Memory         : 80 MiB
        Process ID                  : 2450
            Type                    : G
            Name                    : /usr/lib/xorg/Xorg
            Used GPU Memory         : 363 MiB
        Process ID                  : 2631
            Type                    : G
            Name                    : /usr/bin/gnome-shell
            Used GPU Memory         : 142 MiB
        Process ID                  : 3068
            Type                    : G
            Name                    : /opt/google/chrome/chrome --type=gpu-process --field-trial-handle=15466691898050642703,2714747135580672923,131072 --enable-crash-reporter=b6227030-26a9-487c-b99f-efddda704fbf, --gpu-preferences=KAAAAAAAAACAAABAAQAAAAAAAAAAAGAAAAAAAAAAAAAIAAAAAAAAAAgAAAAAAAAA --enable-crash-reporter=b6227030-26a9-487c-b99f-efddda704fbf, --service-request-channel-token=405587616121577545
            Used GPU Memory         : 121 MiB
  • [ ] Docker version from docker version
Client:
 Version:           18.06.1-ce
 API version:       1.38
 Go version:        go1.10.3
 Git commit:        e68fc7a
 Built:             Tue Aug 21 17:24:51 2018
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          18.06.1-ce
  API version:      1.38 (minimum version 1.12)
  Go version:       go1.10.3
  Git commit:       e68fc7a
  Built:            Tue Aug 21 17:23:15 2018
  OS/Arch:          linux/amd64
  Experimental:     false
  • [ ] NVIDIA packages version from dpkg -l '*nvidia*' _or_ rpm -qa '*nvidia*'
un  libgldispatch0-nvidia      <none>             <none>             (no description available)
ii  libnvidia-cfg1-410:amd64   410.73-0ubuntu0~gp amd64              NVIDIA binary OpenGL/GLX configuration library
un  libnvidia-cfg1-any         <none>             <none>             (no description available)
un  libnvidia-common           <none>             <none>             (no description available)
ii  libnvidia-common-410       410.73-0ubuntu0~gp all                Shared files used by the NVIDIA libraries
rc  libnvidia-compute-390:amd6 390.87-0ubuntu1    amd64              NVIDIA libcompute package
rc  libnvidia-compute-390:i386 390.87-0ubuntu1    i386               NVIDIA libcompute package
rc  libnvidia-compute-396:amd6 396.54-0ubuntu0~gp amd64              NVIDIA libcompute package
rc  libnvidia-compute-396:i386 396.54-0ubuntu0~gp i386               NVIDIA libcompute package
ii  libnvidia-compute-410:amd6 410.73-0ubuntu0~gp amd64              NVIDIA libcompute package
ii  libnvidia-compute-410:i386 410.73-0ubuntu0~gp i386               NVIDIA libcompute package
ii  libnvidia-container-tools  1.0.0-1            amd64              NVIDIA container runtime library (command-line tools)
ii  libnvidia-container1:amd64 1.0.0-1            amd64              NVIDIA container runtime library
un  libnvidia-decode           <none>             <none>             (no description available)
ii  libnvidia-decode-410:amd64 410.73-0ubuntu0~gp amd64              NVIDIA Video Decoding runtime libraries
ii  libnvidia-decode-410:i386  410.73-0ubuntu0~gp i386               NVIDIA Video Decoding runtime libraries
un  libnvidia-encode           <none>             <none>             (no description available)
ii  libnvidia-encode-410:amd64 410.73-0ubuntu0~gp amd64              NVENC Video Encoding runtime library
ii  libnvidia-encode-410:i386  410.73-0ubuntu0~gp i386               NVENC Video Encoding runtime library
un  libnvidia-fbc1             <none>             <none>             (no description available)
ii  libnvidia-fbc1-410:amd64   410.73-0ubuntu0~gp amd64              NVIDIA OpenGL-based Framebuffer Capture runtime library
ii  libnvidia-fbc1-410:i386    410.73-0ubuntu0~gp i386               NVIDIA OpenGL-based Framebuffer Capture runtime library
un  libnvidia-gl               <none>             <none>             (no description available)
ii  libnvidia-gl-410:amd64     410.73-0ubuntu0~gp amd64              NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
ii  libnvidia-gl-410:i386      410.73-0ubuntu0~gp i386               NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
un  libnvidia-ifr1             <none>             <none>             (no description available)
ii  libnvidia-ifr1-410:amd64   410.73-0ubuntu0~gp amd64              NVIDIA OpenGL-based Inband Frame Readback runtime library
ii  libnvidia-ifr1-410:i386    410.73-0ubuntu0~gp i386               NVIDIA OpenGL-based Inband Frame Readback runtime library
un  nvidia-304                 <none>             <none>             (no description available)
un  nvidia-340                 <none>             <none>             (no description available)
un  nvidia-384                 <none>             <none>             (no description available)
un  nvidia-390                 <none>             <none>             (no description available)
un  nvidia-common              <none>             <none>             (no description available)
rc  nvidia-compute-utils-390   390.87-0ubuntu1    amd64              NVIDIA compute utilities
rc  nvidia-compute-utils-396   396.54-0ubuntu0~gp amd64              NVIDIA compute utilities
ii  nvidia-compute-utils-410   410.73-0ubuntu0~gp amd64              NVIDIA compute utilities
ii  nvidia-container-runtime   2.0.0+docker18.06. amd64              NVIDIA container runtime
ii  nvidia-container-runtime-h 1.4.0-1            amd64              NVIDIA container runtime hook
ii  nvidia-cuda-dev            9.1.85-4ubuntu1    amd64              NVIDIA CUDA development files
ii  nvidia-cuda-doc            9.1.85-4ubuntu1    all                NVIDIA CUDA and OpenCL documentation
ii  nvidia-cuda-gdb            9.1.85-4ubuntu1    amd64              NVIDIA CUDA Debugger (GDB)
ii  nvidia-cuda-toolkit        9.1.85-4ubuntu1    amd64              NVIDIA CUDA development toolkit
rc  nvidia-dkms-390            390.87-0ubuntu1    amd64              NVIDIA DKMS package
rc  nvidia-dkms-396            396.54-0ubuntu0~gp amd64              NVIDIA DKMS package
ii  nvidia-dkms-410            410.73-0ubuntu0~gp amd64              NVIDIA DKMS package
un  nvidia-dkms-kernel         <none>             <none>             (no description available)
un  nvidia-docker              <none>             <none>             (no description available)
ii  nvidia-docker2             2.0.3+docker18.06. all                nvidia-docker CLI wrapper
un  nvidia-driver              <none>             <none>             (no description available)
ii  nvidia-driver-410          410.73-0ubuntu0~gp amd64              NVIDIA driver metapackage
un  nvidia-driver-binary       <none>             <none>             (no description available)
un  nvidia-kernel-common       <none>             <none>             (no description available)
rc  nvidia-kernel-common-390   390.87-0ubuntu1    amd64              Shared files used with the kernel module
rc  nvidia-kernel-common-396   396.54-0ubuntu0~gp amd64              Shared files used with the kernel module
ii  nvidia-kernel-common-410   410.73-0ubuntu0~gp amd64              Shared files used with the kernel module
un  nvidia-kernel-source       <none>             <none>             (no description available)
un  nvidia-kernel-source-390   <none>             <none>             (no description available)
un  nvidia-kernel-source-396   <none>             <none>             (no description available)
ii  nvidia-kernel-source-410   410.73-0ubuntu0~gp amd64              NVIDIA kernel source package
un  nvidia-legacy-304xx-vdpau- <none>             <none>             (no description available)
un  nvidia-legacy-340xx-vdpau- <none>             <none>             (no description available)
un  nvidia-libopencl1          <none>             <none>             (no description available)
un  nvidia-libopencl1-dev      <none>             <none>             (no description available)
ii  nvidia-opencl-dev:amd64    9.1.85-4ubuntu1    amd64              NVIDIA OpenCL development files
un  nvidia-opencl-icd          <none>             <none>             (no description available)
ii  nvidia-openjdk-8-jre       9.1.85-4ubuntu1    amd64              NVIDIA provided OpenJDK Java runtime, using Hotspot JIT
un  nvidia-persistenced        <none>             <none>             (no description available)
ii  nvidia-prime               0.8.10             all                Tools to enable NVIDIA's Prime
ii  nvidia-profiler            9.1.85-4ubuntu1    amd64              NVIDIA Profiler for CUDA and OpenCL
ii  nvidia-settings            410.73-0ubuntu0~gp amd64              Tool for configuring the NVIDIA graphics driver
un  nvidia-settings-binary     <none>             <none>             (no description available)
un  nvidia-smi                 <none>             <none>             (no description available)
un  nvidia-utils               <none>             <none>             (no description available)
ii  nvidia-utils-410           410.73-0ubuntu0~gp amd64              NVIDIA driver support binaries
un  nvidia-vdpau-driver        <none>             <none>             (no description available)
ii  nvidia-visual-profiler     9.1.85-4ubuntu1    amd64              NVIDIA Visual Profiler for CUDA and OpenCL
ii  xserver-xorg-video-nvidia- 410.73-0ubuntu0~gp amd64              NVIDIA binary Xorg driver
dpkg-query: no packages found matching *nvidia*rpm
dpkg-query: no packages found matching -qa
  • [ ] NVIDIA container library version from nvidia-container-cli -V
version: 1.0.0
build date: 2018-09-20T20:19+00:00
build revision: 881c88e2e5bb682c9bb14e68bd165cfb64563bb1
build compiler: x86_64-linux-gnu-gcc-7 7.3.0
build platform: x86_64
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections
  • [ ] NVIDIA container library logs (see troubleshooting)
  • [ ] Docker command, image and tag used
docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi

Most helpful comment

I have similar symptoms but I can run nvidia-smi after executing ldconfig inside the container. I'm using driver version 410.73.

>docker run --runtime=nvidia --rm -it nvidia/cuda:9.0-base bash
root@9b2ab11c3ff9:/# nvidia-smi
NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system.
Please also try adding directory that contains libnvidia-ml.so to your system PATH.
root@9b2ab11c3ff9:/# ldconfig
root@9b2ab11c3ff9:/# nvidia-smi
Fri Nov  2 15:35:25 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.73       Driver Version: 410.73       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 105...  Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   49C    P8    N/A /  N/A |    289MiB /  4040MiB |     14%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

All 37 comments

Do I have to install any other libraries apart from the nvidia drivers on the host machine ?

Bumping up. I am having exactly the same problem, also on Ubuntu 18.10 and driver version 390.87.

I have similar symptoms but I can run nvidia-smi after executing ldconfig inside the container. I'm using driver version 410.73.

>docker run --runtime=nvidia --rm -it nvidia/cuda:9.0-base bash
root@9b2ab11c3ff9:/# nvidia-smi
NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system.
Please also try adding directory that contains libnvidia-ml.so to your system PATH.
root@9b2ab11c3ff9:/# ldconfig
root@9b2ab11c3ff9:/# nvidia-smi
Fri Nov  2 15:35:25 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.73       Driver Version: 410.73       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 105...  Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   49C    P8    N/A /  N/A |    289MiB /  4040MiB |     14%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

@symmsaur Can you import TensorFlow in the docker after ldconfig ?

Mmm, given the symptoms, you are probably stumbling in the issue that was fixed by this commit:
https://github.com/NVIDIA/libnvidia-container/commit/deccb2801502675bd283c6936861814dbca99ecd
You would need to wait for the next release of the library.

Mmm, given the symptoms, you are probably stumbling in the issue that was fixed by this commit:
NVIDIA/libnvidia-container@deccb28
You would need to wait for the next release of the library.

Thanks for the information. I will wait for the next release or compile libnvidia-container myself if I can't resolve it until then. Running ldconfig manually helped though !

Many thanks to @symmsaur...

Best

The same for me.

docker run --runtime=nvidia --rm nvidia/cuda:9.0-base ldconfig && nvidia-smi

works, without ldconfig fails with the same error.

Running ldconfig inside the container fixes any issue with failing to resolve .so libraries (actually resolving on the host system): so tensorflow(image nvcr.io/nvidia/tensorflow:18.09-py3) imports and runs fine after that.

Confirm that the above mentioned commit fixes the problem: re-compiled the latest master branch and replaced the library in my system path - now ngc tensorflow image works out of the box on ubuntu 18.10, nvidia driver 415.25

The same for me.

docker run --runtime=nvidia --rm nvidia/cuda:9.0-base ldconfig && nvidia-smi

@ddurnev you probably ran nvidia-smi on the host (compare the Processes list inside and out of the container)

@lccro Yes, you're right - this runs only the first part "ldconfig" inside the container, correct is smth like:

docker run --runtime=nvidia --rm nvidia/cuda:9.0-base /bin/bash -c "ldconfig && nvidia-smi"

Still running nvidia-smi works without ldconfig only after the patch for libnvidia-container is applied.

I have the same problem on Fedora 29, with nvidia 415 driver and nvidia-docker 2.0.3

$ sudo docker run --runtime=nvidia --rm nvidia/cuda:10.0-base nvidia-smi
Unable to find image 'nvidia/cuda:10.0-base' locally
10.0-base: Pulling from nvidia/cuda
473ede7ed136: Pull complete 
c46b5fa4d940: Pull complete 
93ae3df89c92: Pull complete 
6b1eed27cade: Pull complete 
cb5511f09cc0: Pull complete 
4173c1e5c714: Pull complete 
Digest: sha256:7ba25f8ec32821f4225a73d6cd3df5ccf70ecc9622724f64c61b123f2bde5b90
Status: Downloaded newer image for nvidia/cuda:10.0-base
NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system.
Please also try adding directory that contains libnvidia-ml.so to your system PATH.

But on host it works well:

$ nvidia-smi 
Thu Jan  3 14:11:28 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 415.25       Driver Version: 415.25       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1080    Off  | 00000000:09:00.0  On |                  N/A |
| 28%   30C    P8     8W / 180W |    341MiB /  8116MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1328      G   /usr/libexec/Xorg                             40MiB |
|    0      1510      G   /usr/bin/gnome-shell                          48MiB |
|    0      1806      G   /usr/libexec/Xorg                            126MiB |
|    0      1922      G   /usr/bin/gnome-shell                         122MiB |
+-----------------------------------------------------------------------------+

Additonal informations about nvidia card:

$ whereis nvidia-smi
nvidia-smi: /usr/bin/nvidia-smi /usr/share/man/man1/nvidia-smi.1.gz
$ nvidia-installer -v |grep version
nvidia-installer:  version 415.25
$ uname -a
Linux localhost.localdomain 4.19.13-300.fc29.x86_64 #1 SMP Sat Dec 29 22:54:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
$ lspci |grep -E "VGA|3D"
09:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1080] (rev a1)

More infoirmation about nvidia docker:

$nvidia-docker version
NVIDIA Docker: 2.0.3

I have followed this guide for nvidia driver installation process.

@botalaszlo I have the same problem on Fedora 29 after dnf update today.

Running ldconfig in the container does make it work, the so file is found in /usr/local/cuda-9.2/targets/x86_64-linux/lib/stubs/libnvidia-ml.so

Does this work for you?

docker run --runtime=nvidia --rm nvidia/cuda:10.0-base bash -c "ldconfig; nvidia-smi"

@andyneff Perfect! This works fine.

$ docker run --runtime=nvidia --rm nvidia/cuda:10.0-base bash -c "ldconfig; nvidia-smi"
Fri Jan  4 16:43:54 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 415.25       Driver Version: 415.25       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1080    Off  | 00000000:09:00.0  On |                  N/A |
| 35%   49C    P8     8W / 180W |    254MiB /  8116MiB |      1%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

Maybe the documentation should be updated with your notice :)

@botalaszlo It's not a documentation bug, you shouldn't have to run ldconfig, this is a bug, and manually running ldconfig is just a workaround of the ldconfig cache not being right in the image.

I just found out today the hard way that this bug affects more than just nvidia stuff.

docker run --runtime=nvidia --rm nvidia/cuda:10.0-base ldconfig -p
0 libs found in cache `/etc/ld.so.cache'

This breaks anything in python that uses find_library (a lot), if not everything ld cache related.

@flx42 Any idea when the next release will be?

This should be fixed with the latest version of the libnvidia-container packages.
Closing, feel free to reopen if the bug persists.

Tested on Fedora 29, updated

  • docker 18-09.1.ce
  • nvidia-docker 2.0.3-1.docker18.09.0 -> 2.0.3-1.docker18.09.0
  • nvidia-container-runtime 2.0.0-1.docker18.09.0 -> 2.0.0-1.docker18.09.1
  • libnvidia-container1 1.0.0-1 -> 1.0.1-1
  • kernel 4.19.13 -> 4.19.15

After the update, confirmed fixed! Thanks @RenaudWasTaken

I'm still experiencing this bug, running ldconfig makes nvidia-smi work.

  • OS: Ubuntu 18.4 LTS
  • Docker version 18.09.3
  • nvidia docker 2.0.3+docker18.09.3-1
  • nvidia-container-runtime 2.0.0+docker18.09.3-1
  • libnvidia-container 1.0.1-1
  • kernel 4.18.0-16

How can make it work without running ldconfig first?

Just supplying more (possibly useless) info. Still working on Fedora:

  • OS: Fedora 29
  • Docker version 18.09.3
  • nvidia-docker 2.0.3-1.docker18.09.3.ce
  • nvidia-container-runtime 2.0.0-1.docker18.09.3
  • libnvidia-container 1.0.1-1
  • kernel 4.20.14-200

Test

docker run --runtime=nvidia --rm nvidia/cuda@sha256:3cba5c5a8f37ba05b2710071907bd8da22ad1dc828025687b2435b1308a138ff nvidia-smi #that's today's digest id for tag 10.0-base

@edoardogiacomello What is your current version of ld ?

@edoardogiacomello What is your current version of ld ?

on the host I got: GNU ld (GNU Binutils for Ubuntu) 2.30
inside the docker container: GNU ld (GNU Binutils for Ubuntu) 2.26.1

Yep, same issue here with the latest version:

  1. Without ldconfig:
docker run --gpus all nvidia/cuda:10.1-base nvidia-smi
NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system.
Please also try adding directory that contains libnvidia-ml.so to your system PATH.

  1. And with ldconfig:
docker run --gpus all nvidia/cuda:10.1-base ldconfig && nvidia-smi
Fri Aug 30 14:11:27 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 435.21       Driver Version: 435.21       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 2080    Off  | 00000000:01:00.0  On |                  N/A |
| N/A   48C    P8     5W /  N/A |    223MiB /  7973MiB |      2%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1942      G   /usr/lib/xorg/Xorg                            18MiB |
|    0      2057      G   /usr/bin/gnome-shell                          57MiB |
|    0      2936      G   /usr/lib/xorg/Xorg                            69MiB |
|    0      3073      G   /usr/bin/gnome-shell                          76MiB |
+-----------------------------------------------------------------------------+

So yeah, still broken:

docker --version
Docker version 19.03.1, build 74b1e89

So yeah, still broken:

yes -- i'm also stumbling over this bug on debian testing:

~$ docker run --gpus all nvidia/cuda nvidia-smi
NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system.
Please also try adding directory that contains libnvidia-ml.so to your system PATH.

vs.

~$ docker run --gpus all nvidia/cuda ldconfig && nvidia-smi
Wed Sep 18 17:13:19 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.40       Driver Version: 430.40       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 107...  On   | 00000000:01:00.0 Off |                  N/A |
| 33%   29C    P8     6W / 180W |      1MiB /  8119MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
~$ docker version
Client: Docker Engine - Community
 Version:           19.03.2
 API version:       1.40
 Go version:        go1.12.8
 Git commit:        6a30dfc
 Built:             Thu Aug 29 05:29:29 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.2
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.8
  Git commit:       6a30dfc
  Built:            Thu Aug 29 05:28:05 2019
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.6
  GitCommit:        894b81a4b802e4eb2a91d1ce216b8817763c29fb
 runc:
  Version:          1.0.0-rc8
  GitCommit:        425e105d5a03fabd737a126ad93d62a9eeede87f
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

libnvidia-container1:amd64/buster 1.0.5-1 uptodate

the ldconfig workaround doesn't look acceptable...
how can we finally fix this long term issue?

Hi "nvidia",
Can you provide an eta on this?
It is really painful to run ldconfig on each command (and enable root access/sudo without password in the container).
Many thanks and kind regards.

Works for me on Ubuntu 18.04 and Debian 10.

Here's a run from scratch on Debian 10 (looks the same for me on Ubuntu as well), without ever manually doing ldconfig. I'm using nvidia-container-toolkit and have removed the old nvidia-docker2:

~/ docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi
Unable to find image 'nvidia/cuda:9.0-base' locally
9.0-base: Pulling from nvidia/cuda
f7277927d38a: Pull complete
8d3eac894db4: Pull complete
edf72af6d627: Pull complete
3e4f86211d23: Pull complete
d6e9603ff777: Pull complete
9454aa7cddfc: Pull complete
a296dc1cdef1: Pull complete
Digest: sha256:1883759ad42016faba1e063d6d86d5875cecf21c420a5c1c20c27c41e46dae44
Status: Downloaded newer image for nvidia/cuda:9.0-base
Thu Oct  3 17:44:45 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00    Driver Version: 418.87.00    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GT 710      Off  | 00000000:65:00.0 N/A |                  N/A |
| 50%   39C    P0    N/A /  N/A |      0MiB /  2001MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0                    Not Supported                                       |
+-----------------------------------------------------------------------------+

I recommend uninstalling and re-installing the driver and packages. it's possible your host system is in a strange state and it's impacting something in your setup.

@glennie
Sorry, we don't see this issue on our end. Without a better understanding of what problem you're specifically facing, we can't offer an ETA on a fix.

@glennie @mash-graz @Brainiarc7 do any of you get same result if you use nvjmayo's exact same sha?

docker run --runtime=nvidia --rm nvidia/cuda@sha256:1883759ad42016faba1e063d6d86d5875cecf21c420a5c1c20c27c41e46dae44 nvidia-smi

@andyneff

your cmd line produces this error message on my machine:

local@bonsai:~$  docker run --runtime=nvidia --rm nvidia/cuda@sha256:1883759ad42016faba1e063d6d86d5875cecf21c420a5c1c20c27c41e46dae44 nvidia-smi
docker: Error response from daemon: Unknown runtime specified nvidia.
See 'docker run --help'.

using the --gpu option instead produces:

local@bonsai:~$ docker run --gpus all --rm nvidia/cuda@sha256:1883759ad42016faba1e063d6d86d5875cecf21c420a5c1c20c27c41e46dae44 nvidia-smi
NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system.
Please also try adding directory that contains libnvidia-ml.so to your system PATH.

and manually adding ldconfig finally works again:

local@bonsai:~$ docker run --gpus all --rm nvidia/cuda@sha256:1883759ad42016faba1e063d6d86d5875cecf21c420a5c1c20c27c41e46dae44 ldconfig && nvidia-smi
Mon Oct  7 19:10:11 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.50       Driver Version: 430.50       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 107...  On   | 00000000:01:00.0 Off |                  N/A |
| 33%   26C    P8     6W / 180W |      1MiB /  8119MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

but i should perhaps mention, that i do not use the then nvida-drivers on this machine for the actual video output. i prefer the utilize the onboard intel chip for this purpose, because i'm otherwise not able to share the graphic card by PCIe--passthrough by qemu-kvm instances and mostly need the the nvidia card only for CUDA based GPGPU stuff. therefore the setup could slightly differ from other installations.

Hello,

Hello,

~/ docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi

May be I'm missing something here... Why are you (@nvjmayo) using --runtime option?

I used --gpus all (as I've got docker 19.03.2).

Using the sha256 specified by @andyneff with --gpu all I still have the same issue:

[glennie@hestia` ~]$ docker run --gpus all --rm nvidia/cuda@sha256:1883759ad42016faba1e063d6d86d5875cecf21c420a5c1c20c27c41e46dae44 nvidia-smi NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system. Please also try adding directory that contains libnvidia-ml.so to your system PATH.

But, it works when I use ldconfig before:

```[glennie@hestia ~]$ docker run --gpus all --rm nvidia/cuda@sha256:1883759ad42016faba1e063d6d86d5875cecf21c420a5c1c20c27c41e46dae44 bash -c 'ldconfig && nvidia-smi'
Mon Oct 7 19:12:49 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.50 Driver Version: 430.50 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce MX130 Off | 00000000:01:00.0 Off | N/A |
| N/A 53C P0 N/A / N/A | 0MiB / 2004MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+```

Kind regards,

May be I'm missing something here... Why are you (@nvjmayo) using --runtime option?

My mistake, I have multiple runtimes installed for a bunch of different environments (both for docker and podman). I should have pasted the canonical form. Sorry for the confusion.

But, it works when I use ldconfig before:

I'll ask the team to bump up the priority on fixing this. It's an issue of at what stage to run the container hooks. Automatically running ldconfig when needed is something we're looking into. When to do it, what mechanism to use to do it, and if we should stop a running container are all open questions for an implementation.

The best way to work around the issue right now is to run ldconfig on the container whenever you upgrade your host driver. Admittedly inconvenient.

Hello!

Can you give us a few more information?

  • uname -a
  • ldconfig --version

Thanks!

Can you give us a few more information?

uname -a

~$ uname -a
Linux bonsai 5.2.0-3-amd64 #1 SMP Debian 5.2.17-1 (2019-09-26) x86_64 GNU/Linux

ldconfig --version

~$ sudo ldconfig --version
ldconfig (Debian GLIBC 2.29-2) 2.29

i hope, that helps!

btw.: i'm using debian testing as a rolling release solution, which isn't uncommon in case of GPGPU utilization for ML tasks, because the software in the stable debian branch is usually to much outdated for the requirements resp. fast progress in this field.

Can you try replacing "@/sbin/ldconfig" with "/sbin/ldconfig" in /etc/nvidia-container-runtime/config.toml? Not sure why but this helped in my case.

is the reason resp. actual meaning of this "@"-syntax used in https://gitlab.com/nvidia/container-toolkit/toolkit/blob/master/config/config.toml.debian somewhere documented or explained?

Can you try replacing "@/sbin/ldconfig" with "/sbin/ldconfig" in /etc/nvidia-container-runtime/config.toml? Not sure why but this helped in my case.

Thank you, works for me. I am using Debian Testing (same as @mash-graz).
Is there gonna be an official update with a fix from Nvidia?

Thank you @lyon667. It worked for me as well after wasting many hours of my time. Why does this work @RenaudWasTaken?

is the reason resp. actual meaning of this "@"-syntax used in https://gitlab.com/nvidia/container-toolkit/toolkit/blob/master/config/config.toml.debian somewhere documented or explained?

I did not find any documentation but it seems to be processed here in nvc_ldcache_update in libnvidia-container.

Can you try replacing "@/sbin/ldconfig" with "/sbin/ldconfig" in /etc/nvidia-container-runtime/config.toml? Not sure why but this helped in my case.

this solved it for me.

this solved it for me.

yes! -- this manual removal of the @-sign works for me as well.

i also don't understand, why this particular issue still isn't fixed in the released nvida-docker packages and still affects debian installations?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

SpotCrowdTech picture SpotCrowdTech  路  3Comments

mmitterma picture mmitterma  路  4Comments

terrybroad picture terrybroad  路  4Comments

jonghwanhyeon picture jonghwanhyeon  路  4Comments

o1lo01ol1o picture o1lo01ol1o  路  4Comments