Nvidia-docker: nvidia-container-cli: initialization error: load library failed: libcuda.so.1

Created on 18 Nov 2017 · 14Comments · Source: NVIDIA/nvidia-docker

Hi,

I'm getting a failure on trying to load libcuda.so.1, but my understanding is that CUDA doesn't have to be installed on the host machine, right?

I'm on Fedora 26, with bumblebee installed but optirun shouldn't be needed, right?

[sztamas@nomad ~]$ cat /proc/acpi/bbswitch 
0000:01:00.0 ON

[sztamas@nomad ~]$ nvidia-smi 
Sat Nov 18 22:59:48 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.90                 Driver Version: 384.90                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1050    Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   37C    P0    N/A /  N/A |      0MiB /  4041MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+



md5-35ae7ed1800d3dd4d428f1c29b64f8dd



[sztamas@nomad ~]$ docker run --runtime=nvidia --rm nvidia/cuda find / -name nvidia-smicontainer_linux.go:265: starting container process caused "process_linux.go:368: container init caused \"process_linux.go:351: running prestart hook 1 caused \\\"error running hook: exit status 1, stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods configure --ldconfig=@/sbin/ldconfig --device=all --compute --utility --require=cuda>=9.0 --pid=19311 /var/lib/docker/overlay2/248624b66696970549d54634da9cba9a6c6041b9d5d587b2d2cfa6c698a70a7e/merged]\\\\nnvidia-container-cli: initialization error: load library failed: libcuda.so.1: cannot open shared object file: no such file or directory\\\\n\\\"\""
docker: Error response from daemon: oci runtime error: container_linux.go:265: starting container process caused "process_linux.go:368: container init caused \"process_linux.go:351: running prestart hook 1 caused \\\"error running hook: exit status 1, stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods configure --ldconfig=@/sbin/ldconfig --device=all --compute --utility --require=cuda>=9.0 --pid=19311 /var/lib/docker/overlay2/248624b66696970549d54634da9cba9a6c6041b9d5d587b2d2cfa6c698a70a7e/merged]\\\\nnvidia-container-cli: initialization error: load library failed: libcuda.so.1: cannot open shared object file: no such file or directory\\\\n\\\"\"".



md5-4c106e56a9d23b5fe4810c4a803b9bf0



[sztamas@nomad ~]$ nvidia-container-cli --debug=/dev/stdout list --compute

-- WARNING, the following logs are for debugging purposes only --

I1118 21:05:30.464748 19428 nvc.c:250] initializing library context (version=1.0.0, build=ec15c7233bd2de821ad5127cb0de6b52d9d2083c)
I1118 21:05:30.464846 19428 nvc.c:225] using ldcache /etc/ld.so.cache
I1118 21:05:30.464857 19428 nvc.c:226] using unprivileged user 1000:1000
nvidia-container-cli: initialization error: load library failed: libcuda.so.1: cannot open shared object file: no such file or directory



md5-35ae7ed1800d3dd4d428f1c29b64f8dd



[sztamas@nomad ~]$ ldconfig -p | grep cuda
    libicudata.so.57 (libc6,x86-64) => /lib64/libicudata.so.57
    libcuda.so.1 (libc6) => /lib/libcuda.so.1
    libcuda.so (libc6) => /lib/libcuda.so
[sztamas@nomad ~]$ uname -a
Linux nomad 4.13.12-200.fc26.x86_64 #1 SMP Wed Nov 8 16:47:26 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
[sztamas@nomad ~]$ cat /etc/fedora-release 
Fedora release 26 (Twenty Six)
[sztamas@nomad ~]$ docker -v 
Docker version 17.09.0-ce, build afdb6d4
[sztamas@nomad ~]$ dnf list installed | grep nvidia-docker2
nvidia-docker2.noarch                      2.0.1-1.docker17.09.0.ce    @nvidia-docker

Any ideas what could be wrong?

Many Thanks.

Source

sztamas

Most helpful comment

I had installed nvidia-415 driver.

I checked and libcuda.so.1 was missing
ldconfig -p | grep libcuda

libcudart.so.9.0 (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libcudart.so.9.0
libcudart.so (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libcudart.so

I fixed the problem by installing libcuda from apt-get
sudo apt-get install libcuda1-415

maitek on 21 Nov 2018

👍4

All 14 comments

I'm getting a failure on trying to load libcuda.so.1, but my understanding is that CUDA doesn't have to be installed on the host machine, right?

That's right, but libcuda.so.1 comes from the driver, not the CUDA toolkit. (not to be confused with libcudart).

Please share the output of:

ldconfig -p | grep cuda

flx42 on 18 Nov 2017

Thanks for the quick answer!

It was in the original post already, maybe I've included too much :)

[sztamas@nomad ~]$ ldconfig -p | grep cuda
    libicudata.so.57 (libc6,x86-64) => /lib64/libicudata.so.57
    libcuda.so.1 (libc6) => /lib/libcuda.so.1
    libcuda.so (libc6) => /lib/libcuda.so

sztamas on 18 Nov 2017

Eh, you have libcuda.so.1 but 32-bit. How did you install the NVIDIA driver?

flx42 on 18 Nov 2017

I followed the Fedora Bumblebee Wiki page https://fedoraproject.org/wiki/Bumblebee

# dnf -y --nogpgcheck install http://install.linux.ncsu.edu/pub/yum/itecs/public/bumblebee/fedora$(rpm -E %fedora)/noarch/bumblebee-release-1.2-1.noarch.rpm

Used the managed NVidia repo:

# dnf -y --nogpgcheck install http://install.linux.ncsu.edu/pub/yum/itecs/public/bumblebee-nonfree/fedora$(rpm -E %fedora)/noarch/bumblebee-nonfree-release-1.2-1.noarch.rpm

and

# dnf install bumblebee-nvidia bbswitch-dkms VirtualGL.x86_64 VirtualGL.i686 primus.x86_64 primus.i686 kernel-devel

sztamas on 18 Nov 2017

Do you have another libcuda.so.1 installed somewhere else on your system?
I would recommend using the official repository and install the cuda-drivers package:
http://developer.download.nvidia.com/compute/cuda/repos/fedora25/x86_64/
It's for Fedora 25 though, so the second best choice is to use our installer scripts:
http://www.nvidia.com/object/unix.html

Check also this (old) issue for bumblebee/bbswitch:
https://github.com/NVIDIA/nvidia-docker/issues/16

flx42 on 19 Nov 2017

For Fedora 25+, just use the negativo repository and save yourself a lot of hair tearing getting nvidia and cuda to work in Fedora 25+ on optimus systems.
https://negativo17.org/nvidia-driver-improvements-for-fedora-25/

As a bonus, it comes with a gcc compatiblilty package so you can use an older compiler to build cuda code.

mjmg on 19 Nov 2017

Not an issue with nvidia-docker.

flx42 on 21 Nov 2017

This is due to your broken nvidia driver installation. For Ubuntu 16.04 release, you can check the fix here: https://zhuanlan.zhihu.com/p/37519492

gemfield on 31 May 2018

@flx42: the current ubuntu 16.04 package nvidia-384 384.130-0ubuntu0.16.04.1 does not provide the libcuda.so.1 (https://packages.ubuntu.com/xenial-updates/amd64/nvidia-384/filelist).

It is provided by libcuda1-384 (https://packages.ubuntu.com/xenial-updates/amd64/libcuda1-384/filelist), which is only a recommended dependency of nvidia-384.
This means that when installing with apt-get install -y --no-recommends nvidia-384, libcuda.so.1 is missing, resulting in the initial error.

I'm not sure if the libcuda.so.1 is the full cuda lib or just the driver part that you talked about in your first comment on this issue. I don't know how to improve the situation: is it an ubuntu packaging bug? can nvidia-docker somehow install libcuda1-384 by dependency when in such scenario (driver installed by ubuntu package) (I'm not very optimistic about that possibility)?

In the meantime, this issue could be documented (either in the main README or in the wiki FAQ) with something like:

On ubuntu 16.04: when installing the nvidia driver with the ubuntu package nvidia-384 you need to install its recommended dependency libcuda1-384 too, otherwise you will get this error:

$ docker run -it --rm --runtime=nvidia nvidia/cuda:8.0-runtime sh
docker: Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:402: container init caused \"process_linux.go:385: running prestart hook 1 caused \\\"error running hook: exit status 1, stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods --debug=/var/log/nvidia-container-runtime-hook.log configure --ldconfig=@/sbin/ldconfig.real --device=all --compute --utility --require=cuda>=8.0 --pid=5801 /var/lib/docker/aufs/mnt/1b3b6a154308bda7863a5438b68f68d3c0827cb20f67cb2394a2563de9b791a6]\\\\nnvidia-container-cli: initialization error: driver error: failed to process request\\\\n\\\"\"": unknown.

For ubuntu 18.04 they changed the packaging and it seems to be OK:
https://packages.ubuntu.com/bionic/nvidia-driver-390 depends on https://packages.ubuntu.com/bionic/amd64/libnvidia-compute-390 which ships libcuda.so.1

thomas-riccardi on 21 Jun 2018

👍1

We assume that the CUDA driver is properly installed on the host machine before installing nvidia-docker.
You should test your system with a small CUDA sample outside of a container, if you don't have libcuda.so, it will fail. And it if fails outside of containers, it will probably fail with containers.

There are many ways to install our drivers, so we want to keep that to the official CUDA documentation.

flx42 on 21 Jun 2018

@flx42 I'm confused: the initial comment said:

my understanding is that CUDA doesn't have to be installed on the host machine, right?

.. and you confirmed on your first reply.

Now you say there is an implicit requirement that CUDA is properly installed on the host machine.

My confusion probably comes from CUDA vs CUDA toolkit, but in any case the nvidia-docker README doesn't talk about any host requirement regarding CUDA: it only requires installing the NVIDIA driver.

If NVIDIA considers the libcuda.so.1 to be part of the NVIDIA driver, then it's an ubuntu packaging error. It probably won't be fixed as they can't break such things on 16.04, and they seem to do the right thing on ubuntu 18.04; but this should probably be documented somewhere.
You suggest this should be part of the CUDA documentation, but:

AFAK it only talks about nvidia drivers provided by the cuda apt repository or by the .run script, not the ubuntu package (which is understandable)
it implies that we need a full functioning CUDA on the host to use CUDA from containers, which is not documented on nvidia-docker.

thomas-riccardi on 21 Jun 2018

👍1

Sorry, CUDA driver is properly installed on the host machine. Edited my answer

flx42 on 21 Jun 2018

But I confirm that libcuda.so.1 is part of the NVIDIA driver. And it's not a packaging error from Ubuntu, it makes sense to want to install a subset of the NVIDIA driver if you know you aren't going to use everything.

AFAK it only talks about nvidia drivers provided by the cuda apt repository or by the .run script, not the ubuntu package (which is understandable)

The ubuntu package and the CUDA repo package are very similar, only the version might be different.

flx42 on 21 Jun 2018

I had installed nvidia-415 driver.

I checked and libcuda.so.1 was missing
ldconfig -p | grep libcuda

libcudart.so.9.0 (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libcudart.so.9.0
libcudart.so (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libcudart.so

I fixed the problem by installing libcuda from apt-get
sudo apt-get install libcuda1-415

maitek on 21 Nov 2018

👍4

Was this page helpful?

0 / 5 - 0 ratings