Nvidia-docker: Other Distributions

Created on 1 Dec 2017  路  34Comments  路  Source: NVIDIA/nvidia-docker

The Readme states:

Other distributions and architectures

Look at the Installation section of the wiki.

but there are no instructions for other distributions on the Wiki. Package manager methods are nice and all, but why don't we have a clean

$ ./configure
$ make
$ sudo make install

method to open this software up to the wider Linux community?

enhancement repository

Most helpful comment

@flx42 Thanks! It seems to work on Fedora 27:

curl -s -L https://nvidia.github.io/nvidia-docker/centos7/nvidia-docker.repo | \
  sudo tee /etc/yum.repos.d/nvidia-docker.repo
sudo dnf install nvidia-docker2
sudo pkill -SIGHUP dockerd
docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi

Shows:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.48                 Driver Version: 390.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:02:00.0  On |                  N/A |
| 23%   29C    P8    12W / 250W |    695MiB / 11171MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

All 34 comments

but there are no instructions for other distributions on the Wiki.

Nice catch, will fix it.
It's this page:
https://nvidia.github.io/nvidia-docker/

We can't do the usual ./configure && make && sudo make install here, we have multiple repositories: libnvidia-container, nvidia-container-runtime and nvidia-docker. libnvidia-container has a tarball release, but nvidia-container-runtime doesn't, yet.

By the way, you can make && sudo make install for libnvidia-container.
For nvidia-container-runtime, you can go get github.com/NVIDIA/nvidia-container-runtime/nvidia-container-runtime-hook, but it won't give you the full runtime.

Any chance of this working with the Amazon Linux AMIs? I use ElasticBeanStalk Multi-Container environments. I tried to install nvidia-docker2 using the CentOS/RHEL instructions it didn't work.

I got complaints about the dependency on docker-ce. I got around that using rpm --nodeps and not surprising it didn't work.

If I could compile that would be great.

How is docker installed on Amazon Linux then?

Amazon AMI's have docker installed using their own repository. The package name is docker and the versions offered are all the official releases.

If it's just a dependency issue, I can fix that for the next package release. Can you get the list of all the available versions? e.g.:

yum search --showduplicates docker

Here are the dependencies

[ec2-user]# yum search --showduplicates docker
============================= N/S matched: docker ==============================
docker-storage-setup-0.5-1.7.gite193b3b.amzn1.noarch : A simple service to setup
                                                     : docker storage devices
docker-storage-setup-0.6.0-1.18.giteb688d4.amzn1.noarch : A simple service to
     ...: setup docker storage devices
docker-storage-setup-0.6.0-1.18.giteb688d4.amzn1.noarch : A simple service to
     ...: setup docker storage devices
docker-17.03.2ce-1.59.amzn1.x86_64 : Automates deployment of containerized
                                   : applications
docker-17.06.2ce-1.93.amzn1.x86_64 : Automates deployment of containerized
                                   : applications
docker-17.06.2ce-1.93.amzn1.x86_64 : Automates deployment of containerized
                                   : applications

If I installed it by disabling the dependencies I get an error when trying to run the docker commands

 /usr/lib64/libnvidia-container.so.1: undefined symbol: cap_get_bound

Amazon Linux does have libcap installed.

This is why I was hoping for a compilation solution. Maybe linking on the Amazon Linux OS would resolve this issue.

This is an issue for https://github.com/NVIDIA/libnvidia-container, which can be easily compiled from sources.

TL;DR: That repository has an issue that seems to say that this is simply not supported and will not be fixed.

I have the same issue here using the latest Amazon ECS AMI. I attempted to fix by downloading the source and using sed to alter the dependency version (the Amazon Linux version appends -1.111.amzn1 to it) but arrived at this same issue. I checked the libnvidia-container repo as suggested here and found https://github.com/NVIDIA/libnvidia-container/issues/12 closed with "configuration is not officially supported".

Since ECS is the primary Amazon offering for Docker orchestration I can't imagine I'm alone. I have a custom AMI based on Ubuntu currently but would really like to be able to pick up some of the ECS optimized features via the AMIs.

Hi, there is no instruction on how to run nvidia-docker2 on opensuse leap 42.3 https://nvidia.github.io/nvidia-docker/ or is it just me not understanding the linux naming distro things?
thx

The same problem here with Fedora 27 as in the comment above from @PeterSchichtel. How do I get nvidia-docker installed? The installation instruction https://nvidia.github.io/nvidia-docker/ is not saying a word about other distros.

Those other distros are not supported, for now. You might be able to try using the CentOS 7 packages instead. With a few quirks for now:
https://github.com/NVIDIA/nvidia-docker/issues/657 for OpenSUSE
https://github.com/NVIDIA/nvidia-docker/issues/660 for distros with the latest glibc (including fedora 27, I guess)

Wow, that was quick 馃憤 I'll give it a try. So, what is the value of CentOS 7 for the distribution variable in https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo ?

centos7

@flx42 Thanks! It seems to work on Fedora 27:

curl -s -L https://nvidia.github.io/nvidia-docker/centos7/nvidia-docker.repo | \
  sudo tee /etc/yum.repos.d/nvidia-docker.repo
sudo dnf install nvidia-docker2
sudo pkill -SIGHUP dockerd
docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi

Shows:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.48                 Driver Version: 390.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:02:00.0  On |                  N/A |
| 23%   29C    P8    12W / 250W |    695MiB / 11171MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

hi @OleRoel, did you edit the nvidia-docker.repo file?
I received this error in Fedora 27:

Failed to synchronize cache for repo 'libnvidia-container', disabling.
Failed to synchronize cache for repo 'nvidia-container-runtime', disabling.
Failed to synchronize cache for repo 'nvidia-docker', disabling.
Last metadata expiration check: 0:43:59 ago on Wed 18 Apr 2018 03:35:30 AM +03.
No match for argument: nvidia-docker2

Cant seem to get a good Fedora 27 build. anyone get it please post
I got this as well
sudo dnf install nvidia-docker2
Failed to synchronize cache for repo 'libnvidia-container', disabling.
Failed to synchronize cache for repo 'nvidia-container-runtime', disabling.
Failed to synchronize cache for repo 'nvidia-docker', disabling.
Last metadata expiration check: 0:41:43 ago on Wed 25 Apr 2018 09:29:59 AM PDT.

Take a look at this issue

@escorciav No, all I did is described above. Sorry for the late reply, but I just received an e-mail notification about your question.

Interesting, that didn't work for @RCFilm and me. Hopefully, I found a workaround.
Cheers

@escorciav

That sounds to stupid to be true. I have been using

sudo dnf install nvidia-docker2

while you did (according to your pastbin here)

dnf install nvidia-docker

It is not the missing 2 at the end, is it?

I was able to install on Fedora 28 using the same procedure as @OleRoel. I did have to edit some configurations to get it to work though, as it didn't like having runtimes added both in the command line args (from the systemd unit file) as well as daemons.json.

With procedures described by @OleRoel I could also install it on Fedora 28.
But I'm running Notebook with an Optimus Graphics Card (MX150) and have bumblebee installed.
When I run this command:

optirun docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi

I get this error:

docker: Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:402: container init caused \"process_linux.go:385: running prestart hook 1 caused \\\"error running hook: exit status 1, stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods configure --ldconfig=@/sbin/ldconfig --device=all --compute --utility --require=cuda>=9.0 --pid=13948 /var/lib/docker/overlay2/aeb6fedd42f4df22777402dd30466bcb1e10a536666afdbeadbe9693647f1a72/merged]\\\\nnvidia-container-cli: initialization error: driver error: failed to process request\\\\n\\\"\"": unknown.

@jamie-arcc
Did you edit some configurations which could correlate with my error?

optirun docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi

I am on 4.14.74-1-MANJARO and I get the same problem after installing bumblebee and cuda/cudnn.

~]$ optirun docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi

docker: Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:402: container init caused \"process_linux.go:385: running prestart hook 1 caused \\\"error running hook: exit status 1, stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods configure --device=all --compute --utility --require=cuda>=10.0 brand=tesla,driver>=384,driver<385 --pid=6124 /var/lib/docker/overlay2/1fc494383f262237d92b816a0c1b8b9a19ecb9a0c67087e83eaf2a06bdc42736/merged]\\\\nnvidia-container-cli: requirement error: invalid expression\\\\n\\\"\"": unknown.

@usmcamp0811 you are facing a different problem. See https://github.com/NVIDIA/nvidia-docker/issues/835

Hi @flx42 is a package envisaged for arch linux ?
@usmcamp0811 did you install from yaourt ?

Cheers

@flx42 I am interested to install nvidia-runtime (even to be able to manually configure docker ) on Gentoo. Do I understand it correctly that it is impossible to do from tarball as nvidia-container-runtime doesn't have one? And https://github.com/NVIDIA/libnvidia-container is not FULL runtime?

Is there any other way how to get this working and at least documented way (wiki page, etc.)? I cannot find anything helping people building on other than deb/rpm based distros...

Hi @vinayan3 I am trying to use nvidia-docker with Aws Elastic Beanstalk.
did you manage to make it work? If yes, can you give me some advice?

@fabio-C Yes I have been able to. I use Packer to create an AMI which is based off of the ElasticBeanStalk ECS AMI. I install the Nvidia Drivers, Cuda, and Nvidia Docker.

There is one quirk is that when the AMI starts running you need to run the following EB Extension.

reload_nvidia_modules.cfg

files:
    /home/ec2-user/restart_nvidia_modules.sh:
      mode: "000755"
      owner: root
      group: root
      content: |
        #!/usr/bin/env bash

        set +e

        curl "http://169.254.169.254/latest/meta-data/instance-type/" 2>/dev/null | grep g2 > /dev/null

        if [ $? -eq 0 ]; then
                rmmod nvidia_uvm
                rmmod nvidia_drm
                rmmod nvidia_modeset
                rmmod nvidia
                modprobe nvidia
                modprobe nvidia_modeset
                modprobe nvidia_drm
                modprobe nvidia_uvm
                echo "Finished restarting Nvidia modules"
        else
                echo "Nothing to do"
        fi
commands:
  01-do-on-boot:
    command: sudo /home/ec2-user/restart_nvidia_modules.sh
    test: test ! -f /home/ec2-user/nvidia_restart_semaphore
  02-startup-complete:
    command: touch /home/ec2-user/nvidia_restart_semaphore

Note: If you are using something different than a G2 you can edit the grep command above to include those instance types.

Was hoping there'd be a clear way of doing this for slackware :/

Suddenly there appeared package for v2 on Gentoo system: http://gpo.zugaina.org/sci-libs/nvidia-docker-bin

Hello!

I'll try to track other distributions in individual issues from now on. If you think a specific distro should be supported feel free to open an issue :)
However note that our general guidelines (though they can change) are to stick to the same platform as the CUDA support matrix.

I was able to install on Fedora 30 using the same procedure as @OleRoel.

But had to 1st uninstall the fedora distribution of docker and install the community edition (docker-ce) as per https://docs.docker.com/engine/install/fedora/

Was this page helpful?
0 / 5 - 0 ratings