The Readme states:
Other distributions and architectures
Look at the Installation section of the wiki.
but there are no instructions for other distributions on the Wiki. Package manager methods are nice and all, but why don't we have a clean
$ ./configure
$ make
$ sudo make install
method to open this software up to the wider Linux community?
but there are no instructions for other distributions on the Wiki.
Nice catch, will fix it.
It's this page:
https://nvidia.github.io/nvidia-docker/
We can't do the usual ./configure && make && sudo make install here, we have multiple repositories: libnvidia-container, nvidia-container-runtime and nvidia-docker. libnvidia-container has a tarball release, but nvidia-container-runtime doesn't, yet.
By the way, you can make && sudo make install for libnvidia-container.
For nvidia-container-runtime, you can go get github.com/NVIDIA/nvidia-container-runtime/nvidia-container-runtime-hook, but it won't give you the full runtime.
Any chance of this working with the Amazon Linux AMIs? I use ElasticBeanStalk Multi-Container environments. I tried to install nvidia-docker2 using the CentOS/RHEL instructions it didn't work.
I got complaints about the dependency on docker-ce. I got around that using rpm --nodeps and not surprising it didn't work.
If I could compile that would be great.
How is docker installed on Amazon Linux then?
Amazon AMI's have docker installed using their own repository. The package name is docker and the versions offered are all the official releases.
If it's just a dependency issue, I can fix that for the next package release. Can you get the list of all the available versions? e.g.:
yum search --showduplicates docker
Here are the dependencies
[ec2-user]# yum search --showduplicates docker
============================= N/S matched: docker ==============================
docker-storage-setup-0.5-1.7.gite193b3b.amzn1.noarch : A simple service to setup
: docker storage devices
docker-storage-setup-0.6.0-1.18.giteb688d4.amzn1.noarch : A simple service to
...: setup docker storage devices
docker-storage-setup-0.6.0-1.18.giteb688d4.amzn1.noarch : A simple service to
...: setup docker storage devices
docker-17.03.2ce-1.59.amzn1.x86_64 : Automates deployment of containerized
: applications
docker-17.06.2ce-1.93.amzn1.x86_64 : Automates deployment of containerized
: applications
docker-17.06.2ce-1.93.amzn1.x86_64 : Automates deployment of containerized
: applications
If I installed it by disabling the dependencies I get an error when trying to run the docker commands
/usr/lib64/libnvidia-container.so.1: undefined symbol: cap_get_bound
Amazon Linux does have libcap installed.
This is why I was hoping for a compilation solution. Maybe linking on the Amazon Linux OS would resolve this issue.
This is an issue for https://github.com/NVIDIA/libnvidia-container, which can be easily compiled from sources.
TL;DR: That repository has an issue that seems to say that this is simply not supported and will not be fixed.
I have the same issue here using the latest Amazon ECS AMI. I attempted to fix by downloading the source and using sed to alter the dependency version (the Amazon Linux version appends -1.111.amzn1 to it) but arrived at this same issue. I checked the libnvidia-container repo as suggested here and found https://github.com/NVIDIA/libnvidia-container/issues/12 closed with "configuration is not officially supported".
Since ECS is the primary Amazon offering for Docker orchestration I can't imagine I'm alone. I have a custom AMI based on Ubuntu currently but would really like to be able to pick up some of the ECS optimized features via the AMIs.
Hi, there is no instruction on how to run nvidia-docker2 on opensuse leap 42.3 https://nvidia.github.io/nvidia-docker/ or is it just me not understanding the linux naming distro things?
thx
The same problem here with Fedora 27 as in the comment above from @PeterSchichtel. How do I get nvidia-docker installed? The installation instruction https://nvidia.github.io/nvidia-docker/ is not saying a word about other distros.
Those other distros are not supported, for now. You might be able to try using the CentOS 7 packages instead. With a few quirks for now:
https://github.com/NVIDIA/nvidia-docker/issues/657 for OpenSUSE
https://github.com/NVIDIA/nvidia-docker/issues/660 for distros with the latest glibc (including fedora 27, I guess)
Wow, that was quick 馃憤 I'll give it a try. So, what is the value of CentOS 7 for the distribution variable in https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo ?
centos7
@flx42 Thanks! It seems to work on Fedora 27:
curl -s -L https://nvidia.github.io/nvidia-docker/centos7/nvidia-docker.repo | \
sudo tee /etc/yum.repos.d/nvidia-docker.repo
sudo dnf install nvidia-docker2
sudo pkill -SIGHUP dockerd
docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi
Shows:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.48 Driver Version: 390.48 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:02:00.0 On | N/A |
| 23% 29C P8 12W / 250W | 695MiB / 11171MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
hi @OleRoel, did you edit the nvidia-docker.repo file?
I received this error in Fedora 27:
Failed to synchronize cache for repo 'libnvidia-container', disabling.
Failed to synchronize cache for repo 'nvidia-container-runtime', disabling.
Failed to synchronize cache for repo 'nvidia-docker', disabling.
Last metadata expiration check: 0:43:59 ago on Wed 18 Apr 2018 03:35:30 AM +03.
No match for argument: nvidia-docker2
Cant seem to get a good Fedora 27 build. anyone get it please post
I got this as well
sudo dnf install nvidia-docker2
Failed to synchronize cache for repo 'libnvidia-container', disabling.
Failed to synchronize cache for repo 'nvidia-container-runtime', disabling.
Failed to synchronize cache for repo 'nvidia-docker', disabling.
Last metadata expiration check: 0:41:43 ago on Wed 25 Apr 2018 09:29:59 AM PDT.
Take a look at this issue
@escorciav No, all I did is described above. Sorry for the late reply, but I just received an e-mail notification about your question.
Interesting, that didn't work for @RCFilm and me. Hopefully, I found a workaround.
Cheers
@escorciav
That sounds to stupid to be true. I have been using
sudo dnf install nvidia-docker2
while you did (according to your pastbin here)
dnf install nvidia-docker
It is not the missing 2 at the end, is it?
I was able to install on Fedora 28 using the same procedure as @OleRoel. I did have to edit some configurations to get it to work though, as it didn't like having runtimes added both in the command line args (from the systemd unit file) as well as daemons.json.
With procedures described by @OleRoel I could also install it on Fedora 28.
But I'm running Notebook with an Optimus Graphics Card (MX150) and have bumblebee installed.
When I run this command:
optirun docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi
I get this error:
docker: Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:402: container init caused \"process_linux.go:385: running prestart hook 1 caused \\\"error running hook: exit status 1, stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods configure --ldconfig=@/sbin/ldconfig --device=all --compute --utility --require=cuda>=9.0 --pid=13948 /var/lib/docker/overlay2/aeb6fedd42f4df22777402dd30466bcb1e10a536666afdbeadbe9693647f1a72/merged]\\\\nnvidia-container-cli: initialization error: driver error: failed to process request\\\\n\\\"\"": unknown.
@jamie-arcc
Did you edit some configurations which could correlate with my error?
optirun docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi
I am on 4.14.74-1-MANJARO and I get the same problem after installing bumblebee and cuda/cudnn.
~]$ optirun docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi
docker: Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:402: container init caused \"process_linux.go:385: running prestart hook 1 caused \\\"error running hook: exit status 1, stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods configure --device=all --compute --utility --require=cuda>=10.0 brand=tesla,driver>=384,driver<385 --pid=6124 /var/lib/docker/overlay2/1fc494383f262237d92b816a0c1b8b9a19ecb9a0c67087e83eaf2a06bdc42736/merged]\\\\nnvidia-container-cli: requirement error: invalid expression\\\\n\\\"\"": unknown.
@usmcamp0811 you are facing a different problem. See https://github.com/NVIDIA/nvidia-docker/issues/835
Hi @flx42 is a package envisaged for arch linux ?
@usmcamp0811 did you install from yaourt ?
Cheers
@flx42 I am interested to install nvidia-runtime (even to be able to manually configure docker ) on Gentoo. Do I understand it correctly that it is impossible to do from tarball as nvidia-container-runtime doesn't have one? And https://github.com/NVIDIA/libnvidia-container is not FULL runtime?
Is there any other way how to get this working and at least documented way (wiki page, etc.)? I cannot find anything helping people building on other than deb/rpm based distros...
Hi @vinayan3 I am trying to use nvidia-docker with Aws Elastic Beanstalk.
did you manage to make it work? If yes, can you give me some advice?
@fabio-C Yes I have been able to. I use Packer to create an AMI which is based off of the ElasticBeanStalk ECS AMI. I install the Nvidia Drivers, Cuda, and Nvidia Docker.
There is one quirk is that when the AMI starts running you need to run the following EB Extension.
reload_nvidia_modules.cfg
files:
/home/ec2-user/restart_nvidia_modules.sh:
mode: "000755"
owner: root
group: root
content: |
#!/usr/bin/env bash
set +e
curl "http://169.254.169.254/latest/meta-data/instance-type/" 2>/dev/null | grep g2 > /dev/null
if [ $? -eq 0 ]; then
rmmod nvidia_uvm
rmmod nvidia_drm
rmmod nvidia_modeset
rmmod nvidia
modprobe nvidia
modprobe nvidia_modeset
modprobe nvidia_drm
modprobe nvidia_uvm
echo "Finished restarting Nvidia modules"
else
echo "Nothing to do"
fi
commands:
01-do-on-boot:
command: sudo /home/ec2-user/restart_nvidia_modules.sh
test: test ! -f /home/ec2-user/nvidia_restart_semaphore
02-startup-complete:
command: touch /home/ec2-user/nvidia_restart_semaphore
Note: If you are using something different than a G2 you can edit the grep command above to include those instance types.
Was hoping there'd be a clear way of doing this for slackware :/
Suddenly there appeared package for v2 on Gentoo system: http://gpo.zugaina.org/sci-libs/nvidia-docker-bin
Hello!
I'll try to track other distributions in individual issues from now on. If you think a specific distro should be supported feel free to open an issue :)
However note that our general guidelines (though they can change) are to stick to the same platform as the CUDA support matrix.
I was able to install on Fedora 30 using the same procedure as @OleRoel.
But had to 1st uninstall the fedora distribution of docker and install the community edition (docker-ce) as per https://docs.docker.com/engine/install/fedora/
Most helpful comment
@flx42 Thanks! It seems to work on Fedora 27:
Shows: