Unable to use the nvidia-container-runtime repository. Yum is failing with the following message:
https://nvidia.github.io/libnvidia-container/centos7/x86_64/repodata/repomd.xml: [Errno -1] repomd.xml signature could not be verified for libnvidia-container
This is on a system running docker-ee 2.0:
Client: Docker Enterprise Edition (EE) 2.0
Version: 17.06.2-ee-16
API version: 1.30
Go version: go1.8.7
Git commit: 9ef4f0a
Built: Thu Jul 26 16:40:49 2018
OS/Arch: linux/amd64
Server: Docker Enterprise Edition (EE) 2.0
Engine:
Version: 17.06.2-ee-16
API version: 1.30 (minimum version 1.12)
Go version: go1.8.7
Git commit: 9ef4f0a
Built: Thu Jul 26 16:42:11 2018
OS/Arch: linux/amd64
Experimental: false
Note: Did a yum clean all before starting.
Created the repository file:
[root@titan yum.repos.d]# curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.repo | \
> tee /etc/yum.repos.d/nvidia-container-runtime.repo
Tried to install the package:
[root@titan yum.repos.d]# yum install nvidia-container-runtime-hook
Loaded plugins: fastestmirror, langpacks
Determining fastest mirrors
epel/x86_64/metalink | 13 kB 00:00:00
* base: mirror.ash.fastserv.com
* epel: mirror.umd.edu
* extras: mirror.ash.fastserv.com
* updates: mirror.ash.fastserv.com
base | 3.6 kB 00:00:00
bintray--sbt-rpm | 1.3 kB 00:00:00
docker-ee-stable-17.06 | 2.9 kB 00:00:00
epel | 3.2 kB 00:00:00
extras | 3.4 kB 00:00:00
libnvidia-container/x86_64/signature | 455 B 00:00:00
Retrieving key from https://nvidia.github.io/libnvidia-container/gpgkey
libnvidia-container/x86_64/signature | 2.0 kB 00:00:00 !!!
https://nvidia.github.io/libnvidia-container/centos7/x86_64/repodata/repomd.xml: [Errno -1] repomd.xml signature could not be verified for libnvidia-container
Trying other mirror.
One of the configured repositories failed (libnvidia-container),
and yum doesn't have enough cached data to continue. At this point the only
safe thing yum can do is fail. There are a few ways to work "fix" this:
1. Contact the upstream for the repository and get them to fix the problem.
2. Reconfigure the baseurl/etc. for the repository, to point to a working
upstream. This is most often useful if you are using a newer
distribution release than is supported by the repository (and the
packages for the previous distribution release still work).
3. Run the command with the repository temporarily disabled
yum --disablerepo=libnvidia-container ...
4. Disable the repository permanently, so yum won't use it by default. Yum
will then just ignore the repository until you permanently enable it
again or use --enablerepo for temporary usage:
yum-config-manager --disable libnvidia-container
or
subscription-manager repos --disable=libnvidia-container
5. Configure the failing repository to be skipped, if it is unavailable.
Note that yum will try to contact the repo. when it runs most commands,
so will have to try and fail each time (and thus. yum will be be much
slower). If it is a very temporary problem though, this is often a nice
compromise:
yum-config-manager --save --setopt=libnvidia-container.skip_if_unavailable=true
failure: repodata/repomd.xml from libnvidia-container: [Errno 256] No more mirrors to try.
https://nvidia.github.io/libnvidia-container/centos7/x86_64/repodata/repomd.xml: [Errno -1] repomd.xml signature could not be verified for libnvidia-container
REPO FILE:
[root@titan yum.repos.d]# cat nvidia-container-runtime.repo
[libnvidia-container]
name=libnvidia-container
baseurl=https://nvidia.github.io/libnvidia-container/centos7/$basearch
repo_gpgcheck=1
gpgcheck=0
enabled=1
gpgkey=https://nvidia.github.io/libnvidia-container/gpgkey
sslverify=1
sslcacert=/etc/pki/tls/certs/ca-bundle.crt
[nvidia-container-runtime]
name=nvidia-container-runtime
baseurl=https://nvidia.github.io/nvidia-container-runtime/centos7/$basearch
repo_gpgcheck=1
gpgcheck=0
enabled=1
gpgkey=https://nvidia.github.io/nvidia-container-runtime/gpgkey
sslverify=1
sslcacert=/etc/pki/tls/certs/ca-bundle.crt
[root@titan yum.repos.d]# uname -a
Linux titan 3.10.0-862.6.3.el7.x86_64 #1 SMP Tue Jun 26 16:32:21 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
[root@titan yum.repos.d]# cat /etc/redhat-release
CentOS Linux release 7.5.1804 (Core)
[root@titan yum.repos.d]# nvidia-smi
Wed Oct 3 19:12:50 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.37 Driver Version: 396.37 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1080 Off | 00000000:01:00.0 On | N/A |
| N/A 39C P8 5W / N/A | 106MiB / 8117MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1681 G /usr/bin/X 89MiB |
| 0 2128 G /usr/bin/gnome-shell 14MiB |
+-----------------------------------------------------------------------------+
[root@titan yum.repos.d]# nvidia-smi -a
==============NVSMI LOG==============
Timestamp : Wed Oct 3 19:26:35 2018
Driver Version : 396.37
Attached GPUs : 1
GPU 00000000:01:00.0
Product Name : GeForce GTX 1080
Product Brand : GeForce
Display Mode : Enabled
Display Active : Enabled
Persistence Mode : Disabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : N/A
GPU UUID : GPU-5ed61906-1aa5-fe97-e638-2cccf05ee8b2
Minor Number : 0
VBIOS Version : 86.04.88.00.26
MultiGPU Board : No
Board ID : 0x100
GPU Part Number : N/A
Inforom Version
Image Version : N/A
OEM Object : N/A
ECC Object : N/A
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GPU Virtualization Mode
Virtualization mode : None
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x01
Device : 0x00
Domain : 0x0000
Device Id : 0x1BE010DE
Bus Id : 00000000:01:00.0
Sub System Id : 0x12181462
GPU Link Info
PCIe Generation
Max : 3
Current : 1
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays since reset : 0
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Fan Speed : N/A
Performance State : P8
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
FB Memory Usage
Total : 8117 MiB
Used : 106 MiB
Free : 8011 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 8 MiB
Free : 248 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 7 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Aggregate
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending : N/A
Temperature
GPU Current Temp : 39 C
GPU Shutdown Temp : 99 C
GPU Slowdown Temp : 94 C
GPU Max Operating Temp : 91 C
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
Power Readings
Power Management : N/A
Power Draw : 7.27 W
Power Limit : N/A
Default Power Limit : N/A
Enforced Power Limit : N/A
Min Power Limit : N/A
Max Power Limit : N/A
Clocks
Graphics : 139 MHz
SM : 139 MHz
Memory : 405 MHz
Video : 544 MHz
Applications Clocks
Graphics : N/A
Memory : N/A
Default Applications Clocks
Graphics : N/A
Memory : N/A
Max Clocks
Graphics : 1911 MHz
SM : 1911 MHz
Memory : 5005 MHz
Video : 1708 MHz
Max Customer Boost Clocks
Graphics : N/A
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Processes
Process ID : 1681
Type : G
Name : /usr/bin/X
Used GPU Memory : 89 MiB
Process ID : 2128
Type : G
Name : /usr/bin/gnome-shell
Used GPU Memory : 14 MiB
[root@titan yum.repos.d]# rpm -qa '*nvidia*'
nvidia-container-runtime-2.0.0-1.docker18.06.0.x86_64
libnvidia-container1-1.0.0-0.1.rc.2.x86_64
libnvidia-container-tools-1.0.0-0.1.rc.2.x86_64
nvidia-container-runtime-hook-1.4.0-1.x86_64
[root@titan yum.repos.d]# nvidia-container-cli -V
version: 1.0.0
build date: 2018-06-12T00:20+0000
build revision: e3a2035da5a44b8a83d9568b91a8a0b542ee15d5
build compiler: gcc 4.8.5 20150623 (Red Hat 4.8.5-28)
build platform: x86_64
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections
Can you try refreshing the GPG key:
rpm --import <(curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey)
Still failed - executed the steps below.
Thanks.
[root@titan ~]# rpm --import <(curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey)
error: /dev/fd/63: import read failed(0).
[root@titan ~]# curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey > /tmp/nvidia-container-runtime.gpg
[root@titan ~]# rpm --import /tmp/nvidia-container-runtime.gpg
[root@titan ~]# yum clean all
[root@titan ~]# yum update yum
Loaded plugins: fastestmirror, langpacks
Determining fastest mirrors
epel/x86_64/metalink | 14 kB 00:00:00
* base: mirrors.advancedhosters.com
* epel: mirror.vcu.edu
* extras: mirrors.advancedhosters.com
* updates: mirrors.advancedhosters.com
base | 3.6 kB 00:00:00
bintray--sbt-rpm | 1.3 kB 00:00:00
docker-ee-stable-17.06 | 2.9 kB 00:00:00
epel | 3.2 kB 00:00:00
extras | 3.4 kB 00:00:00
libnvidia-container/x86_64/signature | 455 B 00:00:00
Retrieving key from https://nvidia.github.io/libnvidia-container/gpgkey
libnvidia-container/x86_64/signature | 2.0 kB 00:00:00 !!!
https://nvidia.github.io/libnvidia-container/centos7/x86_64/repodata/repomd.xml: [Errno -1] repomd.xml signature could not be verified for libnvidia-container
Trying other mirror.
...
Weird, it looks fine on my end.
Can you try removing the GPG key first and then importing it again?
Same result:
[root@titan rpm-gpg]# rpm -qa gpg-pubkey \* --qf "%{version}-%{release} %{summary}\n" | grep f796ecb0-59cd5831
f796ecb0-59cd5831 gpg(NVIDIA CORPORATION (Open Source Projects) <[email protected]>)
[root@titan rpm-gpg]# rpm -e --allmatches gpg-pubkey-f796ecb0-59cd5831
[root@titan rpm-gpg]# yum update yum
...
https://nvidia.github.io/libnvidia-container/centos7/x86_64/repodata/repomd.xml: [Errno -1] repomd.xml signature could not be verified for libnvidia-container
Thanks.
Also went back and re-imported the gpg-pubkey and tested and ended with same result.
Workaround (bad one) that works:
Modify /etc/yum.repos.d/nvidia-container-*.repo lines:
repo_gpgcheck=1 -> repo_gpgcheck=0
AFAIK you might also need to remove the old GPG key for the repository.
For libnvidia-container:
sudo gpg --homedir /var/lib/yum/repos/x86_64/7/libnvidia-container/gpgdir --delete-key F796ECB0
sudo yum clean all
Thanks @3XX0 . Verified that to work. In many environments the key is very likely there three times:
sudo gpg --homedir /var/lib/yum/repos/x86_64/7/libnvidia-container/gpgdir --delete-key F796ECB0
sudo gpg --homedir /var/lib/yum/repos/x86_64/7/nvidia-container-runtime/gpgdir --delete-key F796ECB0
sudo gpg --homedir /var/lib/yum/repos/x86_64/7/nvidia-docker/gpgdir --delete-key F796ECB0
sudo yum update
Thanks for confirming, I will add it to our documentation.
With @mippos steps and leaving the gpg checks on it is now working for me as well. Thanks @3XX0 and @mippos .
@billgercken it`s work use:
sudo yum install cuda --nogpgcheck
@3XX0, I think the glob based instruction in the documentation is failing because the gpg tool prompts for confirmation of removal, with the default as "no".
sudo gpg --homedir /var/lib/yum/repos/$(uname -m)/$DIST/*/gpgdir --delete-key f796ecb0
The manpage says it has a --yes flag, but it is only available "on most questions," so I do not know if deletion is included and I cannot easily check it now that I have removed the old key. (Reading further, --yes is explicit mentioned under --delete-key, but for batch mode. Is batch mode implied for multiple home directories, or is the --batch explicitly required?)
sudo gpg --homedir /var/lib/yum/repos/$(uname -m)/$DIST/*/gpgdir --batch --yes --delete-key f796ecb0
Alternatively, a loop should get the job done.
for GPGDIR in $(ls -d /var/lib/yum/repos/$(uname -m)/$DIST/*/gpgdir); do sudo gpg --homedir $GPGDIR --delete-key f796ecb0; done;
Think something is amiss with the ppc64le repository for EL7 ppc64le or my Talos 2's configuration, getting the same error even after purging the above keys from my environment's equivalent three directories (e.g. /var/lib/yum/repos/ppc64le/7Server/libnvidia-container/gpgdir), verifying with gpg it is indeed gone, and doing a yum clean all and a yum makecache
there is no gpgdir. How to solve the problem?
(base) [xxx@centos7 libnvidia-container]$ ls
(base) [xxx@centos7 libnvidia-container]$ pwd
/var/lib/yum/repos/x86_64/7/libnvidia-container
@3XX0
In Fedora (30) That can be fixed similarly by
rpm -q gpg-pubkey --qf '%{NAME}-%{VERSION}-%{RELEASE}\t%{SUMMARY}\n' | grep -i nvidiarpm -e gpg-pubkey-{ID}-{RELEASE ID} using the values from the previous stepfor x in /var/cache/dnf/*/pubring; do sudo gpg --homedir "${x}" --delete-key f796ecb0; donednf clean alldnf makecache@autodataming to rehash, in centos the fix looks more like
rpm -q gpg-pubkey --qf '%{NAME}-%{VERSION}-%{RELEASE}\t%{SUMMARY}\n' | grep -i nvidiarpm -e gpg-pubkey-{ID}-{RELEASE ID} using the values from the previous stepfor x in /var/lib/yum/repos/x86_64/7/*/gpgdir; do sudo gpg --homedir "${x}" --delete-key f796ecb0; doneyum clean allyum makecacheWhen you run makecache be sure to say yes to accept the new gpg keys.
You can always optionally import the key manually, at the end, rpm --import https://nvidia.github.io/nvidia-docker/gpgkey, but that's not necessary, makecache will do that for you.
This is what I've been doing on my Fedora 30 and Centos 7 Servers, and it seems to work. Now as to "WHY are nvidia gpg keys so messed up in an unacceptable way?" I have no idea. Someone did something they shouldn't have, I suspect. All repos use the gpg-pubkey, but none of my dozens of other 3rd party repos that do use gpg signing on any of my workstations or servers use this gpgdir/pubkey dir. Even AFTER this fix, the pubring is still being used by the new gpg key, so I can only image this problem will happen again down the road, if (when?) the gpg keys are ever changed again.
Before
gpg --homedir /var/cache/dnf/libnvidia-container-abceabc6675ce29d/pubring --list-keys
gpg: WARNING: unsafe permissions on homedir '/var/cache/dnf/libnvidia-container-abceabc6675ce29d/pubring'
/var/cache/dnf/libnvidia-container-abceabc6675ce29d/pubring/pubring.kbx
-----------------------------------------------------------------------
pub rsa4096 2017-09-28 [SCE]
C95B321B61E88C1809C4F759DDCAE044F796ECB0
uid [ unknown] NVIDIA CORPORATION (Open Source Projects) <[email protected]>
After
/var/cache/dnf/libnvidia-container-abceabc6675ce29d/pubring/pubring.kbx
-----------------------------------------------------------------------
pub rsa4096 2017-09-28 [SCE]
C95B321B61E88C1809C4F759DDCAE044F796ECB0
uid [ unknown] NVIDIA CORPORATION (Open Source Projects) <[email protected]>
sub rsa2048 2019-09-18 [S] [expires: 2020-09-17]
So nothing is better, just working "right now"
I tried following the insructions here:
$ DIST=$(sed -n 's/releasever=//p' /etc/yum.conf)
$ DIST=${DIST:-$(. /etc/os-release; echo $VERSION_ID)}
$ sudo rpm -e gpg-pubkey-f796ecb0
$ sudo gpg --homedir /var/lib/yum/repos/$(uname -m)/$DIST/*/gpgdir --delete-key f796ecb0
$ sudo gpg --homedir /var/lib/yum/repos/$(uname -m)/latest/nvidia-docker/gpgdir --delete-key f796ecb0
$ sudo gpg --homedir /var/lib/yum/repos/$(uname -m)/latest/nvidia-container-runtime/gpgdir --delete-key f796ecb0
$ sudo gpg --homedir /var/lib/yum/repos/$(uname -m)/latest/libnvidia-container/gpgdir --delete-key f796ecb0
but I get key not found.
But when I tried what was suggested by @mippos :
sudo gpg --homedir /var/lib/yum/repos/x86_64/7/libnvidia-container/gpgdir --delete-key F796ECB0
sudo gpg --homedir /var/lib/yum/repos/x86_64/7/nvidia-container-runtime/gpgdir --delete-key F796ECB0
sudo gpg --homedir /var/lib/yum/repos/x86_64/7/nvidia-docker/gpgdir --delete-key F796ECB0
it worked perfectly.
Any explanation as to why this worked?
Thanks @3XX0 . Verified that to work. In many environments the key is very likely there three times:
sudo gpg --homedir /var/lib/yum/repos/x86_64/7/libnvidia-container/gpgdir --delete-key F796ECB0
sudo gpg --homedir /var/lib/yum/repos/x86_64/7/nvidia-container-runtime/gpgdir --delete-key F796ECB0
sudo gpg --homedir /var/lib/yum/repos/x86_64/7/nvidia-docker/gpgdir --delete-key F796ECB0
sudo yum update
Addendum to @mippos' solution: if you don't want to update all your yum repos-- yum install -y nvidia-container-toolkit
Most helpful comment
Thanks @3XX0 . Verified that to work. In many environments the key is very likely there three times:
sudo gpg --homedir /var/lib/yum/repos/x86_64/7/libnvidia-container/gpgdir --delete-key F796ECB0
sudo gpg --homedir /var/lib/yum/repos/x86_64/7/nvidia-container-runtime/gpgdir --delete-key F796ECB0
sudo gpg --homedir /var/lib/yum/repos/x86_64/7/nvidia-docker/gpgdir --delete-key F796ECB0
sudo yum update