Attempting to apt-get update in the latest nvidia/cuda image for CUDA 10.1 / cudnn7 / ubuntu16.04 produces the following failure:
Reading package lists... Error!
E: Encountered a section with no Package: header
E: Problem with MergeList /var/lib/apt/lists/developer.download.nvidia.com_compute_cuda_repos_ubuntu1604_x86%5f64_Packages.lz4
E: The package lists or status file could not be parsed or opened.
Possibly, this is due to corruption of the nvidia apt index itself, like its /Packages file, since the apt index is reporting recent updates to it timestamp wise (2020-10-19 19:03)
# Latest image for me is `sha256:59179dcc823e4dda86b3d780c165ad0eed5559bbcc08950c7973b68775a32ed2`
$ docker pull nvidia/cuda:10.1-cudnn7-runtime-ubuntu16.04
$ docker run -it nvidia/cuda:10.1-cudnn7-runtime-ubuntu16.04 apt-get update
Full output:
acarrillo ~/code/farmwise_main/docs $ docker run -it nvidia/cuda:10.1-cudnn7-runtime-ubuntu16.04 apt-get update
Ign:1 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64 InRelease
Ign:2 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64 InRelease
Get:3 http://archive.ubuntu.com/ubuntu xenial InRelease [247 kB]
Get:4 http://security.ubuntu.com/ubuntu xenial-security InRelease [109 kB]
Get:5 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64 Release [697 B]
Get:6 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64 Release [564 B]
Get:7 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64 Release.gpg [836 B]
Get:8 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64 Release.gpg [833 B]
Ign:9 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64 Packages
Get:9 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64 Packages [382 kB]
Get:10 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64 Packages [97.2 kB]
Get:11 http://security.ubuntu.com/ubuntu xenial-security/main amd64 Packages [1835 kB]
Get:12 http://archive.ubuntu.com/ubuntu xenial-updates InRelease [109 kB]
Get:13 http://archive.ubuntu.com/ubuntu xenial-backports InRelease [107 kB]
Get:14 http://archive.ubuntu.com/ubuntu xenial/main amd64 Packages [1558 kB]
Get:15 http://security.ubuntu.com/ubuntu xenial-security/restricted amd64 Packages [15.9 kB]
Get:16 http://security.ubuntu.com/ubuntu xenial-security/universe amd64 Packages [951 kB]
Get:17 http://archive.ubuntu.com/ubuntu xenial/restricted amd64 Packages [14.1 kB]
Get:18 http://archive.ubuntu.com/ubuntu xenial/universe amd64 Packages [9827 kB]
Get:19 http://security.ubuntu.com/ubuntu xenial-security/multiverse amd64 Packages [9249 B]
Get:20 http://archive.ubuntu.com/ubuntu xenial/multiverse amd64 Packages [176 kB]
Get:21 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages [2353 kB]
Get:22 http://archive.ubuntu.com/ubuntu xenial-updates/restricted amd64 Packages [16.4 kB]
Get:23 http://archive.ubuntu.com/ubuntu xenial-updates/universe amd64 Packages [1497 kB]
Get:24 http://archive.ubuntu.com/ubuntu xenial-updates/multiverse amd64 Packages [26.7 kB]
Get:25 http://archive.ubuntu.com/ubuntu xenial-backports/main amd64 Packages [10.9 kB]
Get:26 http://archive.ubuntu.com/ubuntu xenial-backports/universe amd64 Packages [12.6 kB]
Fetched 19.4 MB in 5s (3713 kB/s)
Reading package lists... Error!
E: Encountered a section with no Package: header
E: Problem with MergeList /var/lib/apt/lists/developer.download.nvidia.com_compute_cuda_repos_ubuntu1604_x86%5f64_Packages.lz4
E: The package lists or status file could not be parsed or opened.
I'm also seeing this issue, and am able to reproduce locally using @acarrillo 's steps. Here's my output:
matt@matt-thinkpad:~/farmwise_main$ docker run -it nvidia/cuda:10.1-cudnn7-runtime-ubuntu16.04 apt-get update
Get:1 http://security.ubuntu.com/ubuntu xenial-security InRelease [109 kB]
Ign:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64 InRelease
Ign:3 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64 InRelease
Get:4 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64 Release [697 B]
Get:5 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64 Release [564 B]
Get:6 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64 Release.gpg [836 B]
Get:7 http://archive.ubuntu.com/ubuntu xenial InRelease [247 kB]
Get:8 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64 Release.gpg [833 B]
Ign:9 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64 Packages
Get:9 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64 Packages [382 kB]
Get:10 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64 Packages [97.2 kB]
Get:11 http://security.ubuntu.com/ubuntu xenial-security/main amd64 Packages [1835 kB]
Get:12 http://archive.ubuntu.com/ubuntu xenial-updates InRelease [109 kB]
Get:13 http://security.ubuntu.com/ubuntu xenial-security/restricted amd64 Packages [15.9 kB]
Get:14 http://security.ubuntu.com/ubuntu xenial-security/universe amd64 Packages [951 kB]
Get:15 http://security.ubuntu.com/ubuntu xenial-security/multiverse amd64 Packages [9249 B]
Get:16 http://archive.ubuntu.com/ubuntu xenial-backports InRelease [107 kB]
Get:17 http://archive.ubuntu.com/ubuntu xenial/main amd64 Packages [1558 kB]
Get:18 http://archive.ubuntu.com/ubuntu xenial/restricted amd64 Packages [14.1 kB]
Get:19 http://archive.ubuntu.com/ubuntu xenial/universe amd64 Packages [9827 kB]
Get:20 http://archive.ubuntu.com/ubuntu xenial/multiverse amd64 Packages [176 kB]
Get:21 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages [2353 kB]
Get:22 http://archive.ubuntu.com/ubuntu xenial-updates/restricted amd64 Packages [16.4 kB]
Get:23 http://archive.ubuntu.com/ubuntu xenial-updates/universe amd64 Packages [1497 kB]
Get:24 http://archive.ubuntu.com/ubuntu xenial-updates/multiverse amd64 Packages [26.7 kB]
Get:25 http://archive.ubuntu.com/ubuntu xenial-backports/main amd64 Packages [10.9 kB]
Get:26 http://archive.ubuntu.com/ubuntu xenial-backports/universe amd64 Packages [12.6 kB]
Fetched 19.4 MB in 3s (5100 kB/s)
Reading package lists... Error!
E: Encountered a section with no Package: header
E: Problem with MergeList /var/lib/apt/lists/developer.download.nvidia.com_compute_cuda_repos_ubuntu1604_x86%5f64_Packages.lz4
E: The package lists or status file could not be parsed or opened.
if i download the /Packages file, and i search for \n\n[^P], it looks like the last package definitions have been split with an extra newline after Installed-Size
Package: datacenter-gpu-manager
Priority: optional
Provides: datacenter-gpu-manager
Replaces: datacenter-gpu-manager, datacenter-gpu-manager-fabricmanager (<<2.0), datacenter-gpu-manager-dcp-nda-only, datacenter-gpu-manager-collectd, datacenter-gpu-manager-wsgi, datacenter-gpu-manager-fabricmanager-internal-api-header
Section: devel
Version: 1:2.0.13
Installed-Size: 370887
Filename: ./datacenter-gpu-manager_2.0.13_amd64.deb
Size: 184134550
MD5sum: 74c67c4a8477bcf808508fbb13fafea3
SHA1: 0590a85fbe357c434eb903e59cce0b3db2903620
SHA256: 74add19a8b6e3bd612c04690c366eff7a0eb3271e021a663939f3ff683aa2705
SHA512: d54529d72223544eba8bf2d05b18aac39ae741d66d8dfbb26093a61e0485229519577b1de37b55d0c7759612aef19acabd7622956bb0284531924695d6491677
Description: NVIDIA® Datacenter GPU Management Tools
The Datacenter GPU Manager package contains tools for managing NVIDIA® GPUs in
high performance and cluster computing environments.
.
This package also contains the DCGM GPU Diagnostic. DCGM GPU Diagnostic is the system
administrator and cluster manager's tool for detecting and troubleshooting
common problems affecting NVIDIA® Tesla GPUs.
Package: datacenter-gpu-manager
Priority: optional
Provides: datacenter-gpu-manager
Replaces: datacenter-gpu-manager, datacenter-gpu-manager-fabricmanager (<<2.0), datacenter-gpu-manager-dcp-nda-only, datacenter-gpu-manager-collectd, datacenter-gpu-manager-wsgi, datacenter-gpu-manager-fabricmanager-internal-api-header
Section: devel
Version: 1:2.0.13
Installed-Size: 370887
Filename: ./datacenter-gpu-manager_2.0.13_amd64.deb
Size: 184134550
MD5sum: 74c67c4a8477bcf808508fbb13fafea3
SHA1: 0590a85fbe357c434eb903e59cce0b3db2903620
SHA256: 74add19a8b6e3bd612c04690c366eff7a0eb3271e021a663939f3ff683aa2705
SHA512: d54529d72223544eba8bf2d05b18aac39ae741d66d8dfbb26093a61e0485229519577b1de37b55d0c7759612aef19acabd7622956bb0284531924695d6491677
Description: NVIDIA® Datacenter GPU Management Tools
The Datacenter GPU Manager package contains tools for managing NVIDIA® GPUs in
high performance and cluster computing environments.
.
This package also contains the DCGM GPU Diagnostic. DCGM GPU Diagnostic is the system
administrator and cluster manager's tool for detecting and troubleshooting
common problems affecting NVIDIA® Tesla GPUs.
@cliffwoolley can someone from the Nvidia side take a look? This is blocking Horovod CI.
This is not an issue for nvidia-docker itself.
All issues related to images running on top of nvidia-docker should be directed here:
https://forums.developer.nvidia.com/c/accelerated-computing/nvidia-gpu-cloud-ngc-users/25
Ah, I think the community did not know that nvidia-docker and the nvidia apt maintainers do not talk to each other :grimacing:
@klueska, is there an issue tracker for the correct team? A forum does not seem like the correct place to track breakages of this sort.
Completely +1 to @tgaddair -- does nvidia have an issue tracking process for their package releases?
Let me double check if there is something better than the forum I linked nowadays.
Last time I checked, this was the recommended place to file issues for these images.
We're working on the package repository and the issue is being addressed.
@acarrillo, @tgaddair
I got word that this is actually the better place for issues of this type (and it is well monitored):
https://forums.developer.nvidia.com/c/accelerated-computing/cuda/cuda-setup-and-installation/8
In fact, you can see this exact issue being discussed here:
https://forums.developer.nvidia.com/t/apt-update-failing-on-ubuntu-cuda-repo/140815/5
@acarrillo @tgaddair
The repository metadata has been fixed. However, if you were affected by the corrupt metadata, you would need to manually purge it from your system (not necessary in containers as the metadata would not be stored, but on bare-metal).
$ sudo rm -v /var/lib/apt/lists/developer.download.nvidia.com_compute_cuda_repos*
$ sudo apt-get update
Noted, thank you for all of the information! I will route future concerns to that forum when they strictly pertain to nvidia/cuda core libs :)
Thanks @dualvtable, seems to be working now.
Hello, I've met exactly the same problem with you now. Would you plz share your solution? Thx in advance!
Most helpful comment
if i download the
/Packagesfile, and i search for\n\n[^P], it looks like the last package definitions have been split with an extra newline afterInstalled-Size