Version:
v1.17.0+k3s.1
Describe the bug
ARM64 and ARM airgap image tarballs contain some AMD64 images instead of the correct arch.
To Reproduce
Examine released files. Unpacked k3s-airgap-images-arm.tar and examined content with:
for d in *.json; do [ $d != manifest.json ] && jq .architecture < $d; done
Expected behavior
Should only see "arm" for all images.
Actual behavior
"amd64"
"arm"
"arm"
"arm"
"amd64"
"amd64"
"arm"
Additional context
This appears to have been broken for a while, maybe it never worked correctly on arm/arm64.
Thanks for the info @gbritton! We definitely need to dig deeper into this. Strangely it appears that even tho the metadata says amd64, the extracted binaries look to be the appropriate architecture. Since we are using a simple docker command to download and save those images it appears to be an upstream docker issue (https://github.com/rancher/k3s/blob/master/scripts/package-airgap).
These may be related: #1278 and #1094 (I've seen a few other complaints online, but these at least are here). These appear to be people hitting containerd errors when using the airgapped images on arm64. In my limited toying with "ctr i import airgap-images.tar" it fails on the mixed images. Curiously, this appears to succeed back on v0.9.1, and the few missing images got pulled from the internet. Newer versions I end up with containerd complaining about missing objects, I'm guessing because the import of improper data left things in a bad state. I wish I had time to debug this further, but this is at least the impression I've gotten from what little I've dug into.
@erikwilson looking at package-airgap I see it uses docker pull <image> to grab things... should this be adding the --platform <arch> argument?

This may be related... it looks like the metrics server at least is misbuilt, pulling that image shows there is an amd64 not arm64 binary in the container.
I am also having the same problem. The three images that have amd64 as architecture are coredns, pause, and metrics server. It causes the container creation to hang because it can't resolve the platform type.
I think I understand what the issue is (and have worked around it in my own setup which gathers a airgap images independently). The multiarch manifests refer to say amd64, arm, and arm64 images. Doing just "docker save" simply dumps the existing manifest as-is into the archive along with the single architecture's layers. When this is loaded into containerd, the manifest still refers to additional blobs that aren't needed, but containerd still complains aren't found. My workaround is to use skopeo in my build system to pull images down into local storage, then exporting that all as an oci bundle with self-consistent manifests. Unfortunately, it's not cleanly separable and isn't a drop-in fix for what k3s releases build, but hopefully this helps others fix it.
for ref in $(sed -e 's/#.*//;/^[[:space:]]*$/d' ${AIRGAP_FILES}); do
skopeo --policy ${WORKDIR}/policy.json --override-arch ${HOST_GOARCH} copy \
docker://$ref oci:${WORKDIR}/oci-images:$ref
done
tar --numeric-owner --owner=0 --group=0 \
--mtime @${REPRODUCIBLE_TIMESTAMP_ROOTFS} -C ${WORKDIR}/oci-images \
-cf ${WORKDIR}/${AIRGAP_TAR} oci-layout blobs index.json
Thanks for the quick turnaround!
Since I’m new at this, my options were limited. I did a brute force of pulling the images from docker hub with the “--platform=arm64” option, pushed them to my private repository to acquire the matching digests as docker hub, did a docker save to tar, extracted the files to edit the json files that still had the amd64 architectures, and tarred them up again. It worked. Now with your explanation, I finally understand why the manifests were incorrect. Is there a docker save –platform
@shariperryman Thanks a lot this also worked for me.
Nevertheless, are the any news on this topic?
I just faceplanted into this issue. The airgapped install seems to be broken and all system ns pods are stuck at ContainerCreating. failed to resolve rootfs and all.
1008 kuberuntime_sandbox.go:68] CreatePodSandbox for pod "coredns-8655855d6-lxb65_kube-system(a9bf61ea-c193-4fa9-944d-cd838424f0c2)" failed: rpc error: code = NotFound desc = failed to create containerd container: error unpacking image: failed to resolve rootfs: content digest sha256:e11a8cbeda8688fdc3a68dc246d6c945aa272b29f8dd94d0ea370d30c2674042: not found
Any updates or a recommended workaround?
We were targeting a fix for v1.18.7 today, but due to complications we are going to postpone this. This is because the images that we would need to repush to fix this are also used by RKE. To safely ensure we don't break production, we need to have a more methodical and thorough approach that encompases testing BOTH RKE and K3S. HOWEVER, we may be able to fix this out of band and it is on our radar to try to get to this soon - just not in time for new patch releases today.
We believe we need to do the following:
Our apologies for delay on this but we believe it's in everybody's best interests (we don't want to break people in production).
I have also taken note of this issue in our August patch releases issue here: https://github.com/rancher/k3s/issues/2113
@rancher-max wait for https://github.com/rancher/k3s/issues/1908 to get in before test. These are both intertwined so should just wait.
I will note that several of the upstream images are still misconfigured and claim to be the wrong arch, but are in fact correct. We are working around this by telling containerd to not skip loading layers for images that are incorrectly configured.
v1.19.1-rc2+k3s1OS used (uname -a): Linux ip-172-31-10-146 5.4.0-1022-aws #22-Ubuntu SMP Wed Aug 12 13:52:46 UTC 2020 aarch64 aarch64 aarch64 GNU/Linux
rancher/metrics-server:v0.3.6, rancher/coredns-coredns:1.6.3, and rancher/pause:3.1. Also, the image for busybox was not present. The result of the images having a problem was that a few pods would not come up successfully and would be stuck in ContainerCreating:kube-system helm-install-traefik-qg4hh 0/1 ContainerCreating 0 4m8s
kube-system coredns-8655855d6-dktnp 0/1 ContainerCreating 0 4m8s
kube-system local-path-provisioner-6d59f47c7-cd9d5 0/1 ContainerCreating 0 4m8s
kube-system metrics-server-7566d596c8-sznr2 0/1 ContainerCreating 0 4m8s
$ k3s kubectl get nodes,pods -A
NAME STATUS ROLES AGE VERSION
node/ip-172-31-10-146 Ready master 46m v1.19.1-rc2+k3s1
node/ip-172-31-12-195 Ready <none> 45m v1.19.1-rc2+k3s1
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system pod/helm-install-traefik-hdfnt 0/1 Completed 0 46m
kube-system pod/local-path-provisioner-7ff9579c6-chxmr 1/1 Running 0 46m
kube-system pod/coredns-66c464876b-fdh28 1/1 Running 0 46m
kube-system pod/metrics-server-7b4f8b595-tn6w4 1/1 Running 0 46m
kube-system pod/svclb-traefik-qr82d 2/2 Running 0 45m
kube-system pod/svclb-traefik-hgzbz 2/2 Running 0 45m
kube-system pod/traefik-5dd496474-qrtz4 1/1 Running 0 45m
$ sudo k3s crictl images
IMAGE TAG IMAGE ID SIZE
docker.io/rancher/coredns-coredns 1.6.9 af51a588dff59 41MB
docker.io/rancher/klipper-helm v0.3.0 4001cb2c385ce 140MB
docker.io/rancher/klipper-lb v0.1.2 9be4f056f04b7 6.21MB
docker.io/rancher/library-busybox 1.31.1 19d689bc58fd6 1.6MB
docker.io/rancher/library-traefik 1.7.19 1cdb7e2bd5e25 83.6MB
docker.io/rancher/local-path-provisioner v0.0.14 2b703ea309660 40.2MB
docker.io/rancher/metrics-server v0.3.6 f9499facb1e8c 39.6MB
docker.io/rancher/pause 3.1 6cf7c80fe4444 529kB
cat manifest.json | grep -i config and then cat <configvalue> | grep architecture on all 8 results and ensuring it shows as arm64):rancher/library-busybox:1.31.1: arm64rancher/library-traefik:1.7.19: arm64rancher/local-path-provisioner:v0.0.14: arm64rancher/metrics-server:v0.3.6: amd64rancher/pause:3.1: amd64rancher/coredns-coredns:1.6.9: arm64rancher/klipper-helm:v0.3.0: arm64rancher/klipper-lb:v0.1.2: arm64That's correct @rancher-max, and is what I was getting at in https://github.com/rancher/k3s/issues/1285#issuecomment-693174813. Upstream managed to build them such that they have the correct arch in the manifest list, and the correct arch for the binaries, but the wrong arch in the image config json. We are fine just ignoring this for now. Upstream has fixed this in newer releases of the images, but we don't want to bump those at the moment.