Description
Originally reported at https://github.com/rancher/k3s/issues/2455
Pulling an image with many layers fails with a hard-to-trace error regarding label size:
level=error msg="PullImage \"docker.io/cjp2k20/test:latest\" failed" error="rpc error: code = InvalidArgument desc = failed to pull and unpack image \"docker.io/cjp2k20/test:latest\": failed to prepare extraction snapshot \"extract-184956284-yQG1 sha256:42add5a4c91d6ec9dce019f7675367ea9736d9a4156cae704bb068865e1f45ac\": info.Labels: label key and value greater than maximum size (4096 bytes), key: containerd: invalid argument"
The root cause of that is at https://github.com/containerd/containerd/blob/master/pkg/cri/server/image_pull.go#L493 where the layer digest list is placed into a label. With a 39 character key, and 72 characters per digest (including the trailing comma), it appears that this will fail with more than 56 layers.
This is really hard to track down though, since for some reason it truncates the keys to 10 characters when logging the error at https://github.com/containerd/containerd/blob/master/labels/validate.go#L32 - I had to do a custom build with additional logging just to figure out what the actual key was, since I couldn't find any excessively long keys in the image itself.
Steps to reproduce the issue:
Describe the results you received:
Error:
FATA[2020-11-02T11:45:16.275682391-08:00] pulling image: rpc error: code = InvalidArgument desc = failed to pull and unpack image "docker.io/cjp2k20/test:latest": failed to prepare extraction snapshot "extract-273268420-iGNj sha256:42add5a4c91d6ec9dce019f7675367ea9736d9a4156cae704bb068865e1f45ac": info.Labels: label key and value greater than maximum size (4096 bytes), key: containerd: invalid argument
Describe the results you expected:
Image is successfully pulled, or we receive an error message indicating that containerd has an upper bound on the number of layers, and this bound has been exceeded.
Output of containerd --version:
containerd github.com/rancher/containerd v1.4.0-k3s1
Any other relevant information:
This was reported on the containerd embedded in k3s 1.19.3+k3s2, but appears to also reproduce on stock containerd.
Fixes are already in master from two PRs--one CRI, one default config: 1) https://github.com/containerd/cri/pull/1572 and 2) https://github.com/containerd/containerd/pull/4665
There is not a release yet with either of these fixes, but we will be putting out a 1.4.x point release which will contain these solutions. For now, you can manually turn off annotations in your containerd config as unless you are using a remote snapshotter which uses this feature, it is unnecessary.
Ah, thanks! I didn't find those by searching for the error, and wasn't aware that this was an experimental feature since it was on by default.
yeah, sadly not as searchable as would be valuable since the original "issue" was in a subproject (https://github.com/containerd/stargz-snapshotter/issues/144), the fix was in the CRI sub-project (right prior to the merge of the CRI into the main containerd repo), and then the PR for turning off the config feature wasn't associated with a bug in our main repo because of that meandering history!
This was a good reminder to get the config change back into release/1.4; I've opened #4685 to do that; I think we can close this now and hopefully this will help anyone else who is searching for the same issue an easy place to find the history and fix(es).
Most helpful comment
Fixes are already in master from two PRs--one CRI, one default config: 1) https://github.com/containerd/cri/pull/1572 and 2) https://github.com/containerd/containerd/pull/4665
There is not a release yet with either of these fixes, but we will be putting out a 1.4.x point release which will contain these solutions. For now, you can manually turn off annotations in your containerd config as unless you are using a remote snapshotter which uses this feature, it is unnecessary.