Eksctl: kube-proxy upgrade fails due to missing AWS image

Created on 3 Aug 2019  ·  17Comments  ·  Source: weaveworks/eksctl

What happened?
Recently eksctl utils update-kube-proxy is failing when trying to upgrade to eks v1.12 and v1.13. It appears the upstream images are missing from the AWS docker repo. I'm not sure if this is a temporary issue or for some reason these images have been deprecated and the internal code is still referencing something out of date.

What you expected to happen?
eksctl utils update-kube-proxy should not throw any ErrImagePull errors

How to reproduce it?
On a cluster booted with v1.11 after upgrading the cluster, and attaching a new nodegroup, run the kube-proxy upgrade command and see if fail.

Anything else we need to know?

Here is the relevant describe output when trying to upgrade kube-proxy to v1.12:

Events:
  Type     Reason     Age               From                                                  Message
  ----     ------     ----              ----                                                  -------
  Normal   Scheduled  21s               default-scheduler                                     Successfully assigned kube-system/kube-proxy-9jkl2 to ip-10-146-70-156.us-east-2.compute.internal
  Normal   BackOff    20s               kubelet, ip-10-146-70-156.us-east-2.compute.internal  Back-off pulling image "602401143452.dkr.ecr.us-east-2.amazonaws.com/eks/kube-proxy:v1.12.10"
  Warning  Failed     20s               kubelet, ip-10-146-70-156.us-east-2.compute.internal  Error: ImagePullBackOff
  Normal   Pulling    9s (x2 over 20s)  kubelet, ip-10-146-70-156.us-east-2.compute.internal  pulling image "602401143452.dkr.ecr.us-east-2.amazonaws.com/eks/kube-proxy:v1.12.10"
  Warning  Failed     9s (x2 over 20s)  kubelet, ip-10-146-70-156.us-east-2.compute.internal  Failed to pull image "602401143452.dkr.ecr.us-east-2.amazonaws.com/eks/kube-proxy:v1.12.10": rpc error: code = Unknown desc = Error response from daemon: manifest for 602401143452.dkr.ecr.us-east-2.amazonaws.com/eks/kube-proxy:v1.12.10 not found
  Warning  Failed     9s (x2 over 20s)  kubelet, ip-10-146-70-156.us-east-2.compute.internal  Error: ErrImagePull

Here is the output for v1.13:

Events:
  Type     Reason     Age                From                                                  Message
  ----     ------     ----               ----                                                  -------
  Normal   Scheduled  47s                default-scheduler                                     Successfully assigned kube-system/kube-proxy-qz8sn to ip-10-146-70-156.us-east-2.compute.internal
  Normal   BackOff    21s (x2 over 47s)  kubelet, ip-10-146-70-156.us-east-2.compute.internal  Back-off pulling image "602401143452.dkr.ecr.us-east-2.amazonaws.com/eks/kube-proxy:v1.13.8"
  Warning  Failed     21s (x2 over 47s)  kubelet, ip-10-146-70-156.us-east-2.compute.internal  Error: ImagePullBackOff
  Normal   Pulling    8s (x3 over 47s)   kubelet, ip-10-146-70-156.us-east-2.compute.internal  pulling image "602401143452.dkr.ecr.us-east-2.amazonaws.com/eks/kube-proxy:v1.13.8"
  Warning  Failed     8s (x3 over 47s)   kubelet, ip-10-146-70-156.us-east-2.compute.internal  Failed to pull image "602401143452.dkr.ecr.us-east-2.amazonaws.com/eks/kube-proxy:v1.13.8": rpc error: code = Unknown desc = Error response from daemon: manifest for 602401143452.dkr.ecr.us-east-2.amazonaws.com/eks/kube-proxy:v1.13.8 not found
  Warning  Failed     8s (x3 over 47s)   kubelet, ip-10-146-70-156.us-east-2.compute.internal  Error: ErrImagePull
$ k get po -n kube-system
NAME                       READY   STATUS             RESTARTS   AGE
aws-node-dbkfh             1/1     Running            0          3h41m
aws-node-l9kp4             1/1     Running            0          3h41m
coredns-5c466f5779-lqmq8   1/1     Running            0          3h13m
coredns-5c466f5779-xw92g   1/1     Running            0          3h13m
kube-proxy-4mrmc           1/1     Running            0          3h41m
kube-proxy-kqjsl           0/1     ImagePullBackOff   0          8m29s



md5-e43a7ac8f48e8dde0b989e04f5ad95d7



$ eksctl version
[ℹ]  version.Info{BuiltAt:"", GitCommit:"", GitTag:"0.2.1"}

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.3", GitCommit:"5e53fd6bc17c0dec8434817e69b04a25d8ae0ff0", GitTreeState:"clean", BuildDate:"2019-06-06T01:44:30Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.8-eks-a977ba", GitCommit:"a977bab148535ec195f12edc8720913c7b943f9c", GitTreeState:"clean", BuildDate:"2019-07-29T20:47:04Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}

Logs

eksctl get clusters -v 4 output: https://gist.github.com/res0nat0r/ab1dd174441d2c3684921614c0243993

areadd-ons kinbug prioritimportant-soon

Most helpful comment

There is no particular reason that we only keep one version of kube-proxy.
I'll discuss with team to either:

  • provide kube-proxy per patch version
  • grant public 'list-images' permission to allow programmatically access
  • provide public ssm parameter to denote the latest version for each k8s minor version
  • provide an latest-tag for each k8s minor version like kube-proxy: 1.11

All 17 comments

It looks like only:

  • eks/kube-proxy:v1.12.6
  • eks/kube-proxy:v1.13.7

are available as per the table on this EKS documentation page:

$ $(aws ecr get-login --no-include-email --registry-ids 602401143452)
WARNING! Using --password via the CLI is insecure. Use --password-stdin.
Login Succeeded

$ for minor in $(seq 10) ; do docker pull 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/kube-proxy:v1.12.${minor} ; done
Error response from daemon: manifest for 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/kube-proxy:v1.12.1 not found: manifest unknown: Requested image not found
Error response from daemon: manifest for 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/kube-proxy:v1.12.2 not found: manifest unknown: Requested image not found
Error response from daemon: manifest for 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/kube-proxy:v1.12.3 not found: manifest unknown: Requested image not found
Error response from daemon: manifest for 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/kube-proxy:v1.12.4 not found: manifest unknown: Requested image not found
Error response from daemon: manifest for 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/kube-proxy:v1.12.5 not found: manifest unknown: Requested image not found
v1.12.6: Pulling from eks/kube-proxy
7ba0c30ce37c: Pull complete 
99d8672f79df: Pull complete 
d30c36e7c40b: Pull complete 
Digest: sha256:c391670199576ebb770d3851f1e1e7b27bb9655940414b5b7cc4b33db156e066
Status: Downloaded newer image for 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/kube-proxy:v1.12.6
602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/kube-proxy:v1.12.6
Error response from daemon: manifest for 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/kube-proxy:v1.12.7 not found: manifest unknown: Requested image not found
Error response from daemon: manifest for 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/kube-proxy:v1.12.8 not found: manifest unknown: Requested image not found
Error response from daemon: manifest for 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/kube-proxy:v1.12.9 not found: manifest unknown: Requested image not found
Error response from daemon: manifest for 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/kube-proxy:v1.12.10 not found: manifest unknown: Requested image not found

$ for minor in $(seq 8) ; do docker pull 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/kube-proxy:v1.13.${minor} ; done
Error response from daemon: manifest for 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/kube-proxy:v1.13.1 not found: manifest unknown: Requested image not found
Error response from daemon: manifest for 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/kube-proxy:v1.13.2 not found: manifest unknown: Requested image not found
Error response from daemon: manifest for 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/kube-proxy:v1.13.3 not found: manifest unknown: Requested image not found
Error response from daemon: manifest for 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/kube-proxy:v1.13.4 not found: manifest unknown: Requested image not found
Error response from daemon: manifest for 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/kube-proxy:v1.13.5 not found: manifest unknown: Requested image not found
Error response from daemon: manifest for 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/kube-proxy:v1.13.6 not found: manifest unknown: Requested image not found
v1.13.7: Pulling from eks/kube-proxy
6cf6a0b0da0d: Pull complete 
8e1ce322a1d9: Pull complete 
dcc78b3296ee: Pull complete 
Digest: sha256:7bd8569a3c32472019ef819854b3a4605d91310fcac9d36aff85063bbe699f8e
Status: Downloaded newer image for 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/kube-proxy:v1.13.7
602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/kube-proxy:v1.13.7
Error response from daemon: manifest for 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/kube-proxy:v1.13.8 not found: manifest unknown: Requested image not found

@mhausenblas, @M00nF1sh (given I noticed your comment here https://github.com/weaveworks/eksctl/pull/1001#issuecomment-516617408), would you know (or would be able to point to someone who would know):

  1. why AWS only keeps one minor version available for the kube-proxy as opposed to one for each Kubernetes version?
  2. if there is a way to programmatically get the versions mapping table available on this EKS documentation page?

There is no particular reason that we only keep one version of kube-proxy.
I'll discuss with team to either:

  • provide kube-proxy per patch version
  • grant public 'list-images' permission to allow programmatically access
  • provide public ssm parameter to denote the latest version for each k8s minor version
  • provide an latest-tag for each k8s minor version like kube-proxy: 1.11

Thanks a lot @M00nF1sh, this would indeed be extremely helpful! ✨

I've not tested an eks upgrade in a week or so to see if this is still an ongoing problem, but I assume until this issue is resolved all eksctl users cluster upgrades are currently broken right now yes?

AFAIK only update-kube-proxy is broken right now (I updated a cluster yesterday, and other eksctl update stuff worked fine)

The best way to fix this for now

kubectl set image daemonset.apps/kube-proxy \
-n kube-system \
kube-proxy=602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/kube-proxy:v1.13.7

Reference: https://docs.aws.amazon.com/eks/latest/userguide/update-cluster.html

Hello, just wanted to add that I ran into this issue today while upgrading EKS to use kubernetes 1.16. Everything mentioned above followed my issue exactly with just different version numbers.
The kube-proxy image mismatch was:
expected 602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/kube-proxy:v1.16.13

and I followed @austinbv 's suggestion (after manually reading through the docs link) to set the tag to v1.16.12 👍

What was really unfortunate is that I encountered this problem during a production upgrade which I hadn't encountered previously when I tested on a lower environment because at _that_ time, eksctl upgraded the proxy to 1.16.8 which was available. Additionally, doing both eksctl upgrade cluster and upgrade-kube-proxy in that lower environment today shows "up-to-date" so there is no obvious version drift or way to reproduce in my other environment.

All in all not the most difficult problem to _fix_ but certainly seems very unintuitive that this situation would arise where I upgrade using this tool which ends up looking for an image that doesn't exist.

We still update these services without eksctl because the synchronisation between ekstl hard-coded versions and AWS is not guaranteed. So we prefer to verify the new versions are actually available at upgrade time and then update.

@M00nF1sh's earlier suggestions sound like they could fix this. And there are also various Containers Roadmap requests for API support to discover correct service versions.

https://github.com/aws/containers-roadmap/issues/933

https://github.com/aws/containers-roadmap/issues/982#issuecomment-659008694

https://github.com/aws/containers-roadmap/issues/744

It would also be nice if there was upgrade advice for Calico, right there are install instructions but no upgrade advice, There are manifests to match the latest CNI patch version, but it is not clear if it is safe to apply those manifests over the top. And it can be hard to compare or uninstall previous manifests because older patch version are removed; similar to the above problem, only one patch version of each minor version is retained.

Hi @marccarre ,

It is like a very old issue. I do see that we are mapping the kubeproxy to control plane version.
(https://github.com/weaveworks/eksctl/blob/master/pkg/ctl/utils/update_kube_proxy.go)

This is kind of a blocker to many customers, can we like put the kube-proxy image version as constant for particular major version of eks and try to do update accordingly till this data is available via api from aws side. Then over the course of release just updating the image value of kube-proxy.

The second approach would be providing an additional flag where the user can specify the image version explicitly to upgrade to and put it in help doc show that user check aws document before mentioning the image version

I would like to work on the PR. Kindly tell us as to which approach would be suitable.

@smrutiranjantripathy et al., sorry, I will defer to @martina-if, @cPu1 and @michaelbeaumont, as I'm no longer working on eksctl.

This issue is still present with eks 0.25.0 and k8s version v1.17.9-eks-4c6976. AWS only provides an kube-proxy image in the version v1.17.7. However eksclt tries to deploy kube-proxy v1.17.9.

Please update to 0.26-rc.1, 0.26 will be released tomorrow/thursday

@michaelbeaumont , which commit/PR in 0.26-rc.1 addresses this? I couldn't find it.

Providing a "latest" for each minor version as @M00nF1sh suggested above would be very beneficial for us too - at the moment we have to hardcode it and it is not the best way to keep our clusters' components up to date

Image name is changed from 1.16.13 - https://docs.aws.amazon.com/eks/latest/userguide/update-cluster.html. Therefore it would be

602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/kube-proxy:v1.16.13-eksbuild.1

@res0nat0r I had exactly the same problem. What solved the problem for me:

  • Upgrading the eksctl version and running eksctl utils update-kube-proxy --name MyClusterName --approve
  • [not sure if that's applicable for you]: As per the AWS documention => Remove the --resource-container="" flag from your kube-proxy DaemonSet, if your cluster was originally deployed with Kubernetes 1.11 or earlier or use a kube-proxy configuration file (recommended). To determine whether your current version of kube-proxy has the flag, enter the following command.
Was this page helpful?
0 / 5 - 0 ratings

Related issues

albertmichaelj picture albertmichaelj  ·  3Comments

KevinMonk picture KevinMonk  ·  3Comments

jcleal picture jcleal  ·  4Comments

arun-gupta picture arun-gupta  ·  3Comments

errordeveloper picture errordeveloper  ·  4Comments