What happened?
Recently eksctl utils update-kube-proxy is failing when trying to upgrade to eks v1.12 and v1.13. It appears the upstream images are missing from the AWS docker repo. I'm not sure if this is a temporary issue or for some reason these images have been deprecated and the internal code is still referencing something out of date.
What you expected to happen?
eksctl utils update-kube-proxy should not throw any ErrImagePull errors
How to reproduce it?
On a cluster booted with v1.11 after upgrading the cluster, and attaching a new nodegroup, run the kube-proxy upgrade command and see if fail.
Anything else we need to know?
Here is the relevant describe output when trying to upgrade kube-proxy to v1.12:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 21s default-scheduler Successfully assigned kube-system/kube-proxy-9jkl2 to ip-10-146-70-156.us-east-2.compute.internal
Normal BackOff 20s kubelet, ip-10-146-70-156.us-east-2.compute.internal Back-off pulling image "602401143452.dkr.ecr.us-east-2.amazonaws.com/eks/kube-proxy:v1.12.10"
Warning Failed 20s kubelet, ip-10-146-70-156.us-east-2.compute.internal Error: ImagePullBackOff
Normal Pulling 9s (x2 over 20s) kubelet, ip-10-146-70-156.us-east-2.compute.internal pulling image "602401143452.dkr.ecr.us-east-2.amazonaws.com/eks/kube-proxy:v1.12.10"
Warning Failed 9s (x2 over 20s) kubelet, ip-10-146-70-156.us-east-2.compute.internal Failed to pull image "602401143452.dkr.ecr.us-east-2.amazonaws.com/eks/kube-proxy:v1.12.10": rpc error: code = Unknown desc = Error response from daemon: manifest for 602401143452.dkr.ecr.us-east-2.amazonaws.com/eks/kube-proxy:v1.12.10 not found
Warning Failed 9s (x2 over 20s) kubelet, ip-10-146-70-156.us-east-2.compute.internal Error: ErrImagePull
Here is the output for v1.13:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 47s default-scheduler Successfully assigned kube-system/kube-proxy-qz8sn to ip-10-146-70-156.us-east-2.compute.internal
Normal BackOff 21s (x2 over 47s) kubelet, ip-10-146-70-156.us-east-2.compute.internal Back-off pulling image "602401143452.dkr.ecr.us-east-2.amazonaws.com/eks/kube-proxy:v1.13.8"
Warning Failed 21s (x2 over 47s) kubelet, ip-10-146-70-156.us-east-2.compute.internal Error: ImagePullBackOff
Normal Pulling 8s (x3 over 47s) kubelet, ip-10-146-70-156.us-east-2.compute.internal pulling image "602401143452.dkr.ecr.us-east-2.amazonaws.com/eks/kube-proxy:v1.13.8"
Warning Failed 8s (x3 over 47s) kubelet, ip-10-146-70-156.us-east-2.compute.internal Failed to pull image "602401143452.dkr.ecr.us-east-2.amazonaws.com/eks/kube-proxy:v1.13.8": rpc error: code = Unknown desc = Error response from daemon: manifest for 602401143452.dkr.ecr.us-east-2.amazonaws.com/eks/kube-proxy:v1.13.8 not found
Warning Failed 8s (x3 over 47s) kubelet, ip-10-146-70-156.us-east-2.compute.internal Error: ErrImagePull
$ k get po -n kube-system
NAME READY STATUS RESTARTS AGE
aws-node-dbkfh 1/1 Running 0 3h41m
aws-node-l9kp4 1/1 Running 0 3h41m
coredns-5c466f5779-lqmq8 1/1 Running 0 3h13m
coredns-5c466f5779-xw92g 1/1 Running 0 3h13m
kube-proxy-4mrmc 1/1 Running 0 3h41m
kube-proxy-kqjsl 0/1 ImagePullBackOff 0 8m29s
md5-e43a7ac8f48e8dde0b989e04f5ad95d7
$ eksctl version
[ℹ] version.Info{BuiltAt:"", GitCommit:"", GitTag:"0.2.1"}
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.3", GitCommit:"5e53fd6bc17c0dec8434817e69b04a25d8ae0ff0", GitTreeState:"clean", BuildDate:"2019-06-06T01:44:30Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.8-eks-a977ba", GitCommit:"a977bab148535ec195f12edc8720913c7b943f9c", GitTreeState:"clean", BuildDate:"2019-07-29T20:47:04Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}
Logs
eksctl get clusters -v 4 output: https://gist.github.com/res0nat0r/ab1dd174441d2c3684921614c0243993
It looks like only:
eks/kube-proxy:v1.12.6eks/kube-proxy:v1.13.7are available as per the table on this EKS documentation page:
$ $(aws ecr get-login --no-include-email --registry-ids 602401143452)
WARNING! Using --password via the CLI is insecure. Use --password-stdin.
Login Succeeded
$ for minor in $(seq 10) ; do docker pull 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/kube-proxy:v1.12.${minor} ; done
Error response from daemon: manifest for 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/kube-proxy:v1.12.1 not found: manifest unknown: Requested image not found
Error response from daemon: manifest for 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/kube-proxy:v1.12.2 not found: manifest unknown: Requested image not found
Error response from daemon: manifest for 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/kube-proxy:v1.12.3 not found: manifest unknown: Requested image not found
Error response from daemon: manifest for 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/kube-proxy:v1.12.4 not found: manifest unknown: Requested image not found
Error response from daemon: manifest for 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/kube-proxy:v1.12.5 not found: manifest unknown: Requested image not found
v1.12.6: Pulling from eks/kube-proxy
7ba0c30ce37c: Pull complete
99d8672f79df: Pull complete
d30c36e7c40b: Pull complete
Digest: sha256:c391670199576ebb770d3851f1e1e7b27bb9655940414b5b7cc4b33db156e066
Status: Downloaded newer image for 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/kube-proxy:v1.12.6
602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/kube-proxy:v1.12.6
Error response from daemon: manifest for 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/kube-proxy:v1.12.7 not found: manifest unknown: Requested image not found
Error response from daemon: manifest for 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/kube-proxy:v1.12.8 not found: manifest unknown: Requested image not found
Error response from daemon: manifest for 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/kube-proxy:v1.12.9 not found: manifest unknown: Requested image not found
Error response from daemon: manifest for 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/kube-proxy:v1.12.10 not found: manifest unknown: Requested image not found
$ for minor in $(seq 8) ; do docker pull 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/kube-proxy:v1.13.${minor} ; done
Error response from daemon: manifest for 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/kube-proxy:v1.13.1 not found: manifest unknown: Requested image not found
Error response from daemon: manifest for 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/kube-proxy:v1.13.2 not found: manifest unknown: Requested image not found
Error response from daemon: manifest for 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/kube-proxy:v1.13.3 not found: manifest unknown: Requested image not found
Error response from daemon: manifest for 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/kube-proxy:v1.13.4 not found: manifest unknown: Requested image not found
Error response from daemon: manifest for 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/kube-proxy:v1.13.5 not found: manifest unknown: Requested image not found
Error response from daemon: manifest for 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/kube-proxy:v1.13.6 not found: manifest unknown: Requested image not found
v1.13.7: Pulling from eks/kube-proxy
6cf6a0b0da0d: Pull complete
8e1ce322a1d9: Pull complete
dcc78b3296ee: Pull complete
Digest: sha256:7bd8569a3c32472019ef819854b3a4605d91310fcac9d36aff85063bbe699f8e
Status: Downloaded newer image for 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/kube-proxy:v1.13.7
602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/kube-proxy:v1.13.7
Error response from daemon: manifest for 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/kube-proxy:v1.13.8 not found: manifest unknown: Requested image not found
@mhausenblas, @M00nF1sh (given I noticed your comment here https://github.com/weaveworks/eksctl/pull/1001#issuecomment-516617408), would you know (or would be able to point to someone who would know):
kube-proxy as opposed to one for each Kubernetes version?There is no particular reason that we only keep one version of kube-proxy.
I'll discuss with team to either:
kube-proxy: 1.11Thanks a lot @M00nF1sh, this would indeed be extremely helpful! ✨
I've not tested an eks upgrade in a week or so to see if this is still an ongoing problem, but I assume until this issue is resolved all eksctl users cluster upgrades are currently broken right now yes?
AFAIK only update-kube-proxy is broken right now (I updated a cluster yesterday, and other eksctl update stuff worked fine)
The best way to fix this for now
kubectl set image daemonset.apps/kube-proxy \
-n kube-system \
kube-proxy=602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/kube-proxy:v1.13.7
Reference: https://docs.aws.amazon.com/eks/latest/userguide/update-cluster.html
Hello, just wanted to add that I ran into this issue today while upgrading EKS to use kubernetes 1.16. Everything mentioned above followed my issue exactly with just different version numbers.
The kube-proxy image mismatch was:
expected 602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/kube-proxy:v1.16.13
and I followed @austinbv 's suggestion (after manually reading through the docs link) to set the tag to v1.16.12 👍
What was really unfortunate is that I encountered this problem during a production upgrade which I hadn't encountered previously when I tested on a lower environment because at _that_ time, eksctl upgraded the proxy to 1.16.8 which was available. Additionally, doing both eksctl upgrade cluster and upgrade-kube-proxy in that lower environment today shows "up-to-date" so there is no obvious version drift or way to reproduce in my other environment.
All in all not the most difficult problem to _fix_ but certainly seems very unintuitive that this situation would arise where I upgrade using this tool which ends up looking for an image that doesn't exist.
We still update these services without eksctl because the synchronisation between ekstl hard-coded versions and AWS is not guaranteed. So we prefer to verify the new versions are actually available at upgrade time and then update.
@M00nF1sh's earlier suggestions sound like they could fix this. And there are also various Containers Roadmap requests for API support to discover correct service versions.
https://github.com/aws/containers-roadmap/issues/933
https://github.com/aws/containers-roadmap/issues/982#issuecomment-659008694
https://github.com/aws/containers-roadmap/issues/744
It would also be nice if there was upgrade advice for Calico, right there are install instructions but no upgrade advice, There are manifests to match the latest CNI patch version, but it is not clear if it is safe to apply those manifests over the top. And it can be hard to compare or uninstall previous manifests because older patch version are removed; similar to the above problem, only one patch version of each minor version is retained.
Hi @marccarre ,
It is like a very old issue. I do see that we are mapping the kubeproxy to control plane version.
(https://github.com/weaveworks/eksctl/blob/master/pkg/ctl/utils/update_kube_proxy.go)
This is kind of a blocker to many customers, can we like put the kube-proxy image version as constant for particular major version of eks and try to do update accordingly till this data is available via api from aws side. Then over the course of release just updating the image value of kube-proxy.
The second approach would be providing an additional flag where the user can specify the image version explicitly to upgrade to and put it in help doc show that user check aws document before mentioning the image version
I would like to work on the PR. Kindly tell us as to which approach would be suitable.
@smrutiranjantripathy et al., sorry, I will defer to @martina-if, @cPu1 and @michaelbeaumont, as I'm no longer working on eksctl.
This issue is still present with eks 0.25.0 and k8s version v1.17.9-eks-4c6976. AWS only provides an kube-proxy image in the version v1.17.7. However eksclt tries to deploy kube-proxy v1.17.9.
Please update to 0.26-rc.1, 0.26 will be released tomorrow/thursday
@michaelbeaumont , which commit/PR in 0.26-rc.1 addresses this? I couldn't find it.
https://github.com/weaveworks/eksctl/commit/6a2279490933ac455ed7f0a2cbb71db25aca7e58 (it's the new -eksbuild.1 suffix)
Providing a "latest" for each minor version as @M00nF1sh suggested above would be very beneficial for us too - at the moment we have to hardcode it and it is not the best way to keep our clusters' components up to date
Image name is changed from 1.16.13 - https://docs.aws.amazon.com/eks/latest/userguide/update-cluster.html. Therefore it would be
602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/kube-proxy:v1.16.13-eksbuild.1
@res0nat0r I had exactly the same problem. What solved the problem for me:
eksctl utils update-kube-proxy --name MyClusterName --approveRemove the --resource-container="" flag from your kube-proxy DaemonSet, if your cluster was originally deployed with Kubernetes 1.11 or earlier or use a kube-proxy configuration file (recommended). To determine whether your current version of kube-proxy has the flag, enter the following command.
Most helpful comment
There is no particular reason that we only keep one version of kube-proxy.
I'll discuss with team to either:
kube-proxy: 1.11