Kops: Dockerhub rate limit caused Node NotReady

Created on 17 Nov 2020 · 12Comments · Source: kubernetes/kops

1. Describe IN DETAIL the feature/behavior/change you would like to see.
Since 11/16, 12:50pm Pacific,

Kops node cannot initialized due to DockerHub Increase Rate limit

Warning  Failed       34m (x4 over 35m)      kubelet            Failed to pull image "calico/cni:v3.15.3": rpc error: code = Unknown desc = Error response from daemon: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit

Some required system containers use the docker.io registry. However, It cannot configure the registry itself.

Here is example of containers:

CNI: calico/node:v3.15.3, calico/cni:v3.15.3
~kube-apiserver-healthcheck kope/kube-apiserver-healthcheck in kube-apiserver sidecar~ Move to gcr.io in kops 1.19
~kops-controller: kope/kops-controller~ Move to gcr.io in kops 1.19
etcd-manager: kopeio/etcd-manager

2. Feel free to provide a design supporting your feature request.
Allow configuration registry for system critical component
or
Move to another registry like gcr.io ghcr.io... etc

Source

DingGGu

All 12 comments

You can configure a different registry through spec.assets.containerRegistry and copy images there through the assets phase.

johngmyers on 17 Nov 2020

You can configure a different registry through spec.assets.containerRegistry and copy images there through the assets phase.

This type of thing (and the linked PR https://github.com/kubernetes/kops/pull/10204) would be helpful to us as well. We use the Weave network and the image is hardcoded there also.

We looked at the spec.assets.containerRegistry and spec.assets.containerProxy however they seem to map all images to this one value, regardless of original repository. My org caches each repository (DockerHub, gcr.io, etc) via Artifactory to different URLs so it wasn't clear how these fields would be useful for us.

Are there examples for using them correctly? I checked out aws-china.md but it describes itself as "a naive, uncompleted attempt" and says "it hasn't been tested" with the Assets API.

d-m on 18 Nov 2020

I haven't figured out the use case for containerProxy.

There is no support for sharding the containerRegistry like you ask and that does not strike me as being a widespread (or particularly useful) requirement.

Documentation for assets phase is indeed lacking.

johngmyers on 18 Nov 2020

There is no support for sharding the containerRegistry like you ask and that does not strike me as being a widespread (or particularly useful) requirement.

Well, I can't comment on whether or not the request is widespread but it would certainly be useful to my organization.

d-m on 18 Nov 2020

👍1

It is true from an operational perspactive that we want to reduce the number of points of management as much as possible.
From Kops 1.19 some components are moved to gcr, but etcd-manager and cni use dockerhub.

In this case, I will have to manage all the images that kops need for bootstrapping.

As mentioned above, the operator only wants to manage a few images, in order to reduce the management point.

Currently, problems are occurring when dozens of nodes or more boot up in a short time due to auto scaling or maintenance reasons.

By the way, I am thinking of injecting all Dockerhub Credentials right now.

DingGGu on 18 Nov 2020

I am unfortunately unable to deduce the utility of such sharding of the images. It strikes me as being an entirely cosmetic concern. Perhaps you could expound on the disadvantages of putting them in the same Artifactory repository?

johngmyers on 19 Nov 2020

I am unfortunately unable to deduce the utility of such sharding of the images. It strikes me as being an entirely cosmetic concern. Perhaps you could expound on the disadvantages of putting them in the same Artifactory repository?

The most immediate disadvantage is that Artifactory is managed by a separate team at my company and it's how they decided to do it. They are reluctant to put a bunch of remote repositories behind a single virtual repository due to the potential for naming collisions between the underlying remote repos.

d-m on 19 Nov 2020

When the assets phase copies the images into the containerRegistry it adjusts the image names to avoid naming collisions.

johngmyers on 19 Nov 2020

I'm thinking we may be talking about different things. My org blocks access to most of the Internet and requires us to go through Artifactory. We have a one to one mapping for Docker repositories defined in Artifactory: we access DockerHub via dockerhub.my-internal-repo.local, k8s.gcr.io via k8s-gcr-io.my-internal-repo.local, etc. The description of spec.assets.containerProxy says that "if, for example, the containerProxy is set to proxy.example.com, the image k8s.gcr.io/kube-apiserver will be pulled from proxy.example.com/kube-apiserver instead" which is exactly what we need. However, since this key only accepts a single string, all repos, DockerHub, gcr.io, k8s.gcr.io, etc, get mapped to the same containerProxy repo. Ideally, we'd like to be able to specify something like this:

containerProxy:
  docker.io: dockerhub.my-internal-repo.local
  k8s.gcr.io: k8s-gcr-io.my-internal-repo.local
  ...

with containers being pulled from our internal Docker repositories instead. If we could arbitrarily configure images like this ticket suggested, that would help us accomplish the same thing.

It sounds like the Assets API is a different process, where all the images are first uploaded to containerRegistry and then manifests are remapped to pull from there. I'll look for some more information on working with the assets API to see if it will work for us and check out the slack channels for help if necessary. I don't want to derail this ticket further, but thanks for taking the time to talk through this.

d-m on 19 Nov 2020

👍1

I ran into this. Fairly frustrating, but thinking about what @DingGGu said about injecting credentials, I was able to work around this because I was able to ssh into each node. I was having issues with calico getting pulled down.

kubectl get pods -n kube-system -o wide to see which pod is failing and on which node
kubectl describe pod calico-aaaaa -n kube-system to find which image pull is failing
ssh NODE
sudo docker login and enter credentials
sudo docker pull calico/node:XXX or whichever image is failing to pull
repeat.

Definitely not a long term solution.

isaachui on 20 Nov 2020

I ran into this. Fairly frustrating, but thinking about what @DingGGu said about injecting credentials, I was able to work around this because I was able to ssh into each node. I was having issues with calico getting pulled down.

~Using Kubelet docker credential with kOps fileAssets will helpful to automation.~

Follow the document https://github.com/kubernetes/kops/blob/master/docs/cli/kops_create_secret_dockerconfig.md

Kops provide dockerconfig with kops secret

DingGGu on 20 Nov 2020

👍1

I ran into this. Fairly frustrating, but thinking about what @DingGGu said about injecting credentials, I was able to work around this because I was able to ssh into each node. I was having issues with calico getting pulled down.
1. `kubectl get pods -n kube-system -o wide` to see which pod is failing and on which node

2. `kubectl describe pod calico-aaaaa -n kube-system` to find which image pull is failing

3. `ssh NODE`

4. `sudo docker login` and enter credentials

5. `sudo docker pull calico/node:XXX` or whichever image is failing to pull

6. repeat.
Definitely not a long term solution.

You can configure docker login to dockerhub in all nodes with kops:

kops create secret dockerconfig -f config.json

config.json |

{
        "auths": {
                "https://index.docker.io/v1/": {
                        "auth": "******"
                }
        }
}

you can force the rolling update or wait new nodes are added with the new config:
kops rolling-update cluster --instance-group nodes --yes --force

Also we configured a mirror registry internal, that points to our Artifactory, following the docker recommendation:
https://docs.docker.com/registry/recipes/mirror/

next-luisalbertoleon on 22 Nov 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

"config" file does not exist when creating cluster

rot26 · 5Comments

CoreDNS externalCoreFile Parsing Invalid - Indentation

joshbranham · 3Comments

Cycle Nodes

owenmorgan · 3Comments

Kubectl top nodes not working with the metrics server

minasys · 3Comments

error: error validating "cluster-autoscaler.yml": error validating data: found invalid field tolerations for v1.PodSpec; if you choose to ignore these errors, turn validation off with --validate=false

endejoli · 4Comments