Kops: Dockerhub rate limit caused Node NotReady

Created on 17 Nov 2020  路  12Comments  路  Source: kubernetes/kops

1. Describe IN DETAIL the feature/behavior/change you would like to see.
Since 11/16, 12:50pm Pacific,

Kops node cannot initialized due to DockerHub Increase Rate limit

Warning  Failed       34m (x4 over 35m)      kubelet            Failed to pull image "calico/cni:v3.15.3": rpc error: code = Unknown desc = Error response from daemon: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit

Some required system containers use the docker.io registry. However, It cannot configure the registry itself.

Here is example of containers:

  • CNI: calico/node:v3.15.3, calico/cni:v3.15.3
  • ~kube-apiserver-healthcheck kope/kube-apiserver-healthcheck in kube-apiserver sidecar~ Move to gcr.io in kops 1.19
  • ~kops-controller: kope/kops-controller~ Move to gcr.io in kops 1.19
  • etcd-manager: kopeio/etcd-manager

2. Feel free to provide a design supporting your feature request.
Allow configuration registry for system critical component
or
Move to another registry like gcr.io ghcr.io... etc

Related with https://github.com/kubernetes/kops/pull/10204

All 12 comments

You can configure a different registry through spec.assets.containerRegistry and copy images there through the assets phase.

You can configure a different registry through spec.assets.containerRegistry and copy images there through the assets phase.

This type of thing (and the linked PR https://github.com/kubernetes/kops/pull/10204) would be helpful to us as well. We use the Weave network and the image is hardcoded there also.

We looked at the spec.assets.containerRegistry and spec.assets.containerProxy however they seem to map all images to this one value, regardless of original repository. My org caches each repository (DockerHub, gcr.io, etc) via Artifactory to different URLs so it wasn't clear how these fields would be useful for us.

Are there examples for using them correctly? I checked out aws-china.md but it describes itself as "a naive, uncompleted attempt" and says "it hasn't been tested" with the Assets API.

I haven't figured out the use case for containerProxy.

There is no support for sharding the containerRegistry like you ask and that does not strike me as being a widespread (or particularly useful) requirement.

Documentation for assets phase is indeed lacking.

There is no support for sharding the containerRegistry like you ask and that does not strike me as being a widespread (or particularly useful) requirement.

Well, I can't comment on whether or not the request is widespread but it would certainly be useful to my organization.

It is true from an operational perspactive that we want to reduce the number of points of management as much as possible.
From Kops 1.19 some components are moved to gcr, but etcd-manager and cni use dockerhub.

In this case, I will have to manage all the images that kops need for bootstrapping.

As mentioned above, the operator only wants to manage a few images, in order to reduce the management point.

Currently, problems are occurring when dozens of nodes or more boot up in a short time due to auto scaling or maintenance reasons.

By the way, I am thinking of injecting all Dockerhub Credentials right now.

I am unfortunately unable to deduce the utility of such sharding of the images. It strikes me as being an entirely cosmetic concern. Perhaps you could expound on the disadvantages of putting them in the same Artifactory repository?

I am unfortunately unable to deduce the utility of such sharding of the images. It strikes me as being an entirely cosmetic concern. Perhaps you could expound on the disadvantages of putting them in the same Artifactory repository?

The most immediate disadvantage is that Artifactory is managed by a separate team at my company and it's how they decided to do it. They are reluctant to put a bunch of remote repositories behind a single virtual repository due to the potential for naming collisions between the underlying remote repos.

When the assets phase copies the images into the containerRegistry it adjusts the image names to avoid naming collisions.

I'm thinking we may be talking about different things. My org blocks access to most of the Internet and requires us to go through Artifactory. We have a one to one mapping for Docker repositories defined in Artifactory: we access DockerHub via dockerhub.my-internal-repo.local, k8s.gcr.io via k8s-gcr-io.my-internal-repo.local, etc. The description of spec.assets.containerProxy says that "if, for example, the containerProxy is set to proxy.example.com, the image k8s.gcr.io/kube-apiserver will be pulled from proxy.example.com/kube-apiserver instead" which is exactly what we need. However, since this key only accepts a single string, all repos, DockerHub, gcr.io, k8s.gcr.io, etc, get mapped to the same containerProxy repo. Ideally, we'd like to be able to specify something like this:

containerProxy:
  docker.io: dockerhub.my-internal-repo.local
  k8s.gcr.io: k8s-gcr-io.my-internal-repo.local
  ...

with containers being pulled from our internal Docker repositories instead. If we could arbitrarily configure images like this ticket suggested, that would help us accomplish the same thing.

It sounds like the Assets API is a different process, where all the images are first uploaded to containerRegistry and then manifests are remapped to pull from there. I'll look for some more information on working with the assets API to see if it will work for us and check out the slack channels for help if necessary. I don't want to derail this ticket further, but thanks for taking the time to talk through this.

I ran into this. Fairly frustrating, but thinking about what @DingGGu said about injecting credentials, I was able to work around this because I was able to ssh into each node. I was having issues with calico getting pulled down.

  1. kubectl get pods -n kube-system -o wide to see which pod is failing and on which node
  2. kubectl describe pod calico-aaaaa -n kube-system to find which image pull is failing
  3. ssh NODE
  4. sudo docker login and enter credentials
  5. sudo docker pull calico/node:XXX or whichever image is failing to pull
  6. repeat.

Definitely not a long term solution.

I ran into this. Fairly frustrating, but thinking about what @DingGGu said about injecting credentials, I was able to work around this because I was able to ssh into each node. I was having issues with calico getting pulled down.

~Using Kubelet docker credential with kOps fileAssets will helpful to automation.~

Follow the document https://github.com/kubernetes/kops/blob/master/docs/cli/kops_create_secret_dockerconfig.md

Kops provide dockerconfig with kops secret

I ran into this. Fairly frustrating, but thinking about what @DingGGu said about injecting credentials, I was able to work around this because I was able to ssh into each node. I was having issues with calico getting pulled down.

1. `kubectl get pods -n kube-system -o wide` to see which pod is failing and on which node

2. `kubectl describe pod calico-aaaaa -n kube-system` to find which image pull is failing

3. `ssh NODE`

4. `sudo docker login` and enter credentials

5. `sudo docker pull calico/node:XXX` or whichever image is failing to pull

6. repeat.

Definitely not a long term solution.

You can configure docker login to dockerhub in all nodes with kops:

kops create secret dockerconfig -f config.json

config.json |

{
        "auths": {
                "https://index.docker.io/v1/": {
                        "auth": "******"
                }
        }
}

you can force the rolling update or wait new nodes are added with the new config:
kops rolling-update cluster --instance-group nodes --yes --force

Also we configured a mirror registry internal, that points to our Artifactory, following the docker recommendation:
https://docs.docker.com/registry/recipes/mirror/

Was this page helpful?
0 / 5 - 0 ratings