Kubespray: Error pulling image from private docker registry in kubernetes cluster

Created on 17 Apr 2019 · 12Comments · Source: kubernetes-sigs/kubespray

Using terraform contribution in kubespray project.

I enabled Private Docker Registry in Kubernetes Addon in my configuration file.

I was able to setup port-forward:
kubectl port-forward --namespace kube-system $POD 5000:5000 &

And push my local image to the repository as: localhost:5000/my-image

But we a reference the image in my pod file as localhost:5000/my-image kubernetes fails to pull the image from the private docker registry in my cluster due to connection being refused.

Warning Failed 12s kubelet, kubernetes-k8s-worker0 Failed to pull image "localhost:5000/nginx-my:latest": rpc error: code = Unknown desc = Error response from daemon: Get http://localhost:5000/v2/: dial tcp 127.0.0.1:5000: connect: connection refused

Also ssh to worker node but still can't reach localhost:5000 e.g. curl -k localhost:5000/v2/_catalog

Any ideas?

Environment:

Cloud provider or hardware configuration:
AWS (using terraform contribution in kubespray repository)

OS (printf "$(uname -srm)\n$(cat /etc/os-release)\n"):
centos on all VMs
Version of Ansible (ansible --version):

Kubespray version (commit) (git rev-parse --short HEAD):
master

Network plugin used:
calico

Copy of your inventory file:

Command used to invoke ansible:
ansible-playbook -i ./inventory/mycluster/hosts.ini ./cluster.yml -e ansible_user=centos -e bootstrap_os=centos -e kube_network_plugin=calico -b --become-user=root --flush-cache -e ansible_ssh_private_key_file=k8s_ssh_key.pem

Output of ansible run:

Anything else do we need to know:
Name: sample
Namespace: default
Priority: 0
PriorityClassName:
Node: kubernetes-k8s-worker0/10.250.193.128
Start Time: Thu, 18 Apr 2019 02:24:12 +0500
Labels: app=sample
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"labels":{"app":"sample"},"name":"sample","namespace":"default"},"spec":{"con...
Status: Pending
IP: 10.233.85.18
Containers:
sample:
Container ID:
Image: localhost:5000/nginx-my:latest
Image ID:
Port: 80/TCP
Host Port: 0/TCP
State: Waiting
Reason: ErrImagePull
Ready: False
Restart Count: 0
Environment:
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-vsv52 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-vsv52:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-vsv52
Optional: false
QoS Class: BestEffort
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 13s default-scheduler Successfully assigned default/sample to kubernetes-k8s-worker0
Normal Pulling 12s kubelet, kubernetes-k8s-worker0 pulling image "localhost:5000/nginx-my:latest"
Warning Failed 12s kubelet, kubernetes-k8s-worker0 Failed to pull image "localhost:5000/nginx-my:latest": rpc error: code = Unknown desc = Error response from daemon: Get http://localhost:5000/v2/: dial tcp 127.0.0.1:5000: connect: connection refused
Warning Failed 12s kubelet, kubernetes-k8s-worker0 Error: ErrImagePull
Normal BackOff 11s kubelet, kubernetes-k8s-worker0 Back-off pulling image "localhost:5000/nginx-my:latest"
Warning Failed 11s kubelet, kubernetes-k8s-worker0 Error: ImagePullBackOff

kinbug lifecyclrotten

Source

habbas99

👍2

Most helpful comment

The proxy can't seem reach the registry service. Here are the logs for the proxy on one of the worker nodes:

waiting for registry (registry.kube-system.svc.cluster.local:5000) to come online ...

I'm able to ping registry.kube-system.svc.cluster.local from the registry-proxy pod and it resolves to the correct service IP but curl to registry.kube-system.svc.cluster.local:5000 fails.

habbas99 on 26 Apr 2019

👍3

All 12 comments

Same problem here

marcecaro on 26 Apr 2019

The proxy can't seem reach the registry service. Here are the logs for the proxy on one of the worker nodes:

waiting for registry (registry.kube-system.svc.cluster.local:5000) to come online ...

I'm able to ping registry.kube-system.svc.cluster.local from the registry-proxy pod and it resolves to the correct service IP but curl to registry.kube-system.svc.cluster.local:5000 fails.

habbas99 on 26 Apr 2019

👍3

I solve this issue adding in the registry-service

clusterIP: None

I also create PR #4786.

I think it is some conflict between the service registry and the hostPort from registry-proxy.
But I don't know what is the exact problem occurred.

ribeiromiranda on 19 May 2019

🎉2

@mirandawork Thanks a million !
Not sure if that's the best way but after debugging the problem for 3 hours your solution worked for me.

heron182 on 23 May 2019

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 21 Aug 2019

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot on 20 Sep 2019

I'm facing the same problem here, any one cares to explain why is this happening and what the cleanest way to fix it? (Is making the service headless really the best/solution)?

yelhouti on 23 Sep 2019

/remove-lifecycle rotten

yelhouti on 23 Sep 2019

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 22 Dec 2019

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot on 21 Jan 2020

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

fejta-bot on 20 Feb 2020

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.