Kubernetes: Liveness/Readiness probes are failing with getsockopt: connection refused

Created on 15 Apr 2018  路  38Comments  路  Source: kubernetes/kubernetes

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug

What happened: Liveness/Readiness probes are failing so frequently and the failures are also inconsistent. Some of the pods in the same deployment are getting stuck in the crashLoopBackOff

Back-off restarting failed container
Error syncing pod, skipping: failed to "StartContainer" for "web" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=web pod=web-controller-3987916996-qtfg7_micro-services(a913f25b-400a-11e8-8a2a-0252b8c4655e)"
Liveness probe failed: Get http://100.96.11.194:5000/: dial tcp 100.96.11.194:5000: getsockopt: connection refused
Failed to start container with id 0d47047e3e7a640ad6b4a6a8664bdb01a3601c95518c9e60bdb4966533fe7e6d with error: rpc error: code = 2 desc = failed to create symbolic link "/var/log/pods/a913f25b-400a-11e8-8a2a-0252b8c4655e/web_23.log" to the container log file "/var/lib/docker/containers/0d47047e3e7a640ad6b4a6a8664bdb01a3601c95518c9e60bdb4966533fe7e6d/0d47047e3e7a640ad6b4a6a8664bdb01a3601c95518c9e60bdb4966533fe7e6d-json.log" for container "0d47047e3e7a640ad6b4a6a8664bdb01a3601c95518c9e60bdb4966533fe7e6d": symlink /var/lib/docker/containers/0d47047e3e7a640ad6b4a6a8664bdb01a3601c95518c9e60bdb4966533fe7e6d/0d47047e3e7a640ad6b4a6a8664bdb01a3601c95518c9e60bdb4966533fe7e6d-json.log /var/log/pods/a913f25b-400a-11e8-8a2a-0252b8c4655e/web_23.log: file exists

And few more errors from other pods!

Error syncing pod, skipping: failed to "CreatePodSandbox" for "work-1132229878-zk00f_micro-services(897dd216-41f4-11e8-8a2a-0252b8c4655e)" 
with CreatePodSandboxError: "CreatePodSandbox for pod \"work-1132229878-zk00f_micro-services(897dd216-41f4-11e8-8a2a-0252b8c4655e)\" 
failed: rpc error: code = 2 desc = NetworkPlugin kubenet failed to set up pod \"work-1132229878-zk00f_micro-services\" 
network: Error adding container to network: failed to connect \"vethdaa54c24\" to bridge cbr0: exchange full"

Here is how my livenessProbe config

        "livenessProbe": {
          "httpGet": {
            "path": "/",
            "port": 5000,
            "scheme": "HTTP"
          },
          "initialDelaySeconds": 60,
          "timeoutSeconds": 10,
          "periodSeconds": 10,
          "successThreshold": 1,
          "failureThreshold": 3
        }

What you expected to happen:
Health checks to pass if the app is running.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:
Its a NodeJS app and its listening on port 5000 and its also exposed in dockerfile.

Environment:

  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.3", GitCommit:"f0efb3cb883751c5ffdbe6d515f3cb4fbe7b7acd", GitTreeState:"clean", BuildDate:"2017-11-09T07:26:38Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.7", GitCommit:"095136c3078ccf887b9034b7ce598a0a1faff769", GitTreeState:"clean", BuildDate:"2017-07-05T16:40:42Z", GoVersion:"go1.7.6", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration: AWS
  • OS (e.g. from /etc/os-release): Ubuntu 16.04 LTS
  • Kernel (e.g. uname -a):
Linux ip-172-20-54-255 4.4.78-k8s #1 SMP Fri Jul 28 01:28:39 UTC 2017 x86_64 GNU/Linux
  • Install tools: kops
  • Others:
kinbug sinetwork triagunresolved

Most helpful comment

Hello all. We were experiencing this issue as well.

Using:

kubectl describe pod <pod-name>

We also found that the Exit code was:137 and the Reason: 'Error'.
This is the error code for both Liveness failure and Memory issues but we were fairly certain it was not due to Memory issues as we had more than enough Memory allocated and when we find error issues kill the pod we get the correct reason 'OOMKilled'.

Anyway, We found that the issue occurred when we attempted to statically apply 250m per pod CPU in Lower Environments so to be Resource Efficient.

We run Spring Boot Applications which have a real heavy boot period. Because of this, the 0.25 cores we applied could not boot the App in time to start running the server and pass the health checks. And, we ultimately passed our Failure deadline before the Service was Ready.

I suggest that anyone seeing this issue either could maybe solve their problems by doing 1 of 2 things:

1) Meet the Race Condition
Allocate Higher Quantities of CPU to your Deployments so that the boot process is faster and the pod will be up in time for Liveness and Readiness checks.

2) Change the Race Condition
Set a longer Initial wait time on the Live and Readiness probe and, extend the failure deadline and the test interval . This should give you plenty of time for your Service to boot up and become Ready.

e.g

    readinessProbe:
      httpGet:
        scheme: http
        path: /health
        port: 8080
      initialDelaySeconds: 120
      timeoutSeconds: 3
      periodSeconds: 30
      successThreshold: 1
      failureThreshold: 5


    livenessProbe:
      httpGet:
        scheme: http
        path: /health
        port: 8080
      initialDelaySeconds: 120
      timeoutSeconds: 3
      periodSeconds: 30
      successThreshold: 1
      failureThreshold: 5

Obviously there could be other reasons for the probe failing and ending up with this type of Error but this is what solved it for us.

Hope this can help.

All 38 comments

/sig network
/sig aws
/kind bug

This is happening right now in my clusters with kubedns/nginx causing to terminate the rest of the pods.

What is this issue related to?

Yeah same issue on my side when I activate readiness probe... I got this issue "Readiness probe failed: Get getsockopt: connection refused" then when the service is available everything is green.

My probe's configuration:
livenessProbe: httpGet: path: / port: 80 initialDelaySeconds: 120 timeoutSeconds: 2 periodSeconds: 30 readinessProbe: httpGet: path: / port: 80 initialDelaySeconds: 10 timeoutSeconds: 1 periodSeconds: 5

I am also having this issue, its very intermittent

This is my config.

readinessProbe:
              httpGet:
                path: /health
                port: 8080
                scheme: HTTP
              initialDelaySeconds: 120
              timeoutSeconds: 30
          livenessProbe:
              httpGet:
                path: /health
                port: 8080
                scheme: HTTP
              initialDelaySeconds: 120
              timeoutSeconds: 30

Same here!

Same, with a basic config:

          livenessProbe:
            httpGet:
              path: /health
              port: 3000
            initialDelaySeconds: 30
            timeoutSeconds: 10
          readinessProbe:
            httpGet:
              path: /health
              port: 3000
            initialDelaySeconds: 30
            timeoutSeconds: 10

+1 for similar simple config

I'm also seeing this. New pods get stuck in ContainerCreating as well.

My case was solved by changing the binding from 127.0.0.1 to 0.0.0.0.

I'm seeing this as well.

Found the same on kube 1.8 with the above basic config

We have same issue on v1.7.5, intermittent probe failed cause by getsockopt: connection refused in a specific worker node, other pods on other nodes works perfectly, and go inside of container do curl http://localhost:8080/health return ok.

A snippet of kubelet log:

Jul 17 20:31:39 192-168-0-1-B28 kubelet: I0717 20:31:39.569363   23774 prober.go:106] Readiness probe for "zp-crm-rule-1362938709-4900s_default(92d60c48-89b2-11e8-8d0c-7e635641b742):zp-crm-rule" failed (failure): Get http://11.11.75.4:8080/health: dial tcp 11.11.75.4:8080: getsockopt: connection refused
Jul 17 20:31:49 192-168-0-1-B28 kubelet: I0717 20:31:49.569312   23774 prober.go:106] Readiness probe for "zp-crm-rule-1362938709-4900s_default(92d60c48-89b2-11e8-8d0c-7e635641b742):zp-crm-rule" failed (failure): Get http://11.11.75.4:8080/health: dial tcp 11.11.75.4:8080: getsockopt: connection refused
Jul 17 20:31:59 192-168-0-1-B28 kubelet: I0717 20:31:59.569324   23774 prober.go:106] Readiness probe for "zp-crm-rule-1362938709-4900s_default(92d60c48-89b2-11e8-8d0c-7e635641b742):zp-crm-rule" failed (failure): Get http://11.11.75.4:8080/health: dial tcp 11.11.75.4:8080: getsockopt: connection refused
Jul 17 20:32:09 192-168-0-1-B28 kubelet: I0717 20:32:09.569293   23774 prober.go:106] Readiness probe for "zp-crm-rule-1362938709-4900s_default(92d60c48-89b2-11e8-8d0c-7e635641b742):zp-crm-rule" failed (failure): Get http://11.11.75.4:8080/health: dial tcp 11.11.75.4:8080: getsockopt: connection refused
Jul 17 20:32:20 192-168-0-1-B28 kubelet: I0717 20:32:20.808546   23774 prober.go:113] Readiness probe for "zp-crm-rule-1362938709-4900s_default(92d60c48-89b2-11e8-8d0c-7e635641b742):zp-crm-rule" succeeded
Jul 17 20:32:39 192-168-0-1-B28 kubelet: I0717 20:32:39.638384   23774 prober.go:113] Readiness probe for "zp-crm-rule-1362938709-4900s_default(92d60c48-89b2-11e8-8d0c-7e635641b742):zp-crm-rule" succeeded
Jul 17 20:32:49 192-168-0-1-B28 kubelet: I0717 20:32:49.569171   23774 prober.go:106] Readiness probe for "zp-crm-rule-1362938709-4900s_default(92d60c48-89b2-11e8-8d0c-7e635641b742):zp-crm-rule" failed (failure): Get http://11.11.75.4:8080/health: dial tcp 11.11.75.4:8080: getsockopt: connection refused
Jul 17 20:32:59 192-168-0-1-B28 kubelet: I0717 20:32:59.577038   23774 prober.go:113] Readiness probe for "zp-crm-rule-1362938709-4900s_default(92d60c48-89b2-11e8-8d0c-7e635641b742):zp-crm-rule" succeeded
Jul 17 20:33:09 192-168-0-1-B28 kubelet: I0717 20:33:09.569288   23774 prober.go:106] Readiness probe for "zp-crm-rule-1362938709-4900s_default(92d60c48-89b2-11e8-8d0c-7e635641b742):zp-crm-rule" failed (failure): Get http://11.11.75.4:8080/health: dial tcp 11.11.75.4:8080: getsockopt: connection refused
Jul 17 20:33:19 192-168-0-1-B28 kubelet: I0717 20:33:19.575965   23774 prober.go:113] Readiness probe for "zp-crm-rule-1362938709-4900s_default(92d60c48-89b2-11e8-8d0c-7e635641b742):zp-crm-rule" succeeded
Jul 17 20:38:49 192-168-0-1-B28 kubelet: I0717 20:38:49.569077   23774 prober.go:106] Readiness probe for "zp-crm-rule-1362938709-4900s_default(92d60c48-89b2-11e8-8d0c-7e635641b742):zp-crm-rule" failed (failure): Get http://11.11.75.4:8080/health: dial tcp 11.11.75.4:8080: getsockopt: connection refused
Jul 17 20:38:59 192-168-0-1-B28 kubelet: I0717 20:38:59.569407   23774 prober.go:106] Readiness probe for "zp-crm-rule-1362938709-4900s_default(92d60c48-89b2-11e8-8d0c-7e635641b742):zp-crm-rule" failed (failure): Get http://11.11.75.4:8080/health: dial tcp 11.11.75.4:8080: getsockopt: connection refused
Jul 17 20:39:09 192-168-0-1-B28 kubelet: I0717 20:39:09.569431   23774 prober.go:106] Readiness probe for "zp-crm-rule-1362938709-4900s_default(92d60c48-89b2-11e8-8d0c-7e635641b742):zp-crm-rule" failed (failure): Get http://11.11.75.4:8080/health: dial tcp 11.11.75.4:8080: getsockopt: connection refused
Jul 17 20:39:20 192-168-0-1-B28 kubelet: I0717 20:39:20.728373   23774 prober.go:113] Readiness probe for "zp-crm-rule-1362938709-4900s_default(92d60c48-89b2-11e8-8d0c-7e635641b742):zp-crm-rule" succeeded
Jul 17 20:39:39 192-168-0-1-B28 kubelet: I0717 20:39:39.577240   23774 prober.go:113] Readiness probe for "zp-crm-rule-1362938709-4900s_default(92d60c48-89b2-11e8-8d0c-7e635641b742):zp-crm-rule" succeeded
Jul 17 20:39:49 192-168-0-1-B28 kubelet: I0717 20:39:49.577932   23774 prober.go:113] Readiness probe for "zp-crm-rule-1362938709-4900s_default(92d60c48-89b2-11e8-8d0c-7e635641b742):zp-crm-rule" succeeded
Jul 17 20:39:59 192-168-0-1-B28 kubelet: I0717 20:39:59.578145   23774 prober.go:113] Readiness probe for "zp-crm-rule-1362938709-4900s_default(92d60c48-89b2-11e8-8d0c-7e635641b742):zp-crm-rule" succeeded
Jul 17 20:40:09 192-168-0-1-B28 kubelet: I0717 20:40:09.569164   23774 prober.go:106] Readiness probe for "zp-crm-rule-1362938709-4900s_default(92d60c48-89b2-11e8-8d0c-7e635641b742):zp-crm-rule" failed (failure): Get http://11.11.75.4:8080/health: dial tcp 11.11.75.4:8080: getsockopt: connection refused
Jul 17 20:40:19 192-168-0-1-B28 kubelet: I0717 20:40:19.569277   23774 prober.go:106] Readiness probe for "zp-crm-rule-1362938709-4900s_default(92d60c48-89b2-11e8-8d0c-7e635641b742):zp-crm-rule" failed (failure): Get http://11.11.75.4:8080/health: dial tcp 11.11.75.4:8080: getsockopt: connection refused
Jul 17 20:40:39 192-168-0-1-B28 kubelet: I0717 20:40:39.569681   23774 prober.go:106] Readiness probe for "zp-crm-rule-1362938709-4900s_default(92d60c48-89b2-11e8-8d0c-7e635641b742):zp-crm-rule" failed (failure): Get http://11.11.75.4:8080/health: dial tcp 11.11.75.4:8080: getsockopt: connection refused
Jul 17 20:40:49 192-168-0-1-B28 kubelet: I0717 20:40:49.576956   23774 prober.go:113] Readiness probe for "zp-crm-rule-1362938709-4900s_default(92d60c48-89b2-11e8-8d0c-7e635641b742):zp-crm-rule" succeeded
Jul 17 20:40:59 192-168-0-1-B28 kubelet: I0717 20:40:59.569375   23774 prober.go:106] Readiness probe for "zp-crm-rule-1362938709-4900s_default(92d60c48-89b2-11e8-8d0c-7e635641b742):zp-crm-rule" failed (failure): Get http://11.11.75.4:8080/health: dial tcp 11.11.75.4:8080: getsockopt: connection refused
Jul 17 20:41:09 192-168-0-1-B28 kubelet: I0717 20:41:09.569403   23774 prober.go:106] Readiness probe for "zp-crm-rule-1362938709-4900s_default(92d60c48-89b2-11e8-8d0c-7e635641b742):zp-crm-rule" failed (failure): Get http://11.11.75.4:8080/health: dial tcp 11.11.75.4:8080: getsockopt: connection refused
Jul 17 20:41:19 192-168-0-1-B28 kubelet: I0717 20:41:19.575196   23774 prober.go:113] Readiness probe for "zp-crm-rule-1362938709-4900s_default(92d60c48-89b2-11e8-8d0c-7e635641b742):zp-crm-rule" succeeded
Jul 17 20:41:39 192-168-0-1-B28 kubelet: I0717 20:41:39.569339   23774 prober.go:106] Readiness probe for "zp-crm-rule-1362938709-4900s_default(92d60c48-89b2-11e8-8d0c-7e635641b742):zp-crm-rule" failed (failure): Get http://11.11.75.4:8080/health: dial tcp 11.11.75.4:8080: getsockopt: connection refused
Jul 17 20:41:49 192-168-0-1-B28 kubelet: I0717 20:41:49.569497   23774 prober.go:106] Readiness probe for "zp-crm-rule-1362938709-4900s_default(92d60c48-89b2-11e8-8d0c-7e635641b742):zp-crm-rule" failed (failure): Get http://11.11.75.4:8080/health: dial tcp 11.11.75.4:8080: getsockopt: connection refused
Jul 17 20:41:59 192-168-0-1-B28 kubelet: I0717 20:41:59.569198   23774 prober.go:106] Readiness probe for "zp-crm-rule-1362938709-4900s_default(92d60c48-89b2-11e8-8d0c-7e635641b742):zp-crm-rule" failed (failure): Get http://11.11.75.4:8080/health: dial tcp 11.11.75.4:8080: getsockopt: connection refused
Jul 17 20:42:09 192-168-0-1-B28 kubelet: I0717 20:42:09.569606   23774 prober.go:106] Readiness probe for "zp-crm-rule-1362938709-4900s_default(92d60c48-89b2-11e8-8d0c-7e635641b742):zp-crm-rule" failed (failure): Get http://11.11.75.4:8080/health: dial tcp 11.11.75.4:8080: getsockopt: connection refused
Jul 17 20:42:19 192-168-0-1-B28 kubelet: I0717 20:42:19.569110   23774 prober.go:106] Readiness probe for "zp-crm-rule-1362938709-4900s_default(92d60c48-89b2-11e8-8d0c-7e635641b742):zp-crm-rule" failed (failure): Get http://11.11.75.4:8080/health: dial tcp 11.11.75.4:8080: getsockopt: connection refused
Jul 17 20:42:39 192-168-0-1-B28 kubelet: I0717 20:42:39.576585   23774 prober.go:106] Readiness probe for "zp-crm-rule-1362938709-4900s_default(92d60c48-89b2-11e8-8d0c-7e635641b742):zp-crm-rule" failed (failure): Get http://11.11.75.4:8080/health: dial tcp 11.11.75.4:8080: getsockopt: connection refused
Jul 17 20:42:49 192-168-0-1-B28 kubelet: I0717 20:42:49.569490   23774 prober.go:106] Readiness probe for "zp-crm-rule-1362938709-4900s_default(92d60c48-89b2-11e8-8d0c-7e635641b742):zp-crm-rule" failed (failure): Get http://11.11.75.4:8080/health: dial tcp 11.11.75.4:8080: getsockopt: connection refused
Jul 17 20:42:59 192-168-0-1-B28 kubelet: I0717 20:42:59.569501   23774 prober.go:106] Readiness probe for "zp-crm-rule-1362938709-4900s_default(92d60c48-89b2-11e8-8d0c-7e635641b742):zp-crm-rule" failed (failure): Get http://11.11.75.4:8080/health: dial tcp 11.11.75.4:8080: getsockopt: connection refused
Jul 17 20:43:09 192-168-0-1-B28 kubelet: I0717 20:43:09.569399   23774 prober.go:106] Readiness probe for "zp-crm-rule-1362938709-4900s_default(92d60c48-89b2-11e8-8d0c-7e635641b742):zp-crm-rule" failed (failure): Get http://11.11.75.4:8080/health: dial tcp 11.11.75.4:8080: getsockopt: connection refused

Environment:

  • kubernetes version:
Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.5", GitCommit:"17d7182a7ccbb167074be7a87f0a68bd00d58d97", GitTreeState:"clean", BuildDate:"2017-08-31T09:14:02Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.5", GitCommit:"17d7182a7ccbb167074be7a87f0a68bd00d58d97", GitTreeState:"clean", BuildDate:"2017-08-31T08:56:23Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
  • worker node OS and kernel:
# cat /etc/centos-release
CentOS Linux release 7.4.1708 (Core)
# uname -a
Linux 192-168-0-1-B28 3.10.0-693.el7.x86_64 #1 SMP Tue Aug 22 21:09:27 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
  • docker version:
# docker version
Client:
 Version:      18.03.1-ce
 API version:  1.37
 Go version:   go1.9.2
 Git commit:   9ee9f40
 Built:        Thu Apr 26 07:12:25 2018
 OS/Arch:      linux/amd64
 Experimental: false
 Orchestrator: swarm

Server:
 Engine:
  Version:      18.03.1-ce
  API version:  1.37 (minimum version 1.12)
  Go version:   go1.9.5
  Git commit:   9ee9f40
  Built:        Thu Apr 26 07:23:03 2018
  OS/Arch:      linux/amd64
  Experimental: false
  • flannel version:
# flanneld --version
v0.8.0

@Phanindra48 Hi there, do you have any updates about this issue?

@Shawyeok Actually I don't have a concrete solution yet as there are several reasons for it to fail and moreover its quite inconsistent. May be they need to provide more information or something useful instead of what we get now in the logs.

+1

+1

With same configuration , same pod one time was successfully started. then stopped it , minikube delete. now I am getting this readiness probe failed .
running with vm-driver=none

We are seeing same issue. When deploying the same pod to multiple nodes, pods on some node can pass liveness probing, while pods on other nodes fails with "connection refused" issue.

The pod was eventually killed and re-scheduled on the same node. However, this time the probing started working.

Hello all. We were experiencing this issue as well.

Using:

kubectl describe pod <pod-name>

We also found that the Exit code was:137 and the Reason: 'Error'.
This is the error code for both Liveness failure and Memory issues but we were fairly certain it was not due to Memory issues as we had more than enough Memory allocated and when we find error issues kill the pod we get the correct reason 'OOMKilled'.

Anyway, We found that the issue occurred when we attempted to statically apply 250m per pod CPU in Lower Environments so to be Resource Efficient.

We run Spring Boot Applications which have a real heavy boot period. Because of this, the 0.25 cores we applied could not boot the App in time to start running the server and pass the health checks. And, we ultimately passed our Failure deadline before the Service was Ready.

I suggest that anyone seeing this issue either could maybe solve their problems by doing 1 of 2 things:

1) Meet the Race Condition
Allocate Higher Quantities of CPU to your Deployments so that the boot process is faster and the pod will be up in time for Liveness and Readiness checks.

2) Change the Race Condition
Set a longer Initial wait time on the Live and Readiness probe and, extend the failure deadline and the test interval . This should give you plenty of time for your Service to boot up and become Ready.

e.g

    readinessProbe:
      httpGet:
        scheme: http
        path: /health
        port: 8080
      initialDelaySeconds: 120
      timeoutSeconds: 3
      periodSeconds: 30
      successThreshold: 1
      failureThreshold: 5


    livenessProbe:
      httpGet:
        scheme: http
        path: /health
        port: 8080
      initialDelaySeconds: 120
      timeoutSeconds: 3
      periodSeconds: 30
      successThreshold: 1
      failureThreshold: 5

Obviously there could be other reasons for the probe failing and ending up with this type of Error but this is what solved it for us.

Hope this can help.

I am having the same issue on GKE.

Not sure if there is underlying issue. I am using helm chart (concourse)
I don't think it is timeout issue, it seems to me that the container is looking for some response on the 8080 port and that port is blocked on the container.

Warning Unhealthy 40m (x3 over 40m) kubelet, gke-cluster-default-pool-bec82955-0rtc Liveness probe failed: Get http://10.28.0.19:8080/: dial tcp 10.28.0.19:8080: getsockopt: connection refused Normal Killing 40m kubelet, gke-cluster-default-pool-bec82955-0rtc Killing container with id docker://concourse-web:Container failed liveness probe.. Container will be killed and recreated. Normal Pulled 40m kubelet, gke-cluster-default-pool-bec82955-0rtc Container image "concourse/concourse:4.2.1" already present on machine Warning BackOff 8m (x45 over 26m) kubelet, gke-cluster-default-pool-bec82955-0rtc Back-off restarting failed container Warning Unhealthy 3m (x150 over 42m) kubelet, gke-cluster-default-pool-bec82955-0rtc Readiness probe failed: Get http://10.28.0.19:8080/: dial tcp 10.28.0.19:8080: getsockopt: connection refused

After some troubleshooting, I found out that I was missing the targetPort directive, as it was different from the LB inbound port.

apiVersion: v1
kind: Service
metadata:
  name: app-server
  annotations:
    service.beta.kubernetes.io/azure-dns-label-name: <hidden>
spec:
  type: LoadBalancer
  ports:
    - port: 80
      name: http
      targetPort: 3000
      protocol: TCP
  selector:
    app: app-server

May be you have some server.context-path set up in application.properties file.
Try adjusting that in the path: of the liveness/readinessProbe as httpGet:path: The spring.application.name is overridden by the server context-path if specified in the application.properties. I was thinking that in service discovery mechanism the spring.application.name is applicable.

Same issue here. I can get the service running continuously by disabling the readiness probe. This indicated that it's the probe failure and the subsequent shutting down that caused the connection refusal instead of the other way around.

Things I have tried that doesn't work:

  • Increasing probe timeout
  • Fiddling with ports, containerPort, targetPort

@njgibbon, Your suggestion to modify the initialDelaySeconds, periodSeconds and timeoutSeconds to bigger values (30,30,10) respectively worked for me. Our problem pod contains an authenticationProxy and a Jupyter-Lab deployment. After applying liveness probes we saw intermittent connection refused errors. Ideally, we'd like to play with these values to see what can be reduced but In any case, I'd like to say thanks for the good and helpful comment. 馃憤

for whoever encounter this issue while using spring services.
don't forget, like I did, to set the appropriate args.

```
Args:

  • --server.port=8080
    ...

I encounter this issue too, and finally I found that my server take toooooooo much time to start up.

longer than the initialDelaySeconds, so the deploy fall into a infinity loop.

Encountered the same issue on the single node. New pods stuck in Waiting: ContainerCreating status. sudo reboot on the node helped

I faced the same issue. Increasing the initialDelaySeconds for liveness and readiness probes resolved it. The spring boot app took too long to start.

for spring boot deployment
increasing initialDelaySeconds from 10 to 45 fixed the problem for me.

Can you update your k8s cluster to a supported version?

As mentioned in https://github.com/kubernetes/kubernetes/issues/62594#issuecomment-420685737 we found this issue was caused by:

  1. Insufficient CPU/RAM resource allocation for quick container startup (took a long time to start)
  2. Insufficient initialDelaySeconds for the probe.

After increasing the resource allocation, the container started more quickly which allowed reducing the initialDelaySeconds value. After that, the probes succeed.

Guys same issue I am facing but it happen after time intervals like after 2 days. It works properly in starting and looks stable but after time intervals pod going in infinite loop of restarting one by one of the same service.

My case was solved by changing the binding from 127.0.0.1 to 0.0.0.0.

This worked for me. Why does this work?

I was receiving same error while I spin up pods for jenkins container.

I removed limits on the container. It start working.


apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: jenkins-dep
namespace: jenkins-ns
spec:
template:
metadata:
name: jenkins-tmplt
labels:
app: jenkins
spec:
containers:
- name: jenkins
image: jenkins:lastest
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
successThreshold: 1
timeoutSeconds: 1
periodSeconds: 10
initialDelaySeconds: 10
tcpSocket:
port: 8081
readinessProbe:
failureThreshold: 3
successThreshold: 1
timeoutSeconds: 1
periodSeconds: 10
initialDelaySeconds: 30
httpGet:
port: 8081
path: /login
#resources:
# limits:
# cpu: 100m
# memory: 512Mi
ports:
- containerPort: 8081
replicas: 2
strategy:
rollingUpdate:
maxUnavailable: 2
maxSurge: 2
selector:
matchLabels:
app: jenkins

@njgibbon, Your suggestion to modify the initialDelaySeconds, periodSeconds and timeoutSeconds to bigger values (30,30,10) respectively worked for me. Our problem pod contains an authenticationProxy and a Jupyter-Lab deployment. After applying liveness probes we saw intermittent connection refused errors. Ideally, we'd like to play with these values to see what can be reduced but In any case, I'd like to say thanks for the good and helpful comment. 馃憤

it works for me. So, i think we should increase the times a bit more, even the start time is less than value defined (my initialDelaySeconds was defined 20s but start time take 19s). Here is my config:
readinessProbe:
httpGet:
path: actuator/health/readiness
port: {{ .Values.service.targetPort }}
initialDelaySeconds: 15
timeoutSeconds: 10
periodSeconds: 30
successThreshold: 2
failureThreshold: 5
livenessProbe:
httpGet:
path: actuator/health/liveness
port: {{ .Values.service.targetPort }}
initialDelaySeconds: 30
timeoutSeconds: 10
periodSeconds: 30
failureThreshold: 5

Increasing the timeout doesn't work for us. We had to change the IP to 0.0.0.0. It looks like that this thread contains two different issues:

  • Misconfiguration of readiness timeouts in general.
  • Issue in readiness core feature.

For us, it looks like that the readiness feature has an issue. Our server starts immediately and we can curl the health endpoints without any issue.

That's the question https://github.com/kubernetes/kubernetes/issues/62594#issuecomment-605714141 we should try to answer.

When I try to ping the container via the POD ip I get the exact same error.

curl http://10.244.0.89:4000
curl: (7) Failed to connect to 10.244.0.89 port 4000: Connection refused

Is this the same what readinessProbe use as host option? This would explain why binding to 0.0.0.0 works.

Yes, according to the docs readinessProbe use the POD IP. 127.0.0.1 can't be accessed outside of the POD network. Either you use 0.0.0.0 or you specify an AAAA record.

Was this page helpful?
0 / 5 - 0 ratings