Ingress-nginx: Readiness and Liveness probe failed: HTTP probe failed with statuscode: 500

Created on 5 Mar 2018  ·  21Comments  ·  Source: kubernetes/ingress-nginx

NGINX Ingress controller version: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.11.0 installed with helm using stable chart.
Kubernetes version (use kubectl version): 1.8.4
Environment:

  • Cloud provider or hardware configuration: AWS
  • OS (e.g. from /etc/os-release): kops version 1.8.1 (Debian I think)

What happened: nginx-ingress-controller pod Readiness and Liveness probe failed: HTTP probe failed with statuscode: 500. The pod is terminated and restarted. This happens 2-5 times until it starts successfully.

What you expected to happen: Pod to start successfully without failing Readiness and Liveness probe.

How to reproduce it (as minimally and precisely as possible): We are running the nginx-ingress-controller as a daemonset so whenever a new node is created we see this problem.

Anything else we need to know: This issue has been opened before:

Here are the events from the nginx-ingress-controller pod:

Events:
  Type     Reason                 Age                From                                                  Message
  ----     ------                 ----               ----                                                  -------
  Normal   SuccessfulMountVolume  2m                 kubelet, ip-10-0-19-85.eu-central-1.compute.internal  MountVolume.SetUp succeeded for volume "ingress1-nginx-ingress-token-jm48x"
  Warning  FailedSync             1m (x3 over 2m)    kubelet, ip-10-0-19-85.eu-central-1.compute.internal  Error syncing pod
  Normal   Pulling                1m                 kubelet, ip-10-0-19-85.eu-central-1.compute.internal  pulling image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.11.0"
  Normal   Pulled                 48s                kubelet, ip-10-0-19-85.eu-central-1.compute.internal  Successfully pulled image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.11.0"
  Warning  Unhealthy              13s (x3 over 33s)  kubelet, ip-10-0-19-85.eu-central-1.compute.internal  Liveness probe failed: HTTP probe failed with statuscode: 500
  Warning  Unhealthy              4s (x4 over 34s)   kubelet, ip-10-0-19-85.eu-central-1.compute.internal  Readiness probe failed: HTTP probe failed with statuscode: 500
  Normal   Created                0s (x2 over 48s)   kubelet, ip-10-0-19-85.eu-central-1.compute.internal  Created container
  Normal   Started                0s (x2 over 48s)   kubelet, ip-10-0-19-85.eu-central-1.compute.internal  Started container
  Normal   Killing                0s                 kubelet, ip-10-0-19-85.eu-central-1.compute.internal  Killing container with id docker://nginx-ingress-controller:Container failed liveness probe.. Container will be killed and recreated.
  Normal   Pulled                 0s                 kubelet, ip-10-0-19-85.eu-central-1.compute.internal  Container image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.11.0" already present on machine

Here is the default probe config:

       livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /healthz
            port: 10254
            scheme: HTTP
          initialDelaySeconds: 10
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /healthz
            port: 10254
            scheme: HTTP
          initialDelaySeconds: 10
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1

Here is the helm chart values we use: https://gist.github.com/max-rocket-internet/ba6b368502f58bc7061d3062939b5dca

I have logs from pod with --v=10 argument set but there is a lot of output and some of it is sensitive. Here is an excerpt but let me know if need more:

I0305 10:57:12.548693       7 main.go:47] annotation kubernetes.io/ingress.class is not present in ingress default/env1-app1-part1
I0305 10:57:15.587793       7 round_trippers.go:417] curl -k -v -XGET  -H "Accept: application/vnd.kubernetes.protobuf, */*" -H "User-Agent: nginx-ingress-controller/v0.0.0 (linux/amd64) kubernetes/$Format" -H "Authorization: Bearer xxxxxxxx" https://100.64.0.1:443/api/v1/namespaces/default/configmaps/ingress-controller-leader-nginx
I0305 10:57:15.590327       7 round_trippers.go:436] GET https://100.64.0.1:443/api/v1/namespaces/default/configmaps/ingress-controller-leader-nginx 200 OK in 2 milliseconds
I0305 10:57:15.590344       7 round_trippers.go:442] Response Headers:
I0305 10:57:15.590350       7 round_trippers.go:445]     Content-Type: application/vnd.kubernetes.protobuf
I0305 10:57:15.590355       7 round_trippers.go:445]     Content-Length: 437
I0305 10:57:15.590362       7 round_trippers.go:445]     Date: Mon, 05 Mar 2018 10:57:15 GMT
I0305 10:57:15.590397       7 request.go:871] Response Body:
00000000  6b 38 73 00 0a 0f 0a 02  76 31 12 09 43 6f 6e 66  |k8s.....v1..Conf|
...
I0305 10:57:15.590459       7 leaderelection.go:243] lock is held by ingress1-nginx-ingress-controller-9jsqp and has not yet expired
I0305 10:57:15.590467       7 leaderelection.go:180] failed to acquire lease default/ingress-controller-leader-nginx
I0305 10:57:22.549142       7 main.go:47] annotation kubernetes.io/ingress.class is not present in ingress default/env1-app2-admin
I0305 10:57:26.091336       7 main.go:152] Received SIGTERM, shutting down
I0305 10:57:26.091359       7 nginx.go:359] shutting down controller queues
I0305 10:57:26.091376       7 nginx.go:367] stopping NGINX process...
2018/03/05 10:57:26 [notice] 29#29: signal process started
I0305 10:57:29.097347       7 nginx.go:380] NGINX process has stopped
I0305 10:57:29.097372       7 main.go:160] Handled quit, awaiting pod deletion
I0305 10:57:30.992643       7 round_trippers.go:417] curl -k -v -XGET  -H "User-Agent: nginx-ingress-controller/v0.0.0 (linux/amd64) kubernetes/$Format" -H "Authorization: Bearer xxxxxx" -H "Accept: application/vnd.kubernetes.protobuf, */*" https://100.64.0.1:443/api/v1/namespaces/default/configmaps/ingress-controller-leader-nginx
I0305 10:57:30.994766       7 round_trippers.go:436] GET https://100.64.0.1:443/api/v1/namespaces/default/configmaps/ingress-controller-leader-nginx 200 OK in 2 milliseconds
I0305 10:57:30.994786       7 round_trippers.go:442] Response Headers:
I0305 10:57:30.994792       7 round_trippers.go:445]     Content-Length: 437
I0305 10:57:30.994818       7 round_trippers.go:445]     Date: Mon, 05 Mar 2018 10:57:30 GMT
I0305 10:57:30.994832       7 round_trippers.go:445]     Content-Type: application/vnd.kubernetes.protobuf
I0305 10:57:30.994891       7 request.go:871] Response Body:
00000000  6b 38 73 00 0a 0f 0a 02  76 31 12 09 43 6f 6e 66  |k8s.....v1..Conf|
....
000001b0  00 1a 00 22 00                                    |...".|
I0305 10:57:30.995001       7 leaderelection.go:243] lock is held by ingress1-nginx-ingress-controller-9jsqp and has not yet expired
I0305 10:57:30.995029       7 leaderelection.go:180] failed to acquire lease default/ingress-controller-leader-nginx
I0305 10:57:39.097529       7 main.go:163] Exiting with 0

Most helpful comment

I havint the same issue with 0.25 version

All 21 comments

Seeing the same problem, as above..

However I also see this message in the log:
Error: exit status 1 2018/03/15 16:08:15 [emerg] 180#180: "client_max_body_size" directive invalid value in /tmp/nginx-cfg653645632:777 nginx: [emerg] "client_max_body_size" directive invalid value in /tmp/nginx-cfg653645632:777 nginx: configuration file /tmp/nginx-cfg653645632 test failed

Tested with 0.10.2 and 0.11.0

I'm seeing the same issue, here are the logs with v=10

I0319 18:21:58.035389       7 round_trippers.go:442] Response Headers:
I0319 18:21:58.035393       7 round_trippers.go:445]     Audit-Id: 977bee30-c94f-470a-8aa0-f36703b552d0
I0319 18:21:58.035397       7 round_trippers.go:445]     Content-Type: application/vnd.kubernetes.protobuf;stream=watch
I0319 18:21:58.035400       7 round_trippers.go:445]     Date: Mon, 19 Mar 2018 18:21:58 GMT

I0319 18:22:32.514283       7 main.go:150] Received SIGTERM, shutting down
I0319 18:22:32.514349       7 nginx.go:321] shutting down controller queues
I0319 18:22:32.514371       7 nginx.go:329] stopping NGINX process...
2018/03/19 18:22:32 [notice] 48#48: signal process started
2018/03/19 18:22:32 [error] 48#48: open() "/run/nginx.pid" failed (2: No such file or directory)
nginx: [error] open() "/run/nginx.pid" failed (2: No such file or directory)
I0319 18:22:32.587615       7 main.go:154] Error during shutdown exit status 1
I0319 18:22:32.587670       7 main.go:158] Handled quit, awaiting pod deletion
I0319 18:22:42.587856       7 main.go:161] Exiting with 1

Release: 0.10.2

I am seeing the same issue with 0.14.0 as well.

Having the same issue with 0.15.0

Same issue with 0.14.0, 0.15.0, but not 0.9.0.

Having same issue with 0.9.0, 0.10.0, 0.15.0. Using K8 version 1.8.11

Having same issue with 0.14.0, K8s version 1.8.4

Same issue with 0.15.0
Attached its log output

v10.log

@keslerm can you update your image to current master?

@aledbf i built the image from master and that did the trick, looks good now.

Anything I can provide that might help?

Closing. Please update to 0.16.0

Hi! I am having the same issues with 0.24.0

$ kubectl describe pod nginx-ingress-controller-7846888d77-xlvwk
Events:
  Type     Reason                 Age                From                                               Message
  ----     ------                 ----               ----                                               -------
  Normal   Scheduled              1m                 default-scheduler                                  Successfully assigned nginx-ingress-controller-7846888d77-xlvwk to gke-qaas-test-default-pool-4b3a3303-h9xk
  Normal   SuccessfulMountVolume  1m                 kubelet, gke-qaas-test-default-pool-4b3a3303-h9xk  MountVolume.SetUp succeeded for volume "nginx-ingress-token-lrptw"
  Normal   Pulled                 24s (x2 over 58s)  kubelet, gke-qaas-test-default-pool-4b3a3303-h9xk  Container image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.17.1" already present on machine
  Normal   Created                24s (x2 over 58s)  kubelet, gke-qaas-test-default-pool-4b3a3303-h9xk  Created container
  Normal   Started                24s (x2 over 58s)  kubelet, gke-qaas-test-default-pool-4b3a3303-h9xk  Started container
  Normal   Killing                24s                kubelet, gke-qaas-test-default-pool-4b3a3303-h9xk  Killing container with id docker://nginx-ingress-controller:Container failed liveness probe.. Container will be killed and recreated.
  Warning  Unhealthy              5s (x4 over 45s)   kubelet, gke-qaas-test-default-pool-4b3a3303-h9xk  Liveness probe failed: Get http://10.12.1.12:10254/healthz: dial tcp 10.12.1.12:10254: getsockopt: connection refused
  Warning  Unhealthy              2s (x4 over 42s)   kubelet, gke-qaas-test-default-pool-4b3a3303-h9xk  Readiness probe failed: Get http://10.12.1.12:10254/healthz: dial tcp 10.12.1.12:10254: getsockopt: connection refused
$ kubectl logs nginx-ingress-controller-7846888d77-xlvwk
-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:    0.17.1
  Build:      git-12f7966
  Repository: https://github.com/kubernetes/ingress-nginx.git
-------------------------------------------------------------------------------

I0815 22:21:46.579086       5 flags.go:180] Watching for Ingress class: nginx

@michaelkunzmann-sap if the log ends there it means the pod cannot reach the apiserver.
You can get more details about this increasing the log level in the ingress controller deployment adding the flag --v=10

I have the same problem,And just solved it

In my question, I tried to delete the ingress that references nginx ingress, then delete nginx-ingress-controller , reinstall it

Finally succeeded, no more reported unhealthy

I havint the same issue with 0.25 version

I have the same problem,And just solved it

In my question, I tried to delete the ingress that references nginx ingress, then delete nginx-ingress-controller , reinstall it

Finally succeeded, no more reported unhealthy

I have a similar issue with ingress-Nginx. Do you mind sharing your configuration which is working?

I'm having the same issues with my minikube, with the nginx-ingress-controller 0.25 version; as subject stated, it's a 500 error code from the "describe pod" command:

Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  17m                   default-scheduler  Successfully assigned ingress-nginx/nginx-ingress-controller-79f6884cf6-qj65t to minikube
  Normal   Started    17m (x2 over 17m)     kubelet, minikube  Started container nginx-ingress-controller
  Warning  Unhealthy  16m (x6 over 17m)     kubelet, minikube  Liveness probe failed: HTTP probe failed with statuscode: 500
  Normal   Killing    16m (x2 over 17m)     kubelet, minikube  Container nginx-ingress-controller failed liveness probe, will be restarted
  Normal   Pulled     16m (x3 over 17m)     kubelet, minikube  Container image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.25.1" already present on machine
  Normal   Created    16m (x3 over 17m)     kubelet, minikube  Created container nginx-ingress-controller
  Warning  Unhealthy  7m40s (x35 over 17m)  kubelet, minikube  Readiness probe failed: HTTP probe failed with statuscode: 500
  Warning  BackOff    2m43s (x44 over 12m)  kubelet, minikube  Back-off restarting failed container

The nginx-ingress-controller pod also went in status CrashLoopBackOff (I guess for too many fails):

NAME                                        READY   STATUS             RESTARTS   AGE
nginx-ingress-controller-79f6884cf6-qj65t   0/1     CrashLoopBackOff   11         28m

Any progress here? We have the same problem with 0.26.1. Nginx config looks good nginx: configuration file /etc/nginx/nginx.conf test is successful. Any clues?

Possibly related to #3993. Eventually we fixed this by upgrading the nodes to 1.14.7-gke.10. After that the for i in $(seq 1 200); do curl localhost:10254/healthz; done inside the ingress-nginx container was done in a few seconds, whereas before it took minutes. It could well be that the upgrade triggered a reset on the root cause, which is still unknown to me. Or maybe somehow nginx-ingress-controller:0.26.1 works better with the newer kubernetes version.

I am also getting this issue:

Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 13m default-scheduler Successfully assigned jenkins/nginx-ingress-controller-6d9c6d875b-8h98z to ip-192-168-150-176.ec2.internal
Normal Started 12m (x2 over 13m) kubelet, ip-192-168-150-176.ec2.internal Started container nginx-ingress-controller
Warning Unhealthy 11m (x6 over 12m) kubelet, ip-192-168-150-176.ec2.internal Liveness probe failed: HTTP probe failed with statuscode: 500
Normal Killing 11m (x2 over 12m) kubelet, ip-192-168-150-176.ec2.internal Container nginx-ingress-controller failed liveness probe, will be restarted
Warning Unhealthy 11m (x9 over 13m) kubelet, ip-192-168-150-176.ec2.internal Readiness probe failed: HTTP probe failed with statuscode: 500
Normal Pulled 11m (x3 over 13m) kubelet, ip-192-168-150-176.ec2.internal Container image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.30.0" already present on machine
Normal Created 11m (x3 over 13m) kubelet, ip-192-168-150-176.ec2.internal Created container nginx-ingress-controller
Warning BackOff 2m53s (x24 over 8m57s) kubelet, ip-192-168-150-176.ec2.internal Back-off restarting failed container

I am using quay.io/kubernetes-ingress-controller/nginx-ingress-controller image.
Could you please help?

删除引用ingress的,然后删除pod,在重新安装即可

Was this page helpful?
0 / 5 - 0 ratings