NGINX Ingress controller version: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.11.0 installed with helm using stable chart.
Kubernetes version (use kubectl version): 1.8.4
Environment:
What happened: nginx-ingress-controller pod Readiness and Liveness probe failed: HTTP probe failed with statuscode: 500. The pod is terminated and restarted. This happens 2-5 times until it starts successfully.
What you expected to happen: Pod to start successfully without failing Readiness and Liveness probe.
How to reproduce it (as minimally and precisely as possible): We are running the nginx-ingress-controller as a daemonset so whenever a new node is created we see this problem.
Anything else we need to know: This issue has been opened before:
Here are the events from the nginx-ingress-controller pod:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulMountVolume 2m kubelet, ip-10-0-19-85.eu-central-1.compute.internal MountVolume.SetUp succeeded for volume "ingress1-nginx-ingress-token-jm48x"
Warning FailedSync 1m (x3 over 2m) kubelet, ip-10-0-19-85.eu-central-1.compute.internal Error syncing pod
Normal Pulling 1m kubelet, ip-10-0-19-85.eu-central-1.compute.internal pulling image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.11.0"
Normal Pulled 48s kubelet, ip-10-0-19-85.eu-central-1.compute.internal Successfully pulled image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.11.0"
Warning Unhealthy 13s (x3 over 33s) kubelet, ip-10-0-19-85.eu-central-1.compute.internal Liveness probe failed: HTTP probe failed with statuscode: 500
Warning Unhealthy 4s (x4 over 34s) kubelet, ip-10-0-19-85.eu-central-1.compute.internal Readiness probe failed: HTTP probe failed with statuscode: 500
Normal Created 0s (x2 over 48s) kubelet, ip-10-0-19-85.eu-central-1.compute.internal Created container
Normal Started 0s (x2 over 48s) kubelet, ip-10-0-19-85.eu-central-1.compute.internal Started container
Normal Killing 0s kubelet, ip-10-0-19-85.eu-central-1.compute.internal Killing container with id docker://nginx-ingress-controller:Container failed liveness probe.. Container will be killed and recreated.
Normal Pulled 0s kubelet, ip-10-0-19-85.eu-central-1.compute.internal Container image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.11.0" already present on machine
Here is the default probe config:
livenessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 10254
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
readinessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 10254
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
Here is the helm chart values we use: https://gist.github.com/max-rocket-internet/ba6b368502f58bc7061d3062939b5dca
I have logs from pod with --v=10 argument set but there is a lot of output and some of it is sensitive. Here is an excerpt but let me know if need more:
I0305 10:57:12.548693 7 main.go:47] annotation kubernetes.io/ingress.class is not present in ingress default/env1-app1-part1
I0305 10:57:15.587793 7 round_trippers.go:417] curl -k -v -XGET -H "Accept: application/vnd.kubernetes.protobuf, */*" -H "User-Agent: nginx-ingress-controller/v0.0.0 (linux/amd64) kubernetes/$Format" -H "Authorization: Bearer xxxxxxxx" https://100.64.0.1:443/api/v1/namespaces/default/configmaps/ingress-controller-leader-nginx
I0305 10:57:15.590327 7 round_trippers.go:436] GET https://100.64.0.1:443/api/v1/namespaces/default/configmaps/ingress-controller-leader-nginx 200 OK in 2 milliseconds
I0305 10:57:15.590344 7 round_trippers.go:442] Response Headers:
I0305 10:57:15.590350 7 round_trippers.go:445] Content-Type: application/vnd.kubernetes.protobuf
I0305 10:57:15.590355 7 round_trippers.go:445] Content-Length: 437
I0305 10:57:15.590362 7 round_trippers.go:445] Date: Mon, 05 Mar 2018 10:57:15 GMT
I0305 10:57:15.590397 7 request.go:871] Response Body:
00000000 6b 38 73 00 0a 0f 0a 02 76 31 12 09 43 6f 6e 66 |k8s.....v1..Conf|
...
I0305 10:57:15.590459 7 leaderelection.go:243] lock is held by ingress1-nginx-ingress-controller-9jsqp and has not yet expired
I0305 10:57:15.590467 7 leaderelection.go:180] failed to acquire lease default/ingress-controller-leader-nginx
I0305 10:57:22.549142 7 main.go:47] annotation kubernetes.io/ingress.class is not present in ingress default/env1-app2-admin
I0305 10:57:26.091336 7 main.go:152] Received SIGTERM, shutting down
I0305 10:57:26.091359 7 nginx.go:359] shutting down controller queues
I0305 10:57:26.091376 7 nginx.go:367] stopping NGINX process...
2018/03/05 10:57:26 [notice] 29#29: signal process started
I0305 10:57:29.097347 7 nginx.go:380] NGINX process has stopped
I0305 10:57:29.097372 7 main.go:160] Handled quit, awaiting pod deletion
I0305 10:57:30.992643 7 round_trippers.go:417] curl -k -v -XGET -H "User-Agent: nginx-ingress-controller/v0.0.0 (linux/amd64) kubernetes/$Format" -H "Authorization: Bearer xxxxxx" -H "Accept: application/vnd.kubernetes.protobuf, */*" https://100.64.0.1:443/api/v1/namespaces/default/configmaps/ingress-controller-leader-nginx
I0305 10:57:30.994766 7 round_trippers.go:436] GET https://100.64.0.1:443/api/v1/namespaces/default/configmaps/ingress-controller-leader-nginx 200 OK in 2 milliseconds
I0305 10:57:30.994786 7 round_trippers.go:442] Response Headers:
I0305 10:57:30.994792 7 round_trippers.go:445] Content-Length: 437
I0305 10:57:30.994818 7 round_trippers.go:445] Date: Mon, 05 Mar 2018 10:57:30 GMT
I0305 10:57:30.994832 7 round_trippers.go:445] Content-Type: application/vnd.kubernetes.protobuf
I0305 10:57:30.994891 7 request.go:871] Response Body:
00000000 6b 38 73 00 0a 0f 0a 02 76 31 12 09 43 6f 6e 66 |k8s.....v1..Conf|
....
000001b0 00 1a 00 22 00 |...".|
I0305 10:57:30.995001 7 leaderelection.go:243] lock is held by ingress1-nginx-ingress-controller-9jsqp and has not yet expired
I0305 10:57:30.995029 7 leaderelection.go:180] failed to acquire lease default/ingress-controller-leader-nginx
I0305 10:57:39.097529 7 main.go:163] Exiting with 0
Seeing the same problem, as above..
However I also see this message in the log:
Error: exit status 1
2018/03/15 16:08:15 [emerg] 180#180: "client_max_body_size" directive invalid value in /tmp/nginx-cfg653645632:777
nginx: [emerg] "client_max_body_size" directive invalid value in /tmp/nginx-cfg653645632:777
nginx: configuration file /tmp/nginx-cfg653645632 test failed
Tested with 0.10.2 and 0.11.0
I'm seeing the same issue, here are the logs with v=10
I0319 18:21:58.035389 7 round_trippers.go:442] Response Headers:
I0319 18:21:58.035393 7 round_trippers.go:445] Audit-Id: 977bee30-c94f-470a-8aa0-f36703b552d0
I0319 18:21:58.035397 7 round_trippers.go:445] Content-Type: application/vnd.kubernetes.protobuf;stream=watch
I0319 18:21:58.035400 7 round_trippers.go:445] Date: Mon, 19 Mar 2018 18:21:58 GMT
I0319 18:22:32.514283 7 main.go:150] Received SIGTERM, shutting down
I0319 18:22:32.514349 7 nginx.go:321] shutting down controller queues
I0319 18:22:32.514371 7 nginx.go:329] stopping NGINX process...
2018/03/19 18:22:32 [notice] 48#48: signal process started
2018/03/19 18:22:32 [error] 48#48: open() "/run/nginx.pid" failed (2: No such file or directory)
nginx: [error] open() "/run/nginx.pid" failed (2: No such file or directory)
I0319 18:22:32.587615 7 main.go:154] Error during shutdown exit status 1
I0319 18:22:32.587670 7 main.go:158] Handled quit, awaiting pod deletion
I0319 18:22:42.587856 7 main.go:161] Exiting with 1
Release: 0.10.2
I am seeing the same issue with 0.14.0 as well.
Having the same issue with 0.15.0
Same issue with 0.14.0, 0.15.0, but not 0.9.0.
Having same issue with 0.9.0, 0.10.0, 0.15.0. Using K8 version 1.8.11
Having same issue with 0.14.0, K8s version 1.8.4
Same issue with 0.15.0
Attached its log output
@keslerm can you update your image to current master?
@aledbf i built the image from master and that did the trick, looks good now.
Anything I can provide that might help?
Closing. Please update to 0.16.0
Hi! I am having the same issues with 0.24.0
$ kubectl describe pod nginx-ingress-controller-7846888d77-xlvwk
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 1m default-scheduler Successfully assigned nginx-ingress-controller-7846888d77-xlvwk to gke-qaas-test-default-pool-4b3a3303-h9xk
Normal SuccessfulMountVolume 1m kubelet, gke-qaas-test-default-pool-4b3a3303-h9xk MountVolume.SetUp succeeded for volume "nginx-ingress-token-lrptw"
Normal Pulled 24s (x2 over 58s) kubelet, gke-qaas-test-default-pool-4b3a3303-h9xk Container image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.17.1" already present on machine
Normal Created 24s (x2 over 58s) kubelet, gke-qaas-test-default-pool-4b3a3303-h9xk Created container
Normal Started 24s (x2 over 58s) kubelet, gke-qaas-test-default-pool-4b3a3303-h9xk Started container
Normal Killing 24s kubelet, gke-qaas-test-default-pool-4b3a3303-h9xk Killing container with id docker://nginx-ingress-controller:Container failed liveness probe.. Container will be killed and recreated.
Warning Unhealthy 5s (x4 over 45s) kubelet, gke-qaas-test-default-pool-4b3a3303-h9xk Liveness probe failed: Get http://10.12.1.12:10254/healthz: dial tcp 10.12.1.12:10254: getsockopt: connection refused
Warning Unhealthy 2s (x4 over 42s) kubelet, gke-qaas-test-default-pool-4b3a3303-h9xk Readiness probe failed: Get http://10.12.1.12:10254/healthz: dial tcp 10.12.1.12:10254: getsockopt: connection refused
$ kubectl logs nginx-ingress-controller-7846888d77-xlvwk
-------------------------------------------------------------------------------
NGINX Ingress controller
Release: 0.17.1
Build: git-12f7966
Repository: https://github.com/kubernetes/ingress-nginx.git
-------------------------------------------------------------------------------
I0815 22:21:46.579086 5 flags.go:180] Watching for Ingress class: nginx
@michaelkunzmann-sap if the log ends there it means the pod cannot reach the apiserver.
You can get more details about this increasing the log level in the ingress controller deployment adding the flag --v=10
I have the same problem,And just solved it
In my question, I tried to delete the ingress that references nginx ingress, then delete nginx-ingress-controller , reinstall it
Finally succeeded, no more reported unhealthy
I havint the same issue with 0.25 version
I have the same problem,And just solved it
In my question, I tried to delete the ingress that references nginx ingress, then delete nginx-ingress-controller , reinstall it
Finally succeeded, no more reported unhealthy
I have a similar issue with ingress-Nginx. Do you mind sharing your configuration which is working?
I'm having the same issues with my minikube, with the nginx-ingress-controller 0.25 version; as subject stated, it's a 500 error code from the "describe pod" command:
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 17m default-scheduler Successfully assigned ingress-nginx/nginx-ingress-controller-79f6884cf6-qj65t to minikube Normal Started 17m (x2 over 17m) kubelet, minikube Started container nginx-ingress-controller Warning Unhealthy 16m (x6 over 17m) kubelet, minikube Liveness probe failed: HTTP probe failed with statuscode: 500 Normal Killing 16m (x2 over 17m) kubelet, minikube Container nginx-ingress-controller failed liveness probe, will be restarted Normal Pulled 16m (x3 over 17m) kubelet, minikube Container image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.25.1" already present on machine Normal Created 16m (x3 over 17m) kubelet, minikube Created container nginx-ingress-controller Warning Unhealthy 7m40s (x35 over 17m) kubelet, minikube Readiness probe failed: HTTP probe failed with statuscode: 500 Warning BackOff 2m43s (x44 over 12m) kubelet, minikube Back-off restarting failed container
The nginx-ingress-controller pod also went in status CrashLoopBackOff (I guess for too many fails):
NAME READY STATUS RESTARTS AGE nginx-ingress-controller-79f6884cf6-qj65t 0/1 CrashLoopBackOff 11 28m
Any progress here? We have the same problem with 0.26.1. Nginx config looks good nginx: configuration file /etc/nginx/nginx.conf test is successful. Any clues?
Possibly related to #3993. Eventually we fixed this by upgrading the nodes to 1.14.7-gke.10. After that the for i in $(seq 1 200); do curl localhost:10254/healthz; done inside the ingress-nginx container was done in a few seconds, whereas before it took minutes. It could well be that the upgrade triggered a reset on the root cause, which is still unknown to me. Or maybe somehow nginx-ingress-controller:0.26.1 works better with the newer kubernetes version.
I am also getting this issue:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 13m default-scheduler Successfully assigned jenkins/nginx-ingress-controller-6d9c6d875b-8h98z to ip-192-168-150-176.ec2.internal
Normal Started 12m (x2 over 13m) kubelet, ip-192-168-150-176.ec2.internal Started container nginx-ingress-controller
Warning Unhealthy 11m (x6 over 12m) kubelet, ip-192-168-150-176.ec2.internal Liveness probe failed: HTTP probe failed with statuscode: 500
Normal Killing 11m (x2 over 12m) kubelet, ip-192-168-150-176.ec2.internal Container nginx-ingress-controller failed liveness probe, will be restarted
Warning Unhealthy 11m (x9 over 13m) kubelet, ip-192-168-150-176.ec2.internal Readiness probe failed: HTTP probe failed with statuscode: 500
Normal Pulled 11m (x3 over 13m) kubelet, ip-192-168-150-176.ec2.internal Container image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.30.0" already present on machine
Normal Created 11m (x3 over 13m) kubelet, ip-192-168-150-176.ec2.internal Created container nginx-ingress-controller
Warning BackOff 2m53s (x24 over 8m57s) kubelet, ip-192-168-150-176.ec2.internal Back-off restarting failed container
I am using quay.io/kubernetes-ingress-controller/nginx-ingress-controller image.
Could you please help?
删除引用ingress的,然后删除pod,在重新安装即可
Most helpful comment
I havint the same issue with 0.25 version