Aws-load-balancer-controller: healthz endpoint never comes up despite service functioning

Created on 6 Dec 2017 · 7Comments · Source: kubernetes-sigs/aws-load-balancer-controller

Using the helm chart, the readiness and liveliness checks never pass because the <>:8080/healthz endpoint never comes up. <>:8080/metrics is live and the controller is functioning properly after removing the liveness check. We are able to create multiple services using ALB's and everything is groovy.

Based on the logs, it looks like one of the loops off of controller.Configure is stuck. The last log message from ALB-INGRESS (with DEBUG turned on) is
log.go:48] [ALB-INGRESS] [controller] [INFO]: Ingress class set to alb

Setting up a port-forward to 8080 and curling http://localhost:8080/healthz (or state) returns a 404 which makes me think it never got to the step where those handlers are created.

kinbug

Source

kaseyalusi

👍5

Most helpful comment

Chart we are using alb-ingress-controller-helm-0.0.9

alock on 6 Dec 2017

👍3

All 7 comments

Chart we are using alb-ingress-controller-helm-0.0.9

alock on 6 Dec 2017

👍3

@kaseyalusi I am looking into this now. There needs to be more debug logs to figure out where its getting stuck I think, and then il add some error handling to surface these issues in future.

willejs on 8 Dec 2017

@kaseyalusi @alock I ended up running the latest tag 1.0-alpha.7, which includes better logging to debug my issues. Give that a go and it will work. Once 1.0 is released im sure they will bump the version in the helm chart.

willejs on 8 Dec 2017

Hey @willejs thanks for looking into this. I deployed the 1.0-alpha.7 tag but with that image the controller is getting 403 trying to use the AWS apis... we are using kube2iam for the authentication and with the 0.8 tag everything is working just fine.

I1208 18:26:16.307510       1 session.go:31] [ALB-INGRESS] [session] [INFO]: Request: elasticloadbalancing/&{DescribeLoadBalancers POST / %!s(*request.Paginator=&{[Marker] [NextMarker]  }) %!s(func(*request.Request) error=<nil>)}, Payload: {
I1208 18:26:16.307524       1 session.go:31] [ALB-INGRESS] [session] [INFO]:
I1208 18:26:16.307527       1 session.go:31] [ALB-INGRESS] [session] [INFO]: }
I1208 18:26:17.479102       1 session.go:31] [ALB-INGRESS] [session] [INFO]: Request: ec2/&{DescribeTags POST / %!s(*request.Paginator=&{[NextToken] [NextToken] MaxResults }) %!s(func(*request.Request) error=<nil>)}, Payload: {
I1208 18:26:17.479122       1 session.go:31] [ALB-INGRESS] [session] [INFO]:   Filters: [{
I1208 18:26:17.479126       1 session.go:31] [ALB-INGRESS] [session] [INFO]:       Name: "resource-id",
I1208 18:26:17.479129       1 session.go:31] [ALB-INGRESS] [session] [INFO]:       Values: ["sg-XXX"]
I1208 18:26:17.479132       1 session.go:31] [ALB-INGRESS] [session] [INFO]:     }]
I1208 18:26:17.479135       1 session.go:31] [ALB-INGRESS] [session] [INFO]: }
I1208 18:26:17.479271       1 session.go:31] [ALB-INGRESS] [session] [INFO]: Request: ec2/&{DescribeTags POST / %!s(*request.Paginator=&{[NextToken] [NextToken] MaxResults }) %!s(func(*request.Request) error=<nil>)}, Payload: {
I1208 18:26:17.479287       1 session.go:31] [ALB-INGRESS] [session] [INFO]:   Filters: [{
I1208 18:26:17.479294       1 session.go:31] [ALB-INGRESS] [session] [INFO]:       Name: "resource-id",
I1208 18:26:17.479303       1 session.go:31] [ALB-INGRESS] [session] [INFO]:       Values: ["sg-XXX"]
I1208 18:26:17.479323       1 session.go:31] [ALB-INGRESS] [session] [INFO]:     }]
I1208 18:26:17.479333       1 session.go:31] [ALB-INGRESS] [session] [INFO]: }

kaseyalusi on 8 Dec 2017

Hi @kaseyalusi are you still having problems? I ask because I'm using 1.0-alpha.7 with kiam and everything looks good for us: we did have to add a bunch of IAM permissions to our kiam configuration when we upgraded from 0.X to 1.0-alpha.Y though (mainly around WAF and some extra ec2) - so maybe try the newer build again but with more open IAM permissions?

_Side note_
We did have some problems running an old version of the helm chart though: the liveness probe was failing as we suspect the AWS API calls in /healthz were being rate limited.
I've just submitted a PR to prompt a discussion on this (https://github.com/kubernetes-sigs/aws-alb-ingress-controller/pull/406)

tyrannasaurusbanks on 22 Jun 2018

I've run into a similar issue - the ALB is allocated correctly and routes set up, but the /healthz endpoint never comes up and so the pod gets endlessly restarted. I'll hack around it for now by just adding a stupidly long timeout.

Looking at the AWS debug logs, it's trying to call waf-regional/GetWebACLForResource - the endpoint doesn't exist in the region I'm running in (eu-west-2) - might be the root cause on my end