Aws-load-balancer-controller: healthz endpoint never comes up despite service functioning

Created on 6 Dec 2017  路  7Comments  路  Source: kubernetes-sigs/aws-load-balancer-controller

Using the helm chart, the readiness and liveliness checks never pass because the <>:8080/healthz endpoint never comes up. <>:8080/metrics is live and the controller is functioning properly after removing the liveness check. We are able to create multiple services using ALB's and everything is groovy.

Based on the logs, it looks like one of the loops off of controller.Configure is stuck. The last log message from ALB-INGRESS (with DEBUG turned on) is
log.go:48] [ALB-INGRESS] [controller] [INFO]: Ingress class set to alb

Setting up a port-forward to 8080 and curling http://localhost:8080/healthz (or state) returns a 404 which makes me think it never got to the step where those handlers are created.

kinbug

Most helpful comment

Chart we are using alb-ingress-controller-helm-0.0.9

All 7 comments

Chart we are using alb-ingress-controller-helm-0.0.9

@kaseyalusi I am looking into this now. There needs to be more debug logs to figure out where its getting stuck I think, and then il add some error handling to surface these issues in future.

@kaseyalusi @alock I ended up running the latest tag 1.0-alpha.7, which includes better logging to debug my issues. Give that a go and it will work. Once 1.0 is released im sure they will bump the version in the helm chart.

Hey @willejs thanks for looking into this. I deployed the 1.0-alpha.7 tag but with that image the controller is getting 403 trying to use the AWS apis... we are using kube2iam for the authentication and with the 0.8 tag everything is working just fine.

I1208 18:26:16.307510       1 session.go:31] [ALB-INGRESS] [session] [INFO]: Request: elasticloadbalancing/&{DescribeLoadBalancers POST / %!s(*request.Paginator=&{[Marker] [NextMarker]  }) %!s(func(*request.Request) error=<nil>)}, Payload: {
I1208 18:26:16.307524       1 session.go:31] [ALB-INGRESS] [session] [INFO]:
I1208 18:26:16.307527       1 session.go:31] [ALB-INGRESS] [session] [INFO]: }
I1208 18:26:17.479102       1 session.go:31] [ALB-INGRESS] [session] [INFO]: Request: ec2/&{DescribeTags POST / %!s(*request.Paginator=&{[NextToken] [NextToken] MaxResults }) %!s(func(*request.Request) error=<nil>)}, Payload: {
I1208 18:26:17.479122       1 session.go:31] [ALB-INGRESS] [session] [INFO]:   Filters: [{
I1208 18:26:17.479126       1 session.go:31] [ALB-INGRESS] [session] [INFO]:       Name: "resource-id",
I1208 18:26:17.479129       1 session.go:31] [ALB-INGRESS] [session] [INFO]:       Values: ["sg-XXX"]
I1208 18:26:17.479132       1 session.go:31] [ALB-INGRESS] [session] [INFO]:     }]
I1208 18:26:17.479135       1 session.go:31] [ALB-INGRESS] [session] [INFO]: }
I1208 18:26:17.479271       1 session.go:31] [ALB-INGRESS] [session] [INFO]: Request: ec2/&{DescribeTags POST / %!s(*request.Paginator=&{[NextToken] [NextToken] MaxResults }) %!s(func(*request.Request) error=<nil>)}, Payload: {
I1208 18:26:17.479287       1 session.go:31] [ALB-INGRESS] [session] [INFO]:   Filters: [{
I1208 18:26:17.479294       1 session.go:31] [ALB-INGRESS] [session] [INFO]:       Name: "resource-id",
I1208 18:26:17.479303       1 session.go:31] [ALB-INGRESS] [session] [INFO]:       Values: ["sg-XXX"]
I1208 18:26:17.479323       1 session.go:31] [ALB-INGRESS] [session] [INFO]:     }]
I1208 18:26:17.479333       1 session.go:31] [ALB-INGRESS] [session] [INFO]: }

Hi @kaseyalusi are you still having problems? I ask because I'm using 1.0-alpha.7 with kiam and everything looks good for us: we did have to add a bunch of IAM permissions to our kiam configuration when we upgraded from 0.X to 1.0-alpha.Y though (mainly around WAF and some extra ec2) - so maybe try the newer build again but with more open IAM permissions?

_Side note_
We did have some problems running an old version of the helm chart though: the liveness probe was failing as we suspect the AWS API calls in /healthz were being rate limited.
I've just submitted a PR to prompt a discussion on this (https://github.com/kubernetes-sigs/aws-alb-ingress-controller/pull/406)

I've run into a similar issue - the ALB is allocated correctly and routes set up, but the /healthz endpoint never comes up and so the pod gets endlessly restarted. I'll hack around it for now by just adding a stupidly long timeout.

Looking at the AWS debug logs, it's trying to call waf-regional/GetWebACLForResource - the endpoint doesn't exist in the region I'm running in (eu-west-2) - might be the root cause on my end

There is discussion in #439 about disabling services that are not supported in some regions. In the mean time I think WAF can be taken out of the HC.

I've modified how the healthz endpoint works in ##439 to run the AWS tests on an interval outside of the /healthz endpoint.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jcderr picture jcderr  路  3Comments

madhu131313 picture madhu131313  路  3Comments

jchoi926 picture jchoi926  路  3Comments

sawanoboly picture sawanoboly  路  5Comments

jwickens picture jwickens  路  4Comments