Hello,
Thanks for developing this controller. I have been using it to expose a number of services from an EKS cluster. I'm running into an issue and hopeful that someone could point me in the right direction or offer a debugging approach.
I have a service foo-service that is a NodePort service. I have installed ALB and set it up to use my AWS keys (rather than the IAM RBAC configuration option).
Here's my service:
apiVersion: v1
kind: Service
metadata:
name: foo-service
spec:
type: NodePort
ports:
- name: service
port: 80
protocol: TCP
targetPort: 3000
selector:
app: foo
Here is my ingress:
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
annotations:
kubernetes.io/ingress.class: alb
alb.ingress.kubernetes.io/subnets: "subnet-foo,subnet-bar"
alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}]'
alb.ingress.kubernetes.io/tags: Component=foo,Creator=Robert Quinlivan
alb.ingress.kubernetes.io/scheme: internet-facing
name: foo-ingress
spec:
rules:
- http:
paths:
- backend:
serviceName: foo-service
servicePort: 80
path: /*
host: foo.mycompany.com
After I apply this, I see a ELB hostname pop up in the ingress. After I apply the Route53 config to point foo.mycompany.com to that host name it works great. However, I get a large number of 502 "Bad Gateway" responses, seemingly at random, that render the service unusable.
I can successfully make a request to foo.mycompany.com that will return successfully about half the time. Half the time it returns a 502. I am reasonably sure it isn't a 502 bubbling up from the service itself, because if I port-forward to it (e.g. kubectl port-forward service/foo-server 9000:80) it works fine. In addition, the 502 response has the following headers which suggest it is indeed the ELB that is causing the 502:
< HTTP/1.1 502 Bad Gateway
HTTP/1.1 502 Bad Gateway
< Server: awselb/2.0
Server: awselb/2.0
< Date: Thu, 08 Aug 2019 15:48:54 GMT
Date: Thu, 08 Aug 2019 15:48:54 GMT
< Content-Type: text/html
Content-Type: text/html
< Content-Length: 138
Content-Length: 138
< Connection: keep-alive
Connection: keep-alive
It would appear that the ALB controller did not configure the ELB correctly, or there is some configuration issue between ELB and Kubernetes that needs to be resolved. I don't see anything very useful from Cloudwatch metrics, just a verification that the load balancer is indeed sending a lot of 502s.
Any idea where to go from here?
Thanks
A 502 half the time might indicate that one of the nodes (assuming a 2 node cluster) is unhealthy for some reason. Not sure how this works in EKS but in an kops on EC2 world you can get this behaviour. A request that hits the node that the Pod actually resides on will succeed, whereas a request that hits a node where the Pod isn't hosted will fail. This was down to the kube-proxy not being able to forward to the Pod on the other node and was down to some missing security group rules. I also ran into some edge case where I needed to disable SRC/DST check on ENIs, but I don't think that's the case here.
@allanyung Can you post the missing rules here?
I ran across the same issue running on EKS. Loading a site will split requests between 200 and 502.
Solved it including annotation alb.ingress.kubernetes.io/target-type: ip for Ingress resource.
In summary, to achieve zero downtime deployment, you need
馃槃
closing this issue.
Most helpful comment
I ran across the same issue running on EKS. Loading a site will split requests between 200 and 502.
Solved it including annotation
alb.ingress.kubernetes.io/target-type: ipfor Ingress resource.