Aws-load-balancer-controller: Getting 502 when running a deployment rolling update

Created on 9 Jan 2020 · 11Comments · Source: kubernetes-sigs/aws-load-balancer-controller

When running a rolling update of a deployment, the ALB returns a lot of 502's.
Seems like the ALB is not synced to the actual pod state in k8s.

I can see that when a pod is being replaced, the alb controller registers the new pod in the target group and removes the old one. The problem is that the ALB state is for the new pod is initial and for the old one is draining - causing the service to be unavailable and return 502.

Service is completely unavailable when it's a single pod service, with bigger services with multiple pods I can see there is a spike in 502's which resolves itself after a while (all new pods gets to the healthy state in the alb).

Ideally, the old pod should not terminate before the new one gets to a healthy state in the ALB. Of course, k8s is not aware of that.

Is this a known issue? any acceptable workarounds?

Source

idanya

👀1

All 11 comments

You need the readiness gate if you are on flat networking and Pods are direct targets for the ALB. There is a PR sitting around which needs to be merged for this

https://github.com/kubernetes-sigs/aws-alb-ingress-controller/pull/955

Without that the deployment controller will move forward with its rolling updates regardless of whether target groups are up to date with regard to the new pods.

gaganapplatix on 13 Jan 2020

👍1

don't update the service.

wadefelix on 22 Mar 2020

😕2

@wadefelix i believe this is fixed with pod readiness probe.
The caveat is you need to add the pod readiness probe manually for now with a 40 sec sleep.

If it don't work, please send me with a repro case.

Sample pod spec:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-dp
spec:
  replicas: 200
  selector:
    matchLabels:
      app: my-dp
  strategy:
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: my-dp
    spec:
      readinessGates:
      - conditionType: target-health.alb.ingress.k8s.aws/my-ingress_my-dp_80
      containers:
      - name: server
        image: xx
        imagePullPolicy: Always
        ports:
        - containerPort: 8080
        env:
        - name: MESSAGE
          value: The Doctor
        lifecycle:
          preStop:
            exec:
              command: ["/bin/sh", "-c", "sleep 40"]
      terminationGracePeriodSeconds: 70

M00nF1sh on 24 Mar 2020

@M00nF1sh what's the purpose for the lifecycle hook if we have the pod readiness gate?

billyshambrook on 25 Mar 2020

@billyshambrook pod readiness gate only guarantee you always have pods for new connection during deployment(e.g. when controller experiencing timeouts for AWS APIs, the deployment is paused).
But it don't protect existing connection and new connection to old pods. Controller deregisterTarget takes time, ALB propagate targets changes to its nodes takes time.

M00nF1sh on 25 Mar 2020

👍1

Thanks @M00nF1sh for the fast response. Does kubernetes provide a hook like pod readiness gate but for when a external process (alb in this case) has finished before it starts to terminate without having to rely on a sleep? Though you would then need to have a way of knowing once the ALB has propagated the drain request...

Just wondering what would be the next steps to take to harden this even more

billyshambrook on 25 Mar 2020

@billyshambrook
Currently the only thing to prevent pod container deletion is lifecycleHooks. (finalizer won't work).
Ideally the lifecyleHook can query a in-cluster service hosted by this controller for removal status(too complicated than a 40 sec sleep TBH :D)

M00nF1sh on 25 Mar 2020

👍1

@wadefelix i believe this is fixed with pod readiness probe.
The caveat is you need to add the pod readiness probe manually for now with a 40 sec sleep.

If it don't work, please send me with a repro case.

Sample pod spec:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-dp
spec:
  replicas: 200
  selector:
    matchLabels:
      app: my-dp
  strategy:
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: my-dp
    spec:
      readinessGates:
      - conditionType: target-health.alb.ingress.k8s.aws/my-ingress_my-dp_80
      containers:
      - name: server
        image: xx
        imagePullPolicy: Always
        ports:
        - containerPort: 8080
        env:
        - name: MESSAGE
          value: The Doctor
        lifecycle:
          preStop:
            exec:
              command: ["/bin/sh", "-c", "sleep 40"]
      terminationGracePeriodSeconds: 70

If the svc object is update/recreate, the nodePort is changed by eks, So the alb cannot redirect the requests to the old pods.

wadefelix on 2 Apr 2020

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale