Flux: Helm operator on AKS randomly deletes releases

Created on 15 Oct 2018 · 3Comments · Source: fluxcd/flux

When Tiller runs in AKS (Azure's hosted Kubernetes service), it sometimes gets into an inconsistent state because of a combination of a known networking issue in AKS (Azure's hosted Kubernetes service) and client-go not handling intermittent network failures nicely.

This causes the helm operator to occasionally reinstall a release (instead of upgrading it as it should), which fails because the release already exists. The helm operator then purges the "failed" release.

A workaround for this (until the AKS team can apply it globally) is to apply the environment variables mentioned in Azure/AKS#676 to the helm operator pod.

FAQ helm

Source

brantb

Most helpful comment

Resolved in #1530

stefanprodan on 19 Nov 2018

🎉3

All 3 comments

1446 adds an `extraEnvs` value to the helm chart, so you can apply the workaround like so:

helmOperator:
  extraEnvs:
  - name: KUBERNETES_PORT_443_TCP_ADDR
    value: <your-fqdn-prefix>.hcp.<region>.azmk8s.io
  - name: KUBERNETES_PORT
    value: tcp://<your-fqdn-prefix>.hcp.<region>.azmk8s.io:443
  - name: KUBERNETES_PORT_443_TCP
    value: tcp://<your-fqdn-prefix>.hcp.<region>.azmk8s.io:443
  - name: KUBERNETES_SERVICE_HOST
    value: <your-fqdn-prefix>.hcp.<region>.azmk8s.io

brantb on 15 Oct 2018

Are there any fixes we can apply within helm-op itself?