When Tiller runs in AKS (Azure's hosted Kubernetes service), it sometimes gets into an inconsistent state because of a combination of a known networking issue in AKS (Azure's hosted Kubernetes service) and client-go not handling intermittent network failures nicely.
This causes the helm operator to occasionally reinstall a release (instead of upgrading it as it should), which fails because the release already exists. The helm operator then purges the "failed" release.
A workaround for this (until the AKS team can apply it globally) is to apply the environment variables mentioned in Azure/AKS#676 to the helm operator pod.
extraEnvs value to the helm chart, so you can apply the workaround like so:helmOperator:
extraEnvs:
- name: KUBERNETES_PORT_443_TCP_ADDR
value: <your-fqdn-prefix>.hcp.<region>.azmk8s.io
- name: KUBERNETES_PORT
value: tcp://<your-fqdn-prefix>.hcp.<region>.azmk8s.io:443
- name: KUBERNETES_PORT_443_TCP
value: tcp://<your-fqdn-prefix>.hcp.<region>.azmk8s.io:443
- name: KUBERNETES_SERVICE_HOST
value: <your-fqdn-prefix>.hcp.<region>.azmk8s.io
Are there any fixes we can apply within helm-op itself?
Resolved in #1530
Most helpful comment
Resolved in #1530