I am deploying HelmRelease through flux. And many times Flux applies the manifest but HelmOperator fails due to bad configuration. How can I trace a bad deployment?
e.g. I updated a chart version from 1.0 to 1.1
Scenario 1:
chart 1.1 does not exist, Flux applies the manifest successfully but HelmOperator errors saying Chart does not exist.
Scenario 2:
chart 1.1 gets deployed but there was a configuration issue. Now chart 1.0 is removed but chart 1.1 also cannot be deployed due to config issue increasing down time.
Is there a way we can track/trace these two scenarios without having to look into logs manually And ensuring 100 percent uptime?
Can there be any alert/any other method for this so that I know exactly if a helm release fails and if I can either roll back or fix to avoid downtime
.status.conditions in the HelmRelease will usually suggest what is going on with a particular release -- especially if it's a problem fetching the chartDoes that answer your question @usamaahmadkhan ? If not, what would you like to see (i.e., what should be the next step here)?
@squaremo Even if it roll backs, there isn't any way for a developer to know that his latest change is not published and it was failed due to a certain reason except looking into logs of operator pod. I believe there should be a UI for this. I have created an issue for this
Flux v2, based on the GitOps Toolkit, has support for health assessment of deployments https://toolkit.fluxcd.io/components/kustomize/kustomization/#health-assessment
Most helpful comment
@squaremo Even if it roll backs, there isn't any way for a developer to know that his latest change is not published and it was failed due to a certain reason except looking into logs of operator pod. I believe there should be a UI for this. I have created an issue for this