I deleted one of our releases with helmfile -e cloud20 delete --args foo and for some reason it also tried to delete the k8-autoscaler. I cancelled out but the k8-autoscaler release stuck in a uninstalling state.
When I run the sync I get Error: UPGRADE FAILED: "k8s-autoscaler" has no deployed releases
I tried deleting the k8s-autoscaler release with helmfile delete and destroy. Now I'm in a odd situation where the release has been deleted from helm but not from helmfile :cry:
caleb@caleb-H110M-A ~/Documents/fifteen5-helmfiles $ helm list -A
'Tipz:' h list -A
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
fluent-bit logging 1 2020-11-23 23:43:42.654687134 +0000 UTC deployed fluent-bit-X.X.X X.X.X
metrics-server kube-system 2 2021-03-30 20:53:06.578141385 -0700 PDT deployed metrics-server-X.X.X X.X.X
namespaces default 1 2020-11-23 23:43:37.427180437 +0000 UTC deployed namespace-manager-X.X.X X.X.X
caleb@caleb-H110M-A ~/Documents/fifteen5-helmfiles $ helmfile -e cloud20 list
NAME NAMESPACE ENABLED LABELS
k8s-autoscaler kube-system true type:autoscaler
@caleb15 Hey! I think you can safely assume that the release has successfully uninstalled. helmfile list shows that the release is enabled or not, not installed/uninstalled.
I would double-check that there is nothing in that release's helm history. Sometimes I've noticed a release with a bad status in history but not show up in list -A.
helm history -n kube-system k8s-autoscaler
It'll look something like this:
REVISION UPDATED STATUS CHART APP VERSION DESCRIPTION
1 Sat Feb 6 08:47:13 2021 superseded k8s-autoscaler-x.x.x x.x.x Install complete
2 Wed Feb 17 02:12:04 2021 superseded k8s-autoscaler-x.x.x x.x.x Upgrade complete
3 Wed Apr 7 16:24:42 2021 superseded k8s-autoscaler-x.x.x x.x.x Upgrade complete
4 Wed Apr 7 16:27:33 2021 superseded k8s-autoscaler-x.x.x x.x.x Upgrade complete
5 Wed Apr 7 16:29:59 2021 superseded k8s-autoscaler-x.x.x x.x.x Upgrade complete
6 Wed Apr 7 16:39:49 2021 superseded k8s-autoscaler-x.x.x x.x.x Upgrade complete
7 Thu Apr 8 03:27:26 2021 deployed k8s-autoscaler-x.x.x x.x.x Upgrade complete
If the latest revision, for example the 7th status FAILED, you can rollback to the previous one (the 6th) if it was a good one:
helm rollback -n kube-system k8s-autoscaler 6
Then continue helmfile'ing from there.
@mumoshu the autoscaler release was not successfully uninstalled. The pods and various resources were still present.
helmfile list shows that the release is enabled or not, not installed/uninstalled.
Ah, it took me a bit but that makes sense now. I was confused as to what the "state file" was but now I realize it is helmfile.yaml and associated files (in our case, releases/k8s-autoscaler.yaml with installed: true), so that explains why it lists the autoscaler as enabled. So helmfile list shows me the state of the files, not necessarily the state of the cluster.
Furthermore I just found out that helmfile -A, despite what the -A would imply, does not show everything. -A shows all namespaces but only releases that are deployed or failed. To see all release types you need to use lowercase -a, like so:
caleb@caleb-H110M-A Documents/fifteen5-helmfiles (master *%) » helm -n kube-system list -a
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
k8s-autoscaler kube-system 1 2021-02-06 00:14:42.527422803 +0000 UTC uninstalling cluster-autoscaler-9.4.0 1.18.1
metrics-server kube-system 2 2021-03-30 20:53:06.578141385 -0700 PDT deployed metrics-server-5.3.3 0.4.1
So helmfile list and helm list are both operating as expected. So now we're left with the destroy and the helmfile sync issue. Let's start with helmfile sync:
helmfile sync runs helm upgrade --install, and the documentation for --install states "if a release by this name doesn't already exist, run an install." The release does exist, so it skips the install. However, the release is not deployed, so we get the "k8s-autoscaler" has no deployed releases error. The release is in this limbo uninstalling state where it's "there" enough for install to be skipped and "not there" enough for the upgrade to fail too!
I feel like there might be something helm/helmfile could do better in this scenario but I'm not sure what, and if it would be worth doing :thinking:
Anyways, we need to destroy the release so helmfile sync works. I tried helmfile destroy but it didn't work - I think there might be a bug with it:
caleb@caleb-H110M-A Documents/fifteen5-helmfiles (master *%) » helmfile -e cloud20 --interactive destroy
Listing releases matching ^k8s-autoscaler$
Affected releases are:
k8s-autoscaler (autoscaler/cluster-autoscaler)
Do you really want to delete?
Helmfile will delete all your releases, as shown above.
[y/n]: y
caleb@caleb-H110M-A Documents/fifteen5-helmfiles (master *%) » helm -n kube-system list -a
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
k8s-autoscaler kube-system 1 2021-02-06 00:14:42.527422803 +0000 UTC uninstalling cluster-autoscaler-9.4.0 1.18.1
metrics-server kube-system 2 2021-03-30 20:53:06.578141385 -0700 PDT deployed metrics-server-5.3.3 0.4.1
As you can see destroy did nothing. I wonder if it's due to the purge argument? The readme says it specifies --purge, but that is not a valid argument.
caleb@caleb-H110M-A Documents/fifteen5-helmfiles (master *%) » helm -n kube-system uninstall --purge k8s-autoscaler
Error: unknown flag: --purge
I was able to successfully destroy it with helm directly:
caleb@caleb-H110M-A Documents/fifteen5-helmfiles (master *%) » helm -n kube-system uninstall k8s-autoscaler 1 ↵
release "k8s-autoscaler" uninstalled
caleb@caleb-H110M-A Documents/fifteen5-helmfiles (master *%) » helm -n kube-system list -a
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
metrics-server kube-system 2 2021-03-30 20:53:06.578141385 -0700 PDT deployed metrics-server-5.3.3 0.4.1
Now the sync works! :tada:
@caleb15 Thanks for confirming!
I tried helmfile destroy but it didn't work - I think there might be a bug with it:
This turns out to be because helmfile deletes releases in deployed, pending, and failed statuses but uninstalling:
» helm -n kube-system list -a
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
k8s-autoscaler kube-system 1 2021-02-06 00:14:42.527422803 +0000 UTC uninstalling cluster-autoscaler-9.4.0 1.18.1
It was good to know that there is uninstalling. I'll try to enhance helmfile to also delete a release of that status :)
@caleb15 #1786 should fix the non-working destroy issue.
Thanks!
Most helpful comment
@mumoshu the autoscaler release was not successfully uninstalled. The pods and various resources were still present.
Ah, it took me a bit but that makes sense now. I was confused as to what the "state file" was but now I realize it is
helmfile.yamland associated files (in our case,releases/k8s-autoscaler.yamlwithinstalled: true), so that explains why it lists the autoscaler as enabled. Sohelmfile listshows me the state of the files, not necessarily the state of the cluster.Furthermore I just found out that
helmfile -A, despite what the -A would imply, does not show everything.-Ashows all namespaces but only releases that are deployed or failed. To see all release types you need to use lowercase-a, like so:So helmfile list and helm list are both operating as expected. So now we're left with the destroy and the helmfile sync issue. Let's start with helmfile sync:
helmfile syncrunshelm upgrade --install, and the documentation for--installstates "if a release by this name doesn't already exist, run an install." The release does exist, so it skips the install. However, the release is not deployed, so we get the"k8s-autoscaler" has no deployed releaseserror. The release is in this limbo uninstalling state where it's "there" enough for install to be skipped and "not there" enough for the upgrade to fail too!I feel like there might be something helm/helmfile could do better in this scenario but I'm not sure what, and if it would be worth doing :thinking:
Anyways, we need to destroy the release so helmfile sync works. I tried
helmfile destroybut it didn't work - I think there might be a bug with it:As you can see destroy did nothing. I wonder if it's due to the purge argument? The readme says it specifies
--purge, but that is not a valid argument.I was able to successfully destroy it with
helmdirectly:Now the sync works! :tada: