I'm tracking some Helm issues / PR's here relating to this repo. I've summarized some troubleshooting steps and information about helm releases here.
Helm hooks used in this chart are k8s resources version controlled by helm, but simply created before upgrades in our case. These are supposed to be deleted automatically by thanks to bugs in helm, this isn't the case. To clean up this mess:
# The script will delete resources that were meant to be temporary
# The bug that caused this is fixed in version 0.7b1 of the Helm chart
NAMESPACE=<YOUR-NAMESPACE>
resource_types="daemonset,serviceaccount,clusterrole,clusterrolebinding,job"
for bad_resource in $(kubectl get $resource_types --namespace $NAMESPACE | grep '/pre-pull' | awk '{print $1}');
do
kubectl delete $bad_resource --namespace $NAMESPACE --now
done
kubectl delete $resource_types --selector hub.jupyter.org/deletable=true --namespace $NAMESPACE --now
Helm has some bugs causing multiple revisions become considered as deployed, this can cause bugs while installing such as "... not found ...". Helm keeps track of this by looking at configmaps, we can manually clean up some of these.
# get and overview releases
helm list
RELEASE_NAME=<YOUR-RELEASE>
# get and overview of the revisions
helm history $RELEASE_NAME
# check if you have multiple revisions in a DEPLOYED status (a bug)
kubectl get cm -n kube-system --selector "NAME=$RELEASE_NAME,STATUS in (DEPLOYED)"
kubectl delete cm -n kube-system <list all but the most recent DEPLOYED revision configmaps separated with spaces>
# optional clean up of other revisions
kubectl delete cm -n kube-system --selector "STATUS in (FAILED,SUPERSEDED,DELETED)
--force flag in helm upgradeWhen a change to a k8s resource is made that kubectl is not allowed to patch, helm must delete it and then add it anew. That is when we need the --force flag. In between 0.6 and 0.7 of the helm chart, we will need the --force flag.
... already exist ... errors. Something we should recommend for stability.helm.sh/hook-delete-policy=before-hook-creation. It specifies that Tiller should delete the previous hook before the new hook is launched.helm#4384 Support for k8s 1.11 not established. (Resolved in 2.11)
hook-image-puller daemonset isn't deleted on an upgrade failure. With #758 this will be circumvented though by using the before-hook-creation hook delete policy so this is no longer important.helm#3811 fixes a bug which makes us able to relax the yamllint config.
helm#3837 resolves a yaml detail. With this we no longer need to pipe | trimSuffix '\n' when using the toYaml helper in order to please the yaml-linter. _Update_: The PR was reverted, can hopefully be introduced again in Helm 3.0.0.
helm#3811 resolves another yaml-linting detail for (#625)
| trimSuffix '\n' introduced in #625 with Helm 3.0.0 and forward perhaps (#3837, #3888)I am running Helm/Tiller v2.8.1. I've got a number of pods that I cannot get rid of. I delete them and they return. I am not too clear on image pre-puller states. Will I need to delete and the current deployment and redeploy for these pods to be removed?

@jgerardsimcock Helm 2.8.2 introduced a bugfix making the functionality to automatically delete stuff created in a temporary pre-helm upgrade-state also known as helm hooks finally work as intended. To solve the issue long term, we need Helm 2.8.2!
The pods will reappear since the pods are controlled by a DaemonSet, and their purpose is to make sure there is a pod for every node. If you delete them, they will remove the pods themselves.
So in summary, I recommend the following:
kubectl delete ds,sa,clusterrole,clusterrolebinding,job --selector hub.jupyter.org/deletable=true/cc @tracek
I recalled memories wrong, the reason for wanting Helm 2.8.2 that I was thinking of was actually kubernetes/helm#3539. It will save us from "... already exist ..." errors that can come from having had some failed helm upgrades in the past.
The issue that I was thinking should have been resolved in 2.8.2, which caused objects not to be removed, is still unmerged in kubernetes/helm/3540. So we will manually need to cleanup stuff with the step 2 above when helm upgrades fails during the hook phase for now, it won't affect us in general though.
@yuvipanda, with the before-hook-creation delete policy, we could have one single image puller daemonset that would be able to do the work of both the continuous-image-puller as well as the hook-image-puller at the same time, I think... I would need to consider if it works works even though switching the hook/continuous enabled flags in in all possible manners. It might end up lowering and increasing the complexity at the same time, I'm not confident on what option would end up the most robust.
hook|continuous|Single DS setup idea...
-|-|-
enabled|enabled|Use the before-hook-creation option for hook-delete-policy
enabled|disabled|Use the current hook-image-puller DS annotations
disabled|enabled|Use the current continuous-image-puller DS annotations (none)
I am having the same problem, with hook pods hanging around that I cannot delete. Solution 2 returned a "No resources found" for me. Is there anything new on this front regarding the removal of hook-image-puller pods that will not delete? I am on Helm 2.8.2.
@jgerardsimcock, did you ever figure out a solution to this problem?
@tjcrone there are so many bugs that are associated with this as far as i recall, so I've lost track, but the summary to avoid trouble is.
helm delete <asdf> --purge and start fresh, but some resources created by helm hooks won't be cleaned up, you may be able to find and delete them all with kubectl delete ds,sa,clusterrole,clusterrolebinding,job --selector hub.jupyter.org/deletable=true (also add in your namespace as a flag --namespace asdf)Thanks @consideRatio! I added the namespace to the delete command and it worked. (I should have known to add this, duh.) I have also disabled the hook prepuller in my config file. Problem solved!
@consideRatio haha - I'm now maintaining the cluster that @jgerardsimcock put together for our research team and found this thread while trying to upgrade a different deployment! Can confirm that this was super helpful :) Thank you all so much for the incredible work!
Closing this as outdated as we now recommend usage of Helm 3, but will mention that it can be nice to have a troubleshooting section in general in a todo list I'm building up.
Most helpful comment
@consideRatio haha - I'm now maintaining the cluster that @jgerardsimcock put together for our research team and found this thread while trying to upgrade a different deployment! Can confirm that this was super helpful :) Thank you all so much for the incredible work!