Charts: [stable/prometheus-operator] prometheus-operator-kubelet service objects are not deleted on helm del

Created on 11 Nov 2019 · 13Comments · Source: helm/charts

Describe the bug
After removing prometheus-operator helm chart from Kubernetes cluster, the kubelet scraper service objects still exist. This leads to duplicate/triplicate scraped values on next operator installations and causes much confusion.

Version of Helm and Kubernetes:

$ helm version
Client: &version.Version{SemVer:"v2.14.3", GitCommit:"0e7f3b6637f7af8fcfddb3d2941fcc7cbebb0085", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.14.3", GitCommit:"0e7f3b6637f7af8fcfddb3d2941fcc7cbebb0085", GitTreeState:"clean"}

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-19T11:13:54Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.6", GitCommit:"96fac5cd13a5dc064f7d9f4f23030a6aeface6cc", GitTreeState:"clean", BuildDate:"2019-08-19T11:05:16Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}



md5-b80aff89d74f023d5ec16664eed46c4f



$ helm ls prometheus-operator
NAME                    REVISION        UPDATED                         STATUS          CHART                           APP VERSION     NAMESPACE
prometheus-operator     2               Fri Oct 18 09:24:07 2019        DEPLOYED        prometheus-operator-6.18.0      0.32.0          monitoring



md5-bfe9027fdcfb9e257b5ecbb04e1c4f2a



$ kg svc -n kube-system -l k8s-app=kubelet
NAME                                 TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)     AGE
halting-stoat-prometheus-o-kubelet   ClusterIP   None         <none>        10250/TCP   48d
monitoring-prometheus-oper-kubelet   ClusterIP   None         <none>        10250/TCP   48d
prometheus-operator-kubelet          ClusterIP   None         <none>        10250/TCP   48d



md5-574346e1dae529fbd96995ec27fac056



$: helm install stable/prometheus-operator --name test-install
$: helm delete test-install --purge
$: kubectl get service -n kube-system -l k8s-app=kubelet
NAME    TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)         AGE
test-install-prometheus-op-kubelet    ClusterIP   None             <none>        10250/TCP      8m15s

Anything else we need to know:
Already reported before: https://github.com/helm/charts/issues/14595
and mentioned here: https://github.com/helm/charts/issues/13669

lifecyclstale

Source

artazar

👍1

Most helpful comment

This service is created dynamically by coreos/prometheus-operator and not managed by the helm chart. Unfortunately there is nothing that can be done to clean this up from the helm perspective.

vsliouniaev on 12 Nov 2019

👍2

All 13 comments

This service is created dynamically by coreos/prometheus-operator and not managed by the helm chart. Unfortunately there is nothing that can be done to clean this up from the helm perspective.

vsliouniaev on 12 Nov 2019

👍2

OK, I see, this is not managed by helm chart like other services, but there should be some way to address this.
Can the creation procedure be improved to check if there's already an existing service?
Can there be Helm 3 improvements that would help to deal with this better?

If there appears to be no way to handle this, can you please adjust the 'Removal' procedure in the README and add instructions to remove it manually please?

artazar on 13 Nov 2019

👍1

Will see what can be done!

vsliouniaev on 13 Nov 2019

I've experimented with a cleanup process but the more I look at this the more it looks like Helm 2 does not allow a post-delete cleanup of the kind necessary to get this done properly. I'm afraid only documentation is possible in this case.

vsliouniaev on 18 Nov 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

stale[bot] on 18 Dec 2019

This issue is being automatically closed due to inactivity.

stale[bot] on 1 Jan 2020

i have save it.

example
kubectl get svc -n kube-system -l k8s-app=kubelet
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
halting-stoat-prometheus-o-kubelet ClusterIP None 10250/TCP 48d
monitoring-prometheus-oper-kubelet ClusterIP None 10250/TCP 48d
prometheus-operator-kubelet ClusterIP None 10250/TCP 48d

Keep the name of your recent installation，Delete others

lianfulei on 18 Jul 2020

@vsliouniaev Documentation will be really helpful on this. We burnt through several hours trying figure out a double counting issue as a result of a previously enabled kubelet service in another prometheus deployment.

sameerbhadouria on 3 Aug 2020

@sameerbhadouria i've tried to delete ALL kubelet service and kubelet endpoints (hoping that prometheus-operator recreate them)
But this doesn't happen and now i haven't them in the cluster..
The operator parameters in input are:

     Args:                                                                                                                                                                                                       
│       --kubelet-service=kube-system/kubelet                                                                                                                                                                     
│       --logtostderr=true                                                                                                                                                                                        
│       --config-reloader-image=quay.io/coreos/configmap-reload:v0.0.1                                                                                                                                            
│       --prometheus-config-reloader=quay.io/coreos/prometheus-config-reloader:v0.29.0                                                                                                                            
│       --log-level=all

Do you have any tips about that?

pie-r on 3 Sep 2020

@pie-r What version of the chart are you using?
I didn't face issue recreating the services. The creation of Kubelet service is enabled by default, make sure you have it enabled. Also, make sure you are looking at the right namespace (kube-system) for the service. In some dashboards (for eg: GKE), they have a filter to hide system objects Is system object: False, and you won't see kube-system services if you have this filter enabled.

Next I would try to upgrade the release version with helm upgrade. Worst case if you are still testing this setup, may be just delete the helm release deployment. Make sure to also delete the CRDs.

kubectl delete crd prometheuses.monitoring.coreos.com
kubectl delete crd prometheusrules.monitoring.coreos.com
kubectl delete crd servicemonitors.monitoring.coreos.com
kubectl delete crd podmonitors.monitoring.coreos.com
kubectl delete crd alertmanagers.monitoring.coreos.com
kubectl delete crd thanosrulers.monitoring.coreos.com

Hope this helps!

sameerbhadouria on 4 Sep 2020

@pie-r What version of the chart are you using?
I didn't face issue recreating the services. The creation of Kubelet service is enabled by default, make sure you have it enabled. Also, make sure you are looking at the right namespace (kube-system) for the service. In some dashboards (for eg: GKE), they have a filter to hide system objects Is system object: False, and you won't see kube-system services if you have this filter enabled.

Next I would try to upgrade the release version with helm upgrade. Worst case if you are still testing this setup, may be just delete the helm release deployment. Make sure to also delete the CRDs.
kubectl delete crd prometheuses.monitoring.coreos.com
kubectl delete crd prometheusrules.monitoring.coreos.com
kubectl delete crd servicemonitors.monitoring.coreos.com
kubectl delete crd podmonitors.monitoring.coreos.com
kubectl delete crd alertmanagers.monitoring.coreos.com
kubectl delete crd thanosrulers.monitoring.coreos.com
Hope this helps!

Thank you for your help and fast reply!
I'm not using helm but I come here because the issue was very similar. I tried a lot of thing:

Remove all the rows referring to endpoint in kubectl -n kube-system edit endpoints kubelet
Delete service kubelet from kube-system kubectl -n kube-system delete svc kubelet kubelet
Delete endpoints kubelet in kube-system kubectl -n kube-system delete endpoints kubelet
Force an error in the deployment:

Args:                                                                                                                                                                                                       
│       --kubelet-service=kube-system::kubelet

Remove the option --kubelet-service from the ARGS
Reset the option

`Args:                                                                                                                                                                                                       
│       --kubelet-service=kube-system/kubelet

Restart Prometheus-operator killing it after every step (only)

Nothing trigger the recreation of service and endpoint..
I also read the source code and it seems very easy, each 3 minutes it tries to createOrUpdateEnpoints and also Service. But it should be enabled setting it in the ARGS the fals --kubelet-service. It breaks after upgrading from GKE 1.15 to 1.16.

pie-r on 4 Sep 2020

/reopen