Velero: Should prometheus operator CRDs be part of Velero helm chart?

Created on 26 Jan 2020 · 17Comments · Source: vmware-tanzu/velero

Velero apparently publishes some metrics which can be scraped by Prometheus. Where can I find documentation about how to configure Velero or/and Prometheus to be able to view these metrics in Prometheus and to have them displayed in a Grafana dashboard like this one for exampe ?
I did not find any explanation about this (and saw that the problem has already been reported in #1500 but no clear answer has been given)
Thank you

AreDocumentation Metrics

Source

yogeek

Most helpful comment

I would like to re-open since this is a common practice to optionally declare a ServiceMonitor, PodMonitor or PrometheusRule within a Helm Chart, and it does NOT interfere with anything if well written.

The idea of having a Helm Chart is to manage everything needed to run a Velero, one having to manually add other manifests defeats the whole purpose. Although of course it can't support all use cases, Prometheus-operator is widely used in Helm Charts and seems to be the de-factor way to monitor (I think more than 90% of the Charts out there allow to declare it).

So the question is: is there a problem to keep ServiceMonitor/PodMonitor/PrometheusRule in the Chart if this is optional, even if not done from the CLI?

desaintmartin on 5 Aug 2020

👍4

All 17 comments

@yogeek Here are some troubleshooting steps for you to follow as a mitigation to this lack of documentation.

Confirm, from your helm cart values, that Velero's prometheus metrics publishing is enabled. https://github.com/vmware-tanzu/helm-charts/blob/master/charts/velero/values.yaml#L44
And the Velero server pods have the necessary annotations https://www.weave.works/docs/cloud/latest/tasks/monitor/configuration-k8s/#per-pod-prometheus-annotations
Confirm, from prometheus UI, that the velero pod is one of the targets that's being scraped.

ashish-amarnath on 27 Jan 2020

👍2

/assign
/lifecycle active

ashish-amarnath on 27 Jan 2020

@ashish-amarnath Sorry about the delay, but I am not using the Helm chart to install Velero but the CLI : how can I achieve to create the ServiceMonitor with the CLI please ?
I am using Prometheus Operator and apparently, serviceMonitor is disabled by default : how configure this with velero install command ?

Maybe the troubleshooting documentation should be updated to cover the Prometheus Operator case ? (Confirm that your velero deployment has metrics publishing enabled is not sufficient in this case, the ServiceMonitor must also be created right ?)

yogeek on 27 Apr 2020

👍1

@yogeek I don't see us setting up the prometheus operator service monitor in our CLI install and I am surprised that we do that in the helm chart. If you are using prometheus operator and would like to create a service monitor object, you can use the template from https://github.com/vmware-tanzu/helm-charts/blob/4bd96bcce551ea4690d9fd9924814e54fbbc6146/charts/velero/templates/servicemonitor.yaml to create one for yourself. We will look into how we can support this scenario in the CLI install.

Summary:
There is a mismatch between what users get when they install velero using the CLI and the helm chart. Specifically, the helm charts create ServiceMonitor Prometheus operator CRDs that the CLI install doesn't.

Re-opening this issue to discuss how to bridge this gap.
Should prometheus operator CRDs, specifically ServiceMonitor, be part of our helm chart?
Or
Should the velero CLI install also create these CRDs.
cc @carlisia @nrb @skriss

ashish-amarnath on 27 Apr 2020

👍1

@ashish-amarnath it doesn't seem to me like we'd want to include those in our chart; ideally we'd just point to the relevant Prometheus chart.

skriss on 4 May 2020

I agree we would just point to the prometheus chart. The reason I asked was because it's already included it in our helm chart. ServiceMonitor is a prometheus operator CRD.

ashish-amarnath on 4 May 2020

Yeah, no idea when it got in there. I think it would make sense to remove it and just have a documentation section pointing to the relevant Prometheus references. Similarly for the CLI - we shouldn't install it, but we can have a docs section pointing to the Prometheus refs.

skriss on 4 May 2020

I can take it up to remove it.
cc @carlisia

ashish-amarnath on 4 May 2020

👀1

@ashish-amarnath should we close this issue out and open one in the helm-charts repo to track this work?

skriss on 19 May 2020

Yes, we should do that
Closing this one.

ashish-amarnath on 19 May 2020

So the question is: is there a problem to keep ServiceMonitor/PodMonitor/PrometheusRule in the Chart if this is optional, even if not done from the CLI?

desaintmartin on 5 Aug 2020

👍4

@desaintmartin
Thank you for asking, but the ServiceMonitor/PodMonitor/PrometheusRule is not part of this helm chart because:

Keeping ServiceMonitor/PodMonitor/PrometheusRule in this helm chart will require this repo keeping up with the Prometheus monitoring stack when newer versions of monitoring are out.
Maintainers of this helm chart are not fully familiar with the Prometheus monitoring stack and will be unable support this feature. We don't want to provide services that we can't support.
This increases the support responsibility for the maintainers of this helm chart.
The ServiceMonitor/PodMonitor/PrometheusRule is not a part of Velero functionality.
For our users, not using Prometheus, we may end up creating CRDs that should not exist in their clusters.

Ideally, the Prometheus helm chart should have templating that supports adding multiple services that it can monitor.

ashish-amarnath on 5 Aug 2020

Thanks, I understand the reasons, however 1/2. those resources have not changed for years now and are very stable and 5. since this is an optional feature, it does not really apply

Note that the CRDs themselves were not created in previous versions of the chart, only objects of those CRDs, which is really different. At worst, an error stating that the resource type does not exists was thrown in case user enabled the ServiceMonitor without a proper Prometheus Operator installation.

At the very least, a major version of the chart should have been increased to mark the deletion of the feature, being a breaking change (i've upgraded to latest version, and it silently broke my alerting).

desaintmartin on 5 Aug 2020

👍2

@desaintmartin I agree with the versioning point you make. I am sorry about breaking your alerting.

ashish-amarnath on 5 Aug 2020

@desaintmartin I think you raise some good points here. At the very least, I think we missed on messaging here, which silently broke monitoring, and for that, I apologize. We should have incremented the major version and broadcast the change.

To your point about having the Helm Chart be an all-in-one destination for Velero management, I can see it. I'm going to give this some more thought.

nrb on 5 Aug 2020

@nrb Hi. Your last statement was in August and I also belonging to the guys whos monitoring was broken since I updated to this version of velero. It is not good at all to just put this there without any documentation or release notes when it is a breaking change. I agree with @desaintmartin that these are only objects of CRD and not the CRD itself which will be created by Prometheus. And if there are really changes about the CRD in Prometheus (which I cannot really believe) then the community will let you know that something isn't working anymore and will also try to find and help.
Many maintainers of helm charts have metrics included and the creation of CRD objects because this makes life much easier than configure it for any helm chart in prometheus.