Prometheus-operator: rework helm-charts

Created on 29 Aug 2017 · 44Comments · Source: prometheus-operator/prometheus-operator

I just had a chat with @mgoodness regarding the state and future of the kube-prometheus helm charts, and we wanted to share what our vision is regarding them and we invite anyone who wants to help on this to participate.

We imagine the general structure of the charts to be like this:

The kube-prometheus chart (this is mostly a "meta" chart that includes multiple other charts and adds the Prometheus alerting/recording rules, grafana dashboards as well as service-monitors.

It's dependencies are:

Prometheus chart (with opinionated configuration tailored for cluster monitoring, and to work with the alertmanager instance shipped)
Alertmanager chart
Prometheus Operator chart (because it's a dependency of the prometheus and alertmanager chart)
grafana chart (without dashboards, but preconfigured with the k8s prometheus as datasource) (this one is already actively maintained by @weiwei04)
kube-state-metrics chart
node-exporter chart

Something that we had previously done in the kube-prometheus charts is also consider the choice of how the web UIs of the above components are exposed, we'd like to move away from this and leave this completely up to the user.

If anyone is willing to work on any of these charts, contributions are highly appreciated and very welcome. The individual charts that are a dependency of the kube-prometheus chart should be relatively low maintenance as the upstream projects are not released that often.

helm stale

Source

brancz

👍10

Most helpful comment

+1 for have _some_ sort of documentation on an upgrade path/migration steps.

I'm in the middle of migrating right now, and other than this GH issue, there's not much documentation on what the current state is. So much has changed (alert rules, Grafana dashboards, removal of kube-prometheus, etc.), that the new chart location is essentially a new Helm chart entirely, and I've had to essentially drop and recreate our entire monitoring stack :(

Don't get me wrong, I think the new chart is structured better than the old one, but migration isn't a simple helm upgrade, or changing a few Helm values and then upgrading.

chrisob on 22 Nov 2018

👍8

All 44 comments

This will be a really good way forward.
The place I am working at for the moment is a heavy kubernetes, prometheus and helm user. I am willing to help with development of the HELM chart.
I think some of my colleagues are also willing to help with pushing the helm-charts forward and helping out the community.

Kind regards,
Mattias

MattiasGees on 8 Sep 2017

❤8

That sounds fantastic! Let me know if I help you in any way to get started :slightly_smiling_face: .

brancz on 8 Sep 2017

And @MattiasGees, don't be shy to say we are working for Skyscrapers. 😄

ringods on 8 Sep 2017

👍1

@brancz At my $DAYJOB, we're currently using the kube-prometheus manifests to deploy our prometheuses, alerts and dashboards. It's been great but we're looking into rewriting the manifests to helm charts just like you've described. We'd like to contribute to this too if you could point us to a place to get started.

Our background:

using Kubernetes 1.6 in prod for a couple of months
using prometheus, alert manager and grafana in multiple environments

Let me know :)

Thanks!

kevinjqiu on 8 Sep 2017

❤2

similar use case to @kevinjqiu . would definitely contribute back once our infra is set up.

ptagr on 13 Sep 2017

@punitag We have extended prometheus-operator with a new CRD AlertRule which is a simple object wrapper a prometheus rule. The operator syncs a ConfigMap for the AlertRule which gets picked up by the prometheus instance.

see https://github.com/coreos/prometheus-operator/issues/616

I'm not sure if this fits the vision of the project, but I think it could be a useful addition.

kevinjqiu on 13 Sep 2017

Hi @brancz : I can also collaborate with the helm charts for all the components, trying to make them as much flexible and configurable as possible (via values.yaml overwriting).

I think one of the most powerful features of helm is to create different manifests based on the input parameters, for example adding / removing containers (sidecars) based on the configuration, setting up resources (memory, specs, services, etc) based on configuration. I will share some examples I have created soon (i.e kafka with jmx_exporter and filebeat sidecars as options, supporting 1.6 and 1.7 statefulsets possibilities, etc).

I would say the objective should be to be able to install kube-prometheus via a helm charts with the default configuration, but also being able to prepare your own env-setup.yaml with specific configuration to modify the behavior / outcome of the final manifests, and all the possibilities we provide should be documented.

For example, we could have one setting in the yaml like .Values.grafana.service.enableLoadBalancer . If that setting is set to true, helm would create during install or upgrade a service with type LoadBalancer for external access. If it's set to false helm won't create the service at all.
That's just an example, to see if we are aligned.

I would say that just getting exactly the same from helm install or kube-prometheys hack/scripts should be only the first step, because we should aim to support as much configurable settings as we could.

eedugon on 14 Sep 2017

another example / idea.... :)
Prometheus persistent storage should be a simple option in the yaml for helm, so any user could enable persistence by just enabling something and setting the size of the persistent volume claim to create.
That's very easy to achieve too and I have it already written.

eedugon on 14 Sep 2017

I think any mean of accessing/exposing web UIs for example should be completely left to the user consuming the chart, so any Service, Ingress or whatever is not something that belongs in kube-prometheus or even less in the prometheus chart. Otherwise all means of accessing will have to be re-implemented in the charts, that feels like it defeats the purpose.

As I mentioned before I think we first need to make sure the "leaf" packages are robust. For example the kube-state-metrics helm chart needs options to disable certain collectors, which in turn reduces the required RBAC roles, and include the addon-resizer. As a kube-state-metrics helm chart already exist in the upstream stable charts, this would even make sense to contribute back there - the less charts have to be maintained here the better! Essentially the kube-state-metrics chart should only have the Deployment, RBAC roles and a ClusterIP type Service.

Alternatively and a bit more simple would be to create a chart for the node-exporter that really _only_ contains the node-exporter and a ClusterIP type Service. Same as the kube-state-metrics chart, this is probably even a candidate for the stable upstream charts.

Just to make sure I want to reiterate, that everyone is very welcome to contribute anything discussed here :slightly_smiling_face: . Thanks everyone for wanting to contribute, I'm hopeful we'll get all the charts into a good shape!

brancz on 14 Sep 2017

Related to this issue and https://github.com/coreos/prometheus-operator/issues/520, I wanted to mention that being able to add dashboard with helm install and upgrade would be very useful. Not being able to do that is what led me to looking at loading dashboards automatically through the watcher or operator. But really since they don't change except when a chart is installed or upgraded there is no reason for it, except for the fact that there is currently no way for installing a chart to update the grafana dashboards of an existing grafana instance.

This is likely restricted by the abilities of helm charts at this time, but something I wanted to mention anyway :)

tsloughter on 4 Oct 2017

👍1

I'm going to work on the reorganisation of the helm charts, the main task are:

[x] Migrate prometheus to v2.0
> the less charts have to be maintained here the better!
[x] Point dependencies like kube-state-metrics to kubernetes/charts repo, also figure out if there's more dependencies that can be pointed to the k8s chart repo.
[x] Keep kube-prometheus/{rules, grafana, alert} in sync with kube-prometheus chart
[x] Move kube-state-metrics to stable
[x] e2e test to the helm charts
[x] Write documentation for helm/kube-prometheus
[x] Create helm chart to metrics-server and add as dependency to kube-prometheus

Did I miss something?

gianrubio on 22 Nov 2017

🎉3

The upstream kube-state-metrics chart needs some more changes, but there is already a stable one. I outlined some things that should be done to it here: https://github.com/kubernetes/charts/pull/2124

brancz on 23 Nov 2017

Charts finally got synced with s3, everyone can enjoy prometheus 2.0 using helm :)

$ helm repo add coreos https://s3-eu-west-1.amazonaws.com/coreos-charts/stable/ …
$ helm install coreos/prometheus-operator --name prometheus-operator
$ helm install coreos/kube-prometheus --name kube-prometheus --set rbacEnable=true

gianrubio on 3 Jan 2018

🎉6

Hi,
don't know if currently there is any on going work on this, but I think it is a great idea. I have started to work in the development of the standalone alertmanager chart, and I have come to a point where I am not sure how to go on. Here are my questions/decisions to the moment:

Offer two installing modes, configurable through 'deploymentMode' parameter on values.yaml:
- Install as a statefulSet: For the case when you don't want to use PrometheusOperator
- Install through PrometheusOperator: As the current helm chart in this repo does. This opens a few questions for me:
  - Should this option only be used by umbrella charts (like kube-prometheus) or used on it's own?
  - If the latter, how should we manage the 'inherited' parameters like prometheus label on ServiceMonitor or Prometheus alertmanager rules configmap?

I already have a initial version and I intend to do a PR to incubator repo of official charts repo after cleaning up a bit. Will reference this issue there to maintain tracking.

JMLizano on 19 Mar 2018

1) If there is a case that you can't handle with the prometheus operator then please open an issue so we can discuss how to integrate your use case in a meaningful way into the operator
2) The helm charts here should only concern those deployments that use the prometheus operator. The helm charts are as is a giant maintenance burden (well handled thanks to @gianrubio ), we need to scope the effort a bit.

More generally the charts developed in this repo may move out of this repo eventually, but as long as they are as fast moving as today (we pretty much daily bump versions) I think they would drown in the upstream charts repo.

brancz on 19 Mar 2018

@gianrubio

Could you add one more task in your list?

Use stable/grafana more than own grafana chart? So we can easily get updates with new features, deal with secrets, etc. Because stable/grafana is mature, and get better supports.

ozbillwang on 18 Jul 2018

👍2

A couple of questions related to this activity:

Now that kube-prometheus/{rules,alerts,dashboards} have been migrated to mixins, what is the plan for keeping the corresponding parts of the emerging kube-prometheus Helm chart in sync? Maintain the master as mixins and generate what's needed for Helm?
Would it be correct to assume that jsonnet-based deployment of kube-prometheus is preferred over Helm-based? Or is the plan to support both on equal footing? Any expected differences in out-of-the box capabilities worth calling out?
If Grafana with ConfigMap -collecting sidecar (as in stable/grafana) is needed, would customizing jsonnet-based Grafana deployment in kube-prometheus with sidecar capabilities make sense? Or would it be better to use stable/grafana directly?

Thanks!

tallaxes on 17 Aug 2018

❤1

1) I think this would generally be possible, but a large an complicated undertaking, and my opinion is that this would just attempt to alleviate the shortcomings of helm and ignore that jsonnet is just a more sustainable approach (in helm every single point of customization needs to be explicitly specified, this is simply incompatible with Kubernetes where there are dozens of ways of even managing a single Service object, not to mention an entire complex monitoring stack :slightly_smiling_face: ).

2) Not commenting on the quality of the helm charts as we don't use them or maintain them, but fact is the helm charts are a downstream of the jsonnet based kube-prometheus and therefore easily get out of date. We use the jsonnet based kube-prometheus on hundreds of clusters and it goes through QA, so I'm rather confident in that code base. Note that we do pin against specific versions and merge additional things into the "upstream" kube-prometheus jsonnet objects to make it suit OpenShift (see here). Once the helm charts are extracted from this repository we do plan on extracting kube-prometheus out of this repository and into a separate repository and start doing versioned releases so people can better rely on proper versioning, rather than pinning against individual commits.

3) The reload sidecar was actually an artifact from Grafana 4.x, when Grafana did not support provisioning from files, but we collaborated with them and they brought the provisioning feature to Grafana 5.0, so the sidecar is not needed anymore.

Sorry for the wall of text, I hope that answers your questions :slightly_smiling_face:.

tl;dr yes I recommend the jsonnet based kube-prometheus

brancz on 17 Aug 2018

Thanks, that makes sense!

For 3, what I had in mind is the ability to add new dashboards by deploying new ConfigMap objects (similar to what's described in https://github.com/coreos/prometheus-operator/issues/520). Did grafana-watcher support that? The sidecar that stable/grafana Helm chart is using (kiwigrid/k8s-sidecar) appears to handle this (have not tried though ...)

tallaxes on 17 Aug 2018

It's probably best to take this into another issue or just have a completely separate discussion, but the kube-prometheus jsonnet documentation has documentation on how to add additional dashboards. Grafana does automatically load and reload new and changed dashboards as they appear on disk. If you have further questions, let's talk on slack or open a new issue, just to keep this issue on topic :slightly_smiling_face: .

brancz on 17 Aug 2018

my USD0.02, non-technical whining and humble imo_s :)
As I have stated in different issue I've just started playing around prometheus-operator and building my monitoring stack. First of all in labs I'm usually trying to replicate technology manually without ready packages. To get better knowledge about internal moving parts - so in case of this project manual means jsonnet.

I was trying to install compiled kube-prometheus for two days. Nothing worked. At the same time helm install worked just fine from the first try. It was Friday. I've decided to wait for Monday, repeat all the steps, and fill in issues. Well on Monday after dozen commits to the repo - jsonnet setup worked just fine.

Now i've to decide which one to use: jsonnet or helm.
At this point of time helm wins in my case. Why, because it looks little bit more stable for me.
but most important point for me is that jsonnet is too cryptic for me, and it doesn't look like end-userfriendly packaging technique for kubernetes. Working with dozens of technologies on daily basis, at some point I just don't want to learn another one, especially when it seems least important and just want something that works out of the box with less efforts. I'm not working at professional monitoring or kubernetes service provider. I'm usual consumer.

I absolutely agree with saying that helm confronts Kubernetes dynamic nature with time-to-time too much explicit hardcoding/declarations. But still, with couple additional input values helm helped me to have your great prometheus-operator (kube-prometheus) almost instantly.

Also documentation is not 100% good, it's like 95% good :) , especially readme in kube-prometheus. I would like to see more ready examples on how to compile jsonnet with custom configs, for example include "ingress" and stuff like that.

Of course I will try to contribute by PRs or by opening issues :)

Thanks for all your efforts and great product.

Sorry for the bigger wall of text :)

den-is on 17 Aug 2018

Our team also used the /helm directory to install kube-prometheus.
We saw the warnings about it being moved, but were not aware that the features were not up to date.
We use helm for all of our cluster config, so moving to jsonnet for managing the monitoring pipeline would be a divergence.

We're now running into how to manage dashboards.
This comment documents how to use the old grafana-watcher mechanism: https://github.com/coreos/prometheus-operator/issues/1251#issuecomment-414197176

However, this is now deprecated.

The current static kube-prometheus manifests don't use grafana-watcher anymore:
https://github.com/coreos/prometheus-operator/blob/05e6bb5/contrib/kube-prometheus/manifests/grafana-deployment.yaml

The dashboards are also no longer wrapped in grafana-watcher since Grafana 5 now supports file-based provisioning:
http://docs.grafana.org/administration/provisioning/

ref: https://github.com/coreos/prometheus-operator/blob/05e6bb5/contrib/kube-prometheus/manifests/grafana-dashboardDefinitions.yaml

@brancz @weiwei04 @vglafirov
It looks like we need to patch the helm charts to have a new strategy for loading dashboards. (https://github.com/helm/charts/pull/6765#issuecomment-409603179)
I can help out.

One thing is that it should still be possible to manage additional dashboards via a separate chart.
An umbrella chart could do this, but since every dashboard now has it's own ConfigMap, it makes it tricky to configure the Grafana Deployment's VolumeMounts and keep them in sync with all of the new dashboards.

This could also be done with a Job hook that mounts the configs and talks to the Grafana API, or a reconfiguration side-car / operator that manages the volume mounts based on a ConfigMap label selector... it's a little complex though.

stealthybox on 20 Aug 2018

(in helm every single point of customization needs to be explicitly specified, this is simply incompatible with Kubernetes where there are dozens of ways of even managing a single Service object

Helm 3's Lua extensions should help address this.

I will say though, that as a user, I hope that charts expose the common options for Service/Ingress config / labels / annotation overrides.
When charts don't have this stuff, I fork and contribute the patches -- this is kind of just the current state of Helm 2, but the userbase on the upstream charts/ repo keeps things pretty high quality.

stealthybox on 20 Aug 2018

@den-is we would love to hear what kind of improvements you would like to see for the jsonnet documentation! There is also already some stuff in the works for this, see #1732. And there is documentation for ingress here: https://github.com/coreos/prometheus-operator/blob/master/contrib/kube-prometheus/docs/exposing-prometheus-alertmanager-grafana-ingress.md. That said I do agree we need more examples of full setups. Any pull requests to add more documentation would be highly appreciated :slightly_smiling_face: .

@stealthybox The primary maintainer of the charts is @gianrubio, he is also working on upstreaming the charts, it's probably best to talk to him about improvements for the Grafana chart and how we could enable adding additional dashboards.

brancz on 20 Aug 2018

👍2

https://github.com/helm/charts/pull/6765#issuecomment-414266399

stealthybox on 20 Aug 2018

This is merged now and using the grafana sidecars

vsliouniaev on 24 Oct 2018

🎉2

I can see that prometheus-operator has been merged. I cannot find equivalent of kube-prometheus is that expected ? 🤔

mvisonneau on 2 Nov 2018

👍2

The prometheus-operator chart is actually more than just the operator and there is some discussion going on whether to rename the chart to kube-prometheus right away. I might still be missing parts though, I wasn't involved in the process.

metalmatze on 2 Nov 2018

👍1

Indeed after looking into it I figured out! thanks

mvisonneau on 3 Nov 2018

@metalmatze how can I upgrade coreos/prometheus-operator (0.0.29) to stable/prometheus-operator (0.1.15)?

helm ls

NAME                REVISION    UPDATED                     STATUS      CHART                       APP VERSION NAMESPACE
kube-prometheus     1           Thu Oct 18 09:07:19 2018    DEPLOYED    kube-prometheus-0.0.105                 monitoring
prometheus-operator 1           Fri Sep 21 22:19:24 2018    DEPLOYED    prometheus-operator-0.0.29  0.20.0      monitoring

shuraa on 7 Nov 2018

@shuraa there is not a direct upgrade path - the chart in stable/prometheus-operator has a different structure you may be able to do some upgrades depending on what your circumstances are:

The stable/prometheus-operator chart includes both coreos/kube-prometheus and coreos/prometheus-operator charts and all their dependencies too.
The cleanest way is if you can start from scratch - Delete both releases and install stable/prometheus-operator
If you want to keep some components then:
- Don't delete any of the CRDs and set prometheusOperator.createCustomResource: false in the new chart, prometheus-operator should handle upgrades there itself when the new pod comes up
- If you are using persistent storage you can specify the existing storage and attach it to the new Prometheus instances.
- If you are creating custom servicemonitors as part of the charts you can use those there also under prometheus.prometheusSpec.additionalServiceMonitors

vsliouniaev on 7 Nov 2018

👍2 😕1

@vsliouniaev, Vasily, thank you! I'll try to test your instructions in a vagrant environment.

shuraa on 8 Nov 2018

@vsliouniaev What about putting your answer into the readme for the stable/prometheus-operator.

There is still some confusion between prometheus-operator and kube-prometheus like what should you install on the cluster, why kube-prometheus was not ported to stable/charts. We faces these issues when we started a few weeks ago and it wasn't easy to figure out the answers. So these could be really helpful.

lcostea on 22 Nov 2018

Additionally, do you think it would be possible to have each of the components separated on the upstream charts, and actually call the collection "kube-prometheus"? It's causing even more blurred lines between what makes kube-prometheus and what is just the prometheus-operator.

brancz on 22 Nov 2018

@vsliouniaev What about putting your answer into the readme for the stable/prometheus-operator.

There is still some confusion between prometheus-operator and kube-prometheus like what should you install on the cluster, why kube-prometheus was not ported to stable/charts. We faces these issues when we started a few weeks ago and it wasn't easy to figure out the answers. So these could be really helpful.

As far as I understand the prometheus-operator chart available on helm/stable repo contains the prometheus-operator content as well as the kube-prometheus content from the coreos repo.

I have raised this question here yesterday, because we had lots of confusion with the current duplication as well: https://github.com/coreos/prometheus-operator/issues/2153

larsduelfer on 22 Nov 2018

+1 for have _some_ sort of documentation on an upgrade path/migration steps.

Don't get me wrong, I think the new chart is structured better than the old one, but migration isn't a simple helm upgrade, or changing a few Helm values and then upgrading.

chrisob on 22 Nov 2018

👍8

In my opinion kube-prometheus should probably never exists upstream because it is basically a meta Chart to get all other dependencies with just a few a minor resources included for monitoring. So it could exists in a repo like this as a "recommended" setup. As @metalmatze commented it's confusing because now upstream prometheus-operator Chart is sort of what kube-prometheus was, that in my opinion is an error.

Formally this is one of the few reasons why Helm Charts are just a pain to maintain and too opinionated in what to include and what to expose in values.yaml, so basically right now people are stuck with rebuilding their whole Prometheus stack from scratch because of this.

richerve on 4 Dec 2018

Agreed on all points @richerve (if you do end up "building the entire stack from scratch", do have a look at the jsonnet based alternative :slightly_smiling_face: ).

brancz on 4 Dec 2018

I don't understand why migration process renamed half of the variables and make changes so incompatible.
Doing a migration is pain in the ass.
I failed twice up to this moment. Every time I'm finding new blocker.

maver1ck on 18 Dec 2018

👍2

Are there plans to once again provide separate charts for the main subcomponents of prometheus-operator such as prometheus and alertmanager? It used to be useful to be able to declare dependencies on these charts for easily incorporating a Prometheus or AlertManager instance into another chart. This of course required prometheus-operator to already be installed in the cluster so that the CRDs and such were already available. What's the recommended way to rope in these templates now? Declaring a dependency on prometheus-operator seems too heavy since in the use case I'm trying to address, it is already deployed separately in the cluster. That means that most of the values exposed in the new chart would have to be toggled off. It seems to me that it'd be much cleaner to just declare a dependency on separate prometheus and alertmanager charts. Does that make sense?