Prometheus-operator: Documentation of user specific findings/questions

Created on 13 Jan 2017 · 32Comments · Source: prometheus-operator/prometheus-operator

A list for every user of prometheus-operator that misses something from the documentation that should be documented :wink: (If a finding/question is missing let me know and I'll add).

[x] Preferred way to run exporters on the cluster for deployed applications (which don't have prometheus metrics support) (see #114)
[x] Persistence for prometheus collected data (using Kubernetes PVs) (see #170)
[x] Exposing prometheus instances to the outside/"public"
[ ] How to secure/authenticate against exposed Prometheus/Alertmanager
[x] Add Grafana to the Prometheus "stack" (see #115)
[x] Getting started with Prometheus-Operator
[ ] Meta Monitoring
[x] High availability scheme (#117)
[x] RBAC requirements (#153)
[x] Network policies (#156)
[ ] Upgrading cluster monitoring components (+ Effects on the running "system")
[ ] Checking if prometheus-operator is running correctly
[x] alerting

Components Doc

Grafana Watcher

[ ] What is the use case of it? (Taken from #245)
[ ] Example usage? (Taken from #245)
[ ] How grafana dashboard must be modified? (Taken from #245)

User Guides

[x] Cluster Monitoring
[x] Application Monitoring
[x] Example Exporter Configs/Manifests (this is kube-prometheus, which is planned to be merged with this repository)
[ ] Pushing (cronjob) metrics to Pushgateway
[x] Use Kubernetes Ingress to expose Prometheus

Source

galexrt

👍2

Most helpful comment

122 is merged so ticking off

Exposing prometheus instances to the outside/"public"

And there is one more thing I added to the list which is somewhat related to the one solved with #122, which is regarding how to secure the access to the now exposed web UIs.

brancz on 1 Feb 2017

👍2 🎉1

All 32 comments

I opened #114 . It should cover:

Preferred way to run exporters on the cluster for deployed applications (which don't have prometheus metrics support)
node_exporters volume mounts for cronjob pushing

Could you have a look at it @galexrt and tell me whether it makes sense? Thanks! 🙂

brancz on 24 Jan 2017

@brancz If you have some time, I'd like to hear some feedback from you if the new points on the list are partly not documented or should be handled in their separate own issues (most likely as feature requests).

galexrt on 24 Jan 2017

Sure thing!

Persistence for prometheus collected data (using Kubernetes PVs)
Exposing prometheus instances to the outside/"public"

These are definitely on my list, however, for the second one I'm unsure what you meant the difference to be compared to this one:

How to persist Prometheus data

And I'm also unsure what you meant by this one:

node_exporters volume mounts for cronjob pushing

Lastly

Add Grafana to the Prometheus "stack"

Is already on our roadmap 🙂 and often requested, but haven't found any time to work on this yet. But stay tuned, we will get there! You can find a general sneak peak of what that may look like in kube-prometheus (manifests/grafana, but in a more dynamic way).

Appreciate the time you are taking for this!

brancz on 24 Jan 2017

I opened #115, so I'll tick off that last point about Grafana for the sake of this documentation issue staying on track.

brancz on 24 Jan 2017

👍1

How to persist Prometheus data

I removed it. As it is a duplicate of:

Persistence for prometheus collected data (using Kubernetes PVs)

node_exporters volume mounts for cronjob pushing

changed/renamed it to:

Pushing (cronjob) metrics to Pushgateway

I think even if it is not a good/preferred way, there will be people interested in using it.
For example when having a cronjob in Kubernetes, that dumps a database and pushes the time/duration to Prometheus as a metric.

galexrt on 25 Jan 2017

Sounds good! I'll make sure we add docs for those 🙂

brancz on 25 Jan 2017

👍1

122 is merged so ticking off

Exposing prometheus instances to the outside/"public"

And there is one more thing I added to the list which is somewhat related to the one solved with #122, which is regarding how to secure the access to the now exposed web UIs.

brancz on 1 Feb 2017

👍2 🎉1

Awesome, thank you!
This is perfect as I'm just getting ready to setup a new Kubernetes cluster :smile:

galexrt on 1 Feb 2017

I have added "Getting started with prometheus-operator" beause the blog post is out of date and parts of it don't work anymore..
With that said it would be nice to have a new "Getting started" guide that helps new users use prometheus-operator for their Kubernetes clusters and applications.

galexrt on 6 Feb 2017

👍1

@brancz I just read through the documentation/definitions of Prometheus and I think we can tick "Persistence for Prometheus collected data (using Kubernetes PVs)" off the list as it is documented in API definition?
Also after the merge of #153, can the point "RBAC requirements" be ticked off too?

A new point (maybe this is what "Meta Monitoring" is) how can I target the Prometheus instance itself? The Prometheus instance service doesn't have any labels to use for selecting?

galexrt on 21 Feb 2017

1) It is indeed documented, but people seem to have trouble using it so we want to write a guide on how to use PVs with the Operator ideally with an example on AWS for example.

2) Yes after #153 is merged RBAC can be ticked off :tada:

3) That is exactly what meta monitoring is :slightly_smiling_face:. You are absolutely right though, the prometheus-operated named Service will need a label to be able to use it with a ServiceMonitor. Feel free to open an issue for this, so we can discuss what labeling would be appropriate. For the time being you can create your own Service that doesn't have a colliding name and label it accordingly. Generally though what we meant with the Meta Monitoring documentation/user-guide is to layout a reasonable approach of how to do meta monitoring, best practices etc.

Btw thanks for all your feedback, this is very valuable to us!

brancz on 21 Feb 2017

👍2

Wouldn't a ServiceMonitor selecting on app=prometheus do the job just fine?
Given that they need to specify the ServiceMonitor themselves, it seems more explicit if they do it on the label they picked for their per-Prometheus services.

Doesn't speak against labeling the governing service in general of course. I'd just be worried that people accidentally select somewhere on said label as they are generally not even aware that the governing service exists.

fabxc on 21 Feb 2017

That would work with a PodMonitor, but with a ServiceMonitor we'll need a Service that is properly labelled and currently all the Services created have no labels at all. Let's move the discussion of how to label into #157, to keep this issue on topic for documentation.

brancz on 21 Feb 2017

153 is merged so RBAC is ticked off.

brancz on 22 Feb 2017

👍1

Added "Network policies" which will be addressed by #156.

brancz on 22 Feb 2017

It'll be good to know the right way to upgrade kube-prometheus and all the related components.

gianrubio on 22 Feb 2017

Added

- [ ] Upgrading cluster monitoring components

User Guides
- [ ] Cluster Monitoring
- [ ] Application Monitoring

Where the first refers to what @gianrubio mentioned, and the two user-guides refer to the migration of kube-prometheus into this repository.

brancz on 22 Feb 2017

👍1

Prometheus lacks TLS and authentication and doesn't intend to add it. In a couple places people suggest using an Ingress Controller to add TLS and authentication for external, however that eggshell approach still leaves Prometheus access and traffic wide open within the cluster.

I suggest that 'prometheus-operator' could handle that by adding a side-car proxy container that could provide TLS and some form(s) or authentication. Ideally an existing simple container build (e.g. haproxy:alpine) that can be configured with a ConfigMap.

This would address the 'How to secure/authenticate against exposed Prometheus/Alertmanager' item.

whereisaaron on 23 Feb 2017

@whereisaaron Inter-cluster communication via TLS is a general problem, so I would like to gather existing solutions and see if we can come up with something that is not that tightly coupled, hard to change in the future, and still gives the freedom of configuring it to your need. This may end up with always having an nginx sidecar that you can provide a configmap for, but IIRC there are efforts to solve this general class of problem.

The initial thought behind "How to secure/authenticate against exposed Prometheus/Alertmanager" was rather towards authentication, for example using the bitly/oauth2_proxy, but TLS is definitely also something we need to address. This may become more clear of a space with the TLS bootstrapping work for Kubernetes components themselves.

brancz on 27 Feb 2017

170 was merged so ticking off `Persistence for prometheus collected data (using Kubernetes PVs)`

brancz on 27 Feb 2017

👍1

Getting started with Prometheus-Operator

Expanding on some questions that I would like to see this answer before users have them:

I have never interacted with a TPR via kubectl.
- What are some examples of commands to show that these were installed correctly?
- How do I list all of my instances that are running?
How do I know the operator is running correctly?
What is the workflow for updating one of my objects?
- When it is updated, what actions are triggered?
- Are there any pitfalls I should be aware of? Downtime?
I haven't used StatefulSets with Kubernetes before.
- Briefly, how are they used by the operator?
Guide me through using Prometheus against a sample app, that's already instrumented.
- etcd, cadvisor or the nginx ingress are probably great examples
- how do I set up my endpoints to hit these locations? Keep in mind I'm not a Kubernetes expert, and may have no idea where these things are running.
I want to expose the web interface to other coworkers. There are docs for this, but nothing that shows me how something real like example.com will be used to route to the interface.

robszumski on 8 Mar 2017

@robszumski I have added some of your points to the list (I changed the wording to generalize them).

Some of the topics you mentioned are covered by the Kubernetes docs. prometheus-operator should provide examples for them too, so users can adapt them to their situation.

galexrt on 8 Mar 2017

With #188 merged we can tick off "cluster monitoring", "getting started", and a new section I added "alerting".

brancz on 8 Mar 2017

156 merged so ticking off `Network policies`. Thanks again @gianrubio !

brancz on 15 Mar 2017

👍2

I ticked off the point "Use Kubernetes Ingress to expose Prometheus" as example Ingresses have been added to the docs (see https://github.com/coreos/prometheus-operator/blob/master/Documentation/user-guides/exposing-prometheus-and-alertmanager.md#ingress).

@brancz I think that "Application Monitoring" can also be ticked off. What do you think?

galexrt on 29 Mar 2017

Agreed. Ticked off "Application Monitoring".

brancz on 31 Mar 2017

👍1

@brancz would you please tell me how should I export dashboards of grafana after modifying?

cemo on 27 Apr 2017

Sure thing. I'm assuming you're using kube-prometheus. You export the dashboard through the grafana UI and save it to disk. Assuming you are in the contrib/kube-prometheus directory. You move the dashboard json to assets/grafana/ (or replace the existing json with the new one if you just want to update an existing one). Then you run hack/scripts/generate-manifests.sh. It might be worth having this in the kube-prometheus docs. Do you want to create a PR with an example?

brancz on 27 Apr 2017

I am not using kube-prometheus but using manifests and all other things. I wanted to stay a little bid in a hard way to learn deeply. I will send a PR regarding it.

cemo on 27 Apr 2017

My 2 cents as a newbie.
Once you realise the potential of this operator, it blows your mind. But until then it leaves you wondering, "well, how do I make it work". For a regular Joe (like me) who is just starting out with Prometheus the Getting started guide is a bit vague and goes into details about the operator, and not how to do monitoring with it.
Then there is an example app which imho should also represent a more real world scenario by showing all steps from getting metrics till displaying it on grafana. Cause as a newbie I tried that approach for nginx, and was left with another prometheus instance with which I was not sure what to do (hook up as new datasource into grafana? launch new grafana instance?)

So as I understand, my ideal getting started guide would consist of these steps and should be purely centered around kube-prometheus :

Getting Prometheus and Grafana up and running by using kube-prometheus with steps:
set memory limits
set retention policy
plugin storage
Interacting with Grafana
how do I add/edit dashboards (generate-dashboards-configmap.sh)
change authentication (grafana-credentials.yaml)
Using ServiceMonitors
collecting metrics from custom service (maybe nginx? or some sample python app). A real world example
placing ServiceMonitors in namespaces
Using different Prometheus instances for ServiceMonitors or feeding everything into the "master" Prometheus instance (I'm still not sure what is the best use case here)

So this guide would be a real start or wizard/tutorial for a newbie who just want to "Monitor their Kubernetes with Prometheus" without caring about the details of the operator.

This is not a rant, but suggestion to contribute. If owners think something likes this would be usefull, I could create something at the start of June.

gytisgreitai on 10 May 2017

👍1

@gytisgreitai For now we want to keep all Grafana related docs in the /contrib folder. The documentation in the /Documentation folder should be focused on the Prometheus Operator.

I would suggest adding these suggestions to the /contrib/kube-prometheus README.md and /docs folder. Improving these documentations would help us a lot, so PRs are very welcome!

mxinden on 10 May 2017

I'll close this issue at this point as it used to be a "catch all" documentation issue, but most of the documentation is there and for anything else that comes up, feel free to open a new issue :slightly_smiling_face: .

brancz on 16 Oct 2017

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Etcd monitoring using prometheus operator

simox-83 · 47Comments

Maintain a helm chart

izelnakri · 32Comments

Monitoring Kubernetes PersistentVolumes

kaarolch · 28Comments

[Exporter] How to use the blackbox_exporter with prometheus-operator?

galexrt · 81Comments

Permission denied writing to mount using volumeClaimTemplate

IBMRob · 53Comments

Prometheus-operator: Documentation of user specific findings/questions

Components Doc

Grafana Watcher

User Guides

Most helpful comment

122 is merged so ticking off

All 32 comments

122 is merged so ticking off

153 is merged so RBAC is ticked off.

170 was merged so ticking off Persistence for prometheus collected data (using Kubernetes PVs)

156 merged so ticking off Network policies. Thanks again @gianrubio !

Related issues

170 was merged so ticking off `Persistence for prometheus collected data (using Kubernetes PVs)`

156 merged so ticking off `Network policies`. Thanks again @gianrubio !