A list for every user of prometheus-operator that misses something from the documentation that should be documented :wink: (If a finding/question is missing let me know and I'll add).
I opened #114 . It should cover:
Could you have a look at it @galexrt and tell me whether it makes sense? Thanks! 馃檪
@brancz If you have some time, I'd like to hear some feedback from you if the new points on the list are partly not documented or should be handled in their separate own issues (most likely as feature requests).
Sure thing!
Persistence for prometheus collected data (using Kubernetes PVs)
Exposing prometheus instances to the outside/"public"
These are definitely on my list, however, for the second one I'm unsure what you meant the difference to be compared to this one:
How to persist Prometheus data
And I'm also unsure what you meant by this one:
node_exporters volume mounts for cronjob pushing
Lastly
Add Grafana to the Prometheus "stack"
Is already on our roadmap 馃檪 and often requested, but haven't found any time to work on this yet. But stay tuned, we will get there! You can find a general sneak peak of what that may look like in kube-prometheus (manifests/grafana, but in a more dynamic way).
Appreciate the time you are taking for this!
I opened #115, so I'll tick off that last point about Grafana for the sake of this documentation issue staying on track.
How to persist Prometheus data
I removed it. As it is a duplicate of:
Persistence for prometheus collected data (using Kubernetes PVs)
node_exporters volume mounts for cronjob pushing
changed/renamed it to:
Pushing (cronjob) metrics to Pushgateway
I think even if it is not a good/preferred way, there will be people interested in using it.
For example when having a cronjob in Kubernetes, that dumps a database and pushes the time/duration to Prometheus as a metric.
Sounds good! I'll make sure we add docs for those 馃檪
Exposing prometheus instances to the outside/"public"
And there is one more thing I added to the list which is somewhat related to the one solved with #122, which is regarding how to secure the access to the now exposed web UIs.
Awesome, thank you!
This is perfect as I'm just getting ready to setup a new Kubernetes cluster :smile:
I have added "Getting started with prometheus-operator" beause the blog post is out of date and parts of it don't work anymore..
With that said it would be nice to have a new "Getting started" guide that helps new users use prometheus-operator for their Kubernetes clusters and applications.
@brancz I just read through the documentation/definitions of Prometheus and I think we can tick "Persistence for Prometheus collected data (using Kubernetes PVs)" off the list as it is documented in API definition?
Also after the merge of #153, can the point "RBAC requirements" be ticked off too?
A new point (maybe this is what "Meta Monitoring" is) how can I target the Prometheus instance itself? The Prometheus instance service doesn't have any labels to use for selecting?
1) It is indeed documented, but people seem to have trouble using it so we want to write a guide on how to use PVs with the Operator ideally with an example on AWS for example.
2) Yes after #153 is merged RBAC can be ticked off :tada:
3) That is exactly what meta monitoring is :slightly_smiling_face:. You are absolutely right though, the prometheus-operated named Service will need a label to be able to use it with a ServiceMonitor. Feel free to open an issue for this, so we can discuss what labeling would be appropriate. For the time being you can create your own Service that doesn't have a colliding name and label it accordingly. Generally though what we meant with the Meta Monitoring documentation/user-guide is to layout a reasonable approach of how to do meta monitoring, best practices etc.
Btw thanks for all your feedback, this is very valuable to us!
Wouldn't a ServiceMonitor selecting on app=prometheus do the job just fine?
Given that they need to specify the ServiceMonitor themselves, it seems more explicit if they do it on the label they picked for their per-Prometheus services.
Doesn't speak against labeling the governing service in general of course. I'd just be worried that people accidentally select somewhere on said label as they are generally not even aware that the governing service exists.
That would work with a PodMonitor, but with a ServiceMonitor we'll need a Service that is properly labelled and currently all the Services created have no labels at all. Let's move the discussion of how to label into #157, to keep this issue on topic for documentation.
Added "Network policies" which will be addressed by #156.
It'll be good to know the right way to upgrade kube-prometheus and all the related components.
Added
- [ ] Upgrading cluster monitoring components
User Guides
- [ ] Cluster Monitoring
- [ ] Application Monitoring
Where the first refers to what @gianrubio mentioned, and the two user-guides refer to the migration of kube-prometheus into this repository.
Prometheus lacks TLS and authentication and doesn't intend to add it. In a couple places people suggest using an Ingress Controller to add TLS and authentication for external, however that eggshell approach still leaves Prometheus access and traffic wide open within the cluster.
I suggest that 'prometheus-operator' could handle that by adding a side-car proxy container that could provide TLS and some form(s) or authentication. Ideally an existing simple container build (e.g. haproxy:alpine) that can be configured with a ConfigMap.
This would address the 'How to secure/authenticate against exposed Prometheus/Alertmanager' item.
@whereisaaron Inter-cluster communication via TLS is a general problem, so I would like to gather existing solutions and see if we can come up with something that is not that tightly coupled, hard to change in the future, and still gives the freedom of configuring it to your need. This may end up with always having an nginx sidecar that you can provide a configmap for, but IIRC there are efforts to solve this general class of problem.
The initial thought behind "How to secure/authenticate against exposed Prometheus/Alertmanager" was rather towards authentication, for example using the bitly/oauth2_proxy, but TLS is definitely also something we need to address. This may become more clear of a space with the TLS bootstrapping work for Kubernetes components themselves.
Persistence for prometheus collected data (using Kubernetes PVs)Getting started with Prometheus-Operator
Expanding on some questions that I would like to see this answer before users have them:
kubectl.example.com will be used to route to the interface.@robszumski I have added some of your points to the list (I changed the wording to generalize them).
Some of the topics you mentioned are covered by the Kubernetes docs. prometheus-operator should provide examples for them too, so users can adapt them to their situation.
With #188 merged we can tick off "cluster monitoring", "getting started", and a new section I added "alerting".
Network policies. Thanks again @gianrubio !I ticked off the point "Use Kubernetes Ingress to expose Prometheus" as example Ingresses have been added to the docs (see https://github.com/coreos/prometheus-operator/blob/master/Documentation/user-guides/exposing-prometheus-and-alertmanager.md#ingress).
@brancz I think that "Application Monitoring" can also be ticked off. What do you think?
Agreed. Ticked off "Application Monitoring".
@brancz would you please tell me how should I export dashboards of grafana after modifying?
Sure thing. I'm assuming you're using kube-prometheus. You export the dashboard through the grafana UI and save it to disk. Assuming you are in the contrib/kube-prometheus directory. You move the dashboard json to assets/grafana/ (or replace the existing json with the new one if you just want to update an existing one). Then you run hack/scripts/generate-manifests.sh. It might be worth having this in the kube-prometheus docs. Do you want to create a PR with an example?
I am not using kube-prometheus but using manifests and all other things. I wanted to stay a little bid in a hard way to learn deeply. I will send a PR regarding it.
My 2 cents as a newbie.
Once you realise the potential of this operator, it blows your mind. But until then it leaves you wondering, "well, how do I make it work". For a regular Joe (like me) who is just starting out with Prometheus the Getting started guide is a bit vague and goes into details about the operator, and not how to do monitoring with it.
Then there is an example app which imho should also represent a more real world scenario by showing all steps from getting metrics till displaying it on grafana. Cause as a newbie I tried that approach for nginx, and was left with another prometheus instance with which I was not sure what to do (hook up as new datasource into grafana? launch new grafana instance?)
So as I understand, my ideal getting started guide would consist of these steps and should be purely centered around kube-prometheus :
plugin storage
Interacting with Grafana
generate-dashboards-configmap.sh)change authentication (grafana-credentials.yaml)
Using ServiceMonitors
So this guide would be a real start or wizard/tutorial for a newbie who just want to "Monitor their Kubernetes with Prometheus" without caring about the details of the operator.
This is not a rant, but suggestion to contribute. If owners think something likes this would be usefull, I could create something at the start of June.
@gytisgreitai For now we want to keep all Grafana related docs in the /contrib folder. The documentation in the /Documentation folder should be focused on the Prometheus Operator.
I would suggest adding these suggestions to the /contrib/kube-prometheus README.md and /docs folder. Improving these documentations would help us a lot, so PRs are very welcome!
I'll close this issue at this point as it used to be a "catch all" documentation issue, but most of the documentation is there and for anything else that comes up, feel free to open a new issue :slightly_smiling_face: .
Most helpful comment
122 is merged so ticking off
And there is one more thing I added to the list which is somewhat related to the one solved with #122, which is regarding how to secure the access to the now exposed web UIs.