What did you do?
Run the getting started guide, read all the docs.
What did you expect to see?
Some mention of arbitrary configuration of prometheus
Basically I want to set up remote storage for prometheus:
https://prometheus.io/docs/operating/configuration/#
Maybe I'm missing something and this is already supported however.
You can specify an arbitrary configuration for your Prometheus instances by creating a Prometheus object without setting the serviceMonitorSelector. Then you can manage theSecret named prometheus-<prometheus-object-name> yourself. The Prometheus configuration must be under the key prometheus.yaml.
We plan on supporting remote storage configuration one day, but currently it is not supported, and would therefore have to be done with a fully custom configuration for now.
As we've had a couple of questions around this we should probably start a separate user-guide around fully custom Prometheus configuration.
Great stuff. I'll close this then as this helps me well enough.
Reopening as we want to allow remote storage at some point. Probably a bit out there as the upstream API/configuration isn't stable yet though.
We might want to make custom config an explicit parameter at some point @brancz
The omitting of serviceMonitorSelector is pretty leaky by now as it also implies skipping of AM config etc.
Agreed.
Looking to do the same thing, as well. Would appreciate a user-guide around adding to the prometheus.yaml config, specifically adding a remote storage endpoint.
Started looking into this as well. It appears that enabling remote_read and remote_write natively via deployment configuration is the simplest path forward. Since with a custom configuration, by disabling serviceMonitorSelector we might have to add the serviceMonitor config by hand into the prometheus configuration.
Even though this is an experimental, feature, it would be nice to have this available in the operator as well.
natively via deployment configuration
@gdmello would you mind explaining what you mean by this? I'm looking for a nice way to doing this.
Previously I simply statically defined it in the prometheus-operator container - not a solution I'd like to continue using.
@wleese - I meant it would be easier if remote read/write just worked by modifying the prometheus configuration in /contrib/kube-prometheus/manifests/prometheus/prometheus-k8s.yaml (to add remote-read/ remote-write configuration) and deploying the operator with /contrib/kube-prometheus/hack/cluster-monitoring/deploy. Yes, you would have to checkout this repo first.
I forked this repo, and added this feature in, and tested on minikube only (will test in a k8s env next week). You can try it out - gavind/prometheus-operator-dev:787 <- use this image in /prometheus-operator/contrib/kube-prometheus/manifests/prometheus-operator/prometheus-operator.yaml and deploy using the above command.
My source is here.
@gdmello great stuff, many thanks.
@brancz - Is this a good start for a PR even though the feature is experimental? LTS is a production concern for organizations adopting prometheus-operator.
I'm ok with adding it (I haven't had a look at the code, yet but rather mean the general idea), however, we need to explicitly declare in the documentation that everything in this space is highly experimental just like it is in upstream Prometheus, and it can change from any release to the next, we can only support it on a best effort basis.
Out of curiosity, what is your long term storage? I am one a contributor to upstream Prometheus and follow the work around long term storage closely, but haven't observed any solutions that are actually production ready, hence my initial concerns.
Note though that there are some large changes coming up as we're deprecating the v1alpha1 objects and moving them to v1 (#555).
Out of curiosity, what is your long term storage?
I'd gladly add my answer, even though this wasn't addressed to me ;)
Prometheus -> Remote storage adapter -> InfluxCloud
but haven't observed any solutions that are actually production ready
Would you care to elaborate?
Beyond the remote read/write implementation itself being highly experimental, the influx adapter is actually just an example implementation, or at least started as one. Influx is working on integrating real support into influx itself for remote storage purposes, this will be a real solution although the remote read/write interface still remains highly experimental. And this support is still months away to be available in the latest releases of influx.
Again out of curiosity, why do you require long term storage?
@brancz
why do you require long term storage?
Because we value metrics for various purposes (14mil unique series, 40k/s, 90 day retention by default and more than 1000 grafana dashboards - not autogenerated btw). We're currently building up infra in the cloud and Graphite & Nagios simply doesn't fit our desire to have a low maintenance, cloud friendly (monitoring &) metrics platform (opentsdb didn't have the community, kairodb was too niche, etc, etc).
As for the long retention specifically: we have hundreds of services, some updated daily and some a few times a year. Having metrics that reach far back helps teams that might have inherited an old service to understand what they're working on. Further more, our traffic is very dependent on season. Some of our metrics have a >365d retention time to compare to last year.
@brancz - I'm considering Prometheus -> Adapter -> InfluxDB as well.
Again out of curiosity, why do you require long term storage?
Three reasons-
- Prometheus design philosophy suggests not storing data in the long term. (Based on reading some PRs, I've understood that to mean no more than a month or so.)
- We are looking at Prometheus to replace possibly one of 2 commercial monitoring/ alerting solutions, and reduce/consolidate our spending.
- We store business metrics as well (req/s etc), though this might be frowned upon, and so need to look at historical data over the last year or year over year comparisons.
the influx adapter is actually just an example implementation
Why not make the remote/ read write interface and examples mainstream? It would not depart from the design philosophy of staying out of distributed storage.
Influx is working on integrating real support into influx itself for remote storage purposes, this will be a
real solution although the remote read/write interface still remains highly experimental.
This is great to hear! Are there any reservations with using InfluxDB at this point (apart from the read/write interface being experimental)?
All of the above sounds great! Thanks for sharing!
This is great to hear! Are they any reservations with using InfluxDB at this point (apart from the read/write interface being experimental)?
This is really up to you, whether you are willing to buy into the ecosystem and pay for clustering, as if you have single nodes, then Prometheus on it's own will outperform influx simply because it can optimize the time series database for the concrete use case.
I would strongly suggest to look whether your requirements could be satisfied using Prometheus federation. I know that multiple companies simply federate out the metrics that are important for longer retention times, and this works very well and is much easier than maintaining a centralized distributed time-series database (like influx).
We store business metrics as well (req/s etc), though this might be frowned upon, and so need to look at historical data over the last year or year over year comparisons.
This is a totally legit use case and can very easily be achieved with Prometheus federation, by federating only those metrics that you care about for a year to year comparison. The "what if" argument usually comes into play at this point and I invite you to let go of collectors urge :slightly_smiling_face: .
Prometheus design philosophy suggests not storing data in the long term. (Based on reading some PRs, I've understood that to mean no more than a month or so.)
The way I see it is that the primary use case is that the ingesting Prometheus servers are viewed as a high performance real time monitoring, dashboarding and alerting solution with relatively short retention (whatever retention shows you whether new deployments are successful for example). The most value people get out of monitoring is alerting and detecting patterns longer than 1 month are extremely rare from my experience.
Influx (or any distributed tsdb) seems like an easy solution to all the problems but comes at the cost of having to run a very complicated distributed system (plus likely cost of the actual software). Prometheus federation can help in almost any scenario and makes you realize and think about the metrics you actually care about. Nonetheless we see the wish which is why remote read/write is a thing :slightly_smiling_face: .
As I said, this is ultimately your decision and there are other also totally legit use cases for remote read/write so we're not against introducing it, just want to make sure people explore not as complex alternatives.
Thanks @brancz! will look into Prometheus Federation.
@brancz - so had a look at federation, and I have 2 concerns with it-
For LTS, the only viable option seems InfluxDB atm, waiting on their pricing for the Enterprise version.
Is it possible to add an additionalConfiguration field? It seems like this type of request and need comes up often. Then the prometheus operator would be able to just append this block to the end of the prometheus configuration.
ex:
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: k8s
labels:
prometheus: k8s
spec:
replicas: 2
version: v1.7.0
serviceAccountName: prometheus-k8s
serviceMonitorSelector:
matchExpressions:
- {key: k8s-app, operator: Exists}
additionalConfiguration: |-
remote_write:
- url: "http://localhost:9201/write"
I鈥檝e mentioned this a couple times, if we allow passing arbitrary arguments then there are no guarantees for upgrades which is a large reason for why the Prometheus operator exists. We will properly integrate remote read has one of the next feature though. It might be experimental in the beginning.
Ok cool, I haven't seen the upgrade concern in the issues I have read and I see that as a very valid concern. Thank you, I look forward to the newer version that supports remote read / write.
Yeah for some users (certainly myself included), this is a really important feature. We have a pretty sizable influx setup (load tested to handle peak traffic of 500,000 metrics per second) and want to use influx for long term storage of metrics from prometheus. This lets us do longer term analysis on app metrics we pull via prometheus.
hi, guys
I am currently using Prometheus 1.8.2 managed by prometheus-operator, and want to upgrade to prometheus 2.0. Because theirs data formats are not compatible, I followed the migration guide to deploy a new prometheus 2.0 instance and configure it to read old prometheus data, but found it's impossible to configure remote_read in PrometheusSpec.
How about adding a Prometheus2 resource type or version v2 of Prometheus?
Certain versions of 1.x also support remote read, we will support remote read/write support with the Prometheus Operator, we just haven't gotten around to it. If anyone wants to take a stab at it, PRs are highly appreciated! (if you need some guidance I'm happy to help, even if anyone is new to Go :slightly_smiling_face:)
Per these release notes, it would seem that remote_storage is now supported in the latest release. Is this ticket still open pending documentation of feature usage?
RemoteReadSpec RemoteWriteSpec and an example.
Here is an example of how to configure influxdb for this using the newer influx 1.4.x
Yes the feature now exists, but it鈥檚 true that we should add some real docs. I鈥檒l leave this open until we have those.
Any news about proper documentation on this? Or could you please, provide an example how it supposed to be used?
I don't have time to write the docs right now, but the generated API docs are up to date: https://github.com/coreos/prometheus-operator/blob/master/Documentation/api.md
See the remote read/write specs of the PrometheusSpec. If someone figures it out, please contribute docs! :slightly_smiling_face: (as a hint, the config itself is pretty much exactly the same as the upstream ones)
after trying to understand for few hours i still can't simply find a file where i should add the remote configuration lines (like in regular prometheus.yml). can you just point to that file pls? thanks
Say you have a Prometheus object, like this:
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: prometheus
labels:
prometheus: prometheus
spec:
replicas: 2
serviceAccountName: prometheus
serviceMonitorSelector:
matchLabels:
team: frontend
alerting:
alertmanagers:
- namespace: default
name: alertmanager
port: web
Then the remote storage field just goes under the spec field:
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: prometheus
labels:
prometheus: prometheus
spec:
replicas: 2
serviceAccountName: prometheus
serviceMonitorSelector:
matchLabels:
team: frontend
alerting:
alertmanagers:
- namespace: default
name: alertmanager
port: web
remoteWrite:
url: "https://my-remote-write-backend/"
... more fields ...
remoteRead:
url: "https://my-remote-read-backend/"
... more fields ...
As pointed out, the PrometheusSpec contains the remoteWrite and remoteRead fields which have a configuration of their own respectively.
Let us know if that is useful and as always it would be awesome to have anyone new to the matter submit a pull request to improve the docs, as we are very close to the development of these topics so it's often hard to judge what's clear enough and what needs to be elaborated further. Thanks! :slightly_smiling_face:
We want to use the Helm Chart at https://github.com/coreos/prometheus-operator/tree/master/helm/kube-prometheus to deploy Prometheus. How exacly would we go about adding remoteWrite there? I cannot see in in the values.yaml or templates.
@StianOvrevage the helm chart allows you so specify remoteRead and remoteWrite configurations: https://github.com/coreos/prometheus-operator/blob/39fe3f673a81886345ccdecd521cd90407eb31de/helm/prometheus/values.yaml#L126-L136
Yes, I saw that.
But prometheus-operator/helm/prometheus contains ONLY prometheus right? I'm using prometheus-operator/helm/kube-prometheus which is the umbrella chart for all the components, alertmanager, grafana and prometheus.
What I can't figure out is how to set remoteWrite when using the kube-prometheus umbrella chart. If I don't have to I would like to not manually install and keep up with all the charts referenced in kube-prometheus just to change this one setting.
cc @gianrubio
I added a deployPrometheus toggle to the helm chart. In the same process I tried adding/copying over the remoteWrite and remoteRead sections from helm/prometheus/values.yaml to helm/kube-prometheus/values.yamland placed them under the prometheus object.
However when testing they did not show up in the Prometheus resource in K8s.
How do variables get propagated from kube-prometheus/values.yaml to prometheus/ ?
I had a go at this and the following worked, for authenticated remote write:
# kubectl create secret generic --namespace=monitoring prometheus-secret --from-literal=username=username --from-literal=password=password`
# cat <<EOF >values.yaml
prometheus:
remoteWrite:
- url: http://foo/bar
basicAuth:
username:
name: prometheus-secret
key: username
password:
name: prometheus-secret
key: password
EOF
# helm install coreos/kube-prometheus --values config.yaml --name kube-prometheus --namespace monitoring
HTH
I ended up here after searching around for a bit. I'm trying to add the bearer_token to every remote_write request however it is not being added to the request. The url field is working as expected.
When I exec into the prometheus container and cat out the generated config file, only the url is there with a remote_timeout setting.
Anyone run into this yet?
@incognick there were recently some refactorings on basic auth secret retrieval. Could you open a new issue please?
@brancz Thanks! https://github.com/coreos/prometheus-operator/issues/1788
I didn't look into basic auth too much but it didn't look like that was being passed either.
We are trying to figure out how to configure remote storage without resorting to custom configuration.
It would be great if global -> external_labels could be automatically configured when remote_write -> url is set. Every Prometheus instance must have distinct label value in external_labels section when writing to remote storage. Otherwise metrics with identical labels from distinct Prometheus instances will be merged into a single metric in the remote storage.
Also is it possible to configure various options below remote_write section such as remote_write -> queue_config -> max_samples_per_send option? We need to increase this, since the default value is too small for sending big amounts of data points.
We are experimenting with VictoriaMetrics remote storage.
external labels are already automatically configured, starting from the v0.19.0 release, which released over a year ago. It was introduced with: https://github.com/coreos/prometheus-operator/commit/50d7b9f56ece09352928c70d0e8669f3342a3550. That enforces the prometheus: <namespace>/<prometheus-object-name> and prometheus_replica: <pod-name> labels. Various remote write queue configurations are also available: https://github.com/coreos/prometheus-operator/blob/master/Documentation/api.md#queueconfig. Note though that the queuing approach is being replaced in Prometheus with an approach where Prometheus tails the write-ahead-log for remote-write rather than queuing scraped content (RE: https://github.com/prometheus/prometheus/pull/4588).
hey,Is any doc for how remote read/write works? It's 1: total local first,write whole data in local, only read remote when needed (beyond retention);2:only write data log in local;3:total write/read from remote.I think it's useful to have a simple description about how remote read/write works
@maozi07 your questions are legit, but are actually about Prometheus not the Prometheus Operator.
1) deciding when to read from remote storage is not possible as such. When both local and remote storage reply, then local storage has precedence. You will need to configure/tune your remote storage to only return data for the time you want if this is the behavior you expect.
2) that's how everything works if you don't specify any remote storage
3) as of Prometheus 2.0 this is not a possibility, among others, the reason is that now (as of Prometheus 2.8.x, which is being released right now), remote write uses the write-ahead-log of Prometheus for replicating the data, as opposed to just live-enqueueing the samples to be sent to remote storage. This had a lot of problems as this effectively means all samples are stored in memory until they are sent, so this uses a lot of memory and is not durable. Essentially the write-ahead-log is now being used as a persistent-on-disk buffer.
@brancz Thanks for your answer
This issue has been automatically marked as stale because it has not had any activity in last 60d. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had any activity in last 60d. Thank you for your contributions.
Closing due to inactivity.
Most helpful comment
Say you have a
Prometheusobject, like this:Then the remote storage field just goes under the
specfield:As pointed out, the
PrometheusSpeccontains theremoteWriteandremoteReadfields which have a configuration of their own respectively.Let us know if that is useful and as always it would be awesome to have anyone new to the matter submit a pull request to improve the docs, as we are very close to the development of these topics so it's often hard to judge what's clear enough and what needs to be elaborated further. Thanks! :slightly_smiling_face: