Most of the telemetry metrics are missing when they are fetched by Prometheus because:
prometheus_retention_time expires they also get removedThis is a bad practice for Prometheus. Prometheus expect all metrics to be there all the time, even if they are constantly 0, because if they are not there, it's hard to write proper alerts.
For example if I wanted to have an alert which tells me that I have a leader it could look something like this:
consul_raft_state_leader > 0 and consul_raft_state_candidate == 0
This probably isn't the correct way to check this, I didn't test it that much, because I don't see these metrics any more because of retention time. Even if I set retention to a higher value, not all of these values will be available when the server starts. For example consul_raft_state_leader will be missing on hosts which are started and don't become leaders, it will only instantiate when a node becomes a leader. It's the same case for all metrics.
Why not instantiate all metrics all the time, drop prometheus_retention_time and keep them all the time?
Just run Consul in server mode and enable telemetry metrics and set prometheus_retention_time > 0.
Client info
agent:
check_monitors = 0
check_ttls = 0
checks = 0
services = 2
build:
prerelease =
revision = 0bddfa23
version = 1.4.0
consul:
acl = enabled
known_servers = 5
server = false
runtime:
arch = amd64
cpu_count = 4
goroutines = 48
max_procs = 4
os = linux
version = go1.11.1
serf_lan:
coordinate_resets = 0
encrypted = false
event_queue = 0
event_time = 44
failed = 0
health_score = 0
intent_queue = 0
left = 0
member_time = 3537
members = 595
query_queue = 0
query_time = 67
Server info
agent:
check_monitors = 0
check_ttls = 0
checks = 0
services = 3
build:
prerelease =
revision = 0bddfa23
version = 1.4.0
consul:
acl = enabled
bootstrap = false
known_datacenters = 1
leader = false
leader_addr = 10.130.42.1:8300
server = true
raft:
applied_index = 7994308
commit_index = 7994308
fsm_pending = 0
last_contact = 15.405233ms
last_log_index = 7994308
last_log_term = 104
last_snapshot_index = 7993052
last_snapshot_term = 98
latest_configuration = [{Suffrage:Voter ID:b7d504c0-c8bd-6f6f-3879-a64584424560 Address:10.130.42.1:8300} {Suffrage:Voter ID:23f079be-a5c2-778c-1297-c6ef9632ba1f Address:10.130.42.2:8300} {Suffrage:Voter ID:8bff380e-8abc-4624-1d98-f35ca8e2a5ef Address:10.130.42.3:8300} {Suffrage:Voter ID:7c91a590-1986-cb35-f8ce-dbab3abe496e Address:10.130.42.4:8300} {Suffrage:Voter ID:7ff9aadc-2ec7-983f-76be-cc2f0e99df1c Address:10.130.42.0:8300}]
latest_configuration_index = 7993763
num_peers = 4
protocol_version = 3
protocol_version_max = 3
protocol_version_min = 0
snapshot_version_max = 1
snapshot_version_min = 0
state = Follower
term = 104
runtime:
arch = amd64
cpu_count = 4
goroutines = 1230
max_procs = 4
os = linux
version = go1.11.1
serf_lan:
coordinate_resets = 0
encrypted = false
event_queue = 0
event_time = 44
failed = 0
health_score = 0
intent_queue = 0
left = 0
member_time = 3537
members = 595
query_queue = 0
query_time = 67
serf_wan:
coordinate_resets = 0
encrypted = false
event_queue = 0
event_time = 1
failed = 0
health_score = 0
intent_queue = 0
left = 0
member_time = 240
members = 5
query_queue = 0
query_time = 1
I'm not familiar enough with Prometheus to say confidently if this is possible for us to change without effecting other metrics providers but will tag @pierresouchay to get his opinion on this.
@pearkes @kustodian unfortunately, with the current abstraction used for metric, it sounds difficult to:
In our setup, we use very large retention time and it is good enough for us (you might use for instance 1 month)
I don't have any other easy solution than this (but that's why I let you configure the retention time)
I disagree on both these points.
Most metrics are things like HTTP endpoints (known+documented), SERF and RAFT metrics (known, documented). Only a very small number of metrics has ad-hoc labels applied to them (notable metrics dealing with cross-DC requests).
The metrics should be non-ephemeral. There are no monitoring systems out there that get into trouble for a metric constantly emitting zeroes. But many (RRD, Prometheus, Wavefront, Circonus, for example) do not play nice with metric values that are forgotten.
The proper way to handle metrics in prometheus is to declare them as stateful objects. They are then written to appropriately.
I understand you use go-metrics as an abstraction layer, you'll have to figure out how to operate this appropriately.
I completely agree with @kustodian that the current way of exposing prometheus metrics is incorrect/improper for prometheus, as well as that is exposes a lot of headaches with respect to monitoring and operating consul. :/
A comment that I do not understand, is "save metrics at shutdown or something similar".
Modern monitoring systems are explicitly designed to have as little state in the monitored binary (consul in this instance) and instead solving all this in the monitoring system. There is no saving required (and in fact, most systems may behave in unexpected ways when metrics are saved && resumed).
@nahratzah you hit the nail on the head with the go-metrics thing. Pierre was not as I understand it defending the current situation as being ideal, more sharing his current workaround which is sufficient for him.
We all agree that declaring them up-front is the "correct" thing to do for prometheus and would be ideal but changing go-metrics's abstraction sufficiently to allow for it is a lot of work in an upstream lib and then a lot of refactoring in Consul to use that new abstraction. I hope it will be done eventually, but hard to know how to prioritize it!
Contributions or thoughts on how to allow that in go-metrics are very welcome though!
Does anyone know whether using the Consul Exporter(https://github.com/prometheus/consul_exporter) would overcome this issue? That is, does it access some other API or endpoint such that it can at least offer all the metrics it advertises? Just curious because I think trying to define alerts within Prometheus and develop/test Grafana dashboards will be difficult, if we can't really see some of the metrics we're interested in. Thanks!
@ntgdi : The way I see consul_exporter, I would say it monitors (consul's idea of) services registered.
So it would track the load-balancer service, web-server service, database service, memcache service etc, for things like how many are available and their health check state.
It does expose some consul information (notable Raft), but I would not say that's sufficient to use to monitor a consul cluster itself.
@nahratzah Appreciate the response. Yeah, as to whether it will meet our monitoring requirements for Consul, I'll leave that to the PM. ;) However, from a functional perspective, I did notice that the consul_exporter hits different API endpoints(than the Telemetry endpoint), so I still have hope that all the metrics that the consul_exporter advertises will be available on every scrape.
Yes, prometheus_exporter does the right thing. <3
@nahratzah Prometheus_exporter overrides all metrics you need?
I think the telemetry endpoint(metrics) provided by the consul system are not very comprehensive
No, we never said prometheus_exporter gives us all metrics we require. In fact, it can't do that, as that would require reaching deep into the internals of consul. :)
But what it does regarding metrics, it does right. It does not lose track of metrics or reset them.
Many of the metrics we require, we require so we can set up SLA and set up expectations (that we can be beholden to and fulfill) for clients. Consul's metric system is inadequate for this use case. :(
@nahratzah What are the metrics that you absolutely need as initialized?
On our side, as I explained, we are using a very large retention time and it is not a very big deal (you can use 365 days if you want to). Since our clusters are usually quite loaded with recurrent patterns (I mean, the calls are most of the time always the same), data is coming back very quickly.
Some metrics are quite ephemeral in their nature (ex: consul_health_service_query{service="my_database"}, for this, we probably cannot do anything easily, but if you have specific ones you really need, I might try to find a solution to initialize those at startup (or when data is cleared after retention time has elapsed).
That is not a small list...
So, this is from the epic we use:
And this is from what I suspect we'll need:
And we need these metrics to all be stable.
Like, if we use a prometheus query avg(some_metric) it should correctly compute the average, not "the average except for the metrics that haven't changed for so long that we forgot them". :)
Our current dashboard is full of charts where we're write, like "empty is good" at the top. (Example: our election chart).
The thing is, I can not distinguish between an "empty, everything is good" metric and an "empty because the binary is hanging" metric. (And yeah, we've had complete consul outages due to deadlock, where metrics were completely useless.)
But fixing all this is not a single PR kind of task. I've planned multiple quarters for this in just my team.
And this is important stuff for using consul in production and being able to rely on it. :)
Which is why a response that it's not on the roadmap is exceptionally disappointing:
I hope it will be done eventually, but hard to know how to prioritize it!
This tripped me today because a leadership change happened and I couldn't find the metric in prometheus, it only appeared as a data point once I zoomed in the particular time it happened.
Since Hashicorp has joined the Cloud Native Foundation it would be great to treat Prometheus as a first class citizen, rewriting these docs: https://www.consul.io/docs/agent/telemetry.html with the name of the prometheus metrics would also be awesome. Or at least have a section for it.
Thanks
Most helpful comment
I disagree on both these points.
Most metrics are things like HTTP endpoints (known+documented), SERF and RAFT metrics (known, documented). Only a very small number of metrics has ad-hoc labels applied to them (notable metrics dealing with cross-DC requests).
The metrics should be non-ephemeral. There are no monitoring systems out there that get into trouble for a metric constantly emitting zeroes. But many (RRD, Prometheus, Wavefront, Circonus, for example) do not play nice with metric values that are forgotten.
The proper way to handle metrics in prometheus is to declare them as stateful objects. They are then written to appropriately.
I understand you use go-metrics as an abstraction layer, you'll have to figure out how to operate this appropriately.
I completely agree with @kustodian that the current way of exposing prometheus metrics is incorrect/improper for prometheus, as well as that is exposes a lot of headaches with respect to monitoring and operating consul. :/
A comment that I do not understand, is "save metrics at shutdown or something similar".
Modern monitoring systems are explicitly designed to have as little state in the monitored binary (consul in this instance) and instead solving all this in the monitoring system. There is no saving required (and in fact, most systems may behave in unexpected ways when metrics are saved && resumed).