Tendermint: Consensus-related metrics not showing up in metrics endpoint for Prometheus

Created on 21 Sep 2020  路  3Comments  路  Source: tendermint/tendermint

Note: Though this is observed by an App on top of Tendermint, the behavior is due to the underlying Tendermint code

I'm running Terra validator and sentry nodes with instrumentation.prometheus set to true.
Once the nodes are up and running and have synced a few blocks, I hit the metrics endpoint but it doesn't show any consensus_ related metrics described here: https://docs.tendermint.com/master/tendermint-core/metrics.html

The mempool metrics are not showing up either even though the nodes are successfully committing blocks:

I[2020-09-21|17:41:47.741] Indexed block                                module=txindex height=1933683
I[2020-09-21|17:41:47.821] Executed block                               module=state height=1933684 validTxs=1 invalidTxs=0
I[2020-09-21|17:41:47.863] Committed state                              module=state height=1933684 txs=1 appHash=686936B6120542009A5B184437BF089D8A38A86F1EDD62440194F82C691D66A3

I looked at the Tendermint code, and when a New Node is initialized, the createConsensusReactor creates a New State struct. The State struct is initialized with defaults in NewState, including NopMetrics, and performs some operations including making an updateToState call: https://github.com/tendermint/tendermint/blob/master/consensus/state.go#L189

The updateToState call writes metrics in addition to performing the necessary state change logic. However, the metrics updated by this call would use NopMetrics since it is executed before the options are set: https://github.com/tendermint/tendermint/blob/master/consensus/state.go#L195

I had a few questions:

  • Would it be problematic to apply options before calling updateToState ? Are there any issues or risks associated with applying options before performing the necessary logic in NewState ?
  • updateToState updates the Height metric, but the bulk of the other consensus related metrics are updated in recordMetrics. However, it looks like recordMetrics is only called in the consensus state machine when a node is finalizing a commit, and any App using the ABCI would not end up recording any consensus info. Would it be possible to expose the consensus metrics when replaying blocks? At the very least, exposing the block height can be useful for instrumentation to view app block height and latest height to see how far behind a node is in its progression.

Metrics results from hitting endpoint: https://pastebin.com/R1dRm3dS

Tendermint version (use tendermint version or git rev-parse --verify HEAD if installed from source):
v0.33.7, but I've also observed this behavior in v0.34.0

Environment:

What happened:
App built on top of tendermint started up and exposed metrics but was missing consensus and mempool related metrics

What you expected to happen:
Expected consensus and mempool metrics to be shown when hitting the prometheus endpoint with instrumentation.prometheus set to true

Ideally, would like to see consensus metrics even when node is still replaying blocks. Alternatively, would like to see consensus info once Tendermint and App are synced.

Have you tried the latest version: yes

How to reproduce it (as minimally and precisely as possible):
Can reproduce by running local terra testnet: https://github.com/terra-project/core#running-a-local-testnet
Or join a terra testnet found here: https://github.com/terra-project/testnet

Note: Though this is observed by an App on top of Tendermint, the behavior is due to the underlying Tendermint code

Logs (paste a small part showing an error (< 10 lines) or link a pastebin, gist, etc. containing more of the log file):
https://pastebin.com/R1dRm3dS

Config (you can paste only the changes you've made):

[instrumentation]
    prometheus = true
    max_open_connections = 3
    namespace = "tendermint"

node command runtime flags:
Can use any from testnet or: https://github.com/terra-project/testnet/tree/master/soju-0014

Please provide the output from the http://<ip>:<port>/dump_consensus_state RPC endpoint for consensus bugs
I don't believe this is necessary since consensus is working fine but please let me know if this is still required

Anything else we need to know:
This is a bit of a mix of both bug and feature request. I also have a lot of experience with distributed systems and high performance Golang, and would be happy to submit PRs for any necessary changes. Feedback would be much appreciated.

bug observability

All 3 comments

Would it be problematic to apply options before calling updateToState ?

not at all 馃槃

Would it be possible to expose the consensus metrics when replaying blocks? At the very least, exposing the block height can be useful for instrumentation to view app block height and latest height to see how far behind a node is in its progression.

We can expose the height, for sure. Other metrics should probably be exposed as well (except block_parts and block_interval_seconds).

As for the mempool (and other reactors), we replay blocks before starting those. Exposing not active / zero metrics before the component (mempool, evidence, ...) is started makes little sense to me.

I also have a lot of experience with distributed systems and high performance Golang, and would be happy to submit PRs for any necessary changes

PRs are much welcomed 馃挴

Was this page helpful?
0 / 5 - 0 ratings

Related issues

liamsi picture liamsi  路  3Comments

ebuchman picture ebuchman  路  3Comments

ddsvetlov picture ddsvetlov  路  3Comments

ebuchman picture ebuchman  路  3Comments

melekes picture melekes  路  4Comments