Note: Though this is observed by an App on top of Tendermint, the behavior is due to the underlying Tendermint code
I'm running Terra validator and sentry nodes with instrumentation.prometheus set to true.
Once the nodes are up and running and have synced a few blocks, I hit the metrics endpoint but it doesn't show any consensus_ related metrics described here: https://docs.tendermint.com/master/tendermint-core/metrics.html
The mempool metrics are not showing up either even though the nodes are successfully committing blocks:
I[2020-09-21|17:41:47.741] Indexed block module=txindex height=1933683
I[2020-09-21|17:41:47.821] Executed block module=state height=1933684 validTxs=1 invalidTxs=0
I[2020-09-21|17:41:47.863] Committed state module=state height=1933684 txs=1 appHash=686936B6120542009A5B184437BF089D8A38A86F1EDD62440194F82C691D66A3
I looked at the Tendermint code, and when a New Node is initialized, the createConsensusReactor creates a New State struct. The State struct is initialized with defaults in NewState, including NopMetrics, and performs some operations including making an updateToState call: https://github.com/tendermint/tendermint/blob/master/consensus/state.go#L189
The updateToState call writes metrics in addition to performing the necessary state change logic. However, the metrics updated by this call would use NopMetrics since it is executed before the options are set: https://github.com/tendermint/tendermint/blob/master/consensus/state.go#L195
I had a few questions:
updateToState ? Are there any issues or risks associated with applying options before performing the necessary logic in NewState ? updateToState updates the Height metric, but the bulk of the other consensus related metrics are updated in recordMetrics. However, it looks like recordMetrics is only called in the consensus state machine when a node is finalizing a commit, and any App using the ABCI would not end up recording any consensus info. Would it be possible to expose the consensus metrics when replaying blocks? At the very least, exposing the block height can be useful for instrumentation to view app block height and latest height to see how far behind a node is in its progression. Metrics results from hitting endpoint: https://pastebin.com/R1dRm3dS
Tendermint version (use tendermint version or git rev-parse --verify HEAD if installed from source):
v0.33.7, but I've also observed this behavior in v0.34.0
Environment:
What happened:
App built on top of tendermint started up and exposed metrics but was missing consensus and mempool related metrics
What you expected to happen:
Expected consensus and mempool metrics to be shown when hitting the prometheus endpoint with instrumentation.prometheus set to true
Ideally, would like to see consensus metrics even when node is still replaying blocks. Alternatively, would like to see consensus info once Tendermint and App are synced.
Have you tried the latest version: yes
How to reproduce it (as minimally and precisely as possible):
Can reproduce by running local terra testnet: https://github.com/terra-project/core#running-a-local-testnet
Or join a terra testnet found here: https://github.com/terra-project/testnet
Note: Though this is observed by an App on top of Tendermint, the behavior is due to the underlying Tendermint code
Logs (paste a small part showing an error (< 10 lines) or link a pastebin, gist, etc. containing more of the log file):
https://pastebin.com/R1dRm3dS
Config (you can paste only the changes you've made):
[instrumentation]
prometheus = true
max_open_connections = 3
namespace = "tendermint"
node command runtime flags:
Can use any from testnet or: https://github.com/terra-project/testnet/tree/master/soju-0014
Please provide the output from the http://<ip>:<port>/dump_consensus_state RPC endpoint for consensus bugs
I don't believe this is necessary since consensus is working fine but please let me know if this is still required
Anything else we need to know:
This is a bit of a mix of both bug and feature request. I also have a lot of experience with distributed systems and high performance Golang, and would be happy to submit PRs for any necessary changes. Feedback would be much appreciated.
Would it be problematic to apply options before calling updateToState ?
not at all 馃槃
Would it be possible to expose the consensus metrics when replaying blocks? At the very least, exposing the block height can be useful for instrumentation to view app block height and latest height to see how far behind a node is in its progression.
We can expose the height, for sure. Other metrics should probably be exposed as well (except block_parts and block_interval_seconds).
As for the mempool (and other reactors), we replay blocks before starting those. Exposing not active / zero metrics before the component (mempool, evidence, ...) is started makes little sense to me.
I also have a lot of experience with distributed systems and high performance Golang, and would be happy to submit PRs for any necessary changes
PRs are much welcomed 馃挴
Duplicate of https://github.com/tendermint/tendermint/issues/3507