The elastiscearch/index metricset (when xpack.enabled: true is set) consumes memory proportional to the # of indices in the ES cluster and the size of the cluster state, specifically the sizes of the GET _stats and GET _cluster/state responses.
This is somewhat expected as the metricset needs data from those two API calls to create type: index_stats documents in .monitoring-es-* indices.
However, it may be possible to reduce the memory consumed by this metricset's code. Concretely, it would be worth looking into exactly which fields from the API responses are being used (and if the rest could be excluded) and also whether switching to a streaming JSON parser might help.
Pinging @elastic/stack-monitoring (Stack monitoring)
@jsoriano kindly let me pick his brain on this over Zoom. Summarizing our conversation:
prometheus module. Thanks @jsoriano for chatting about this and validating some of my thinking! 鉂わ笍
@ycombinator We still encounter Metricbeat being OOMKilled. After rolling out Metricbeat 7.6.2 we experienced lower memory consumption, because we could run Metricbeat with 500Mi pod limit (before we needed much more). Now after two weeks we have more indices and shards (between 300 and 400 shards) and now we have again OOMKilled our Metricbeat running on the K8s node of the current master.
You mentioned 1.79 MB memory usage in #16538 so still having 500 Mi pod limit seems to be very much.
Perhaps you can give some insights if the number of shards might be too much compared to your memory usage statement. The Metricbeat daemonset now consumes unnecessary much memory limit on a node.
In the end I will open a new support case?
Most helpful comment
@jsoriano kindly let me pick his brain on this over Zoom. Summarizing our conversation:
prometheusmodule.Thanks @jsoriano for chatting about this and validating some of my thinking! 鉂わ笍