Beats: Metricbeat Elasticsearch module stats

Created on 28 Dec 2017  路  8Comments  路  Source: elastic/beats

I build a elasticsearch cluster that it is used to to store, search, and analyze large amounts of log data.
For making sure the cluster health, any issues of cluster need to be fixed as quickly as possible.
So i will build a monitoring alarm system based on the elasticsearch performance metrics data.

Based on my experience, the following metrics are most important to keep an eye on.

  • Indexing rate: The indexing rate means writing rate into elasticsearch. We can learn about that by calculate index_total increasing rate.

  • Indexing latency: The indexing rate means writing latency into elasticsearch. We can learn about that by calculate the sampling index_total and index_time_in_millis regular intervals.

  • Search rate: The search rate means reading rate from elasticsearch. We can learn about that by calculate query_total increasing rate.

  • Search latency: The search rate means reading latency from elasticsearch. We can learn about that by calculate the sampling query_total and query_time_in_millis regular intervals.

  • JVM heap: Elasticsearch runs in the JVM, which means the memory used by heap is a important areas to monitor. You are in trouble if the percent of heap is always very high, and out-of-memory (OOM) exceptions at worst. We can learn about that by looking for the heap_used_in_bytes, heap_max_in_bytes and heap_used_percent.

  • JVM GC: JVM garbage collection will stop the word, so the duration and frequency of GC will be other important areas to monitor. We can learn about that by looking for the collection_count and collection_time_in_millis.

  • Bulk threadpool: If the queue of threadpool.index is always up to the maximum, it means you have lost data. We can increase node to fix it.

  • Search threadpool: It is the same important as Bulk threadpool.

  • Transport bytes: You can look at the rate of bytes sent and received to see how much traffic your network is receiving.

  • File system: You must make sure the file system has enough disk space. You can lean about that by checking total_in_bytes and available_in_bytes.

But the elasticsearch module do not collect the enough performance metrics now, so i opened PR 5279, 5931.

Other better suggestion is welcome.

Stack monitoring Services Investigate discuss

Most helpful comment

We had a lot of internal discussions on this and plan now to align the elasticsearch module with the internal Elasticsearch monitoring. PR's related to that should follow pretty soon.

All 8 comments

I've just installed elastic beat and wanted to switch over from our custom metric collection to the official beat. But we are missing some of the stats i.e. mem heap usage percent which we use for alerting.

Is there any movement on this issue?

We had a lot of internal discussions on this and plan now to align the elasticsearch module with the internal Elasticsearch monitoring. PR's related to that should follow pretty soon.

Please, in the next version of metric beat add values"heap_used_in_bytes", "heap_used_percent".
Thanks a lot.

@wangdisdu Time has come to move forward here. I'm happy to review and accept PR's to extend our coverage of Elasticsearch metrics. Still interested to work on this?

Yes, shore. Please, extend Elasticsearch module metrics.

heap_used_percent data would be really useful in this module to debug the OOM errors.

Pinging @elastic/stack-monitoring

Echoing above. JVM heap_used_percent or at least heap_used_in_bytes would be very helpful.

Was this page helpful?
0 / 5 - 0 ratings