Beats: Beats central monitoring Phase 1

Created on 19 Jan 2017 · 13Comments · Source: elastic/beats

When you deploy a large number of Beats, it becomes challenging to monitor the Beats itself.
A solution would be for the Beats to report health status to a collection point, such as Elasticsearch, and visualize it with Kibana.

The following health metrics should be sent to Elasticsearch:

[x] libbeat.beat.cpu_usage
[x] libbeat.beat.memory_usage
[x] libbeat.publisher.published_events
[x] libbeat.publisher.messages_in_worker_queues
[x] libbeat.outputs.acked_events #3423
[x] libbeat.outputs.messages_dropped
[x] libbeat.outputs.send_bytes #3423
[x] libbeat.outputs.failures #3423

Each Beat exports more metrics via expvar, but it should send only a subset of these metrics to Elasticsearch.

By default, the health metrics are sent directly to the Elasticsearch cluster configured in the outputs.elasticsearch, but you can also configure an extra Elasticsearch cluster to send the monitoring data to.

TODO:

[x] Differentiate between the metrics exported via expvar to send only a subset
[x] Send the health metrics to Elasticsearch

Configuration:

monitoring:
    enabled: true
    period: 10s
    elasticsearch: ["localhost:9201"]

UPDATE: The CPU usage is exported under different fields. See https://github.com/elastic/beats/issues/3422

cc-ed @bohyun-e @brandonmensing

enhancement libbeat meta

Source

monicasarbu

👍11 🎉3 ❤1

Most helpful comment

I'd love it if Beats followed the same model as Logstash and simply exposed a local metrics endpoint. We're struggling with writing robust monitoring for filebeat, as it's very much a "black box" when it comes to state. I'm not sure an Elasticsearch metrics integration would help all that much. Beats logfiles are not geared towards getting to the "current state" of the beat (eg. "is the beat able to ship data to logstash _right now_?") Enabling a local metrics endpoint ala logstash and expose items such as "queued events" "percent number of dead/alive shipper targets" so that we could pick up that info locally using a monitoring tool/agent such as prometheus or Datadog would be of much higher value to us than getting it in Elasticsearch.

trondhindenes on 25 Oct 2017

👍7

All 13 comments

As reference, here is the old issue where this all started: https://github.com/elastic/beats/issues/463

ruflin on 20 Jan 2017

enabled: true

Would the default value be true here? or false?

period: 10s

I'm not 100% sure what happens when you have the collection period that is different from the ES monitoring collection interval. But reading from the doc, my gut feeling is that whatever ES's collection interval is - should be applied in other products, such as Kibana Monitoring collection interval. I'm guessing it would be the same for Beats, but it'd be a good idea to confirm.

bohyun-e on 20 Jan 2017

I'm not 100% sure what happens when you have the collection period that is different from the ES monitoring collection interval. But reading from the doc, my gut feeling is that whatever ES's collection interval is - should be applied in other products, such as Kibana Monitoring collection interval. I'm guessing it would be the same for Beats, but it'd be a good idea to confirm.

I don't think we should have this restriction (that all collection intervals are equal). I also don't know how we can even enforce it.

Different systems may need different intervals, and the monitoring UI should deal with that.

uboness on 20 Jan 2017

👍3

It would be extremely nice to simply have one more commandline option to turn on only expvar variables, rather than the httpprof commandline option. This would allow other tools to scrape each beat type on their own interval.

lswith on 24 Jan 2017

👍4

cc: @pickypg @tsullivan @skearns64

bohyun-e on 25 Jan 2017

Hi,

Any progress on this ? Is there a way to export this as json and not send everything to elasticsearch ?

Thank you!

servergeeks on 22 Aug 2017

👍1

+1 for a progress update.

It would be extremely nice to simply have one more commandline option to turn on only expvar variables, rather than the httpprof commandline option. This would allow other tools to scrape each beat type on their own interval.

It would also be very nice to support outputs other than elasticsearch. Ideally any of the already supported outputs (eg. Kafka, Redis, Logstash, etc.) would work:

monitoring:
    enabled: true
    period: 10s
    output.elasticsearch: 
        hosts: ["localhost:9201"]
        ...
    output.kafka:
        hosts: ["localhost:9092"]
        ...

Thanks!

jeremydonahue on 5 Sep 2017

Unfortunately, we didn't do much progress here. We are planning to store all the monitoring data to Elasticsearch only. In the first version, we are sending the monitoring data to Elasticsearch, but we are considering sending the data to other supported outputs in the future.

monicasarbu on 5 Sep 2017

For kafka it would be nice to send to different topic that can be defined under monitoring struct. We 're heavily scaling filebeat in our infra(18k+) and none of them ships directly to elastic.