Beats: Metricbeat elasticsearch module when output is Kafka

Created on 28 Mar 2019 · 20Comments · Source: elastic/beats

I am linking this to one of the items on the main issue on monitoring ES via metricbeat (https://github.com/elastic/beats/issues/7035).

One aspect we haven't talked about much (or documented) is what happens when the user's metricbeat is configured to route all events through Kafka (using output.kafka).

Per our guidelines today (https://www.elastic.co/guide/en/elasticsearch/reference/current/configuring-metricbeat.html), the configuration of the Elasticsearch module requires the output to be output.elasticsearch.

What is the recommended set up here for output.kafka users?

Will they send everything through output.kafka, and have separate Logstash ES outputs downstream, 1 for regular events to the production cluster, and 1 for routing metricbeat ES stack module events to .monitoring-es* indices on the remote monitoring cluster?
Or is there a way to reuse Logstash's xpack.monitoring.elasticsearch.hosts for the connection to route the metricbeat ES stack module events to the remote monitoring cluster?
Will they have to set up a 2nd metricbeat (with output.elasticsearch just for the ES stack modules) to route events directly to the remote monitoring cluster, while the original metricbeat instance will continue to send other events through Kafka.

Until we figure out our story on this, it will be helpful to update https://www.elastic.co/guide/en/elasticsearch/reference/current/configuring-metricbeat.html with some information on our current guidelines when metricbeat does not have an output.elasticsearch (or maybe it's simply a not-currently-supported statement, etc..). Thx!

Metricbeat Stack monitoring Stalled enhancement needs_team

Source

ppf2

Most helpful comment

@ppf Blog post is live: https://www.elastic.co/blog/elastic-stack-monitoring-with-metricbeat-via-logstash-or-kafka.

@lcawl WDYT about the linking idea that @ppf2 mentioned in the previous comment?

ycombinator on 5 Feb 2020

🚀1 ❤1 🎉1

All 20 comments

Pinging @elastic/stack-monitoring

elasticmachine on 28 Mar 2019

There has been some discussion around this question and from what I have seen thus far, we lean toward option three in your list. @ycombinator, do you concur?

cachedout on 29 Mar 2019

Option 3 has been tested and known to work. So I'd start out by documenting that right now.

However, in theory, option 1 could also work so I think its worth testing it out and coming up with docs around that too.

ycombinator on 29 Mar 2019

Tested option 1 briefly, metricbeat (ES module) -> Kafka -> LS -> ES seems to work as well :)

For example, a conditional can be added to the output section of the LS config to route metricbeat ES module metrics to a separate monitoring cluster (for 6.5+ to 6.latest):

# route ES monitoring metrics collected by metricbeat elasticsearch module
# to ES monitoring cluster
# https example
if [metricset][module] == "elasticsearch"
{
  elasticsearch{
  index => ".monitoring-es-6-mb-%{+YYYY.MM.dd}"
  hosts => ["https://node1:9200"]
  cacert => "/path_to/ca.crt"
  user=>"elastic"
  password=>"password"
} else {
... <where your non-monitoring events will go>
}

Instead of having a conditional statement with 2 ES output, the alternative will be to build out hosts, index, etc.. variables upstream in the pipeline to substitute into a single elasticsearch output.

ppf2 on 29 Mar 2019

Good stuff, @ppf2, thanks so much for testing this out!

I wonder if it's safe for the conditional to just test for [metricset][module] == "elasticsearch". After all, this could be true even if xpack.enabled is false in the corresponding metricbeat module config.

Imagine a case where the user has configured the elasticsearch module in the same Metricbeat instance twice for some reason, once with xpack.enabled: true and once without. Or that there are two Metricbeat instances feeding data to the same LS instance, one configured with xpack.enabled: true in the elasticsearch module and one without.

I wonder if there's another piece of data/metadata in the event that Logstash receives from Metricbeat that we could use to make this check more robust. If there isn't, I wonder if we should inject something to this effect from the Elasticsearch Metricbeat module when xpack.enabled is set to true.

ycombinator on 29 Mar 2019

How about this if clause instead?

# route ES monitoring metrics collected by metricbeat elasticsearch module
# to ES monitoring cluster
# https example
if [@metadata][index] =~ /^.monitoring-es*/
{
  elasticsearch{
  index => ".monitoring-es-6-mb-%{+YYYY.MM.dd}"
  hosts => ["https://node1:9200"]
  cacert => "/path_to/ca.crt"
  user=>"elastic"
  password=>"password"
} else {
... <where your non-monitoring events will go>
}

ppf2 on 29 Mar 2019

Perhaps we could generalize this a bit to work not just for ES stack monitoring data collected by Metricbeat but also other stack products' monitoring data? So something like:

# route monitoring metrics collected by metricbeat Elastic stack product module
# to ES monitoring cluster
# https example
if [@metadata][index] =~ /^.monitoring-*/ {
  if [@metadata][id] {
    elasticsearch {
      index => "%{[@metadata][index]}-%{+YYYY.MM.dd}"
      document_id => "%{[@metadata][id]}"
      hosts => ["https://node1:9200"]
      cacert => "/path_to/ca.crt"
      user=>"elastic"
      password=>"password"
    }
  } else {
    elasticsearch{
      index => "%{[@metadata][index]}-%{+YYYY.MM.dd}"
      hosts => ["https://node1:9200"]
      cacert => "/path_to/ca.crt"
      user=>"elastic"
      password=>"password"
    }
  }
} else {
... <where your non-monitoring events will go>
}

ycombinator on 29 Mar 2019

👍1

If indeed we have a solution that works and we agree on, in order to satisfy the original request , we need to document the recommended setup.

@lcawl Any suggestions on a home for this sort of information in the docs? I'm happy to discuss over slack/zoom as well to give more context.

cachedout on 24 Apr 2019

Polite bump, @lcawl . Thanks!

cachedout on 29 Jul 2019

I implemented option 1 in a 3 node test cluster today. I am setting metricbeat to output to Logstash , then output to elasticsearch. at first glance it seemed to work fine. But then I noticed that the shard count on the nodes page is way off. It keeps incrementing going from correct number to several thousands over time. so I may start with 50 , then for every 10 secs , it gets to 100 , 150 ,200 and so. This was tested using 7.5.1 on RHEL.

My motivation for this is trying to come up with a fix , so I dont have to disable the system module in Metricbeat as it proves very valuable insights into other performance charactericstics of a given node,

kkh-security-distractions on 7 Jan 2020

@lcawl and @ycombinator I've removed myself as owner of this issue after switching teams. Would one of you like to pick it up?

cachedout on 8 Jan 2020

@cachedout Sure.

@lcawl Can you take up bit about documenting Option 1? I can answer @kkh-security-distractions's question.

ycombinator on 8 Jan 2020

@kkh-security-distractions I assume you were using the Logstash fragment I had posted in my comment above:

# route monitoring metrics collected by metricbeat Elastic stack product module
# to ES monitoring cluster
# https example
if [@metadata][index] =~ /^.monitoring-*/
{
  elasticsearch{
    index => "%{[@metadata][index]}-%{+YYYY.MM.dd}"
    hosts => ["https://node1:9200"]
    cacert => "/path_to/ca.crt"
    user=>"elastic"
    password=>"password"
} else {
... <where your non-monitoring events will go>
}

Unfortunately, this fragment is not quite right. It works for _most_ stack monitoring data except data about shards, as you obviously found out the hard way 😞. Sorry about that.

I've now updated the comment with a better fragment; please try that out. Note that you will need to clear out your existing monitoring data (DELETE .monitoring-es-*-mb*) first.

ycombinator on 8 Jan 2020

@ycombinator ,Yesterday when I was comparing a working versus the non working setup , I did notice the rather odd "id" of the working setup. But I was not able to determine , whether it was important or not ;)

I changed my pipeline according to your suggestions and it seems have fixed the problem. Shard count seems steady now as it should be. I added some stuff in the filter section to get rid of the ECS fields from Metricbeat as this is not needed as I see it.

I will leave it running and see tomorrow , how it ends up. I suggest that someone improves the documention on the main metricbeat page to how incorporate this setup. I think many people with Ingest going through fx Kafka will appreciate this setup and still be able to keep the system module , which was my goal :)

Thnx for input.

kkh-security-distractions on 8 Jan 2020

@lcawl Can you take up bit about documenting Option 1?

Sorry, I somehow missed the earlier notifications on this one.

From what I understand in this issue, this is a less common configuration option. Since
the basic setup steps (https://www.elastic.co/guide/en/elasticsearch/reference/current/configuring-metricbeat.html) are already quite complex, I don't think it would be ideal to try to squeeze it in there. Instead, I think this would be appropriate for a separate piece of content describing a more advanced configuration scenario.

I had a chat with @dedemorton and I think our suggestion would be to put this in a blog, at least initially. If it becomes a common enough use case that we want to actively maintain it in the docs, we can revisit incorporating it.

lcawl on 10 Jan 2020

Sounds good and makes sense, @lcawl and @dedemorton. I'll start working on a blog post soon.

ycombinator on 10 Jan 2020

Blog post sounds good, let's cross link from the https://www.elastic.co/guide/en/elasticsearch/reference/current/configuring-metricbeat.html page to the public blog post location (once it is published). thx!

ppf2 on 10 Jan 2020

@ppf Blog post is live: https://www.elastic.co/blog/elastic-stack-monitoring-with-metricbeat-via-logstash-or-kafka.

@lcawl WDYT about the linking idea that @ppf2 mentioned in the previous comment?

ycombinator on 5 Feb 2020

🚀1 ❤1 🎉1

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.