Logstash: [7.7] [Monitoring] Collection changes

Created on 9 Dec 2019 · 15Comments · Source: elastic/logstash

Background

Presently, there are two ways for Logstash to send monitoring data to Elasticsearch.

Internal collection

Using internal collection, Logstash collects and ships its own metrics data. It ships data by connecting to an Elasticsearch monitoring cluster at a special endpoint at _monitoring/bulk on the Elasticsearch cluster which is managed by a monitoring plugin that ships with Elasticsearch. This endpoint is responsible for collecting monitoring data but it also reshapes that data before writing it to Elasticsearch.

Metricbeat collection

Metricbeat can also use Logstash's HTTP endpoint to collect monitoring data about Logstash and then ship that monitoring data to a monitoring cluster.

Monitoring changes in 7.x

From 7.7 through the end of the 7.x series in 7.9, components in the Elastic stack that use internal collection should introduce some changes which will ease the migration path for users when they are connecting a 7.7-7.9 cluster to a 8.x monitoring cluster. Those changes are described below.

Monitoring changes in 8.0

In the 8.0 version of the stack, internal collection will be removed.

Enhancement request to internal collection

Endpoint switch

The first change which needs to happen is to have internal collection in Logstash ship directly to the _bulk/monitoring endpoint.

Configuration option

Introduce config option to write directly to _bulk/monitoring

Document reshaping

As described above, a monitoring cluster with Elasticsearch currently exposes a plugin which reshapes the document before it is written. With this change, incoming data will no longer be routed through that plugin and as a result, Logstash itself will need to send data which is shaped in a manner consistent with what's being done by the plugin today.

Below are documents which describe this change. (Thank you to @jakelandis for help in creating these):

Original sent by Logstash

{"index":{"_id":null,"_index":"","_type":"logstash_stats","routing":null}}
{"os":{"cgroup":{"cpuacct":{"usage_nanos":101371322684,"control_group":"/"},"cpu":{"control_group":"/","stat":{"number_of_elapsed_periods":0,"number_of_times_throttled":0,"time_throttled_nanos":0}}},"cpu":{"load_average":{"1m":5.91,"5m":2.18,"15m":1.18}}},"reloads":{"successes":0,"failures":0},"events":{"out":0,"filtered":0,"in":0,"duration_in_millis":0},"process":{"cpu":{"percent":8},"max_file_descriptors":1048576,"open_file_descriptors":136},"queue":{"events_count":0},"timestamp":"2019-12-05T23:11:18.022Z","pipelines":[{"queue":{"queue_size_in_bytes":0,"events_count":0,"type":"memory","max_queue_size_in_bytes":0},"reloads":{"successes":0,"failures":0},"id":"main","vertices":[{"pipeline_ephemeral_id":"89c705fb-504d-45bc-a07d-2943823b70d8","events_out":0,"id":"52687a79e86cea720a4b9d64e611b7d4c3b7893131d2da4b06f5f8903b464f4c","queue_push_duration_in_millis":0},{"pipeline_ephemeral_id":"89c705fb-504d-45bc-a07d-2943823b70d8","events_out":0,"id":"f63963c4d9e18ffe255df09283327cb8c8ec2227ef06210795cb1d33b944a132","duration_in_millis":493,"events_in":0}],"events":{"queue_push_duration_in_millis":0,"out":0,"filtered":0,"duration_in_millis":0,"in":0},"ephemeral_id":"89c705fb-504d-45bc-a07d-2943823b70d8","hash":"1e3da6bfb66bd25bbb98769a5dcdc16c06f98a333675d3b71cccad117e103b99"}],"jvm":{"mem":{"heap_max_in_bytes":259522560,"heap_used_percent":71,"heap_used_in_bytes":186258096},"gc":{"collectors":{"old":{"collection_time_in_millis":86,"collection_count":2},"young":{"collection_time_in_millis":982,"collection_count":29}}},"uptime_in_millis":42551},"logstash":{"snapshot":false,"host":"ed04eab92604","uuid":"dc5b29f6-3812-4657-a640-869ebfc0e173","ephemeral_id":"d33ab810-e286-4080-94f6-fa5f6cdae2f4","version":"7.3.0","http_address":"127.0.0.1:9600","name":"ed04eab92604","status":"green","pipeline":{"batch_size":125,"workers":8}}}
{"index":{"_id":null,"_index":"","_type":"logstash_state","routing":null}}
{"pipeline":{"representation":{"version":"0.0.0","hash":"1e3da6bfb66bd25bbb98769a5dcdc16c06f98a333675d3b71cccad117e103b99","type":"lir","graph":{"vertices":[{"plugin_type":"input","config_name":"beats","id":"52687a79e86cea720a4b9d64e611b7d4c3b7893131d2da4b06f5f8903b464f4c","type":"plugin","explicit_id":false,"meta":{"source":{"protocol":"str","id":"pipeline","line":2,"column":3}}},{"explicit_id":false,"meta":null,"id":"__QUEUE__","type":"queue"},{"plugin_type":"output","config_name":"stdout","id":"f63963c4d9e18ffe255df09283327cb8c8ec2227ef06210795cb1d33b944a132","type":"plugin","explicit_id":false,"meta":{"source":{"protocol":"str","id":"pipeline","line":8,"column":3}}}],"edges":[{"from":"52687a79e86cea720a4b9d64e611b7d4c3b7893131d2da4b06f5f8903b464f4c","to":"__QUEUE__","id":"99f9721053e9be1f70710dd41e539775532eeb682d4bd559d014a235642438ab","type":"plain"},{"from":"__QUEUE__","to":"f63963c4d9e18ffe255df09283327cb8c8ec2227ef06210795cb1d33b944a132","id":"4ea4b20dfa33784f75260d100448bb073b60a79a874fbe2875e9099a631bb20e","type":"plain"}]}},"id":"main","batch_size":125,"ephemeral_id":"89c705fb-504d-45bc-a07d-2943823b70d8","workers":8,"hash":"1e3da6bfb66bd25bbb98769a5dcdc16c06f98a333675d3b71cccad117e103b99"}}

Same document after being reshaped by Elasticsearch

{"index":{"_index":".monitoring-logstash-7-2019.12.05"}}
{"cluster_uuid":"nSdvccf0QEuPCMRWCMxKMQ","timestamp":"2019-12-05T23:11:18.283Z","interval_ms":1000,"type":"logstash_stats","source_node":{"uuid":"sMohkkOkRFakKH1Y5gayDg","host":"172.30.0.4","transport_address":"172.30.0.4:9300","ip":"172.30.0.4","name":"cluster1-node1","timestamp":"2019-12-05T23:11:18.284Z"},"logstash_stats":{"os":{"cgroup":{"cpuacct":{"usage_nanos":101371322684,"control_group":"/"},"cpu":{"control_group":"/","stat":{"number_of_elapsed_periods":0,"number_of_times_throttled":0,"time_throttled_nanos":0}}},"cpu":{"load_average":{"1m":5.91,"5m":2.18,"15m":1.18}}},"reloads":{"successes":0,"failures":0},"events":{"out":0,"filtered":0,"in":0,"duration_in_millis":0},"process":{"cpu":{"percent":8},"max_file_descriptors":1048576,"open_file_descriptors":136},"queue":{"events_count":0},"timestamp":"2019-12-05T23:11:18.022Z","pipelines":[{"queue":{"queue_size_in_bytes":0,"events_count":0,"type":"memory","max_queue_size_in_bytes":0},"reloads":{"successes":0,"failures":0},"id":"main","vertices":[{"pipeline_ephemeral_id":"89c705fb-504d-45bc-a07d-2943823b70d8","events_out":0,"id":"52687a79e86cea720a4b9d64e611b7d4c3b7893131d2da4b06f5f8903b464f4c","queue_push_duration_in_millis":0},{"pipeline_ephemeral_id":"89c705fb-504d-45bc-a07d-2943823b70d8","events_out":0,"id":"f63963c4d9e18ffe255df09283327cb8c8ec2227ef06210795cb1d33b944a132","duration_in_millis":493,"events_in":0}],"events":{"queue_push_duration_in_millis":0,"out":0,"filtered":0,"duration_in_millis":0,"in":0},"ephemeral_id":"89c705fb-504d-45bc-a07d-2943823b70d8","hash":"1e3da6bfb66bd25bbb98769a5dcdc16c06f98a333675d3b71cccad117e103b99"}],"jvm":{"mem":{"heap_max_in_bytes":259522560,"heap_used_percent":71,"heap_used_in_bytes":186258096},"gc":{"collectors":{"old":{"collection_time_in_millis":86,"collection_count":2},"young":{"collection_time_in_millis":982,"collection_count":29}}},"uptime_in_millis":42551},"logstash":{"snapshot":false,"host":"ed04eab92604","uuid":"dc5b29f6-3812-4657-a640-869ebfc0e173","ephemeral_id":"d33ab810-e286-4080-94f6-fa5f6cdae2f4","version":"7.3.0","http_address":"127.0.0.1:9600","name":"ed04eab92604","status":"green","pipeline":{"batch_size":125,"workers":8}}}}
{"index":{"_index":".monitoring-logstash-7-2019.12.05"}}
{"cluster_uuid":"nSdvccf0QEuPCMRWCMxKMQ","timestamp":"2019-12-05T23:11:18.283Z","interval_ms":1000,"type":"logstash_state","source_node":{"uuid":"sMohkkOkRFakKH1Y5gayDg","host":"172.30.0.4","transport_address":"172.30.0.4:9300","ip":"172.30.0.4","name":"cluster1-node1","timestamp":"2019-12-05T23:11:18.284Z"},"logstash_state":{"pipeline":{"representation":{"version":"0.0.0","hash":"1e3da6bfb66bd25bbb98769a5dcdc16c06f98a333675d3b71cccad117e103b99","type":"lir","graph":{"vertices":[{"plugin_type":"input","config_name":"beats","id":"52687a79e86cea720a4b9d64e611b7d4c3b7893131d2da4b06f5f8903b464f4c","type":"plugin","explicit_id":false,"meta":{"source":{"protocol":"str","id":"pipeline","line":2,"column":3}}},{"explicit_id":false,"meta":null,"id":"__QUEUE__","type":"queue"},{"plugin_type":"output","config_name":"stdout","id":"f63963c4d9e18ffe255df09283327cb8c8ec2227ef06210795cb1d33b944a132","type":"plugin","explicit_id":false,"meta":{"source":{"protocol":"str","id":"pipeline","line":8,"column":3}}}],"edges":[{"from":"52687a79e86cea720a4b9d64e611b7d4c3b7893131d2da4b06f5f8903b464f4c","to":"__QUEUE__","id":"99f9721053e9be1f70710dd41e539775532eeb682d4bd559d014a235642438ab","type":"plain"},{"from":"__QUEUE__","to":"f63963c4d9e18ffe255df09283327cb8c8ec2227ef06210795cb1d33b944a132","id":"4ea4b20dfa33784f75260d100448bb073b60a79a874fbe2875e9099a631bb20e","type":"plain"}]}},"id":"main","batch_size":125,"ephemeral_id":"89c705fb-504d-45bc-a07d-2943823b70d8","workers":8,"hash":"1e3da6bfb66bd25bbb98769a5dcdc16c06f98a333675d3b71cccad117e103b99"}}}

Stack co-ordination

This is a change that we would like to have happen in 7.7 across a variety of stack components. We would like to ask if all teams could have their corresponding PRs ready to be merged by March 1st, 2020 so that we can ensure that all changes are ready and there is not any inconsistency between stack components. It is imperative that these changes _not_ be merged into 7.6, however, because we may not be ready for them on the Stack Monitoring Kibana application end. (I will update this issue if that changes.)

enhancement monitoring

Source

cachedout

All 15 comments

A question about upgrading during 7.x

If one is using LS 7.5 -> ES 7.5 and wants to upgrade the stack to 7.7, will upgrading Logstash first break monitoring if it's now sending a different document format to a different endpoint in Elasticsearch 7.5 ? I know we typically recommend upgrading Elasticsearch first but this can easily be seen as a breaking change.

Regarding the format changes, looking at the two sets of two documents, the differences seem to be that:

the bulk action header should only contain the index in the format {"index":{"_index":".monitoring-logstash-7-YYYY.MM.DD"}

Question: if we still send other fields as null and remote _type will it still work? e.g. {"index":{"_id":null,"_index":".monitoring-logstash-7-YYYY.MM.DD","routing":null}}

a content "header" block is added :

{
  "cluster_uuid": "nSdvccf0QEuPCMRWCMxKMQ",
  "timestamp": "2019-12-05T23:11:18.283Z",
  "interval_ms": 1000,
  "type": "logstash_stats",
  "source_node": {
    "uuid": "sMohkkOkRFakKH1Y5gayDg",
    "host": "172.30.0.4",
    "transport_address": "172.30.0.4:9300",
    "ip": "172.30.0.4",
    "name": "cluster1-node1",
    "timestamp": "2019-12-05T23:11:18.284Z"
  },

Question: can logstash know about source_node parameters? this seems to refer to a node if the ES cluster. Is it fine if we don't provide this source_node object at all?

The metrics content is nested inside a logstash_stats or logstash_state top level key.

This one is easy, no issue here.

jsvd on 9 Dec 2019

If one is using LS 7.5 -> ES 7.5 and wants to upgrade the stack to 7.7, will upgrading Logstash first break monitoring...?

You are correct that this would break if the monitoring cluster is ES 7.5 and a user tries to use it monitor a LS 7.7 cluster. This is an unsupported configuration, per https://www.elastic.co/guide/en/elasticsearch/reference/current/monitoring-overview.html. Specifically:

In general, the monitoring cluster and the clusters being monitored should be running the same version of the stack. A monitoring cluster cannot monitor production clusters running newer versions of the stack. If necessary, the monitoring cluster can monitor production clusters running the latest release of the previous major version.

cachedout on 9 Dec 2019

Question: can logstash know about source_node parameters? this seems to refer to a node if the ES cluster. Is it fine if we don't provide this source_node object at all?

This is a good question. From looking at the code on the Kibana side, this is used primarily with Elasticsearch metrics. I confess that I don't know why the ES plugin adds this. @jakelandis do you know?

cachedout on 9 Dec 2019

In our support matrix, Logstash and Beats can interact with a monitoring cluster whose version sits between the version of LS/Beats and latest minor.

So an instance of Logstash 7.5 should be able to interact with a 7.7 monitoring/management cluster.

Because our stack upgrade guidelines tell users, the only thing we don't support is a Logstash 7.7 pushing monitoring data to a lesser minor version of ES, like 7.5.

jsvd on 9 Dec 2019

So an instance of Logstash 7.5 should be able to interact with a 7.7 monitoring/management cluster.

This should work. A 7.7 cluster will still have the _monitoring/bulk endpoint.

cachedout on 9 Dec 2019

👍1

There are actually 4 parts to the enhancement request here:

Endpoint switch
Document reshaping
Index name switch (from .monitoring* -> monitoring*) (e.g. no dot) (8.0 will no allow writing to dot indexes)
Configuration to make this optional, ideally mirroring the beats and Kibana config that does the same

Since the config is optional, LS upgrade passivity should not be impacted. However, the requirement for monitoring to continue work once a client starts to upgrade to 8.0 will be to enable this config, or update to Metricbeats based monitoring.

Once the config is enabled, it will simply cutout the middleman of ES and LS will start sending documents as as they are currently indexed (but to a non-dot index). The main concern around version to version compatibly is only due to the index templates for the new non-dot indexes which won't land until new ES (7.7 hopefully) lands it on start up.

if we still send other fields as null and remote _type will it still work? e.g. {"index":{"_id":null,"_index":".monitoring-logstash-7-YYYY.MM.DD","routing":null}}

_id and routing set to null should be fine. However, you will need to exclude _type, and the index name will need to exclude the dot.

I confess that I don't know why the ES plugin adds this. @jakelandis do you know?

I am not 100% sure, but I believe it is so that the UI can associate it with the correct cluster. Metricbeats monitoring and internal beats monitoring (with flags) I believe have same concern, so maybe LS can follow suite for what those already do. @ycombinator - can you help us understand how (and why) Metricbeats (or internal beats) adds the source_node to the monitoring document ?

jakelandis on 9 Dec 2019

Configuration to make this optional, ideally mirroring the beats and Kibana config that does the same

Guh. I was on the wrong tab when I filed this issue and I sent an earlier draft. Apologies. I corrected the body of the issue to reflect this.

cachedout on 9 Dec 2019

@ycombinator - can you help us understand how (and why) Metricbeats (or internal beats) adds the source_node to the monitoring document ?

Sure! The short version is that Logstash will not need to add this field when it ships its internally-collected monitoring data directly to the monitoring Elasticsearch cluster, toggled via the new config option. For the longer version, read on.

The original intent of the source_node field was for debugging purposes. The idea was that, when using internal collection (which was the only method back in the day), monitoring data would pass through the production Elasticsearch cluster on it's way to the monitoring Elasticsearch cluster. The production Elasticsearch cluster injected the source_node field as a way of recording which ES production cluster node the data passed through, in case it might be helpful for debugging (e.g. looking at that node's logs) later.

This was done for all stack monitoring data that passed through the Elasticsearch cluster, including Elasticsearch monitoring data as well (i.e. data that was collected internally by the production Elasticsearch cluster about itself).

Unfortunately, over time, the Stack Monitoring UI in Kibana started to rely on the source_node field in some queries about Elasticsearch monitoring data. For instance, I believe the Elasticsearch Node Listing page in the Stack Monitoring uses this field.

Thankfully, none of the Logstash pages in the Stack Monitoring UI rely on this field (at least that was the case last I checked — perhaps @cachedout can confirm current state?). Indeed, when Metricbeat collection is used against Logstash, it does not inject this field into documents it's shipping to the monitoring Elasticsearch cluster either. So when Logstash internally mimics the Metricbeat collection+shipping functionality, it should not bother creating this field either.

ycombinator on 11 Dec 2019

❤1

Thanks very much, @ycombinator. That's really helpful!

Thankfully, none of the Logstash pages in the Stack Monitoring UI rely on this field (at least that was the case last I checked — perhaps @cachedout can confirm current state?)

tl;dr I could not find any case where it appears that this is used.

Methodology

I looked for source_node in the codebase . There were a few instances and I looked at them individually.

x-pack/legacy/plugins/monitoring/server/lib/logstash/__tests__/get_pipeline.js

I took the source_node field out and re-ran the test. (node scripts/mocha.js --grep get_pipeline). Test still passes.

Instances in server/lib/elasticsearch. Ignored these per the comment by @ycombinator above that field is used in some questions about Elasticsearch monitoring data.
Instances in server/lib/metrics/elasticsearch. Ignored these for the same reasons as the above.
Instance in server/routes/api/v1/elasticsearch/node_detail.js. Ignored for the same reason.
server/lib/metrics/__test__/__snapshots__/metrics.test.js.snap. Looked at these instances and verified they all correspond to Elasticsearch and not Logstash.
server/lib/__tests__/create_query.js. These specifically test Elasticsearch metrics and so shouldn't apply.

Finally, I wanted to do some functional testing so I sent a bunch of documents with Logstash internal collection and then I removed the source_node field from them using the following:

POST /.monitoring-logstash-7-2019.12.11/_update_by_query
{
  "script" : "ctx._source.remove('source_node')"
}

After checking indexed documents to be sure that I didn't see source_node present, I went through the Stack Monitoring application looking for anything broken. I couldn't find anything out of order.

Overall, not sending source_node in Logstash seems like a safe path forward. Any additional suggestions for testing or corrections are welcome. Thanks!

cachedout on 11 Dec 2019

I'm working on PR #11541 and I've some questions about index name switch (from .monitoring* -> monitoring*).
That indexes are created from the templates .monitoring-logstash which are created by Elasticsearch's x-pack monitoring exporters, in this local:
https://github.com/elastic/elasticsearch/blob/378b27b9fb40459cc4792aa06616c131f498584a/x-pack/plugin/monitoring/src/main/java/org/elasticsearch/xpack/monitoring/exporter/local/LocalExporter.java#L168-L182

seems to create the template named .monitoring-logstash:
https://github.com/elastic/elasticsearch/blob/378b27b9fb40459cc4792aa06616c131f498584a/x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/monitoring/exporter/MonitoringTemplateUtils.java#L74-L76

After that when the LS monitoring write a doc to ES, the index field is empty. So my question is:

the instantiation of templates/indices in ES will be always resposability of ES xpack plugin?

andsel on 31 Jan 2020

@jakelandis @chrisronline Please see @andsel's latest comment above. Is this change still necessary?

ycombinator on 3 Feb 2020

It's really a question for @jakelandis and/or @danhermann as I don't know if it's official that we will be able to leverage the hidden indices concept yet.

chrisronline on 3 Feb 2020

I just confirmed that .dot name will continue to be allowed, so there is no need to change from .monitoring-* -> monitoring-* !

However, the 8.0 index template should declare the .monitoring-* indices as hidden as to be hidden from wildcard expansion and some general purpose UI's.

the instantiation of templates/indices in ES will be always responsibility of ES xpack plugin?

I believe the responsibility should ideally be with the system that is populating the indices. However, in this case as least during the 7.x timeframe, I think that this responsibility should remain with ES to avoid 2 systems trying control of the index template (dependent which minor version of ES that LS is talking to).

jakelandis on 3 Feb 2020

👍1

A couple of nuances around this work:

Make sure you are passing the appropriate type directly in the source document
Make sure you are nesting the source document under the type key (such as logstash_stats)
Make sure you are correctly creating the index name and sending that along too

chrisronline on 6 Feb 2020

I close this because the discussed work is in master and also backported to 7.7 with commit d9016163730f7d679302bce4bfdfefe2d7c10a60. Actually was decided to keep out the embedding of metric beat. Meta issue to track the work done is #11573

andsel on 13 Mar 2020

Was this page helpful?

0 / 5 - 0 ratings