Kibana: Telemetry & Monitoring: Kibana Monitoring & BulkUploader

Created on 12 Jun 2020 · 12Comments · Source: elastic/kibana

Lately we've noticed that when adding some cluster-level stats (#68603 and #64935), those collectors are not registered as Usage or Stats collector in Kibana, because they are not supposed to be reported as part of the stack_stats.kibana.plugins payload.

This results in missing some information when monitoring is enabled. As far as I could understand from taking a look at the code in x-pack/plugins/monitoring, the information reported to the Monitoring cluster is collected via the code in x-pack/plugins/monitoring/server/kibana_monitoring. More specifically, in the bulk_uploader.js file.

I'm creating this issue to review the logic in BulkUploader to:

[ ] Do not start collecting until Self-Monitoring is enabled and fully started.
[x] Provide a way to report Kibana-collected cluster-stats to be used later on when sending telemetry from the Monitoring cluster.
[x] When Kibana belongs to a monitoring cluster, but that same cluster is not self-monitoring itself (only has monitoring info from other clusters), it won't report any telemetry about itself.
[x] Are there any differences when Metricbeat is used vs. the legacy collector mechanism? Has this API https://github.com/elastic/kibana/blob/3d0552e03ce1ff2112a7ca50555575c2cd828c1b/src/legacy/server/status/routes/api/register_stats.js#L67 anything to do with it?

Stack Monitoring Telemetry Meta KibanaTelemetry Monitoring

Source

afharo

All 12 comments

Pinging @elastic/stack-monitoring (Team:Monitoring)

elasticmachine on 12 Jun 2020

Pinging @elastic/pulse (Team:KibanaTelemetry)

afharo on 12 Jun 2020

When monitoring is enabled, cluster_stats usage data is read from the .monitoring-es-* indices. This means that any usage data not collected from Kibana needs to be added to those indices (pushed by elasticsearch and beats) in order to retain parity between local and monitoring collection. We need to decide if this is appropriate and, if not, determine the best path forward for monitoring-shipped usage data.

TinaHeiligers on 25 Jun 2020

👍1

@Bamieh
I ran a terms agg to see the % of usage data that's reported through local and monitoring collection:

{
  "aggs" : {
    "telemetry_collection" : {
      "terms" : { "field" : "stack_stats.xpack.monitoring.collection_enabled" } 
    }
  }

Roughly 25% of data that's reported is through monitoring.

TinaHeiligers on 30 Jun 2020

👀1 👍1

And it could happen that those monitored clusters are also reporting the local telemetry themselves? 😅

afharo on 30 Jun 2020

@afharo maybe, maybe not. We can't tell, since we don't combine local collection and collection through monitoring ATM.

TinaHeiligers on 30 Jun 2020

I mean: the monitored cluster reporting local telemetry + the monitoring cluster reporting on its behalf.
It would be nice to know that ratio because if, for instance, 90% of the clusters that are reported via monitoring also report local-collected telemetry themselves, then, disabling telemetry from monitoring would affect to even fewer clusters: we would only lose 2.5% of the clusters

afharo on 30 Jun 2020

Are there any differences when Metricbeat is used vs. the legacy collector mechanism? Has this API

There should not be a difference here. The bulk uploader (which is used by monitoring plugin when collecting monitoring data for legacy collection) and the stats api (which is used by Metricbeat when collecting monitoring data for Metricbeat collection) should use the same exact code, or at the very least, return the same output. There is a ticket to better consolidate this but hasn't been worked on yet. It's worth noting that we have a collection of parity tests that ensure Metricbeat collected monitoring documents are identical to documents collected through legacy collection.

For future proofing, the bulk uploader is going away in 8.0. We are currently deprecating that behavior for 7.x and will completely remove it in 8.0 so it might not be worth it to invest much in that area of the code.

We still want to be sure we understand the telemetry story here, but I'm not sure I'm entirely up to date on it. Happy to help anyway I can though

chrisronline on 30 Jun 2020

🚀1

I see this is targeted for 7.10. Are we still on track for that release? Many production clusters have monitoring enabled and we'll want to start receiving additional telemetry for them as soon as possible. Let me know if there is anything I can do to help expedite.

alexfrancoeur on 5 Aug 2020

AFAIK, we are discussing an RFC to, possibly entirely remove the Kibana-related telemetry from the monitoring collection. If that happens, I think we can close or repurpose this issue to make that happen 🙂

afharo on 10 Aug 2020

++ I believe we capture data from multiple Kibana instances today when monitoring is enabled, so we'd have to understand impact of removing complete.

I do think not having data telemetry from monitoring clusters will become more visible soon as we begin to trust and use the data. If 25% of clusters really have monitoring enabled, and most production clusters have monitoring enabled (assumption) then we're really only capturing a small subset of production clusters. Should we have a sync specifically to discuss the RFC?

alexfrancoeur on 18 Aug 2020

After the discussions in the RFC and the changes in https://github.com/elastic/kibana/pull/82638, I think we can close this issue.

There will be one outstanding item:

[ ] Do not start collecting until Self-Monitoring is enabled and fully started.

But since bulk_uploader is going to be removed in 8.0, maybe we can let it be for now?

afharo on 6 Nov 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

convert json of a kibana visualization to an elastic query

socialmineruser1 · 3Comments

Upgrade Assistant - Scripting settings - from 5.6 to 6.x

LukeMathWalker · 3Comments

Using unique ids for plugins in new platform

timroes · 3Comments

Visualize: Authorization Exception Error when loading dashboard with "too many" visualizations

mark54g · 3Comments

Embedded objects in a dashboard should use the same font across different OSs

stacey-gammon · 3Comments