Beats: [Discuss] Monitoring deployment

Created on 2 Dec 2019 · 20Comments · Source: elastic/beats

For the 8.0.0 release of the stack, all stack components, including Beats must be monitored using Metricbeat instead of Beats shipping their own monitoring data.

Some stack teams are choosing to bundle Metricbeat. Below are links to open issues tracking these requests:

Elasticsearch
Kibana
Logstash

This issue is being opened to discuss the deployment story for monitoring with Beats/APM. (I have filed a separate issue in the APM repo but was asked by @graphaelli to link to a discussion with the Beats folks.

Some high-level questions to be discussed:

Would it be OK to require a user to download/install Metricbeat to monitor a beat for 8.0?
Should Beats adopt a "bundling" strategy similar to the rest of the stack? If not can, Fleet/Agent/EPM project meet the need of provisioning/enabling monitoring for a beat?
For any management strategy, how would it work in the case of containerization?

cc: @elastic/beats-contributors @elastic/apm-server

Stack monitoring Stalled needs_team

Source

cachedout

Most helpful comment

There's been a couple of discussions off-issue that might help move this issue forward now, I think.

In 7.x, we've decided to keep the monitoring.* settings in all Beats. These settings allow Beats to ship their monitoring data directly to a monitoring cluster (i.e. without going through the custom monitoring bulk API on their production cluster). Keeping these settings around in 7.x allows Beats users a smoother upgrade path to 8.0.

Perhaps we should keep these settings around for all Beats even in 8.x, at least until Agent is GA? Once Agent is GA, all Beats that would be managed by Agent could be monitored via a Metricbeat that's spun up by the Agent (or not, TBD). But for Beats that wouldn't be managed by Agent (e.g. Functionbeat), maybe we keep the monitoring.* settings and underlying functionality available just for them?

ycombinator on 6 Dec 2019

👍3

All 20 comments

Pinging @elastic/stack-monitoring (Stack monitoring)

elasticmachine on 2 Dec 2019

What are the plans for Functionbeat? Metricbeat should not be deployed with any cloud function on a cloud provider. Are we going to ask people to monitor the functions using the providers' dashboards? Or do we want to keep internal monitoring of Functionbeat and use our stack instead?

kvch on 3 Dec 2019

Or do we want to keep internal monitoring of Functionbeat and use our stack instead?

This is a great question. In looking at the current published documentation across the suite of Beats, it seems that all Beats but Functionbeat mention both Metricbeat monitoring and internal collection as monitoring strategies. However, Functionbeat mentions only internal collection.

Given this, I'm speculating that at some point there was an intentional decision to forgo Metricbeat monitoring for Functionbeat but I don't recall being personally involved in that conversation. @ycombinator is this a discussion you might have been involved in?

If we do not require Metricbeat to monitor Functionbeat, then I would presume that we would also need to keep the monitoring code around in libbeat which would affect these plans.

I will wait for @urso and @ycombinator to chime in to see if there are past discussions which could shed more light on this. Thanks for raising this point, @kvch

cachedout on 3 Dec 2019

@kvch @ph What's the plan for Functionbeat as compared to other Beats when it comes to Agent? Last I recall we were not going to bundle Metricbeat with other Beats given than they would be managed by the Agent, which could also manage a Metricbeat instance to monitor the Beat. But will this also apply to Functionbeat, since it doesn't run in the same type of environments as other Beats?

ycombinator on 3 Dec 2019

You can not monitor functionbeat similar to the other Beats. Functionbeat is active only for a short amound of time.

Collecting Cloudwatch metrics would be a good start I think: https://docs.aws.amazon.com/lambda/latest/dg/monitoring-functions-metrics.html

We can also create custom cloud watch metrics or have the Beat send metrics to a monitoring endpoint before it shuts down (which would negatively impact the lambda execution time).

urso on 3 Dec 2019

👍2

For 8.0, I think the Elastic Agent will solve the monitoring solve the monitoring problem for all Beats. @ph Any issue you could link to for this?

ruflin on 4 Dec 2019

@ruflin said:

For 8.0, I think the Elastic Agent will solve the monitoring solve the monitoring problem for all Beats. @ph Any issue you could link to for this?

Even for Functionbeat, keeping in mind that it gets deployed to a serverless environment?

ycombinator on 4 Dec 2019

@ycombinator No, that is the exception. I would also recommend to have the discussion around function beat separately as it works very differently.

ruflin on 4 Dec 2019

@ruflin Yup, that is the discussion we are having above (started by @kvch's comment) 😄. I can see a couple of options for it:

Per @urso's comment, we forgo Stack Monitoring for Functionbeat. Instead we rely on the cloud provider's monitoring (e.g. AWS Cloudwatch).
We keep the monitoring.* settings around for Functionbeat but remove them from other Beats, starting 8.0.0 (and deprecate them for other Beats starting 7.x). These settings allow users to ship the Beat's monitoring data directly to an Elasticsearch monitoring cluster (as opposed to routing it through an Elasticsearch production cluster which exports it to an Elasticsearch monitoring cluster).

As @urso noted, this may negatively impact the Functionbeat lambda's execution time.

Note that this option does not necessarily preclude us from also pursuing option 1 in the future, if we choose to do that.

One related thing that might be good to understand (at least for me) is how much telemetry we have today about Functionbeat. If there is telemetry, it means users are configuring it for Stack Monitoring (since Telemetry uses the same data path). @cachedout is this something you could chase down?

ycombinator on 4 Dec 2019

For 8.0, I think the Elastic Agent will solve the monitoring solve the monitoring problem for all Beats.

I'm not sure this is true. We will continue having the single beats for 8.x. Plus: who solves the stack monitoring for non Beats?

One related thing that might be good to understand (at least for me) is how much telemetry we have today about Functionbeat.

As of today we have no telemetry on functionbeat. Telemetry info we could send while publishing events and kill the sending if events is done. But then telemetry data are rather stats. Metrics are not.

urso on 4 Dec 2019

@cachedout is this something you could chase down?

Yes, I am working on it.

cachedout on 5 Dec 2019

There are actually two ways we can get metrics for functionbeat due to the nature of the start and possible freeze of lambda executables.

We could send the Monitoring data when Function is actively sending data to Elasticsearch.
Dump the metrics in the logs, theses logs would be send to cloudwatch logs and we could parse them.
Cloudwatch metrics as noted by @urso.

ph on 5 Dec 2019

@cachedout @ycombinator Did any teams expressed concerns about the artifact size increase?

ph on 5 Dec 2019

@ph The only comment I have heard in that vein was that the Kibana team is "evaluating" the artifact size but I haven't heard any serious reservations from anyone.

cachedout on 5 Dec 2019

I followed up with @alexfrancoeur today regarding the question regarding Functionbeat. Unfortunately, this isn't really something that telemetry can answer for us right now. He and I plan to talk again in the coming weeks on introducing some additional telemetry data that might be able to answer this question in the future but for now I'm afraid what we have currently won't be of much use.e

cachedout on 6 Dec 2019

There's been a couple of discussions off-issue that might help move this issue forward now, I think.

ycombinator on 6 Dec 2019

👍3

Perhaps we should keep these settings around for all Beats even in 8.x, at least until Agent is GA?

👍 to this plan. I like that this basically just allows Agent to mature and then we swap it in as the default once we feel comfortable.

cachedout on 6 Dec 2019

Perhaps we should keep these settings around for all Beats even in 8.x, at least until Agent is GA? Once Agent is GA, all Beats that would be managed by Agent could be monitored via a Metricbeat that's spun up by the Agent (or not, TBD). But for Beats that wouldn't be managed by Agent (e.g. Functionbeat), maybe we keep the monitoring.* settings and underlying functionality available just for them?

Agree with your proposal @ycombinator

ph on 6 Dec 2019

👍1

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.