Beats: Support multiple outputs of the same type (like two independent Logstash clusters)

Created on 25 Feb 2016 · 24Comments · Source: elastic/beats

Beats already support outputting to any combination of one file, ES, LS, redis, kafka, and console at the same time. Could Beats be enhanced to support multiple outputs of the same type? An example use case is shown below.

            |-->Logstash input host (dev)  -> redis queue (dev)  ->  logstash filter hosts  (dev) -> elasticsearch (dev) 
Filebeat  --|
            |-->Logstash input host (prod) -> redis queue (prod) ->  logstash filter hosts  (prod) -> elasticsearch (prod)

Assuming there are no issues around multi-instantiation of an output type, then I think the biggest challenge here is around how to handle this in configuration (without braking backwards-compatability).

enhancement libbeat

Source

andrewkroh

❤5 👍3 🎉1

Most helpful comment

Sending to dev and production simultaneously is just one use case. There could potentially be many. One example I can think of is

send to logstash-A for non-repudiation
send to logstash-B for metrics collection, monitoring
send to logstash-C for consumer use

Just a thought..

richardjq on 3 Jun 2016

👍40

All 24 comments

outputs kinda act like as plugins and can be instantiated multiple times (we want to double-check no outputs having any global variables).

Mostly a matter of configuration handling, which might get more complicated. So far we did rely on logstash for event routing and processing.

urso on 26 Feb 2016

@urso If we would use for outputs the same model as we use for prospectors and in the future for metricbeat this would probably be quite simple (at least from a config perspective).

ruflin on 26 Feb 2016

Thinking about this scenario: sending to dev+production at the same time, doing so from within beats or logstash is a bad idea. Both systems block if output is unresponsive (which can clearly happen with dev environments) also affecting the production processing pipeline.

Better decouple systems e.g. via kafka and use distinguished consumer-groups reading events from same topic.

urso on 26 Feb 2016

Closing. See urso's recommendation above. Feel free to reopen or leave additional comments.

andrewkroh on 9 Mar 2016

Sending to dev and production simultaneously is just one use case. There could potentially be many. One example I can think of is

send to logstash-A for non-repudiation
send to logstash-B for metrics collection, monitoring
send to logstash-C for consumer use

Just a thought..

richardjq on 3 Jun 2016

👍40

+1
@richardjq I'm trying to do exactly that and now I have to create a new init service to have 2 filebeat processes watch the same files.
For now, I plan to make the logstash 1 forward logs to logstash 2. But it is a non-resilient setup because if logstash 1 fails, logstash 2 won't see the logs.

prevostc on 6 Jul 2016

@richardjq @prevostc See my original comment. Problem is, when having multiple logstash outputs in beats (doing event routing essentially), these logstash instances implicitly get coupled via beats. If one instance is down or unresponsive, the others won't get any data. A message queue like kafka will help to uncouple these systems as long as kafka is operating.

Using logstash for event-routing will suffer similar problems (due to back-pressure), until persistent queuing will be added to logstash.

TBH I'm not all to hesitant to knowingly add a shoot-into-your-foot feature. Maybe in future versions it might be sensible to do some form of event routing, but not yet.

urso on 6 Jul 2016

👍5

This feature would be useful when migrating one cluster to another, and you want to send all beat traffic to both clusters for a while to make sure everything works. (and maybe even compare the results in the clusters to really make sure the new cluster works). The only way to solve this now is to run two beat agents on each VM, or setup some fancy proxy that duplicates the traffic.

perjahn on 5 Jul 2017

👍6 ❤3 🎉2

Hi there. I have the same requirement.

Sadly, as of the stupid design, I have to add a different filebeat instance on the same host with a different config file and a different init.d script.

demofly on 19 Aug 2017

👍5 👎3

See from dev comments that blocking is a concern. Assuming that you have looked for options, but wouldn't allowing one process, all independent, for each outputter handle that?

sastorsl on 21 Aug 2017

👍1

Another use case:
I think, sending events from Filebeat to multiple logstash or elasticsearch is essential for couple of use cases. I have a design of two completely independent cloud platforms for nonprod and prod. I have filebeat out of these two cloud environment and want to push set of log files to prod logstash and another set of log files to nonprod logstash server from single Filebeat instance. If filebeat doesnot support multiple logstash routing then as a workaround need to run the filebeat in a different port in the same server. Why to run 2 filebeat for one simple task.
There are recommendations to use kafka but unfortunately we are not using Kafka.

get2arun on 12 Oct 2017

👍4

Hi this feature would be useful if you have two services running on a single machine that have radically different log formats and you want the logs to go to Logstash on different ports so logstash pipelines can treat them differently (eg diff grok filter and send to diff index in elastic search)

please consider this.

karlroberts on 30 Nov 2017

👍8

Another valid use case is if you are sending filebeat to logstash and you want to use multiple pipelines for logstash. The pipeline input would be on separate ports, since it is running to the same logstash (indexers) the blocking concern may not be valid.

The logstash pipelines looks really good but not possible with one filebeat (assuming the logs are all on one server). (Please correct me if I am wrong)

hurrycaine on 20 Feb 2018

@hurrycaine That is the same use-case we have. We have separate Logstash pipelines for our IIS, Tomcat, JBoss, Log4Net, Log4J, and RabbitMQ log files, so since we cannot ship to independent outputs within a single Filebeat we have resorted to deploying a separate agent per logtype which is not ideal.

jakauppila on 20 Feb 2018

I have a similar requirement here, but actually slightly different. I need to be able to take the same data output from a single beat and push them to two different elasticsearch indexes in the same cluster. The end result right now is I have to run two different daemons to accomplish this. Which is less than ideal as it adds unneeded load and complexity to our systems.

supernomad on 12 Mar 2018

Yet another use case is pushing two different log types to Kafka, yet enriching them on the fly in two different ways (e.g. adding some fields to one of the outputs).

kaplun on 17 May 2018

we have the same problem during we are migrating our kafka cluster from classic network to VPC, so want to send all beat traffic to both clusters for a while to make sure everything works。

foxracle on 18 Sep 2018

Hey folks, Beats is meant to be a lightweight shipper. Moreover, one of the reasons it sends only to one endpoint is to be able to guarantee delivery. If that endpoint stops responding, it stops sending and simply queues up the messages. If it supported multiple outputs, the logic to guarantee delivery (especially when one of the destinations is unavailable for a while) would become very tricky.

Beats will not go there by design.

So the best way for you to accomplish a fan out is to send everything to Logstash, which can then split out to whatever outputs you have in mind. If you do this, please know that Beats guarantees delivery to Logstash. After that, it depends on what you do, and Logstash will be the system dealing with difficult delivery guarantees in cases of multiple outputs.

webmat on 18 Sep 2018

I see thank you for the explanation, however I would like to pose some problems with the solution you are giving @webmat.

In my situation using logstash would just not work due to the sheer volume of incoming data. We would need to spin up a very large logstash cluster to handle it all, just so that we can have two or three outputs for data to be duplicated to in the same elasticsearch cluster. Right now we are handling TB of metrics data a day via metricbeat coming from 1000+ nodes spread out over the world. Each individual metricbeat instance is capable of the load as is the large elasticsearch cluster acting as the data sink. Please correct me if I am wrong but our current usage of logstash for the logging portion of our cluster is proving to be very resource intensive and that system collects in the 100's of GB a day let alone TB of additional data :confused:

That being said we have worked around the problem by duplicating the metricset collectors and having them change the index that they are mapped to based on the collectors fields. This works but is less than ideal, as we need to duplicate 10-15 collectors multiple times just so we can have this functionality. Would a PR that made this possible, while keeping the guaranteed delivery mechanic, be reviewed and possibly accepted? Or is what you are saying that there is no relatistic way it would be considered?

supernomad on 18 Sep 2018

👍1

Without knowing the specifics, it's hard to make solid recommendations.

But keep in mind that it's possible to deploy another set of Logstash servers which only do this splitting of the event stream. This would be a much lighter use case for Logstash than a cluster doing heavy text parsing with grok, ruby filters & the like.

webmat on 18 Sep 2018

@jakauppila Have you looked at Logstash pipelines and pipeline to pipeline communication?

webmat on 18 Sep 2018

@webmat Logstash pipelines was my argument for filebeat to send to two ports (same server). See my response above. As of right now you can not use filebeat to send logstash that has multiple pipelines.
Or am I missing something.

hurrycaine on 19 Oct 2018

@hurrycaine If you want to separate your event streams into multiple pipelines, you can try the distributor pattern. Only one output is needed on the Beats side, and the separation of the event streams happens inside Logstash.

For high throughput scenarios like @supernomad describes, you can also have one set of Logstash instances whose only role is receiving everything and splitting it out to multiple queues (e.g. multiple Redis or split to multiple Kafka topics). Then later down the line you have another set of Logstash servers that works off of these queue to do the actual parsing and enrichment.

In either cases, Beats is able to uphold its delivery guarantee to one destination.

webmat on 22 Oct 2018

👍1

@webmat I'm guessing the solution would be for filebeat to support similar pipelines model.
syslog -> elasticsearch (Kibana "Logs" using filebeat index naming)
application log -> logstash (to be mutated and additional filters before being sent to ES and most likely to different index name i.e. logstash-*)
and so on.

Even if wanted to bypass logstash all together an send to multiple indexes:

syslog -> elasticsearch filebeat-* index
application (maybe already in json) -> elasticsearch logs-* index

This allows me to send to multiple different index names on even with the same elasticsearch output.