Logstash: all output fail if one of many outputs fails

Created on 28 Jan 2015 · 16Comments · Source: elastic/logstash

We have an issue in logstash when our elasticsearch cluster slows down.

Essentially we use logstash with multiple outputs, elasticsearch, graphite and hdfs.

If any of the above outputs have issues all outputs will fail.

It would be great if only the output the issue failed and didn't impact the others.

enhancement

Source

rgardam

Most helpful comment

@untergeek thanks for the response.

Please consider update the documentation to make the current behavior clear. i had the complete opposite expectation that this was how logstash would operate if one of my outputs disconnected (for example, a rabbitmq output disconnecting causes all of the other outputs to fail).

i'm confident that most people would not expect this behavior from logstash by default, and when things go wrong they will be losing messages when having a tcp/http input (unless all of the producers are smarter enough to handle). understanding how this works will help developers architect their system to do like you suggest early on

k2xl on 11 Mar 2016

👍3

All 16 comments

Could you describe a bit more what issues one of the output might have? Do you mean exceptions there? wrong data? connection problems?

purbon on 28 Jan 2015

@rgardam

In the current design we have a single buffer before all the outputs, this is to ensure that an event only gets removed from the buffer when all outputs have processed that event.
This can cause issues for other outputs if one is slowed down or unavailable.

I think you are after a solution that every output has its own buffer so that if Output A is slowed down or unavailable that Output B still continues to function without losing any events at Output A.

Am i correct?

electrical on 28 Jan 2015

Yeah exactly
In our particular case we use the elasticsearch_http output and I imagine it's not able to to get confirmation back from elasticsearch and so the buffer would fill for that output and then everything stops.

rgardam on 28 Jan 2015

We are working on a new feature that will implement much better buffering / queueing inside Logstash ( https://github.com/elasticsearch/logstash/pull/1939 )
Which is currently targeted for logstash 2.0

With the current design its impossible to do what you want i'm afraid.

electrical on 28 Jan 2015

Awesome! I'm looking forward to seeing this as it does cause us some pain when it happens.

rgardam on 28 Jan 2015

We have a similar issue when shipping logs to elasticsearch and HDFS. If HDFS goes down (or more often the case elasticsearch), then the buffer fills up and logstash stops sending logs to all outputs. It ends up being very brittle since a failure of any downstream output causes all downstream outputs to stop receiving data. We'd actually like to hook up more non-critical downstream outputs but don't want more failure points for ES and HDFS which are the two critical ones.

Looking forward to 2.0!

sarus on 12 Feb 2015

Is this still the case in 2.1?

alex88 on 16 Jan 2016

At my understanding @alex88, even with the new changes introduced recently in the pipeline, if one output is stuck all output face is stuck. But I'm sure @andrewvc would be able to confirm or deny, what do you think?

purbon on 18 Jan 2016

+1, this is a critical ability for messaging pipelines where strong consistency across outputs isn't required

k2xl on 5 Feb 2016

Message durability and consistency are of the utmost importance to Logstash. If you need complete consistency to one output, and don't care about complete consistency to another, you'll need two pipelines. A single Logstash instance cannot yet provide this. As far as delayed delivery goes, the dead letter queue functionality that Logstash is providing will prevent a buffer-overflow level backup of the pipeline. As stated, though, that will not function for a single pipeline, as the backup will result in all messages going to the DLQ, not just the ones not making it to the slow/offline output.

If you have this sort of issue, the best choice I can currently offer is to have a second (or more) output to a broker (Redis, for example), and that second pipeline/Logstash-instance reading from that, and feeding the slow/lossy output.

So long as Logstash has a single pipeline, there will be no way to do what is described in this issue in a single stage. When (or if) Logstash adopts multiple pipelines and adds a DLQ, that will be a different story, but until then there is nothing that will permit Logstash to address your request, as it is in opposition to the design specs of (at least the best attempt at) guaranteed delivery of all messages to all outputs.

Closing. Please feel free to re-open if this answer is not satisfactory. We'd be happy to continue the discussion.

untergeek on 11 Mar 2016

👍2

@untergeek thanks for the response.

k2xl on 11 Mar 2016

👍3

Is this behavior still present in 5.4? We have a use case where we have multiple elasticsearch destinations and multiple syslog servers as outputs. There are times when there's a typo in the hostname of the syslog server and logstash can't reach it via tcp. In that case events would stop to all the other outputs as well?

ronakg on 27 May 2017

Yes, it seems like, since we also recently faced the same issue...We have the output configured with Solr/ES and HDFS...and ES was down which caused the logstash to stop writing to all of them...
Is there a fix for this issue please help

DivyaYash on 9 Jun 2017

Still present for 5.4 and tcp in my environment. I don't understand how this isn't fixed.

wadejensen on 28 Aug 2017

👍2

I have deleted all the empty "+1" comments. Please refrain from this noise in the future.

As for the request, you can use the Logstash multiple pipelines feature to route data from your main pipeline to a secondary pipeline where you don't care about delivery.

Please advise if this does not work for your needs.

jordansissel on 20 Feb 2018

This is happening to me in 7.0.0

With something as simple as:

output {

  elasticsearch {
    hosts => "[::1]:9200"
    manage_template => false
    index => "test-%{+YYYY.MM.dd}"
  }

  http {
    http_method => "post"
    format => "json_batch"
    url => "http://[::1]:8080"
  }

}

If Logstash doesn't receive a 200 from url, then will keep retrying without sending to Elasticsearch.

I tried also the suggestion from @jordansissel of splitting the output into a separate pipeline, but the behavior is the same.