Logstash: all output fail if one of many outputs fails

Created on 28 Jan 2015  路  16Comments  路  Source: elastic/logstash

We have an issue in logstash when our elasticsearch cluster slows down.

Essentially we use logstash with multiple outputs, elasticsearch, graphite and hdfs.

If any of the above outputs have issues all outputs will fail.

It would be great if only the output the issue failed and didn't impact the others.

enhancement

Most helpful comment

@untergeek thanks for the response.

Please consider update the documentation to make the current behavior clear. i had the complete opposite expectation that this was how logstash would operate if one of my outputs disconnected (for example, a rabbitmq output disconnecting causes all of the other outputs to fail).

i'm confident that most people would not expect this behavior from logstash by default, and when things go wrong they will be losing messages when having a tcp/http input (unless all of the producers are smarter enough to handle). understanding how this works will help developers architect their system to do like you suggest early on

All 16 comments

Could you describe a bit more what issues one of the output might have? Do you mean exceptions there? wrong data? connection problems?

@rgardam

In the current design we have a single buffer before all the outputs, this is to ensure that an event only gets removed from the buffer when all outputs have processed that event.
This can cause issues for other outputs if one is slowed down or unavailable.

I think you are after a solution that every output has its own buffer so that if Output A is slowed down or unavailable that Output B still continues to function without losing any events at Output A.

Am i correct?

Yeah exactly
In our particular case we use the elasticsearch_http output and I imagine it's not able to to get confirmation back from elasticsearch and so the buffer would fill for that output and then everything stops.

We are working on a new feature that will implement much better buffering / queueing inside Logstash ( https://github.com/elasticsearch/logstash/pull/1939 )
Which is currently targeted for logstash 2.0

With the current design its impossible to do what you want i'm afraid.

Awesome! I'm looking forward to seeing this as it does cause us some pain when it happens.

We have a similar issue when shipping logs to elasticsearch and HDFS. If HDFS goes down (or more often the case elasticsearch), then the buffer fills up and logstash stops sending logs to all outputs. It ends up being very brittle since a failure of any downstream output causes all downstream outputs to stop receiving data. We'd actually like to hook up more non-critical downstream outputs but don't want more failure points for ES and HDFS which are the two critical ones.

Looking forward to 2.0!

Is this still the case in 2.1?

At my understanding @alex88, even with the new changes introduced recently in the pipeline, if one output is stuck all output face is stuck. But I'm sure @andrewvc would be able to confirm or deny, what do you think?

+1, this is a critical ability for messaging pipelines where strong consistency across outputs isn't required

Message durability and consistency are of the utmost importance to Logstash. If you need complete consistency to one output, and don't care about complete consistency to another, you'll need two pipelines. A single Logstash instance cannot yet provide this. As far as delayed delivery goes, the dead letter queue functionality that Logstash is providing will prevent a buffer-overflow level backup of the pipeline. As stated, though, that will not function for a single pipeline, as the backup will result in all messages going to the DLQ, not just the ones not making it to the slow/offline output.

If you have this sort of issue, the best choice I can currently offer is to have a second (or more) output to a broker (Redis, for example), and that second pipeline/Logstash-instance reading from that, and feeding the slow/lossy output.

So long as Logstash has a single pipeline, there will be no way to do what is described in this issue in a single stage. When (or if) Logstash adopts multiple pipelines and adds a DLQ, that will be a different story, but until then there is nothing that will permit Logstash to address your request, as it is in opposition to the design specs of (at least the best attempt at) guaranteed delivery of all messages to all outputs.

Closing. Please feel free to re-open if this answer is not satisfactory. We'd be happy to continue the discussion.

@untergeek thanks for the response.

Please consider update the documentation to make the current behavior clear. i had the complete opposite expectation that this was how logstash would operate if one of my outputs disconnected (for example, a rabbitmq output disconnecting causes all of the other outputs to fail).

i'm confident that most people would not expect this behavior from logstash by default, and when things go wrong they will be losing messages when having a tcp/http input (unless all of the producers are smarter enough to handle). understanding how this works will help developers architect their system to do like you suggest early on

Is this behavior still present in 5.4? We have a use case where we have multiple elasticsearch destinations and multiple syslog servers as outputs. There are times when there's a typo in the hostname of the syslog server and logstash can't reach it via tcp. In that case events would stop to all the other outputs as well?

Yes, it seems like, since we also recently faced the same issue...We have the output configured with Solr/ES and HDFS...and ES was down which caused the logstash to stop writing to all of them...
Is there a fix for this issue please help

Still present for 5.4 and tcp in my environment. I don't understand how this isn't fixed.

I have deleted all the empty "+1" comments. Please refrain from this noise in the future.


As for the request, you can use the Logstash multiple pipelines feature to route data from your main pipeline to a secondary pipeline where you don't care about delivery.

Please advise if this does not work for your needs.

This is happening to me in 7.0.0

With something as simple as:

output {

  elasticsearch {
    hosts => "[::1]:9200"
    manage_template => false
    index => "test-%{+YYYY.MM.dd}"
  }

  http {
    http_method => "post"
    format => "json_batch"
    url => "http://[::1]:8080"
  }

}

If Logstash doesn't receive a 200 from url, then will keep retrying without sending to Elasticsearch.

I tried also the suggestion from @jordansissel of splitting the output into a separate pipeline, but the behavior is the same.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

packetrevolt picture packetrevolt  路  3Comments

simmel picture simmel  路  4Comments

JPvRiel picture JPvRiel  路  3Comments

OrangeDog picture OrangeDog  路  4Comments

bobbyhubbard picture bobbyhubbard  路  3Comments