Logstash: Logstash problem: message=>"Failed to flush outgoing items",<Errno::EBADF: Bad file descriptor - Bad file descriptor>

Created on 15 Dec 2014  Â·  26Comments  Â·  Source: elastic/logstash

Since a couple of months ago, I'm experiencing problem with logstash, in the logs I have several error like the following:

{:timestamp=>"2014-12-15T10:30:25.482000-0700", :message=>"Failed to flush outgoing items", :outgoing_count=>252, :exception=>#<Errno::EBADF: Bad file descriptor - Bad file descriptor>, :backtrace=>["org/jruby/RubyIO.java:2097:in `close'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/connection.rb:173:in `connect'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/connection.rb:168:in `connect'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/connection.rb:156:in `connect'", "org/jruby/RubyArray.java:1613:in `each'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/connection.rb:139:in `connect'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/agent.rb:406:in `connect'", "org/jruby/RubyProc.java:271:in `call'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/pool.rb:48:in `fetch'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/agent.rb:403:in `connect'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/agent.rb:319:in `execute'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/agent.rb:217:in `post!'", "/opt/logstash/lib/logstash/outputs/elasticsearch/protocol.rb:106:in `bulk_ftw'", "/opt/logstash/lib/logstash/outputs/elasticsearch/protocol.rb:80:in `bulk'", "/opt/logstash/lib/logstash/outputs/elasticsearch.rb:315:in `flush'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/stud-0.0.17/lib/stud/buffer.rb:219:in `buffer_flush'", "org/jruby/RubyHash.java:1339:in `each'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/stud-0.0.17/lib/stud/buffer.rb:216:in `buffer_flush'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/stud-0.0.17/lib/stud/buffer.rb:193:in `buffer_flush'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/stud-0.0.17/lib/stud/buffer.rb:112:in `buffer_initialize'", "org/jruby/RubyKernel.java:1521:in `loop'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/stud-0.0.17/lib/stud/buffer.rb:110:in `buffer_initialize'"], :level=>:warn}

I have ElasticSearch 1.1.1 and Logstash 1.4.2, and the configuration for logstash is like this:

input {
    udp {
        type => "udp"
        port => 30000
    }
}

filter {
   if [type] == "udp" {
      mutate {
        rename => ["@host", "host"]
      }
      dns {
         reverse => ["host"]
         action => "replace"
      }
      grok {
          patterns_dir => "/etc/logstash/patterns"
          match => ["message", "%{OLD_MESSAGE}",
                    "message", "%{NEW_MESSAGE}"
                   ]
      }
   }
}

output {
    elasticsearch {
        protocol => "http"
        cluster => "logstash"
        host => "127.0.0.1"
        index => "logstash-syslog-%{+YYYY.MM.dd}"
    }

}

What is wrong? Is there any bug in Logstash? Should I upgrade it?

Thanks in advance/

logging improvements

Most helpful comment

So this isn't a bug _per se,_ but perhaps we could catch this exception and return a more descriptive error message in the log. This may be beneficially informational in cases where Logstash is at the mercy of the remote service.

Added the "logging improvements" label.

All 26 comments

Hi,

interesting issue.
Can you confirm your ES instance is not having any issues?

I checked the ES instance everytime I have this error but there is no error on the ES log files but the ES process died. I found surprising that the problem goes to ES instead Logstash.
Any idea?

can you check your system logs?
Perhaps ES got OOM killed? ( Out Of Memory ) or something else that might give an indication.

@nadalie can you check your file descriptor limit? what is your OS?

Sorry for not answering before, I was out for holidays.
The maximun number of file descriptors is set to 64k:
I have already set to the maximum (to 64k):

  "process" : {
    "refresh_interval" : 1000,
    "id" : 862,
    "max_file_descriptors" : 65535,
    "mlockall" : false
  }

And for the logstash process is set to maximum 10000 open files in /etc/security/limits.conf.
Even though, I'm not logging so many data so this limit should not be reached, the maximum is 15.000 messages in a day, so even the connection needs time to close it, it shouldn't be a problem.

I'm running on Linux and I'm thinking that maybe this is happening because running out of RAM memory. My server has only 3gigas of RAM but only 256 megas assigned to ES, my ES configuration is 5 shards and creating a index per day and keeping just the latest 30 indexes.
Could it be this?

I had issue with very same symptoms. I'm running the same versions of LS and ES as described, I have it in containers with lot of resources.

My issue was that output -> elasticsearch -> host was set to just hostname "es1" and my search domain in resolv.conf was in wrong order, so I got public IP, where is forbidden to access elasticsearch port 9200. I provided FQDN (or just IP) and everything started to work.

@nadalie this is not obviously your case, but I suggest to doublecheck that your ES is accessible from where logstash is running and there is no obstruction on the port. Something like telnet 127.0.0.1 9200

@vasekch I'm running both in the same machine so I'm able to connect to the ES port is accesible from logstash, thanks!

I just came across this issue and similar to what @vasekch mentioned, it turned out to be networking stack related to ES. I am also running a non-data ES node on the same logstash box - turns out an iptables change I made forgot to allow an exception for localhost traffic. I was able to do a curl against the localhost ES server before I put in the iptables fix which initially threw me off.

What do you suggest it's a networking stack problem? I didn't change anything from iptables(in fact, I allow connections from 9200 "-A INPUT -m state --state NEW -m tcp -p tcp --dport 9200 -j ACCEPT
" ). I don't understand why without changing anything in the configuration and so on, I suddenly experiencing problems.

@nadalie: in my case I had error message every time, if I'm getting you correctly you only have this message in logs time to time.
If you have firewall on and can not turn it off to test it, I would suggest to review your DoS preventing rules (req/s etc), maybe they come to play when you don't expect it.
And yes, it's most probably network stack problem AFAIK.

I can confirm that the Errno::EBADF: Bad file descriptor message _can_ occur when a remote port is inaccessible or disconnects in-flight.

Same exception as in issue #2896 but in output ti influxdb

+1

I still have the problem, but I don't see it's a problem of the remote ports is not available. It just happen after couple of months running without problems, and there is not a network disconnection. I don't have firewall.

i have the same error, i have two instance of logstash one used as indexer and read from redis server and other is shipper and forward to redis
the logstash indexer which reads from the redis always has this error in the logs after some time

I had same message in the logs. It was due to too many CLOSE_WAIT connections on logstash port.
On cleanup of these connections, the error was gone.

Hope this helps someone.

my issue is resolved by making a cron job to restart the logstash instance that reads from redis-server each 6 hours and it works now perfectly

We have this problem whenever the upstream elasticsearch or load balancer for the logstash output is temporarily unavailable, from that point on logstash fails to process any logs and they go to abyss until the process is restarted.

{:timestamp=>"2016-01-07T07:41:54.590000+1100", :message=>"Got error to send bulk of actions: connect timed out", :level=>:error}
{:timestamp=>"2016-01-07T07:41:54.590000+1100", :message=>"Failed to flush outgoing items", :outgoing_count=>1, :exception=>"Manticore::ConnectTimeout", :backtrace=>["/opt/logstash/vendor/bundle/jruby/1.9/gems/manticore-0.4.4-java/lib/manticore/response.rb:35:in `initialize'", "org/jruby/RubyProc.java:281:in `call'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/manticore-0.4.4-java/lib/manticore/response.rb:70:in `call'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/manticore-0.4.4-java/lib/manticore/response.rb:245:in `call_once'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/manticore-0.4.4-java/lib/manticore/response.rb:148:in `code'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/elasticsearch-transport-1.0.15/lib/elasticsearch/transport/transport/http/manticore.rb:71:in `perform_request'", "org/jruby/RubyProc.java:281:in `call'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/elasticsearch-transport-1.0.15/lib/elasticsearch/transport/transport/base.rb:201:in `perform_request'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/elasticsearch-transport-1.0.15/lib/elasticsearch/transport/transport/http/manticore.rb:54:in `perform_request'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/elasticsearch-transport-1.0.15/lib/elasticsearch/transport/client.rb:125:in `perform_request'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/elasticsearch-api-1.0.15/lib/elasticsearch/api/actions/bulk.rb:87:in `bulk'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-1.1.0-java/lib/logstash/outputs/elasticsearch/protocol.rb:105:in `bulk'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-1.1.0-java/lib/logstash/outputs/elasticsearch.rb:548:in `submit'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-1.1.0-java/lib/logstash/outputs/elasticsearch.rb:547:in `submit'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-1.1.0-java/lib/logstash/outputs/elasticsearch.rb:572:in `flush'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-1.1.0-java/lib/logstash/outputs/elasticsearch.rb:571:in `flush'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/stud-0.0.21/lib/stud/buffer.rb:219:in `buffer_flush'", "org/jruby/RubyHash.java:1342:in `each'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/stud-0.0.21/lib/stud/buffer.rb:216:in `buffer_flush'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/stud-0.0.21/lib/stud/buffer.rb:193:in `buffer_flush'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/stud-0.0.21/lib/stud/buffer.rb:159:in `buffer_receive'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-1.1.0-java/lib/logstash/outputs/elasticsearch.rb:537:in `receive'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-1.5.6-java/lib/logstash/outputs/base.rb:88:in `handle'", "(eval):296:in `output_func'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-1.5.6-java/lib/logstash/pipeline.rb:244:in `outputworker'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-1.5.6-java/lib/logstash/pipeline.rb:166:in `start_outputs'"], :level=>:warn}

What version of logstash and what version of the logstash-output-elasticsearch plugin?

@sdembra How are you managing clean up of connections in CLOSE_WAIT? Please shed some light. I am on the same issue.

I just restart logstash process when there are too many CLOSE_WAIT
connections.

One more reason behind getting too many CLOSE_WAIT connections can be
elasticsearch
going out of memory.

In that case, you will have to increase memory for ES.

Another thing you can do, if you are using a multi node ES cluster, is to
set
appropriate parameters in elasticsearch.yml file:

*discovery.zen.minimum_master_nodes: (number of master-eligible nodes / 2)

  • 1*
    _discovery.zen.ping.timeout: 30s._

_Default ping timeout is 3s._

Hope this helps.

On Mon, Mar 7, 2016 at 4:54 PM, arivazhagan-jeganathan <
[email protected]> wrote:

@sdembra https://github.com/sdembra How are you managing clean up of
connections in CLOSE_WAIT? Please shed some light. I am on the same issue.

—
Reply to this email directly or view it on GitHub
https://github.com/elastic/logstash/issues/2241#issuecomment-193212472.

@sdembra Thank you for the info. We made some configuration changes in logstash output configuration for elasticsearch and it works good.
We added flush_size of 500 and idle_flush_time of 5 seconds. Earlier connections went to CLOSE_WAIT without these configuration. Now its fine.

So this isn't a bug _per se,_ but perhaps we could catch this exception and return a more descriptive error message in the log. This may be beneficially informational in cases where Logstash is at the mercy of the remote service.

Added the "logging improvements" label.

I found that this issue was the same as [https://github.com/logstash-plugins/logstash-output-elasticsearch/issues/127] resolved by proper output configuration described in [https://github.com/logstash-plugins/logstash-output-elasticsearch/issues/127#issuecomment-261374678]

I ask our admin to restart the server and upgrade RAM to 8G and issue was solved.
ES was started and logstash was started, while send bulk request, ES was killed by some reason, maybe OOM(out of memory).

Since this is primarily an issue in the Elasticsearch output plugin, this is being migrated to https://github.com/logstash-plugins/logstash-output-elasticsearch/issues/542

Was this page helpful?
0 / 5 - 0 ratings

Related issues

bertramn picture bertramn  Â·  3Comments

dedemorton picture dedemorton  Â·  3Comments

scheung38 picture scheung38  Â·  5Comments

jakelandis picture jakelandis  Â·  4Comments

packetrevolt picture packetrevolt  Â·  3Comments