Fluent-bit: error sending data to elasticsearch (Broken pipe) and [src/flb_http_client.c:804 errno=25] Inappropriate ioctl for device

Created on 11 Nov 2019 · 12Comments · Source: fluent/fluent-bit

Bug Report

Describe the bug

I set up a fluent-bit to forward log of our k8s cluster to elasticsearch
I setup an http auth nginx proxy for elasticsearch and use that http basic auth to forward log to elasticsearch.
But occasionally fluent bit get this error and can not continue to push log to elasticsearch
I need to recreate-pods to make service start and work again.

Screenshots

Your Environment

Version used: 1.2-debug
Configuration:

    [OUTPUT]
        Name            es
        Match           *servicename*
        Host            elasticlog.example.com
        Port            80
        HTTP_User       user
        HTTP_Passwd     pass
        Logstash_Format On
        Retry_Limit     False
        Type            flb_type
        Time_Key        @timestamp
        Replace_Dots    On
        Logstash_Prefix servicename

Environment name and version (e.g. Kubernetes? What version?):
K8s version: 1.13.11-gke.9
Server type and version:
Operating System and version:
Filters and plugins:
Log:

E 2019-11-11T10:55:07.532231651Z [2019/11/11 10:55:07] [error] [io fd=44] error sending data to: elasticlog.example.com:80 (Broken pipe)

E 2019-11-11T10:55:07.532467781Z [2019/11/11 10:55:07] [error] [src/flb_http_client.c:804 errno=25] Inappropriate ioctl for device

E 2019-11-11T10:55:07.532489088Z [2019/11/11 10:55:07] [ warn] [out_es] http_do=-1 URI=/_bulk

How can i do s.th to make out service fluent-bit more stable?
Additional context

waiting-for-user

Source

thodquach

👍1

Most helpful comment

Hi @thodquach! Were you able to solve the problem? We are having a similar issue where we send 1M+ logs and are getting 413 response codes from ES and then Fluent Bit stops sending logs... I could not find how to instruct the ES output plugin to limit the size of the bulk requests or to apply some kind of rate limiting in order to make the indexing in ES less aggressive. Any help would be appreciated!

chinniehendrix on 17 Jul 2020

👍2

All 12 comments

Broken pipe means that the remote end-point has closed the TCP connection.

would you please use latest v1.3.2 version ?, I cannot match that specific error.

edsiper on 11 Nov 2019

@thodquach : Could you solve the problem ? I'm facing the same issue.
Using fluent-bit 1.3.3 as a DaemonSet.

[2019/11/29 12:44:51] [error] [io fd=50] error sending data to: elasticsearch-es-http:9200 (Broken pipe)
[2019/11/29 12:44:51] [error] [src/flb_http_client.c:844 errno=25] Inappropriate ioctl for device
[2019/11/29 12:44:51] [ warn] [out_es] http_do=-1 URI=/_bulk
[2019/11/29 12:46:28] [error] [src/flb_http_client.c:844 errno=32] Broken pipe
[2019/11/29 12:46:28] [ warn] [out_es] http_do=-1 URI=/_bulk
[2019/11/29 12:51:40] [error] [io fd=50] error sending data to: elasticsearch-es-http:9200 (Broken pipe)
[2019/11/29 12:51:40] [error] [src/flb_http_client.c:844 errno=25] Inappropriate ioctl for device
[2019/11/29 12:51:40] [ warn] [out_es] http_do=-1 URI=/_bulk

ledroide on 29 Nov 2019

Note that despite the error message, a _broken pipe_ means the remote connection has been closed, so on Fluent Bit side we trap the exception and issue a retry after a period of seconds.

Do you see any anomaly in your Elasticsearch logs ? or is this Elastic Cloud ?

edsiper on 29 Nov 2019

@ledroide I do not know exactly the reason why connection is broken, but when this issue happen, the forward log process to es will be stop, and i need to restart service to forward log again.

thodquach on 30 Nov 2019

would you please provide more details about the environment where that elasticsearch is running ?

note: I am trying to replicate that "ioctl error" without success, I can trap a broken pipe and see Fluent Bit succeeding after a retry when the service is up again.

edsiper on 30 Nov 2019

Hello @edsiper ,
My k8s cluster, kubectl version:

Client Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.11-dispatcher", GitCommit:"2e298c7e992f83f47af60cf4830b11c7370f6668", GitTreeState:"clean", BuildDate:"2019-09-19T22:26:40Z", GoVersion:"go1.11.13", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.11-gke.14", GitCommit:"56d89863d1033f9668ddd6e1c1aea81cd846ef88", GitTreeState:"clean", BuildDate:"2019-11-07T19:12:22Z", GoVersion:"go1.12.11b4", Compiler:"gc", Platform:"linux/amd64"}

Helm version:

Client: &version.Version{SemVer:"v2.13.1", GitCommit:"618447cbf203d147601b4b9bd7f8c37a5d39fbb4", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.13.1", GitCommit:"618447cbf203d147601b4b9bd7f8c37a5d39fbb4", GitTreeState:"clean"}

Elasticsearch version and helm chart:
https://github.com/elastic/helm-charts/tree/7.3.0/elasticsearch

And I have used fluentbit :

1.3-debug

Configuration of fluent-bit:

```fluent-bit-filter.conf:

[FILTER]
Name kubernetes
Match kube.var.log.containers.*
Kube_Tag_Prefix kube.var.log.containers.
Kube_URL https://kubernetes.default.svc:443
Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token
Merge_Log On
Merge_Log_Key message

Extra filters

[FILTER]
Name grep
Match *
Exclude log health_check

[FILTER]
Name nest
Match *
Operation lift
Nested_under kubernetes
Add_prefix kubernetes.

[FILTER]
Name modify
Match *
Remove kubernetes.annotations

fluent-bit-input.conf:

[INPUT]
Name tail
Path /var/log/containers/backend.log
Parser nginx
Tag kube.
Refresh_Interval 5
Mem_Buf_Limit 5MB
Skip_Long_Lines On

fluent-bit-output.conf:

[OUTPUT]
Name es
Match backend
Host elasticlog.example.com
Port 80
HTTP_User user
HTTP_Passwd password
Logstash_Format On
Retry_Limit False
Type flb_type
Time_Key @timestamp
Replace_Dots On
Logstash_Prefix backend

fluent-bit-service.conf:

[SERVICE]
Flush 1
Daemon Off
Log_Level debug
Parsers_File parsers.conf
Parsers_File parsers_custom.conf

fluent-bit.conf:

@INCLUDE fluent-bit-service.conf
@INCLUDE fluent-bit-input.conf
@INCLUDE fluent-bit-filter.conf
@INCLUDE fluent-bit-output.conf

And sometime, when i restart the fluentbit pods, i see the http status of fluent when try to connect to elasticsearch is HTTP=413, Does this problem is the reason why the pipe is broken?

Or sometime, i can see this log too

[debug] [input] tail.2 paused (mem buf overlimit)
```

I think these things may help you find out something :D
Tks for your support.

thodquach on 2 Dec 2019

Hi guys. I could solve the problem for my case, but I'm not sure you got the same case.
I had to force TLS and authentication settings for both fluent-bit and elasticsearch.
The main issue, at my point of view, is the error message that give wrong clues about the real problem. Inappropriate ioctl for device is confusing. The log should report an authentication or a certificate issue, not a device issue.

Here is my _spec_ section for fluent-bit daemonSet :

    spec:
      containers:
      - name: fluent-bit
        image: fluent/fluent-bit:1.3.3-debug
        ports:
          - containerPort: 2020
        env:
        - name: FLUENT_ELASTICSEARCH_HOST
          value: "elasticsearch-es-http"
        - name: FLUENT_ELASTICSEARCH_PORT
          value: "9200"
        - name: FLUENT_ELASTICSEARCH_SCHEME
          value: "https"
        - name: FLUENT_ELASTICSEARCH_USER
          value: elastic
        - name: FLUENT_ELASTICSEARCH_PASSWORD
          valueFrom:
            secretKeyRef:
              name: elasticsearch-es-elastic-user
              key: elastic

And the _output-elasticsearch.conf_ section of my configMap :

  output-elasticsearch.conf: |
    [OUTPUT]
        Name            es
        Match           *
        Host            ${FLUENT_ELASTICSEARCH_HOST}
        Port            ${FLUENT_ELASTICSEARCH_PORT}
        Logstash_Format On
        Replace_Dots    On
        Retry_Limit     False
        HTTP_User       ${FLUENT_ELASTICSEARCH_USER}
        HTTP_Passwd     ${FLUENT_ELASTICSEARCH_PASSWORD}
        # Trace_Output    On
        tls             On
        tls.verify      Off

Hope it helps

ledroide on 2 Dec 2019

👍2 👀1

Hello @ledroide , Tks for your help :D
But i already done the configuration you said, but the problem still happen.

thodquach on 2 Dec 2019

Hello all,
I think the problem related to Elasticsearch, because we send a lot of document in a time to Elasticsearch and the bulk can not handle them correctly (HTTP status 413), so the connection will be drop and fluent bit can not forward log to Elasticsearch.

And @edsiper,
How can i know how much logs I can forward to bulk once 1 time?
So i can decrease it to equal to Elasticsearch threshold, and hope this may help the problem does not happen again :D

thodquach on 3 Dec 2019

chinniehendrix on 17 Jul 2020

👍2

@chinniehendrix one trick that you can try is to limiting the memory usage by Fluent Bit input plugin, so you will slowdown a bit the data ingestion: