Describe the bug
I set up a fluent-bit to forward log of our k8s cluster to elasticsearch
I setup an http auth nginx proxy for elasticsearch and use that http basic auth to forward log to elasticsearch.
But occasionally fluent bit get this error and can not continue to push log to elasticsearch
I need to recreate-pods to make service start and work again.
Screenshots
Your Environment
[OUTPUT]
Name es
Match *servicename*
Host elasticlog.example.com
Port 80
HTTP_User user
HTTP_Passwd pass
Logstash_Format On
Retry_Limit False
Type flb_type
Time_Key @timestamp
Replace_Dots On
Logstash_Prefix servicename
1.13.11-gke.9E 2019-11-11T10:55:07.532231651Z [2019/11/11 10:55:07] [error] [io fd=44] error sending data to: elasticlog.example.com:80 (Broken pipe)
E 2019-11-11T10:55:07.532467781Z [2019/11/11 10:55:07] [error] [src/flb_http_client.c:804 errno=25] Inappropriate ioctl for device
E 2019-11-11T10:55:07.532489088Z [2019/11/11 10:55:07] [ warn] [out_es] http_do=-1 URI=/_bulk
How can i do s.th to make out service fluent-bit more stable?
Additional context
Broken pipe means that the remote end-point has closed the TCP connection.
would you please use latest v1.3.2 version ?, I cannot match that specific error.
@thodquach : Could you solve the problem ? I'm facing the same issue.
Using fluent-bit 1.3.3 as a DaemonSet.
[2019/11/29 12:44:51] [error] [io fd=50] error sending data to: elasticsearch-es-http:9200 (Broken pipe)
[2019/11/29 12:44:51] [error] [src/flb_http_client.c:844 errno=25] Inappropriate ioctl for device
[2019/11/29 12:44:51] [ warn] [out_es] http_do=-1 URI=/_bulk
[2019/11/29 12:46:28] [error] [src/flb_http_client.c:844 errno=32] Broken pipe
[2019/11/29 12:46:28] [ warn] [out_es] http_do=-1 URI=/_bulk
[2019/11/29 12:51:40] [error] [io fd=50] error sending data to: elasticsearch-es-http:9200 (Broken pipe)
[2019/11/29 12:51:40] [error] [src/flb_http_client.c:844 errno=25] Inappropriate ioctl for device
[2019/11/29 12:51:40] [ warn] [out_es] http_do=-1 URI=/_bulk
Note that despite the error message, a _broken pipe_ means the remote connection has been closed, so on Fluent Bit side we trap the exception and issue a retry after a period of seconds.
Do you see any anomaly in your Elasticsearch logs ? or is this Elastic Cloud ?
@ledroide I do not know exactly the reason why connection is broken, but when this issue happen, the forward log process to es will be stop, and i need to restart service to forward log again.
would you please provide more details about the environment where that elasticsearch is running ?
note: I am trying to replicate that "ioctl error" without success, I can trap a broken pipe and see Fluent Bit succeeding after a retry when the service is up again.
Hello @edsiper ,
My k8s cluster, kubectl version:
Client Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.11-dispatcher", GitCommit:"2e298c7e992f83f47af60cf4830b11c7370f6668", GitTreeState:"clean", BuildDate:"2019-09-19T22:26:40Z", GoVersion:"go1.11.13", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.11-gke.14", GitCommit:"56d89863d1033f9668ddd6e1c1aea81cd846ef88", GitTreeState:"clean", BuildDate:"2019-11-07T19:12:22Z", GoVersion:"go1.12.11b4", Compiler:"gc", Platform:"linux/amd64"}
Helm version:
Client: &version.Version{SemVer:"v2.13.1", GitCommit:"618447cbf203d147601b4b9bd7f8c37a5d39fbb4", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.13.1", GitCommit:"618447cbf203d147601b4b9bd7f8c37a5d39fbb4", GitTreeState:"clean"}
Elasticsearch version and helm chart:
https://github.com/elastic/helm-charts/tree/7.3.0/elasticsearch
And I have used fluentbit :
1.3-debug
Configuration of fluent-bit:
[FILTER]
Name kubernetes
Match kube.var.log.containers.*
Kube_Tag_Prefix kube.var.log.containers.
Kube_URL https://kubernetes.default.svc:443
Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token
Merge_Log On
Merge_Log_Key message
[FILTER]
Name grep
Match *
Exclude log health_check
[FILTER]
Name nest
Match *
Operation lift
Nested_under kubernetes
Add_prefix kubernetes.
[FILTER]
Name modify
Match *
Remove kubernetes.annotations
[INPUT]
Name tail
Path /var/log/containers/backend.log
Parser nginx
Tag kube.
Refresh_Interval 5
Mem_Buf_Limit 5MB
Skip_Long_Lines On
[OUTPUT]
Name es
Match backend
Host elasticlog.example.com
Port 80
HTTP_User user
HTTP_Passwd password
Logstash_Format On
Retry_Limit False
Type flb_type
Time_Key @timestamp
Replace_Dots On
Logstash_Prefix backend
[SERVICE]
Flush 1
Daemon Off
Log_Level debug
Parsers_File parsers.conf
Parsers_File parsers_custom.conf
@INCLUDE fluent-bit-service.conf
@INCLUDE fluent-bit-input.conf
@INCLUDE fluent-bit-filter.conf
@INCLUDE fluent-bit-output.conf
And sometime, when i restart the fluentbit pods, i see the http status of fluent when try to connect to elasticsearch is HTTP=413, Does this problem is the reason why the pipe is broken?
Or sometime, i can see this log too
[debug] [input] tail.2 paused (mem buf overlimit)
```
I think these things may help you find out something :D
Tks for your support.
Hi guys. I could solve the problem for my case, but I'm not sure you got the same case.
I had to force TLS and authentication settings for both fluent-bit and elasticsearch.
The main issue, at my point of view, is the error message that give wrong clues about the real problem. Inappropriate ioctl for device is confusing. The log should report an authentication or a certificate issue, not a device issue.
Here is my _spec_ section for fluent-bit daemonSet :
spec:
containers:
- name: fluent-bit
image: fluent/fluent-bit:1.3.3-debug
ports:
- containerPort: 2020
env:
- name: FLUENT_ELASTICSEARCH_HOST
value: "elasticsearch-es-http"
- name: FLUENT_ELASTICSEARCH_PORT
value: "9200"
- name: FLUENT_ELASTICSEARCH_SCHEME
value: "https"
- name: FLUENT_ELASTICSEARCH_USER
value: elastic
- name: FLUENT_ELASTICSEARCH_PASSWORD
valueFrom:
secretKeyRef:
name: elasticsearch-es-elastic-user
key: elastic
And the _output-elasticsearch.conf_ section of my configMap :
output-elasticsearch.conf: |
[OUTPUT]
Name es
Match *
Host ${FLUENT_ELASTICSEARCH_HOST}
Port ${FLUENT_ELASTICSEARCH_PORT}
Logstash_Format On
Replace_Dots On
Retry_Limit False
HTTP_User ${FLUENT_ELASTICSEARCH_USER}
HTTP_Passwd ${FLUENT_ELASTICSEARCH_PASSWORD}
# Trace_Output On
tls On
tls.verify Off
Hope it helps
Hello @ledroide , Tks for your help :D
But i already done the configuration you said, but the problem still happen.
Hello all,
I think the problem related to Elasticsearch, because we send a lot of document in a time to Elasticsearch and the bulk can not handle them correctly (HTTP status 413), so the connection will be drop and fluent bit can not forward log to Elasticsearch.
And @edsiper,
How can i know how much logs I can forward to bulk once 1 time?
So i can decrease it to equal to Elasticsearch threshold, and hope this may help the problem does not happen again :D
Hi @thodquach! Were you able to solve the problem? We are having a similar issue where we send 1M+ logs and are getting 413 response codes from ES and then Fluent Bit stops sending logs... I could not find how to instruct the ES output plugin to limit the size of the bulk requests or to apply some kind of rate limiting in order to make the indexing in ES less aggressive. Any help would be appreciated!
@chinniehendrix one trick that you can try is to limiting the memory usage by Fluent Bit input plugin, so you will slowdown a bit the data ingestion:
@thodquach pls confirm if you still can reproduce the issue with v1.5
Hello @edsiper , I dont meet this problem anymore for now but im still dont know how to fix this.
Most helpful comment
Hi @thodquach! Were you able to solve the problem? We are having a similar issue where we send 1M+ logs and are getting 413 response codes from ES and then Fluent Bit stops sending logs... I could not find how to instruct the ES output plugin to limit the size of the bulk requests or to apply some kind of rate limiting in order to make the indexing in ES less aggressive. Any help would be appreciated!