Describe the bug
fluent-bit is receiving a errors from ElasticSearch but it's not warning the user. All we see is "new retry created for task_id"
To Reproduce
I think you can easily reproduce this by adding an output to ElasticSearch that will feed a different "type" of entries. From ES 6.x on (multiple mapping types are not supported in indices created in 6.0)[https://www.elastic.co/guide/en/elasticsearch/reference/6.0/breaking-changes-6.0.html], as it was (https://www.elastic.co/blog/index-vs-type)[common practice]. As such, ES will reject the records with:
{"took":13,"errors":true,"items":[{"index":{"_index":"mylog","_type":"syslog","_id":"asdfADGNsdfn2344n","status":400,"error":{"type":"illegal_argument_exception","reason":"Rejecting mapping update to [mylog] as the final mapping would have more than 1 type: [syslog, docker]"}}}]}
Note that the error refers to a record of _type=syslog, whereas another _type=docker was already being used in the cluster.
However, fluent-bit only shows this to the user:
[0] syslog.udp: [1562582290.719936179, {"timefield"=>"Jul 8 10:36:17", "ident"=>....}]
[2019/07/08 10:38:10] [debug] [task] created task=0x7fd172a3ef00 id=0 OK
[2019/07/08 10:38:11] [debug] [out_es] HTTP Status=200 URI=/_bulk
[2019/07/08 10:38:11] [debug] [retry] new retry created for task_id=0 attemps=1
[2019/07/08 10:38:11] [debug] [sched] retry=0x7fd172a0b7c0 0 in 7 seconds
We can only know that there was a problem because we see a retry, but we have no idea what the problem was. On other situations, I've seen fluent-bit showing the ES message, I don't know why it doesn't happen in this case.
Note: the output above was collected with Log_Level trace!
Expected behavior
Print ES response in case of error. In the special case of (at least) Log_Level trace, I'd just print it anyway, so that the user knows what's going on! :)
Your Environment
me too .
> `[2019/12/16 17:53:40] [debug] [out_es] HTTP Status=400 URI=/_bulk
> [2019/12/16 17:53:40] [debug] [task] task_id=0 reached retry-attemps limit 2/1
> [2019/12/16 17:53:40] [ warn] [engine] Task cannot be retried: task_id=0 thread_id=2 output=es.0
> [2019/12/16 17:53:40] [debug] [task] destroy task=0x7f0b4ca48500 (task_id=0)`
You need to add the following lines to your "td-agent-bit.conf" file:
tls On
tls.verify Off
To force Fluentbit to use HTTPS.
any update ? i'm facing the same situation.
@ntavares did you find the raison of the issue ?
Hi there,
I am getting similar message when trying add forward OTUPUT on to elastic search in our VPC suing HTTP authetication and getting similar error. Any suggestions?
Feb 21 10:38:06 td-agent-bit: [4] cpu.local: [1582281486.001175326, {"cpu_p"=>11.000000, "user_p"=>8.500000, "system_p"=>2.500000, "cpu0.p_cpu"=>9.000000, "cpu0.p_user"=>7.
000000, "cpu0.p_system"=>2.000000, "cpu1.p_cpu"=>14.000000, "cpu1.p_user"=>11.000000, "cpu1.p_system"=>3.000000}]
Feb 21 10:38:06 td-agent-bit: [2020/02/21 10:38:06] [debug] [out_es] HTTP Status=400 URI=/_bulk
Feb 21 10:38:06 td-agent-bit: [2020/02/21 10:38:06] [debug] [retry] new retry created for task_id=2 attemps=1
Feb 21 10:38:06 td-agent-bit: [2020/02/21 10:38:06] [ warn] [engine] failed to flush chunk '9843-1582281482.1391136.flb', retry in 8 seconds: task_id=2, input=cpu.0 > outpu
t=es.1
Here is our tf-agent config file:
``
[INPUT]
Name cpu
Tag cpu.local
# Interval Sec
# ====
# Read interval (sec) Default: 1
Interval_Sec 1
[OUTPUT]
Name stdout
Match *
[OUTPUT]
Name es
Match *
Host vpcXXXXX.es.amazonaws.com
Port 443
index ec2-test-index
Logstash_Format On
Retry_limit 1
Type _doc
Replace_dots On
Logstash_Prefix tf-res-test-ec2
Time_Key @timestamp
HTTP_user es-access-user
HTTP_Passwd **
tls "on"
tls_verify on
tls.debug 1
``
Hello
In my case it was just edit the name of the record from "log" to "log_message" and it works fine, btw you can debug the error by generating the json by adding another output (file) and sending manually the json with a XPOST request to the ES server.
Thanks for the comment
For reference my above error for HTTP Status=400:
tls "on" was the culprit. es was rejecting the request because of this
changed it to remove quotes and worked.
tls on
Hi I can't remember what the problem was nor how I fixed it, but this issue was more about the lack of a descriptive message describing why it failed, not the particular (syntax?) problem that was causing that error.
@edsiper being bitten again by this (lack of verbosity)... can we have some input from you?
A bit more verbosity would be nice. Logging in to a file and check the error message from ES manually as ntdetect mentioned feels a bit dirty to my.
Most helpful comment
Hi I can't remember what the problem was nor how I fixed it, but this issue was more about the lack of a descriptive message describing why it failed, not the particular (syntax?) problem that was causing that error.