Describe the bug
I upgraded yesterday to 1.6.5 and fluentbit stopped to transfer logs, the transfer just stops with "tail.0 paused (mem buf overlimit)" and never resumes. Also a restart did not continue the read process.
I have downgraded to 1.6.4, there it works and the tail.0 process continues.
To Reproduce
Start fluentbit with a lot of outstanding logs in the tail files.
Expected behavior
Continue reading files
Screenshots
Your Environment
[SERVICE]
Flush 1
Daemon Off
Log_Level info
Parsers_File /fluent-bit/etc/parsers.conf
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_Port 2020
[INPUT]
Name tail
Path /var/log/containers/*.log
Parser docker
Tag kubernetes.*
Refresh_Interval 5
Mem_Buf_Limit 5MB
Skip_Long_Lines On
DB /tail-db/tail-containers-state.db
DB.Sync Normal
[FILTER]
Name kubernetes
Match kubernetes.*
Kube_Tag_Prefix kubernetes.var.log.containers.
Kube_URL https://kubernetes.default.svc:443
tls.debug 4
tls.verify Off
Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token
# Do not parse the inner json
Merge_Log Off
K8S-Logging.Parser On
K8S-Logging.Exclude On
[FILTER]
name lua
match *
script filters.lua
call transform
[OUTPUT]
Name forward
Match *
Host 10.x.x.x
Port 24224
Retry_Limit False
I also see some bad, unwanted behavior in 1.6.5.
We use filesystem-based storage, and when using storage.total_limit_size (with some high value, like 30G) in the output, the buffer keeps increasing, even though the backend/output is clearly reachable. Not the case with 1.6.4.
I can confirm that there is some changed behavior with 1.6.5; the container logs from the fluent-bit container are sent to the output, but no logs from other containers (tested with both stdout and with loki output plugin).
I also saw the "tail.0 paused (mem buf overlimit)" message once while testing, but could not reproduce it.
Configuration:
[SERVICE]
Parsers_File parsers.conf
Log_Level info
[INPUT]
Name tail
Tag kube.*
Path C:\\var\\log\\containers\\*.log
Parser docker
DB C:\\fluent-bit\\tail_docker.db
Mem_Buf_Limit 7MB
Refresh_Interval 10
[FILTER]
Name kubernetes
Match kube.*
Kube_URL https://kubernetes.default.svc.cluster.local:443
Labels off
Merge_Log on
[Output]
name loki
Match kube.*
host loki.loki.svc.cluster.local
port 3100
tenant_id ""
labels job=containerlogs
Environment:
troubleshooting..
Found the root cause of the problem and found the fix.
thinking about how to implement a test case to avoid this kind of issues
Could emitter.1 paused (mem buf overlimit) be connected? Or is it a general issue on the output
We've got the same issue with v1.6.5 outputting to FLuentd and Loki.
It looks like rolling back to v1.6.4 gets us back running but we can't use Loki. Is there a ballpark ETA for a fix so I can figure out if I need to get Promtail deployed again to feed Loki?
FYI: v1.6.6 will be out in a few hours.
Thank you @edsiper as always!!
Please upgrade to v1.6.6:
Most helpful comment
Found the root cause of the problem and found the fix.
thinking about how to implement a test case to avoid this kind of issues