Describe the bug
After upgrading, got lot of error message about upstream connection error.
To Reproduce
[2020/12/03 10:17:36] [error] [filter:kubernetes:kubernetes.0] upstream connection error
[2020/12/03 10:17:36] [error] [io] connection #49 failed to: kubernetes.default.svc:443
[2020/12/03 10:17:36] [error] [filter:kubernetes:kubernetes.0] upstream connection error
[2020/12/03 10:17:36] [error] [io] connection #49 failed to: kubernetes.default.svc:443
[2020/12/03 10:17:36] [error] [filter:kubernetes:kubernetes.0] upstream connection error
[2020/12/03 10:17:36] [error] [io] connection #49 failed to: kubernetes.default.svc:443
[2020/12/03 10:17:36] [error] [filter:kubernetes:kubernetes.0] upstream connection error
[2020/12/03 10:17:36] [error] [io] connection #49 failed to: kubernetes.default.svc:443
[2020/12/03 10:17:36] [error] [filter:kubernetes:kubernetes.0] upstream connection error
[2020/12/03 10:17:36] [error] [io] connection #49 failed to: kubernetes.default.svc:443
[2020/12/03 10:17:36] [error] [filter:kubernetes:kubernetes.0] upstream connection error
[2020/12/03 10:17:36] [error] [io] connection #49 failed to: kubernetes.default.svc:443
[2020/12/03 10:17:36] [error] [filter:kubernetes:kubernetes.0] upstream connection error
Expected behavior
No error
Your Environment
[FILTER]
Name kubernetes
Match kube.*
Kube_Tag_Prefix kube.var.log.containers.
Kube_URL https://kubernetes.default.svc:443
Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token
Merge_Log On
K8S-Logging.Parser On
K8S-Logging.Exclude On
[INPUT]
Name tail
Path /var/log/containers/*.log
Parser docker
Tag kube.*
Refresh_Interval 5
Mem_Buf_Limit 6MB
Skip_Long_Lines On
read_from_head on
DB /tail-db/tail-containers-state.db
DB.Sync Off
DB.locking true
[OUTPUT]
Name es
Match *
Host XXXX
Port 443
Logstash_Format On
Retry_Limit 5
Type _doc
Trace_Error true
Time_Key @timestamp-flb
Replace_Dots On
HTTP_User XXX
HTTP_Passwd XXXX
tls on
tls.verify on
I can confirm I am also seeing this issue on AWS EKS after upgrading to the version released a few hours ago. The URL is correct.
Environment name and version (e.g. Kubernetes? What version?): 1.17
Server type and version: Amazon AMI Linux 2
Operating System and version:
Filters and plugins: input tail, filter kubernetes, output http
Problem does not exist in 1.6.6.
Problem presents in 1.6.7.
[FILTER]
Name record_modifier
Match *
Record cluster_name ${CLUSTER_NAME}
[FILTER]
Name kubernetes
Match kube.*
Kube_URL https://kubernetes.default.svc.cluster.local:443
Merge_Log On
Merge_Log_Trim On
Keep_Log Off
K8S-Logging.Parser On
K8S-Logging.Exclude Off
Labels On
Annotations On
Buffer_Size 1m
[FILTER]
Name lua
Match kube.*
script /fluent-bit/etc/dedot.lua
call dedot
[FILTER]
Name modify
Match kube.*
Condition Key_exists kubernetes.labels.app
Rename kubernetes.labels.app kubernetes.labels.app_name
EKS 1.18 can confirm. Release 1.6.7 is bugged.
Taking a look at this.
troubleshooting:
root cause of the problem;
errno properly for connections in progressFixes:
v1.6.8 is under release process.
Container images for v1.6.8 are already available, tags:
fluent/fluent-bit:1.6.8
fluent/fluent-bit:latest
fluent/fluent-bit:1.6
fluent/fluent-bit:1.6.8-debug
fluent/fluent-bit:1.6-debug
seeing a larger number of http healthcheck failures to / after updating to 1.6.8, could there be a socket leak from this?
ya seeing this shutdown intermittently with the following strace
[pid 20022] close(207) = 0
[pid 20022] epoll_ctl(8, EPOLL_CTL_DEL, 203, NULL) = 0
[pid 20022] write(203, "\25\3\3\0\32\0\0\0\0\0\0/\346\214P\205{\263\263\35\245\210s\236\334\5\333K[\2\7", 31) = 31
[pid 20022] close(203) = 0
[pid 20022] close(165) = 0
[pid 20022] close(166) = 0
[pid 20022] write(0, "\335\335\335\335\0\0\0\0", 8) = -1 EBADF (Bad file descriptor)
[pid 20022] write(2, "write: Bad file descriptor\n", 27) = 27
[pid 20022] close(181) = 0
[pid 20022] close(182) = 0
[pid 20022] write(11, "\1\0\0\0\0\0\0\0", 8 <unfinished ...>
[pid 20023] <... epoll_wait resumed> [{EPOLLIN, {u32=3070406656, u64=140006119354368}}], 16, -1) = 1
[pid 20022] <... write resumed> ) = 8
[pid 20023] read(10, <unfinished ...>
[pid 20022] futex(0x7f55b6fff9d0, FUTEX_WAIT, 9, NULL <unfinished ...>
[pid 20023] <... read resumed> "\1\0\0\0\0\0\0\0", 8) = 8
[pid 20023] madvise(0x7f55b67ff000, 8335360, MADV_DONTNEED) = 0
[pid 20023] exit(0) = ?
[pid 20023] +++ exited with 0 +++
[pid 20022] <... futex resumed> ) = 0
[pid 20022] close(9) = 0
[pid 20022] close(10) = 0
[pid 20022] close(11) = 0
[pid 20022] close(18) = 0
[pid 20022] close(3) = 0
[pid 20022] close(4) = 0
[pid 20022] close(18) = -1 EBADF (Bad file descriptor)
[pid 20022] close(19) = 0
[pid 20022] close(6) = 0
[pid 20022] close(7) = 0
[pid 20022] close(169) = 0
[pid 20022] close(170) = 0
[pid 20022] close(171) = 0
[pid 20022] epoll_ctl(8, EPOLL_CTL_DEL, 167, NULL) = 0
[pid 20022] close(167) = 0
[pid 20022] close(184) = 0
[pid 20022] close(8) = 0
[pid 20022] madvise(0x7f55b8232000, 24576, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b82b6000, 131072, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f5594b56000, 2101248, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55893d9000, 5251072, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55886d4000, 4988928, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55880c4000, 5668864, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b6511000, 4096, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55944f8000, 32768, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b650d000, 4096, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b648d000, 4096, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b650f000, 4096, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55945eb000, 331776, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b5ae2000, 36864, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b5911000, 8192, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b58ec000, 8192, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b5a1f000, 20480, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b5947000, 16384, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b5ab3000, 86016, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b58f4000, 24576, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b6495000, 24576, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b5c1d000, 24576, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b5a14000, 24576, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b58b2000, 4096, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f5594d8e000, 24576, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b5930000, 24576, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f5594e94000, 24576, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f5594250000, 49152, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f5594aa1000, 49152, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55942a2000, 122880, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f5594dd7000, 73728, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55947f7000, 73728, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b5aca000, 24576, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b5971000, 4096, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b58e3000, 4096, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f5594e83000, 49152, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f5594768000, 24576, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b5b05000, 4096, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55943c2000, 24576, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f5594a90000, 24576, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f5594e2a000, 98304, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f5594ac0000, 24576, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b5a8c000, 110592, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55943cf000, 49152, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b5982000, 4096, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b5908000, 4096, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f5594b35000, 73728, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f5594b25000, 24576, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f5594689000, 24576, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f559466a000, 24576, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f5594a40000, 24576, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f5594582000, 24576, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b59cb000, 4096, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b5a66000, 106496, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b5a85000, 20480, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55944bd000, 20480, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b59bd000, 20480, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b5927000, 28672, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f559464c000, 98304, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f559419b000, 200704, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b5940000, 4096, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b5920000, 4096, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b597a000, 4096, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b596f000, 4096, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b5900000, 4096, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b58db000, 4096, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b5afd000, 4096, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b5a02000, 61440, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b6487000, 4096, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b6481000, 4096, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b5b0b000, 4096, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b5942000, 8192, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b5c0c000, 20480, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b59e0000, 106496, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55944e0000, 94208, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f559491f000, 147456, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f5594865000, 24576, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f559473a000, 24576, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55944af000, 24576, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b5954000, 24576, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f559452e000, 24576, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55949e0000, 98304, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f5594408000, 540672, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b5a25000, 32768, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55944d4000, 32768, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f5594558000, 151552, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b5a3d000, 32768, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b59d0000, 32768, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55944c5000, 53248, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b5ad3000, 57344, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f559453c000, 94208, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f5594507000, 90112, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b595d000, 36864, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b5a49000, 61440, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b598e000, 36864, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f559458b000, 212992, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b649f000, 446464, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55945cb000, 81920, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b63e6000, 12288, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b63ea000, 593920, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b5b1d000, 593920, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b58a9000, 8192, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b585a000, 8192, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b597e000, 8192, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b586d000, 12288, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b587d000, 8192, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b5aee000, 36864, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b59a5000, 49152, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b5915000, 40960, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b7047000, 8192, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b587a000, 8192, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b5876000, 8192, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b7052000, 32768, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b5871000, 12288, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b702e000, 40960, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f5594497000, 86016, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b585d000, 61440, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b6516000, 16384, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b582d000, 24576, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b61c0000, 753664, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b627e000, 1470464, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b5b11000, 40960, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b5834000, 143360, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b5894000, 69632, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b58b9000, 110592, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b5882000, 61440, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b7001000, 36864, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b6521000, 40960, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b5baf000, 335872, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b700d000, 122880, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b55ff000, 2281472, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b705f000, 1708032, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b5c25000, 5873664, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b652f000, 2949120, MADV_DONTNEED) = 0
[pid 20022] madvise(0x7f55b75ff000, 8335360, MADV_DONTNEED) = 0
[pid 20022] exit(0) = ?
[pid 20022] +++ exited with 0 +++
<... futex resumed> ) = 0
epoll_ctl(5, EPOLL_CTL_DEL, 6, NULL) = -1 EBADF (Bad file descriptor)
close(5) = 0
exit_group(0) = ?
+++ exited with 0 +++
@edsiper ^ this is causing some large stability issues
is this hitting this code block?
https://github.com/fluent/fluent-bit/blob/399953ae372f119b8a1eff1c0a96ad3a317ce2ee/lib/monkey/mk_core/deps/libevent/epoll.c#L336
going to make a separate ticket for this
https://github.com/fluent/fluent-bit/issues/2830 created for the 1.6.8 issue describe above
We were running into comparable issues with the aws-cloudwatch plugin which gave us the following error:
Dec 10 08:11:11 ipc1 td-agent-bit[483]: [2020/12/10 08:11:11] [error] [io] connection #69 failed to: logs.eu-central-1.amazonaws.com:443
Hope it helps.
Thank you!