Describe the bug
Updating image tag to 1.3.0 using the latest stable helm chart now throws a TLS error on startup
To Reproduce
Install fluent-bit using version 1.2.2 on k8s using latest stable helm chart
Verify everything works as expected using kubernetes filter
Update image tag to 1.3.0 (also taking into account the fix listed in https://github.com/fluent/fluent-bit/issues/1608)
See that the fluent-bit pods now throw a TLS error
[2019/10/03 15:04:31] [error] [io_tls] flb_io_tls.c:165 X509 - Read/write of file failed
[2019/10/03 15:04:31] [error] [TLS] error reading certificates from /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Expected behavior
Updating to 1.3.0 with same configs as used in 1.2.2 would still work
Screenshots
Your Environment
Verified that the ca.crt and token both exist at /var/run/secrets/kubernetes.io/serviceaccount and are readable. Since the helm chart creates these as a configmap based on a secret, they are symlinks however.
drwxr-xr-x 2 root root 100 Oct 3 16:00 ..2019_10_03_16_00_13.071035063
lrwxrwxrwx 1 root root 31 Oct 3 16:00 ..data -> ..2019_10_03_16_00_13.071035063
lrwxrwxrwx 1 root root 13 Oct 3 16:00 ca.crt -> ..data/ca.crt
lrwxrwxrwx 1 root root 16 Oct 3 16:00 namespace -> ..data/namespace
lrwxrwxrwx 1 root root 12 Oct 3 16:00 token -> ..data/token
Replacing just the 1.3.0 flb_io_tls.c with the 1.2 branch of flb_io_tls.c and rebuilding docker image fixes the issue. A diff of the 1.2.2 vs 1.3.0
121a122
> char *vhost,
135a137
> ctx->vhost = vhost;
308a311
> mbedtls_ssl_close_notify(&session->ssl);
333c336,339
< mbedtls_ssl_set_hostname(&session->ssl,u->tcp_host);
---
> if (!u->tls->context->vhost) {
> u->tls->context->vhost = u->tcp_host;
> }
> mbedtls_ssl_set_hostname(&session->ssl, u->tls->context->vhost);
It appears https://github.com/fluent/fluent-bit/pull/1313 somehow broke the TLS connection for the kubernetes filter
thanks for pointing out the issue. I am working in the fix now.
I found the root cause of the issue, and surprisedly is not #1313, actually is a bad prototype in the TLS context creation that can lead to uncertain behaviors at runtime.
would you please validate if this image is working properly in your environment?
edsiper/flb-tls-fix:3
edsiper/flb-tls-fix:3
seems ok for now:
```kubectl logs -f fluent-bit-cwd44 -n logging [f7bf60c]
Fluent Bit v1.3.1
Copyright (C) Treasure Data
[2019/10/04 09:57:51] [ info] [storage] initializing...
[2019/10/04 09:57:51] [ info] [storage] in-memory
[2019/10/04 09:57:51] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2019/10/04 09:57:51] [ info] [engine] started (pid=1)
[2019/10/04 09:57:51] [ info] [in_systemd] seek_cursor=s=0fdc9ccbd5794c1b8297787919010f03;i=89a... OK
[2019/10/04 09:57:51] [ info] [filter_kube] https=1 host=kubernetes.default.svc port=443
[2019/10/04 09:57:51] [ info] [filter_kube] local POD info OK
[2019/10/04 09:57:51] [ info] [filter_kube] testing connectivity with API server...
[2019/10/04 09:57:51] [ info] [filter_kube] API server connectivity OK
[2019/10/04 09:57:51] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020
[2019/10/04 09:57:51] [ info] [sp] stream processor started
logs looks ok with k8s metadata
thanks for the update.
@gamer22026 can you re-confirm in your end ?
This fix is working for me as well.
thanks!, I am working in the new release.
All good. Thanks for the quick turnaround.
New tags are already available:
official release notes will be out during the day
The official release is out:
https://fluentbit.io/announcements/v1.3.1/
thanks everyone for your help!
Most helpful comment
I found the root cause of the issue, and surprisedly is not #1313, actually is a bad prototype in the TLS context creation that can lead to uncertain behaviors at runtime.
would you please validate if this image is working properly in your environment?