Hi,
We updated our Kubernetes cluster to 1.7.2 recently (running via Rancher 1.6.6), and needed to do some adaptions for fluent-bit to discover the logs, and during that we also updated fluent-bit to version 0.12.0. This caused the following error in the fluent-bit logs:
[2017/09/01 07:46:34] [error] [filter_kube] upstream connection error
[2017/09/01 07:46:34] [error] [io_tls] flb_io_tls.c:287 X509 - Certificate verification failed, e.g. CRL, CA or signature check
This seems to be related to your updating mbedtls from 2.4.2 to 2.5.1 for release 0.12.0, but I can unfortunately not tell whether this is something which should render an error, or if it's something that is not a problem usually. I will also not rule out that it's Rancher related, and/or that the Rancher certificates are not correctly created when the cluster was initially set up. On different cluster (one provisioned from scratch using acs-engine on Azure), the issue did not occur.
The issue occurs when fluent-bit tries to access the Kubernetes API. Reverting fluent-bit to 0.11.17 resolves the issue, it starts to work again, and logs are yet again forwarded.
Any ideas? Is an option to ignore certificate validation errors an option?
Best regards,
Martin
It's related to filter_kubernetes where it requires valid certificates when connecting to the API server.
We just got a PR with an enhancement where it makes this validation optional, it's coming as part of 0.12.1 release.
FYI: 0.12.1 is already available:
http://fluentbit.io/announcements/v0.12.1/
to get rid of the TLS cert problem you can specify _tls.verify off_ in your Kubernetes filter.
Awesome. Will check Monday!
@DonMartin76 did 0.12.1 work for you?
Haven't gotten around to testing it yet :-( Will come back with info soon.
@edsiper Thanks for the fix, I tested it and it works, the error is no longer reproducible with 0.12.1 and tls.verify Off
2/3 confirmations so closing this issue as fixed. If you face any problem again please comment it out so we can reopen it.
[error] [io_tls] flb_io_tls.c:305 X509 - Certificate verification failed, e.g. CRL, CA or signature check
my versrion is 0.14.0
@edsiper I ran into the same issue with v1.0.5 after a bit of digging I figured out the default kubernetes config linked in the docs don't make sense.
Having the Kube_URL
https://${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT_HTTPS} will always fail tls verification (the ca cert isn't valid for the host-ip, only for the host-name).
Removing it or pointing it to the default Kube_URL https://kubernetes.default.svc.cluster.local:443 will fix it.