Describe the bug
Configuration is working in 0.14.9 but after upgrade to 1.0.2 the kubernetes filter complains with the following error:
[2019/01/21 12:32:37] [ info] [filter_kube] https=1 host=10.233.0.1 port=443
[2019/01/21 12:32:37] [ info] [filter_kube] local POD info OK
[2019/01/21 12:32:37] [ info] [filter_kube] testing connectivity with API server...
[2019/01/21 12:32:37] [debug] [filter_kube] API Server (ns=kube-system, pod=fluent-bit-2bsvt) http_do=0, HTTP Status: 200
[2019/01/21 12:32:37] [ info] [filter_kube] API server connectivity OK
[2019/01/21 12:32:38] [debug] [filter_kube] API Server (ns=online-tst, pod=services.xx-prerender-cache-redis-ha-server-2) http_do=0, HTTP Status: 404
[2019/01/21 12:32:38] [debug] [filter_kube] API Server response
{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods \"services.xx-prerender-cache-redis-ha-server-2\" not found","reason":"NotFound","details":{"name":"services.xx-prerender-cache-redis-ha-server-2","kind":"pods"},"code":404}
Suddenly it can not find the pods anymore
Configuration:
Kubernetes 1.8.4
Docker version 17.05.0-ce, build 89658be
OS: Redhat Linux 4.17.11 x86_64 x86_64 x86_64 GNU/Linux
`
[PARSER]
Decode_Field_As escaped log
Format json
Name json
Time_Format %Y-%m-%dT%H:%M:%S.%L
Time_Key time
[PARSER]
Decode_Field_As escaped log
Format json
Name docker
Time_Format %Y-%m-%dT%H:%M:%S.%L
Time_Keep true
Time_Key time
[PARSER]
Format regex
Name mongo
Regex ^(?<time>[^ ]*)\s+(?<severity>\w)\s+(?<context>[^ ]+)\s+\[(?<connection>[^\]]+)]\s+(?<message>.*)$
Time_Format %Y-%m-%dT%H:%M:%S.%L
Time_Keep true
Time_Key time
[SERVICE]
Flush 1
Daemon Off
Log_Level debug
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_PORT 2020
Parsers_File parsers_custom.conf
[INPUT]
Buffer_Chunk_Size 1MB
Buffer_Max_Size 25MB
DB /var/log/containers/fluent-bit.db
Exclude_Path *kube-system*.log
Mem_Buf_Limit 25MB
Name tail
Parser docker
Path /var/log/containers/*.log
Refresh_Interval 5
Skip_Long_Lines On
Tag kube.services.*
[INPUT]
Buffer_Chunk_Size 1MB
Buffer_Max_Size 25MB
DB /var/log/containers/fluent-bit-nginx.db
Mem_Buf_Limit 25MB
Name tail
Parser docker
Path /var/log/containers/*nginx*.log
Refresh_Interval 5
Skip_Long_Lines On
Tag kube.ingress.*
[FILTER]
K8S-Logging.Parser On
K8S-Logging.exclude True
Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token
Kube_URL https://${KUBERNETES_SERVICE_HOST}:443
Match kube.*
Merge_Log On
Merge_Log_Key k8s
Name kubernetes
tls.verify On
[OUTPUT]
Host xxx.xxx.xxx.xxx
Include_Tag_Key true
Logstash_DateFormat %G.%V
Logstash_Format On
Logstash_Prefix k8s-services-tst
Match kube.services.*
Name es
Port 9200
Retry_Limit False
[OUTPUT]
Host xxx.xxx.xxx.xxx
Include_Tag_Key true
Logstash_DateFormat %G.%V
Logstash_Format On
Logstash_Prefix k8s-ingress-tst
Match kube.ingress.*
Name es
Port 9200
Retry_Limit False
`
Am I missing some changes in the kubernetes filter? Can't find anything. It already cost me a day of work.
Already tried:
Changing Kube_URL to IP address
Changing Kube_URL to DNS name
Removed port
Changed port to 6443
Changing back to 0.14.9 and it all works again...:-S
Is the namespace and pod in your error log correct?
I had the same issue but found out that this happens because of #814.
The tag is where the plugin gets the namespace and pod info and the change in regex is a cause of this.
https://github.com/fluent/fluent-bit/blob/v1.0.2/plugins/filter_kubernetes/kube_regex.h#L25
I haven't dig deep enough to see if this is by intention or a bug but you can get around this by adding your custom 'Regex_Parser' to parse your tag correctly.
I am also facing the same issue.
Info logs say that namespace is getting prepend as . instead of /
[2019/01/25 06:47:40] [debug] [in_tail] file=/var/log/containers/prod-interactions-consumer-6bf8f96c84-nr789_production_fluent-bit-f28d0640ffa2945125b3d8d6ed7c8b1fe59950f462b3f4781a2079eeb1e35a2c.log event
[2019/01/25 06:47:40] [debug] [filter_kube] API Server (ns=production, pod=production.prod-interactions-consumer-6bf8f96c84-nr789) http_do=0, HTTP Status: 404
[2019/01/25 06:47:40] [debug] [filter_kube] API Server response
{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods \"production.prod-interactions-consumer-6bf8f96c84-nr789\" not found","reason":"NotFound","details":{"name":"production.prod-interactions-consumer-6bf8f96c84-nr789","kind":"pods"},"code":404}
Fluentbit versoin: 1.0.2
@puremad Well....actually....no...I did not even noticed. Thank you...that must be the problem. That explains that it can't find anything because there is no "services" namespace.
@edsiper Is this a bug or a new feature? If its a new feature I have to create my own regex, but that takes some time to test...etc.
Tried to investigate the new regex, but failed to find a good regex tester. Regex101.com is probably not the right one.
Old Regex:
var\\.log\\.containers\\.(?<pod_name>[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace_name>[^_]+)_(?<container_name>.+)-(?<docker_id>[a-z0-9]{64})\\.log$
New Regex:
(?<tag>[^.]+)?\\.?(?<pod_name>[a-z0-9](?:[-a-z0-9]*[a-z0-9])?(?:\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace_name>[^_]+)_(?<container_name>.+)-(?<docker_id>[a-z0-9]{64})\\.log$
(Removed the escaping backslash during test)
Test String:
/var/log/containers/idp-54f445cbfd-9zgsf_online-tst_idp-73dbd7f11c1f0b367df1fd179963ed45c8b6e773a485f35e9fe5f409138ad317.log
Fluent-bit logs
[in_tail] file=/var/log/containers/idp-54f445cbfd-9zgsf_online-tst_idp-73dbd7f11c1f0b367df1fd179963ed45c8b6e773a485f35e9fe5f409138ad317.log read=747 lines=1
[filter_kube] API Server (ns=online-tst, pod=services.idp-54f445cbfd-9zgsf) http_do=0, HTTP Status: 404
[filter_kube] API Server response\n","stream":"stderr","time":"2019-01-29T09:22:23.022331859Z"}
{"log":"{\"kind\":\"Status\",\"apiVersion\":\"v1\",\"metadata\":{},\"status\":\"Failure\",\"message\":\"pods \\\"services.idp-54f445cbfd-9zgsf\\\" not found\",\"reason\":\"NotFound\",\"details\":{\"name\":\"services.idp-54f445cbfd-9zgsf\",\"kind\":\"pods\"},\"code\":404}
Sure I can replace the new regex by the old one and build a new version, but that takes so much time and I need to roll out on so many clusters.
@edsiper Can you please shine a light on this?
Also to note, if you use Tag_Regex in your INPUT and then have to use kube.<namespace>.<pod> or such in your Match of the kubernetes FILTER it won't work, skips over the logs. If you switch the INPUT back to remove the Tag_Regex and simply use Tag kube.* the kubernetes filter works.
I think the kubernetes filter is looking for the log tag to match that regex and it won't match anything else, even if specified w/ using <pattern>
@chiefy Sorry for responding so late. Not sure what you meant. I think my configuration looks fine, but you said something about Tag_Regex in my INPUT. My kubernetes filter is not skipping the records. Its just not using the right regex.
Probably i'm one of few having this issue.
Also to note, if you use
Tag_Regexin yourINPUTand then have to usekube.<namespace>.<pod>or such in yourMatchof the kubernetesFILTERit won't work, skips over the logs. If you switch theINPUTback to remove theTag_Regexand simply useTag kube.*the kubernetes filter works.I think the kubernetes filter is looking for the log tag to match that regex and it won't match anything else, even if specified w/ using
<pattern>
Yes , I think i hit this issue , how should we format the Match pattern (on the FILTER) for this to work ? Anything other than kube.* , looks like the filter plugin is skipping.
@mailtoraja18 you're correct, there seems to be a bug w/ the K8s filter where it only accepts a certain kube.* pattern.
I also have a problem with this:
[2019/03/01 10:56:33] [debug] [in_tail] file=/var/log/containers/coredns-7966c859dd-t4nm8_kube-system_coredns-cd673551bf0e11ccf0d184142c4aa9544f0d5f28b78c4889bed299baabb69be0.log read=2398 lines=10
[2019/03/01 10:56:33] [debug] [in_tail] file=/var/log/containers/coredns-7966c859dd-77dss_kube-system_coredns-7ffe22f9bb5558856a878c5bee658b07811d8813a972927f23e7a752b5489059.log event
[2019/03/01 10:56:33] [debug] [filter_kube] API Server (ns=kube-system, pod=kubernetes.coredns-7966c859dd-77dss) http_do=0, HTTP Status: 404
[2019/03/01 10:56:33] [debug] [filter_kube] API Server response
{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods \"kubernetes.coredns-7966c859dd-77dss\" not found","reason":"NotFound","details":{"name":"kubernetes.coredns-7966c859dd-77dss","kind":"pods"},"code":404}
In my case an extra 'kubernetes.' is prepended to the pod name, I think, for some reason. But it would not be needed:
# (Note: log updated. Initially I accidentally took the logs from a different system and therefore the pod names did not match the ones above. Now coredns-7966c859dd-77dss happens in both logs.)
$ kubectl get all --all-namespaces | grep coredns
kube-system deploy/coredns 2 2 2 2 2d
kube-system rs/coredns-7966c859dd 2 2 2 2d
kube-system po/coredns-7966c859dd-77dss 1/1 Running 0 2d
kube-system po/coredns-7966c859dd-t4nm8 1/1 Running 0 2d
kube-system svc/coredns ClusterIP 10.3.0.10 <none> 53/UDP,53/TCP 2d
@donbowman @edsiper you touched the above mentioned line
https://github.com/fluent/fluent-bit/blob/v1.0.2/plugins/filter_kubernetes/kube_regex.h#L25 could you please kindly comment on this issue?
As I read at https://github.com/fluent/fluent-bit-docs/blob/master/installation/kubernetes.md : "... a built-in filter plugin called kubernetes talks to the Kubernetes API Server to retrieve relevant information such as the pod_id, labels and annotations, other fields such as pod_name, container_id and container_name are retrieved locally from the log file names. All of this is handled automatically, no intervention is required from a configuration aspect."
So the pod name is supposed to come from the log file name? Hmm, but how is this 'kubernetes.' is prepended to the pod name?
The relevant parts from our configmap (was done by a colleague):
[INPUT]
Name tail
Tag edge.kubernetes.*
Exclude_Path /var/log/containers/fluent*.log
Path /var/log/containers/*.log
Parser docker
DB /var/log/flb_kube.db
Mem_Buf_Limit 10MB
Skip_Long_Lines On
Refresh_Interval 10
Buffer_Chunk_Size 1MB
Buffer_Max_Size 1MB
[FILTER]
Name kubernetes
Match edge.kubernetes.*
Kube_URL https://kubernetes.default.svc.cluster.local:443
Merge_JSON_Log On
[PARSER]
Name docker
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L
Time_Keep On
Decode_Field_As escaped log
Due the lack of resources I'm not able to spend time on this to fix it. So for now I'll stick to 0.14.9.
Update: my problem was that earlier a colleague changed the kube.* in the input plugin Tag (and also for filter Match) to edge.kubernetes.* for some reason (maybe he just wanted to use some descriptive/custom name). This worked previously to 1.0.0, but not with 1.0.x. I debugged this by adding some extra debug logging temporarily to flb_regex_do() (in src/flb_regex.c). After changing back this setting to kube.*, it seems to work fine with fluent-bit 1.0.4.
The main problem I see with this is that the regexp (based on the discussion above I guess the problem is something like this) does not seem to be configurable (based on https://github.com/fluent/fluent-bit-docs/blob/master/installation/kubernetes.md which tells users to use the following configmap: https://raw.githubusercontent.com/fluent/fluent-bit-kubernetes-logging/master/output/elasticsearch/fluent-bit-configmap.yaml), but users may change this tag.
This kind of gives a false sense of configurability of this Tag for the users.
@edsiper you seemed to have edited this configmap, what do you think?
@edsiper Could you please give some input on this? I confirm what @attila123 mentioned above. Changed the tag on the input plugin from kube.services* to kube.* and it works again. But this will have some consequences. I have multiple input plugins and multiple output plugins that are controlled by different tags.
FYI: I will take a look today.
thanks everyone for your feedback on this issue.
I've pushed some improvements to GIT master that will be reflected on 1.1 release that aims to address the main issues. Please refer to the following relevant commits and it explanation:
Note that from now (1.1) having different tail sections with expanded tag prefixes that need filter kubernetes, will require different kubernetes filters with defined prefixes (to be documented shortly)
FYI:
I've updated our dev docs for 1.1 (not published yet) but you can see the new explanation of the Workflow here:
https://github.com/fluent/fluent-bit-docs/blob/master/filter/kubernetes.md#workflow-of-tail--kubernetes-filter
@edsiper Thank you very much! That explains a lot. This changes you made are great. Gives us back the flexibility. Any idea when version 1.1 will be released?
Just FYI, I can't confirm this but I believe we are experiencing similar issues after going from 1.0.4 -> 1.1.2 was this a breaking change @edsiper ?
Also this is weird but the pod name is getting truncated in the logs (this happens for other pods as well):
[2019/06/04 12:00:14] [debug] [in_tail] file=/var/log/containers/fluent-bit-6mh28_halo-system_fluent-bit-d6fe25e85927ddd3365ba71eea5c16bcbfd885d48a00fb1dfe993622bae80706.log read=1214 lines=6
[2019/06/04 12:00:14] [debug] [in_tail] file=/var/log/containers/fluent-bit-6mh28_halo-system_fluent-bit-d6fe25e85927ddd3365ba71eea5c16bcbfd885d48a00fb1dfe993622bae80706.log event
[2019/06/04 12:00:14] [debug] [filter_kube] API Server (ns=halo-system, pod=luent-bit-6mh28) http_do=0, HTTP Status: 404
[2019/06/04 12:00:14] [debug] [filter_kube] API Server response
{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods \"luent-bit-6mh28\" not found","reason":"NotFound","details":{"name":"luent-bit-6mh28","kind":"pods"},"code":404}
Upgrading notes:
https://docs.fluentbit.io/manual/installation/upgrade_notes
If that don't cover the issue you are facing let me know
@edsiper I figured it out - we were using k8s.* as a tag, so the new default didn't work.
closing as fixed.
Most helpful comment
FYI:
I've updated our dev docs for 1.1 (not published yet) but you can see the new explanation of the Workflow here:
https://github.com/fluent/fluent-bit-docs/blob/master/filter/kubernetes.md#workflow-of-tail--kubernetes-filter