Beats: Infinite loop while watching for Kubernetes events

Created on 7 Mar 2018 · 29Comments · Source: elastic/beats

Both when using add_kubernetes_metadata or kubernetes autodiscover provider, some users have reported this issue (which causes an infinite loop):

2018-03-05T08:16:36.021Z        ERROR   kubernetes/watcher.go:188       kubernetes: Watching API error EOF
2018-03-05T08:16:36.021Z        INFO    kubernetes/watcher.go:173       kubernetes: Watching API for resource events
2018-03-05T08:16:36.030Z        ERROR   kubernetes/watcher.go:188       kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-03-05T08:16:36.030Z        INFO    kubernetes/watcher.go:173       kubernetes: Watching API for resource events
2018-03-05T08:16:36.031Z        ERROR   kubernetes/watcher.go:188       kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-03-05T08:16:36.031Z        INFO    kubernetes/watcher.go:173       kubernetes: Watching API for resource events
2018-03-05T08:16:36.033Z        ERROR   kubernetes/watcher.go:188       kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName

More details can be found here: https://github.com/elastic/beats/pull/6353

bug libbeat

Source

exekias

👍1

Most helpful comment

This error is reproducible with filebeat 6.8.9:

{"level":"error","timestamp":"2020-06-19T11:37:51.250Z","caller":"kubernetes/watcher.go:258","message":"kubernetes: Watching API error EOF"}
{"level":"error","timestamp":"2020-06-19T11:37:51.251Z","caller":"kubernetes/watcher.go:258","message":"kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName"}
{"level":"error","timestamp":"2020-06-19T12:24:33.883Z","caller":"kubernetes/watcher.go:258","message":"kubernetes: Watching API error EOF"}

We're missing some containers logs when this error occurs.

rbailly-talend on 19 Jun 2020

👍5

All 29 comments

I've pushed a branch ignoring unmarshall errors: https://github.com/exekias/beats/tree/ignore-k8s-errors

Also built an image from it: exekias/filebeat:ignore-k8s-errors, @simioa it would be awesome if you could do another test, your help here is much appreciated

exekias on 7 Mar 2018

@exekias Looks good, no problems till now.

simioa on 9 Mar 2018

Good to hear, @simioa, I'm wondering, did you see the ERROR happen?

The code doesn't really avoid that message but should recover from it in all cases:
kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName

exekias on 9 Mar 2018

Not that ERROR Message but this:

2018-03-08T15:33:50.114Z        ERROR   kubernetes/watcher.go:198       kubernetes: Watching API error EOF

I just looked into an Filebeat 6.2.2 Log and saw that right after the kubernetes: Watching API error EOF ERROR appeared, the infinite loop started.

This is not the case anymore, the only thing logged after the EOF Error related to the Watching API is:

2018-03-08T15:33:50.114Z        INFO    kubernetes/watcher.go:181       kubernetes: Watching API for resource events

simioa on 9 Mar 2018

Ok, please keep an eye on it and report if it happens :innocent:

In the meanwhile, it seems we now have a way to reproduce it, I will be doing some testing myself

Thank you for your effort! It's really helping with this issue

exekias on 9 Mar 2018

A fix has been merged and should go out in both 6.3.0 and 6.2.3

exekias on 12 Mar 2018

what's the ETA for new container 6.2.3 with the fix merged ?

primeroz on 13 Mar 2018

It should be out soon, in any case, for convenience, I've pushed a snapshot build with the fix: exekias/filebeat:6.2.3-snapshot

exekias on 13 Mar 2018

👍3

An update on this, new images for 6.2.3 are already available: https://www.docker.elastic.co/

exekias on 20 Mar 2018

👍1

@exekias this error also appears in 6.4.0, kubernetes 1.6.4 , any workaround for this?

2018-08-19T01:53:44.655Z    ERROR   kubernetes/watcher.go:254   kubernetes: Watching API error EOF
2018-08-19T01:53:44.655Z    INFO    kubernetes/watcher.go:238   kubernetes: Watching API for resource events
2018-08-19T01:53:53.051Z    ERROR   kubernetes/watcher.go:254   kubernetes: Watching API error EOF
2018-08-19T01:53:53.051Z    INFO    kubernetes/watcher.go:238   kubernetes: Watching API for resource events
2018-08-19T01:53:53.293Z    ERROR   kubernetes/watcher.go:254   kubernetes: Watching API error EOF
2018-08-19T01:53:53.294Z    INFO    kubernetes/watcher.go:238   kubernetes: Watching API for resource events
2018-08-19T01:53:54.065Z    ERROR   kubernetes/watcher.go:254   kubernetes: Watching API error EOF
2018-08-19T01:53:54.065Z    INFO    kubernetes/watcher.go:238   kubernetes: Watching API for resource events
2018-08-19T01:54:05.405Z    ERROR   kubernetes/watcher.go:254   kubernetes: Watching API error EOF
2018-08-19T01:54:05.405Z    INFO    kubernetes/watcher.go:238   kubernetes: Watching API for resource events
2018-08-19T01:54:07.414Z    ERROR   kubernetes/watcher.go:254   kubernetes: Watching API error EOF
2018-08-19T01:54:07.414Z    INFO    kubernetes/watcher.go:238   kubernetes: Watching API for resource events

yangyuw on 19 Aug 2018

Still appears in 6.3.2:

2018-08-25T14:50:05.364Z        INFO    [beat]  instance/beat.go:725    Build info      {"system_info": {"build": {"commit": "45a9a9e1561b6c540e94211ebe03d18abcacae55", "libbeat": "6.3.2", "time": "2018-07-20T04:18:19.000Z", "version": "6.3.2"}}}
...
2018-08-25T18:40:57.180Z        ERROR   kubernetes/watcher.go:195       kubernetes: Watching API error EOF
2018-08-25T18:40:57.180Z        INFO    kubernetes/watcher.go:179       kubernetes: Watching API for resource events

hobti01 on 26 Aug 2018

@hobti01 Do you have memory issue with filebeat process? With the error appear more and more frequently, the memory useage may grow to 5GB and more.In my cluster, this error appears after filebeat restart about 1 hour.
I found the root cause maybe this line, w.r return empty string, which is abnormal.

https://github.com/elastic/beats/blob/0b32c2d7604145eabb7b5652dc8de7bcccf64baf/vendor/github.com/ericchiang/k8s/watch.go#L100

yangyuw on 26 Aug 2018

@yu-yang2 we do not have memory issues. We have noticed that sometimes log shipping will stop for an application container when we update filebeat and is not restarted.

This is the only ERROR in our logs, but this seems to be just a warning about the Kubernetes watch which is automatically restarted.

hobti01 on 27 Aug 2018

👍1

Same here filebeat version 6.3.2
2018-09-05T04:47:25.540Z ERROR kubernetes/watcher.go:195 kubernetes: Watching API error EOF 2018-09-05T04:47:25.541Z INFO kubernetes/watcher.go:179 kubernetes: Watching API for resource events

agoloAbbas on 5 Sep 2018

Same error in filebeat v6.4.2
Installed from https://github.com/helm/charts/tree/master/stable/filebeat

2018-11-09T15:03:07.979+0600    ERROR   kubernetes/watcher.go:254   kubernetes: Watching API error EOF

mslga on 9 Nov 2018

👍1

I also got this error in v6.4.3.
The pod with this error stops collecting kubernetes' metadata. When I restart the pod, the metadata collection resumes, but appears again after a few hours.

2018-11-13T12:23:44.563Z    ERROR   kubernetes/watcher.go:254   kubernetes: Watching API error EOF
2018-11-13T12:23:44.563Z    INFO    kubernetes/watcher.go:238   kubernetes: Watching API for resource events

thspinto on 13 Nov 2018

👍1

I also have the same errors with v6.4.3 my filebeat memory usage grows until it's killed by kubernetes at about 1gb of usage per pod.

bwunderlich824 on 13 Nov 2018

same error for v6.4.3:
2018-11-13T21:11:43.472+0800 ERROR kubernetes/watcher.go:254 kubernetes: Watching API error EOF
2018-11-13T21:11:43.472+0800 INFO kubernetes/watcher.go:238 kubernetes: Watching API for resource events
2018-11-13T21:11:43.474+0800 ERROR kubernetes/watcher.go:254 kubernetes: Watching API error proto: illegal wireType 6
2018-11-13T21:11:43.474+0800 INFO kubernetes/watcher.go:258 kubernetes: Ignoring event, moving to most recent resource version
2018-11-13T21:11:43.474+0800 INFO kubernetes/watcher.go:238 kubernetes: Watching API for resource events

opsnull on 14 Nov 2018

👍3

Can we have this re-opened since the issues seems to still be occurring?

bwunderlich824 on 14 Nov 2018

@opsnull Do you also see filebeat consuming large amounts of memory?

bwunderlich824 on 14 Nov 2018

I have opened a new ticket here
https://github.com/elastic/beats/issues/9078

bwunderlich824 on 14 Nov 2018

6.6.1也有这个错误

WuChenLinux on 16 Mar 2019

👍2

Have this issue in filebeat-oss:6.8.0

2019-06-18T13:04:47.479Z    ERROR   kubernetes/watcher.go:258   kubernetes: Watching API error EOF
2019-06-18T13:04:47.479Z    INFO    kubernetes/watcher.go:242   kubernetes: Watching API for resource events
2019-06-18T13:05:05.263Z    INFO    [monitoring]    log/log.go:144  Non-zero metrics in the last 30s    {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":470,"time":{"ms":2}},"total":{"ticks":2840,"time":{"ms":18},"value":2840},"user":{"ticks":2370,"time":{"ms":16}}},"handles":{"limit":{"hard":65536,"soft":65536},"open":17},"info":{"ephemeral_id":"b4b54edd-c52c-4b5a-a29b-512f3f10c76e","uptime":{"ms":3360067}},"memstats":{"gc_next":52123232,"memory_alloc":34918008,"memory_total":303615072}},"filebeat":{"harvester":{"open_files":8,"running":8}},"libbeat":{"config":{"module":{"running":0}},"output":{"events":{"acked":900,"active":-573,"batches":1,"total":327},"read":{"bytes":42},"write":{"bytes":33899}},"pipeline":{"clients":2,"events":{"active":4118,"retry":327}}},"registrar":{"states":{"current":35}},"system":{"load":{"1":0.09,"15":0.1,"5":0.12,"norm":{"1":0.045,"15":0.05,"5":0.06}}}}}}

Jasstkn on 18 Jun 2019

👍1

Have the same issue in 6.8.1

illia-sh on 23 Jul 2019

I'm also seeing this on 6.8.1

Repeated output of 2019-09-04T01:17:29.852Z ERROR kubernetes/watcher.go:258 kubernetes: Watching API error EOF coupled with dropped logs.

Some pods on the machine do not have missed logs, while others do. We're seeing this issue in development and production clusters.

justinwalz on 4 Sep 2019

Have the same issue in metricbeat 7.3.1 .
but this issue happen in alibaba cloud Container Service
in aws cloud had not happend

kubernetes module config

 79   kubernetes.yml: |-
 80     - module: kubernetes
 81       metricsets:
 82         - event
 83       period: 10s
 84       host: ${NODE_NAME}
 85       # If using Red Hat OpenShift remove the previous hosts entry and 
 86       # uncomment these settings:
 87       #hosts: ["https://${HOSTNAME}:10250"]
 88       #bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
 89       #ssl.certificate_authorities:
 90       #  - /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
 91       #ssl.verification_mode: "none"
 92       add_metadata: true
 93       in_cluster: true
 94       fields_under_root: true
 95       fields:
 96         environment: ${ENVIRONMENT}
 97         REGION: ${REGION}

log:

2019-09-05T06:29:12.239Z    DEBUG   [cfgfile]   cfgfile/reload.go:197   Scan for new config files
2019-09-05T06:29:12.239Z    DEBUG   [cfgfile]   cfgfile/cfgfile.go:175  Load config from file: /usr/share/metricbeat/modules.d/kubernetes.yml
2019-09-05T06:29:12.239Z    DEBUG   [cfgfile]   cfgfile/reload.go:216   Number of module configs found: 1
2019-09-05T06:29:12.239Z    DEBUG   [reload]    cfgfile/list.go:62  Starting reload procedure, current runners: 1
2019-09-05T06:29:12.239Z    DEBUG   [reload]    cfgfile/list.go:80  Start list: 0, Stop list: 0
2019-09-05T06:29:20.456Z    DEBUG   [kubernetes]    kubernetes/watcher.go:261   EOF while watching API
2019-09-05T06:29:20.456Z    INFO    kubernetes/watcher.go:242   kubernetes: Watching API for resource events
2019-09-05T06:29:20.479Z    DEBUG   [kubernetes]    kubernetes/watcher.go:261   EOF while watching API
2019-09-05T06:29:20.479Z    INFO    kubernetes/watcher.go:242   kubernetes: Watching API for resource events
2019-09-05T06:29:20.579Z    DEBUG   [kubernetes]    kubernetes/watcher.go:261   EOF while watching API

ZhaoBaymax on 5 Sep 2019

We also are seeing this error again in both our aws and on-prem environments on filebeat 6.8.1:

2019-09-26T08:58:13.420Z        ERROR   kubernetes/watcher.go:258       kubernetes: Watching API error EOF
2019-09-26T09:51:33.050Z        ERROR   kubernetes/watcher.go:258       kubernetes: Watching API error EOF
2019-09-26T09:51:33.054Z        ERROR   kubernetes/watcher.go:258       kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName

Many sets of logs are not being shipped as well.

brokencode64 on 26 Sep 2019

👍5

This error is reproducible with filebeat 6.8.9:

{"level":"error","timestamp":"2020-06-19T11:37:51.250Z","caller":"kubernetes/watcher.go:258","message":"kubernetes: Watching API error EOF"}
{"level":"error","timestamp":"2020-06-19T11:37:51.251Z","caller":"kubernetes/watcher.go:258","message":"kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName"}
{"level":"error","timestamp":"2020-06-19T12:24:33.883Z","caller":"kubernetes/watcher.go:258","message":"kubernetes: Watching API error EOF"}

We're missing some containers logs when this error occurs.

rbailly-talend on 19 Jun 2020

👍5

@exekias This is still seen on later versions.