Beats: Infinite loop while watching for Kubernetes events

Created on 7 Mar 2018  路  29Comments  路  Source: elastic/beats

Both when using add_kubernetes_metadata or kubernetes autodiscover provider, some users have reported this issue (which causes an infinite loop):

2018-03-05T08:16:36.021Z        ERROR   kubernetes/watcher.go:188       kubernetes: Watching API error EOF
2018-03-05T08:16:36.021Z        INFO    kubernetes/watcher.go:173       kubernetes: Watching API for resource events
2018-03-05T08:16:36.030Z        ERROR   kubernetes/watcher.go:188       kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-03-05T08:16:36.030Z        INFO    kubernetes/watcher.go:173       kubernetes: Watching API for resource events
2018-03-05T08:16:36.031Z        ERROR   kubernetes/watcher.go:188       kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName
2018-03-05T08:16:36.031Z        INFO    kubernetes/watcher.go:173       kubernetes: Watching API for resource events
2018-03-05T08:16:36.033Z        ERROR   kubernetes/watcher.go:188       kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName

More details can be found here: https://github.com/elastic/beats/pull/6353

bug libbeat

Most helpful comment

This error is reproducible with filebeat 6.8.9:

{"level":"error","timestamp":"2020-06-19T11:37:51.250Z","caller":"kubernetes/watcher.go:258","message":"kubernetes: Watching API error EOF"}
{"level":"error","timestamp":"2020-06-19T11:37:51.251Z","caller":"kubernetes/watcher.go:258","message":"kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName"}
{"level":"error","timestamp":"2020-06-19T12:24:33.883Z","caller":"kubernetes/watcher.go:258","message":"kubernetes: Watching API error EOF"}

We're missing some containers logs when this error occurs.

All 29 comments

I've pushed a branch ignoring unmarshall errors: https://github.com/exekias/beats/tree/ignore-k8s-errors

Also built an image from it: exekias/filebeat:ignore-k8s-errors, @simioa it would be awesome if you could do another test, your help here is much appreciated

@exekias Looks good, no problems till now.

Good to hear, @simioa, I'm wondering, did you see the ERROR happen?

The code doesn't really avoid that message but should recover from it in all cases:
kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName

Not that ERROR Message but this:

2018-03-08T15:33:50.114Z        ERROR   kubernetes/watcher.go:198       kubernetes: Watching API error EOF

I just looked into an Filebeat 6.2.2 Log and saw that right after the kubernetes: Watching API error EOF ERROR appeared, the infinite loop started.

This is not the case anymore, the only thing logged after the EOF Error related to the Watching API is:

2018-03-08T15:33:50.114Z        INFO    kubernetes/watcher.go:181       kubernetes: Watching API for resource events

Ok, please keep an eye on it and report if it happens :innocent:

In the meanwhile, it seems we now have a way to reproduce it, I will be doing some testing myself

Thank you for your effort! It's really helping with this issue

A fix has been merged and should go out in both 6.3.0 and 6.2.3

what's the ETA for new container 6.2.3 with the fix merged ?

It should be out soon, in any case, for convenience, I've pushed a snapshot build with the fix: exekias/filebeat:6.2.3-snapshot

An update on this, new images for 6.2.3 are already available: https://www.docker.elastic.co/

@exekias this error also appears in 6.4.0, kubernetes 1.6.4 , any workaround for this?

2018-08-19T01:53:44.655Z    ERROR   kubernetes/watcher.go:254   kubernetes: Watching API error EOF
2018-08-19T01:53:44.655Z    INFO    kubernetes/watcher.go:238   kubernetes: Watching API for resource events
2018-08-19T01:53:53.051Z    ERROR   kubernetes/watcher.go:254   kubernetes: Watching API error EOF
2018-08-19T01:53:53.051Z    INFO    kubernetes/watcher.go:238   kubernetes: Watching API for resource events
2018-08-19T01:53:53.293Z    ERROR   kubernetes/watcher.go:254   kubernetes: Watching API error EOF
2018-08-19T01:53:53.294Z    INFO    kubernetes/watcher.go:238   kubernetes: Watching API for resource events
2018-08-19T01:53:54.065Z    ERROR   kubernetes/watcher.go:254   kubernetes: Watching API error EOF
2018-08-19T01:53:54.065Z    INFO    kubernetes/watcher.go:238   kubernetes: Watching API for resource events
2018-08-19T01:54:05.405Z    ERROR   kubernetes/watcher.go:254   kubernetes: Watching API error EOF
2018-08-19T01:54:05.405Z    INFO    kubernetes/watcher.go:238   kubernetes: Watching API for resource events
2018-08-19T01:54:07.414Z    ERROR   kubernetes/watcher.go:254   kubernetes: Watching API error EOF
2018-08-19T01:54:07.414Z    INFO    kubernetes/watcher.go:238   kubernetes: Watching API for resource events

Still appears in 6.3.2:

2018-08-25T14:50:05.364Z        INFO    [beat]  instance/beat.go:725    Build info      {"system_info": {"build": {"commit": "45a9a9e1561b6c540e94211ebe03d18abcacae55", "libbeat": "6.3.2", "time": "2018-07-20T04:18:19.000Z", "version": "6.3.2"}}}
...
2018-08-25T18:40:57.180Z        ERROR   kubernetes/watcher.go:195       kubernetes: Watching API error EOF
2018-08-25T18:40:57.180Z        INFO    kubernetes/watcher.go:179       kubernetes: Watching API for resource events

@hobti01 Do you have memory issue with filebeat process? With the error appear more and more frequently, the memory useage may grow to 5GB and more.In my cluster, this error appears after filebeat restart about 1 hour.
I found the root cause maybe this line, w.r return empty string, which is abnormal.

https://github.com/elastic/beats/blob/0b32c2d7604145eabb7b5652dc8de7bcccf64baf/vendor/github.com/ericchiang/k8s/watch.go#L100

@yu-yang2 we do not have memory issues. We have noticed that sometimes log shipping will stop for an application container when we update filebeat and is not restarted.

This is the only ERROR in our logs, but this seems to be just a warning about the Kubernetes watch which is automatically restarted.

Same here filebeat version 6.3.2
2018-09-05T04:47:25.540Z ERROR kubernetes/watcher.go:195 kubernetes: Watching API error EOF 2018-09-05T04:47:25.541Z INFO kubernetes/watcher.go:179 kubernetes: Watching API for resource events

Same error in filebeat v6.4.2
Installed from https://github.com/helm/charts/tree/master/stable/filebeat

2018-11-09T15:03:07.979+0600    ERROR   kubernetes/watcher.go:254   kubernetes: Watching API error EOF

I also got this error in v6.4.3.
The pod with this error stops collecting kubernetes' metadata. When I restart the pod, the metadata collection resumes, but appears again after a few hours.

2018-11-13T12:23:44.563Z    ERROR   kubernetes/watcher.go:254   kubernetes: Watching API error EOF
2018-11-13T12:23:44.563Z    INFO    kubernetes/watcher.go:238   kubernetes: Watching API for resource events

I also have the same errors with v6.4.3 my filebeat memory usage grows until it's killed by kubernetes at about 1gb of usage per pod.

same error for v6.4.3:
2018-11-13T21:11:43.472+0800 ERROR kubernetes/watcher.go:254 kubernetes: Watching API error EOF
2018-11-13T21:11:43.472+0800 INFO kubernetes/watcher.go:238 kubernetes: Watching API for resource events
2018-11-13T21:11:43.474+0800 ERROR kubernetes/watcher.go:254 kubernetes: Watching API error proto: illegal wireType 6
2018-11-13T21:11:43.474+0800 INFO kubernetes/watcher.go:258 kubernetes: Ignoring event, moving to most recent resource version
2018-11-13T21:11:43.474+0800 INFO kubernetes/watcher.go:238 kubernetes: Watching API for resource events

Can we have this re-opened since the issues seems to still be occurring?

@opsnull Do you also see filebeat consuming large amounts of memory?

I have opened a new ticket here
https://github.com/elastic/beats/issues/9078

6.6.1涔熸湁杩欎釜閿欒
image

Have this issue in filebeat-oss:6.8.0

2019-06-18T13:04:47.479Z    ERROR   kubernetes/watcher.go:258   kubernetes: Watching API error EOF
2019-06-18T13:04:47.479Z    INFO    kubernetes/watcher.go:242   kubernetes: Watching API for resource events
2019-06-18T13:05:05.263Z    INFO    [monitoring]    log/log.go:144  Non-zero metrics in the last 30s    {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":470,"time":{"ms":2}},"total":{"ticks":2840,"time":{"ms":18},"value":2840},"user":{"ticks":2370,"time":{"ms":16}}},"handles":{"limit":{"hard":65536,"soft":65536},"open":17},"info":{"ephemeral_id":"b4b54edd-c52c-4b5a-a29b-512f3f10c76e","uptime":{"ms":3360067}},"memstats":{"gc_next":52123232,"memory_alloc":34918008,"memory_total":303615072}},"filebeat":{"harvester":{"open_files":8,"running":8}},"libbeat":{"config":{"module":{"running":0}},"output":{"events":{"acked":900,"active":-573,"batches":1,"total":327},"read":{"bytes":42},"write":{"bytes":33899}},"pipeline":{"clients":2,"events":{"active":4118,"retry":327}}},"registrar":{"states":{"current":35}},"system":{"load":{"1":0.09,"15":0.1,"5":0.12,"norm":{"1":0.045,"15":0.05,"5":0.06}}}}}}

Have the same issue in 6.8.1

I'm also seeing this on 6.8.1

Repeated output of 2019-09-04T01:17:29.852Z ERROR kubernetes/watcher.go:258 kubernetes: Watching API error EOF coupled with dropped logs.

Some pods on the machine do not have missed logs, while others do. We're seeing this issue in development and production clusters.

Have the same issue in metricbeat 7.3.1 .
but this issue happen in alibaba cloud Container Service
in aws cloud had not happend

kubernetes module config

 79   kubernetes.yml: |-
 80     - module: kubernetes
 81       metricsets:
 82         - event
 83       period: 10s
 84       host: ${NODE_NAME}
 85       # If using Red Hat OpenShift remove the previous hosts entry and 
 86       # uncomment these settings:
 87       #hosts: ["https://${HOSTNAME}:10250"]
 88       #bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
 89       #ssl.certificate_authorities:
 90       #  - /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
 91       #ssl.verification_mode: "none"
 92       add_metadata: true
 93       in_cluster: true
 94       fields_under_root: true
 95       fields:
 96         environment: ${ENVIRONMENT}
 97         REGION: ${REGION}

log:

2019-09-05T06:29:12.239Z    DEBUG   [cfgfile]   cfgfile/reload.go:197   Scan for new config files
2019-09-05T06:29:12.239Z    DEBUG   [cfgfile]   cfgfile/cfgfile.go:175  Load config from file: /usr/share/metricbeat/modules.d/kubernetes.yml
2019-09-05T06:29:12.239Z    DEBUG   [cfgfile]   cfgfile/reload.go:216   Number of module configs found: 1
2019-09-05T06:29:12.239Z    DEBUG   [reload]    cfgfile/list.go:62  Starting reload procedure, current runners: 1
2019-09-05T06:29:12.239Z    DEBUG   [reload]    cfgfile/list.go:80  Start list: 0, Stop list: 0
2019-09-05T06:29:20.456Z    DEBUG   [kubernetes]    kubernetes/watcher.go:261   EOF while watching API
2019-09-05T06:29:20.456Z    INFO    kubernetes/watcher.go:242   kubernetes: Watching API for resource events
2019-09-05T06:29:20.479Z    DEBUG   [kubernetes]    kubernetes/watcher.go:261   EOF while watching API
2019-09-05T06:29:20.479Z    INFO    kubernetes/watcher.go:242   kubernetes: Watching API for resource events
2019-09-05T06:29:20.579Z    DEBUG   [kubernetes]    kubernetes/watcher.go:261   EOF while watching API

We also are seeing this error again in both our aws and on-prem environments on filebeat 6.8.1:

2019-09-26T08:58:13.420Z        ERROR   kubernetes/watcher.go:258       kubernetes: Watching API error EOF
2019-09-26T09:51:33.050Z        ERROR   kubernetes/watcher.go:258       kubernetes: Watching API error EOF
2019-09-26T09:51:33.054Z        ERROR   kubernetes/watcher.go:258       kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName

Many sets of logs are not being shipped as well.

This error is reproducible with filebeat 6.8.9:

{"level":"error","timestamp":"2020-06-19T11:37:51.250Z","caller":"kubernetes/watcher.go:258","message":"kubernetes: Watching API error EOF"}
{"level":"error","timestamp":"2020-06-19T11:37:51.251Z","caller":"kubernetes/watcher.go:258","message":"kubernetes: Watching API error proto: wrong wireType = 6 for field ServiceAccountName"}
{"level":"error","timestamp":"2020-06-19T12:24:33.883Z","caller":"kubernetes/watcher.go:258","message":"kubernetes: Watching API error EOF"}

We're missing some containers logs when this error occurs.

@exekias This is still seen on later versions.

Was this page helpful?
0 / 5 - 0 ratings