Charts: [stable/fluentd-elasticsearch] liveness probe hangs

Created on 17 Oct 2018 · 5Comments · Source: helm/charts

Is this a request for help?: No

Is this a BUG REPORT or FEATURE REQUEST? (choose one):
BUG REPORT

Version of Helm and Kubernetes:
Kubernetes v1.9.6
Helm v2.9.1

Which chart:
stable/fluentd-elasticsearch

What happened:
The Fluentd pod stopped sending logs to Elasticsearch, and stopped logging itself. It also stopped updating it's file buffer. There is almost no CPU usage reported in Prometheus. This happened shortly after the pod started up. There was already a file buffer left from the previous pod.

What you expected to happen:
I expected the liveness probe to fail and the container to be restarted

How to reproduce it (as minimally and precisely as possible):
Run fluentd with file buffers over 100MB, and delete the pod so a new one starts up.

Anything else we need to know:
We found that the liveness probe hangs, or takes very long to complete.

We found this in the kubelet logs: 1fa2fc59b7030b872da3dff852b5947dd0270452d8e3e in container c8bbe9de50f77d111420addce3805d20cb03d6434944ac28f30b71d892cef876 terminated but process still running!

Running a ps shows that there are multiple bash commands still running on the container.

We are running a custom image, based off gcr.io/google-containers/fluentd-elasticsearch:v2.3.1, the only changes are fluent-plugin-kubernetes_metadata_filter 2.1.2, concat and rewrite-tag-filter gems installed.

lifecyclstale

Source

davidmessem

Most helpful comment

Is there any news on this issue? I am still experiencing with a fresh deployment of this chart onto kubernetes v1.12.3

chance-schultz on 15 Mar 2019

👍5

All 5 comments

Seeing the same issue. 7/15 fluentd instances has this problem in our cluster and stopped logging and sending data to elasticsearch. Liveness probe is not triggering restart. When manually restarted they start sending logs again.
Same last log entry for the 7 of them. (And we are having some issues with elasticsearch as well):

$ kubectl get pods -n kube-system |grep fluentd-elasticsearch-fluentd-elasticsearch | awk '{print $1}' | xargs -L 1 kubectl logs -n kube-system --tail=1 | sort
2018-10-24 07:41:41 +0000 [warn]: [elasticsearch] failed to write data into buffer by buffer overflow action=:block
2018-10-24 07:47:59 +0000 [warn]: [elasticsearch] failed to write data into buffer by buffer overflow action=:block
2018-10-24 07:51:59 +0000 [warn]: [elasticsearch] failed to write data into buffer by buffer overflow action=:block
2018-10-24 07:54:07 +0000 [warn]: [elasticsearch] failed to write data into buffer by buffer overflow action=:block
2018-10-24 07:58:30 +0000 [warn]: [elasticsearch] failed to write data into buffer by buffer overflow action=:block
2018-10-24 08:29:58 +0000 [warn]: [elasticsearch] failed to write data into buffer by buffer overflow action=:block
2018-10-24 09:24:20 +0000 [warn]: [elasticsearch] failed to write data into buffer by buffer overflow action=:block
...

wideklev on 25 Oct 2018

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.