Describe the bug
To Reproduce
{"log":"YOUR LOG MESSAGE HERE","stream":"stdout","time":"2018-06-11T14:37:30.681701731Z"}
apiVersion: v1
kind: Pod
metadata:
name: logger
namespace: default
spec:
containers:
- name: logger
image: k8s.gcr.io/logs-generator:v0.1.1
args:
- /bin/sh
- -c
- |-
/logs-generator --logtostderr --log-lines-total=${LOGS_GENERATOR_LINES_TOTAL} --run-duration=${LOGS_GENERATOR_DURATION}
# Sleep forever to prevent restarts
while true; do
sleep 3600;
done
env:
- name: LOGS_GENERATOR_LINES_TOTAL
value: 100000
- name: LOGS_GENERATOR_DURATION
value: 600s
$ kubectl annotate po logger fluentbit.io/exclude=true
fluentbit.io/exclude=true is ignored => fluent-bit continues to stream logs from this pod. This is because kubectl annotate does not lead to pod restart. This behaviour also is not described in the docs, a warning could be added.Expected behavior
Fluent-bit to stop process logs from the pod once after it is annotated.
Screenshots
Your Environment
[SERVICE]
Flush 30
Daemon Off
Log_Level warn
Parsers_File parsers.conf
[INPUT]
Name tail
Tag kubernetes.*
Path /var/log/containers/*.log
Parser docker
DB /var/log/flb_kube.db
Mem_Buf_Limit 12MB
Refresh_Interval 10
ignore_older 1800s
[OUTPUT]
Name stdout
Match *
[FILTER]
Name kubernetes
Match kubernetes.*
Kube_URL https://kubernetes.default.svc.cluster.local:443
Buffer_Size 1M
Merge_Log On
K8S-Logging.Parser On
K8S-Logging.Exclude On
Annotations Off
tls.verify Off
Additional context
Also fluentbit.io/exclude is unusable with StatefulSets:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: logger
namespace: default
spec:
serviceName: logger
selector:
matchLabels:
app: logger
template:
metadata:
labels:
app: logger
spec:
nodeSelector:
kubernetes.io/hostname: #put a node name here
containers:
- name: logger
image: k8s.gcr.io/logs-generator:v0.1.1
args:
- /bin/sh
- -c
- |-
/logs-generator --logtostderr --log-lines-total=${LOGS_GENERATOR_LINES_TOTAL} --run-duration=${LOGS_GENERATOR_DURATION}
# Sleep forever to prevent restarts
while true; do
sleep 3600;
done
env:
- name: LOGS_GENERATOR_LINES_TOTAL
value: "100000"
- name: LOGS_GENERATOR_DURATION
value: "600s"
logger-0fluentbit.io/exclude: "true":$ kubectl edit sts logger
statefulset.apps/logger edited
fluentbit.io/exclude: "true" is ignored and fluent-bit continues to receive events from the newly created pod logger-0. Note that the sts has a nodeSelector that makes it easy to spawn the pod in the same node (=> fluent-bit) to reproduce more clearly this issue.we need to implement watch.
we should probably move to a model where watch is filtered by the node we are on, and, when a change is given, invalidate the cache.
@ialidzhikov on the StatefulSet, won't it do a RollingUpdate and thus get the annotation on all the Pods owned by the StatefulSet controller?
I would really like to see this feature. My team is experiencing the same problem when running FluentBit as a DaemonSet.
@ialidzhikov Did you add the fluentbit.io/exclude annotation to .metadata, or to .spec.template.metadata (which should be the right one)?
@donbowman , it does RollingUpdate and sets the annotation on all of the pods. As far I see the cache_key is build from <namespace>:<pod_name>:<container_name>. After the sts update, it creates a new pod with new newly added annotation, but the cache_key stays the same - the sts always follows the pattern <pod-name>-0, <pod-name>-1, <pod-name>-2 for the pod names. To recap - the cache_key does not change with the sts update and fluent-bit does not go to kube-apiserver to read the newly added annotation => fluentbit.io/exclude is completely ignored.
@lbogdan , I reproduced it just now and I add the annotation to .spec.template.metadata. You could also give a try with the steps above. : )
@donbowman We have the same issue. It would be really helpful to implement the watch model to catch the annotation "exclude".
We use fluent-bit in a production environment with more than 4 000 pods running and it would be great to be able to stop collecting logs from specific pods or even all pods from a given namespace by simply adding the annotation.
/kind bug
@donbowman While waiting for the "watch" model, would it be possible to add a parameter to refresh the cache periodically or using an HTTP endpoint to specify the name of the resource to refresh (a POD or a namespace) ?
Or add a TTL for the cache entries with can be set by configuration. It could be a good start.
One use case I'm currently having is that I would like to exclude the fluentbit daemonset itself.
I've seen this 'fixed' after a few hours of running, but I didn't see anything deterministic, yet.
If anyone knows how I can trigger the exclude, even manually, it'll be great.
Most helpful comment
@donbowman While waiting for the "watch" model, would it be possible to add a parameter to refresh the cache periodically or using an HTTP endpoint to specify the name of the resource to refresh (a POD or a namespace) ?
Or add a TTL for the cache entries with can be set by configuration. It could be a good start.