When I run filebeat in a Cloud Provider, I found filebeat always get the wrong kubernetes node.After some investigation, I found https://github.com/elastic/beats/blob/master/libbeat/common/kubernetes/util.go#L104.So Filebeat use machine-id to match a Kubernetes Node Object and the machine that running on,but in Cloud Provider the machine-id maybe always the same锛宐ecause the OS is cloned.And I'd love to fix the problem,my idea is to use IP for Node matching.
We use machine id as a last resort, before that the code will try to (in this order):
I would say having a repeated machine id is a bug, as it should be unique. Did you try passing the host setting like this https://github.com/elastic/beats/blob/e1fa1981a086f00fddb159ea10e1a293ecceb574/deploy/kubernetes/filebeat-kubernetes.yaml#L17?
any ideas on how to improve this @ChrsMark @blakerouse ?
@pigletfly When a machine is cloned it would also be a good idea to have it clear the cloud-init instance data so cloud-init will run again on that machine when it is created.
That will ensure that your machine have unique cloud-ids and that the hostname of the machines will match the cloud provider instance name. This is also important for metrics collections so CPU usage is reported correctly per node in the cluster, without that you would only see 1 node in the cluster when in reality you could have multiple.
OK, in my case , I will run systemd-machine-id-setup to ensure the machine id on each host is unique.Thanks for your elaboration.
Most helpful comment
@pigletfly When a machine is cloned it would also be a good idea to have it clear the cloud-init instance data so cloud-init will run again on that machine when it is created.
That will ensure that your machine have unique cloud-ids and that the hostname of the machines will match the cloud provider instance name. This is also important for metrics collections so CPU usage is reported correctly per node in the cluster, without that you would only see 1 node in the cluster when in reality you could have multiple.