Currently Filebeat treats symlink as normal files. In case a file appears in the glob as symlink and file, the content is read twice. The following changes should be made:
Is the second option needed?
For reference also see: https://discuss.elastic.co/t/filebeat-fails-to-harvest-if-a-file-and-a-symlink-to-that-file-is-in-the-same-directory/49743/3
I would try to go without the second one, and see what backlash we get. I think in this case symlinks can provoke a lot of corner cases.
@tsg What do you mean with the second option? Both changes are required.
I was thinking to simply don't support symlinks. It's removing
functionality, I know, but I think functionality that was unintentionally
introduced, right?
Tudor
Am 23.05.2016 6:08 nachm. schrieb "Nicolas Ruflin" <[email protected]
:
@tsg https://github.com/tsg What do you mean with the second option?
Both changes are required.—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://github.com/elastic/beats/issues/1686#issuecomment-221019377
@tsg Yes. Ok, then we are on the same page. First remove it and potentially reintroduce it in a second step.
We are also seeing this issue. We use github.com/golang/glog for logging. glog typically creates the log file with a long name and places a symlink to the log file in the same folder specified in log_dir flag. In this case, filebeat is reading the file twice.
https://github.com/elastic/beats/pull/1767 now removes following symlinks. I suggest to keep it that way and only introduce a config to enable it in case we get feature requests for it.
We use filebeat to collect logs from Docker logs in Kubernetes cluster. Kubernetes provides a handy path with all container logs which are symlinks to /var/lib/docker/containers. Now that filebeat doesn't have that functionality we no longer can collect the container logs from Kubernetes.
We could just use original path instead of the Kubernetes symlinked one, but we rely on the filename and creating fields based the filename. So we won't be able to upgrade to version 5 until we can re-enable symlinks.
@shamil Thanks for reporting this and sharing the insights. Can you share some more details on how you use the filename and what the original filename would look like? I'm not too familiar with kubernetes.
@ruflin, I'm doing something similar to this: https://github.com/ApsOps/filebeat-kubernetes
let me know if you need more information...
Thanks for the link. The interesting part for me is the following:
"/var/log/containers/%{DATA:pod_name}_%{DATA:namespace}_%{GREEDYDATA:container_name}-%{DATA:container_id}.log"
As far as I understand this files are symlinks to a file somewhere else. The file name can be used in logstash to add additional data meta data to the event. Some questions:
We had in the past the discussion to follow symlinks and read the original file to prevent some symlinks edge cases. But that seems not to work in your case as this would send the original file name and not the one you have above.
The symlinks automatically updated by kubernetes. The original files arw regular docker logs in /var/lib/docker/containers. The symlinks never updated. They persist for thw lifcycle of the container. The original docker container logs are rotated and didn't caused anybissues so far.
Sorry for all the questions. But I'm kind of surprised that it didn't cause and issues so far and want to understand it more in detail. My assumption so far:
Seems like I need to write some tests to see what the actual behaviour is.
OK, let me explain.
kubelet manages this.copytruncate strategy, so it will always stay same target, never new file created or replaced.I guess Kubernetes guys thought about possible problems and made the necessary steps to avoid the issues you mentioned. I think if it possible to have non-default option for enabling symlinks, that would be great!
Here is an example symlink from /var/log/containers
kubernetes-dashboard-2037278273-exjgk_kube-system_POD-41a6c77b2591440b37e393d843b9a0183eea468df3d68316f14e98119372467d.log -> /var/lib/docker/containers/41a6c77b2591440b37e393d843b9a0183eea468df3d68316f14e98119372467d/41a6c77b2591440b37e393d843b9a0183eea468df3d68316f14e98119372467d-json.log
Thanks
A quote from the logrotate docs:
Note that there is a very small time slice between copying the file and truncating it, so some logging data might be lost
With filebeat tailing the files and not harvesting the new file (as it is copied and not found) the chance of loosing some log lines is even higher.
My current conclusion is that this seems to be an acceptable trade off for people. It would mean loosen the filebeat guarantee that all log lines are sent for symlinks, but the same actually applies to all copytruncate use cases.
@shamil Lets move our discussion to the open issue here so it is more visible: https://github.com/elastic/beats/issues/2277
For people interested in this issue, there is now also a PR with a potential implementation: https://github.com/elastic/beats/pull/2478
@shamil @ruflin , Hi, I use kubernets too, this issures will be fixed? in which verison?
Most helpful comment
We use filebeat to collect logs from Docker logs in Kubernetes cluster. Kubernetes provides a handy path with all container logs which are symlinks to
/var/lib/docker/containers. Now that filebeat doesn't have that functionality we no longer can collect the container logs from Kubernetes.We could just use original path instead of the Kubernetes symlinked one, but we rely on the filename and creating fields based the filename. So we won't be able to upgrade to version 5 until we can re-enable symlinks.