compression=lz4) w/ lots of file systems and occasionally create new ones/path/to/logs/logfile-*.txtinclude_lines: designed to capture a tiny subset of the log file content/path/to/logs/logfile-STUFF-YYYY-MM-DD.txtzpool (using zfs create pool/...)Here's some python I used to look at the registry (because I was curious):
>>> import json
>>> def loadj(n):
... f=open(n)
... j=json.load(f)
... f.close()
... return j
...
>>> registry=loadj("/var/lib/filebeat/registry")
>>> files83={a['source']:a for a in registry if a['FileStateOS']['device']==83}
>>> files91={a['source']:a for a in registry if a['FileStateOS']['device']==91}
>>> files91_not83 = [a for a in files91 if not a in files83]
>>> files_83_less_91=[[files83[a],files91[a]] for a in files83 if a in files91 and files83[a]['offset'] < files91[a]['offset']]
>>> files_91_less_83=[[files83[a],files91[a]] for a in files83 if a in files91 and files91[a]['offset'] < files83[a]['offset']]
>>> map(len,[files91, files83, files91_not83, files_83_less_91, files_91_less_83])
[2719, 2639, 80, 67, 13]
>>> offset_check=[[a[0]['source'][-14:],a[0]['offset'],a[1]['offset']] for a in files_91_less_83]
>>> offset_check
[[u'2017-11-13.txt', 16381394402, 5175661735], [u'2017-11-07.txt', 8495794954, 5221706109], [u'2017-11-06.txt', 7292885516, 5286674534], [u'2017-12-10.txt', 8619599723, 5246991550], [u'2017-12-11.txt', 10040266323, 5221623440], [u'2017-11-15.txt', 19147829467, 5194955114], [u'2017-12-04.txt', 34748715454, 5155669419], [u'2017-12-09.txt', 7199195022, 5332507314], [u'2017-11-10.txt', 12397086731, 5205001580], [u'2017-11-09.txt', 10995929363, 5209715654], [u'2017-11-11.txt', 13719864719, 5184755763], [u'2017-12-08.txt', 5778746908, 5321055999], [u'2017-11-08.txt', 9710451263, 5220711257]]
Essentially, at some point in time my log files were on a volume which was assigned deviceid 83. At some point the system rebooted, and now the deviceid for the same volume is 91. After that point, the system ran for a bit over a day and 80 new files (~75 from the second day) appeared.
The files I care most about are the ones in offset_check -- they're all >4GB, and filebeat had them all open. I believe it was slowly making progress through them, but it was doing a really bad job of it.
My understanding is that device ids are not guaranteed past reboot, and any process trying to use them past that point is "doing it wrong". I believe that filebeat is in this category.
Expected result (conceptually):
if [ /proc/1 -nt /var/lib/filebeat/registry ]; then
echo "Do not rely on Device ID or INode as they are no longer meaningful"
fi
I don't have any particular opinion about what one should do if a volume is unmounted and remounted while filebeat is running. Offhand, I think that if the device id changes you probably can't rely on the inode either, but personally, I'd expect the process to consider the datestamp of the file -- if the file hasn't changed since the last time it was seen, then it should normally be treated as the same file. If the file has a different inode and the same deviceid as the last time something looked, then it's reasonable to think it may have changed.
I guess this is happening because ZFS sets a new device ID when mounting the partition?
Each volume gets a device ID and they're first come first served, device IDs being assigned dynamically by the running system without any persistence.
The implementation here is linear (1,2,3,4,...), But there's really no particular requirement for that, and absolutely nothing requires persistence across boots.
I should probably also note that you're only recording the minor id, but even on a running system that doesn't define a device, you need major+minor to uniquely identify a device on a running system.
I think this is an issue specific to ZFS and not a general "doing it wrong" ;-) So far for most file system it worked really well and the combination of device + inode is the unique identifer of a file on the other file systems (windows has 3 identifiers). The issue you are describing reminds me a lot of some "interesting" behaviour on shared file systems and is the reason we recommend to install filebeat on the edge node. But for ZFS I think this can't be applied.
The main question is which methods we have to identify a file over it's life time:
All have their pros and cons. Currently we do option 2 with the most common identifiers. We also discussed option 1 and 3 in the past. 1 would work really well in cases where files are not moved / rotated / renamed and the file name is the unique identifier. Option 3 we discussed for cases where unique identifiers from 2 do not stay the same like in shared volumes. We would identify a file based on hash of a subset of the content of the file. One additional option you brought up above is to take 2 but only enforce a subset of identifiers.
I seems for your case only option 3 would work as the path of the same file changes over time?
@exekias I removed the bug label and changed it to enhancement as the above behaviour is from my point of view as expected and by design. I was not aware that ZFS behaves like the above, so we should probably add a note to our docs about this.
This isn't limited to zfs.
If you use Google Compute Engine (or AWS, or Azure, or Linode, or...) and dynamically add/remove physical storage, the nodes you'll get will be reassigned minor ids. It should also happen if you used classical hot plugging.
I just tested w/ GCE, I had a /dev/sda1 (/), I attached a disk (which provided /dev/sdb1) and mounted it as /media, then I unmounted it, created a new disk, attached it, partitioned+formated it, mounted it as /media (it became /dev/sdb1 -- and had the same major/minor as the previous /dev/sdb1), and then I reattached the previous disk (it became /dev/sdc1) which I mounted as /mnt -- this time it had a new minor number (because the newer disk was sitting in the minor slot ...).
Really, anything that involves any amount of dynamism is problematic.
[It would probably apply to nfs, samba, afs, fuse, but you'd wave that off...]
Thanks a lot for putting all this effort in and test the different system. It's definitively a problem we need to start tackling more active with filebeat. I think the most important step on our side is to make it pluggable how files are compared so any of the 3 options mentioned above could be used and more could be added. First steps have been made but we are not there yet.
You are definitively right with the other network systems you mentioned. We are aware of this limitation, see https://www.elastic.co/guide/en/beats/filebeat/current/faq.html#filebeat-network-volumes Let me quickly explain why I replaced the bug label with an enhancement label in case that concerns you. If we treat it as a bug it would mean we should / must backport fixes to older branches because it is broken. But from a support perspective we know it doesn't work and we don't recommend it. That doesn't mean we should not add this feature.
@jsoref I suggest to rename the title to something like "Add support for network volumes in Filebeat" or "Add additional file identification mechanisms to Filebeat" to be more explicity on what we need to add.
For your specific ZFS use case, would it be enough to just ignore the device id or can the inode also change?
I don't have a particularly good sense of the story for inodes.
https://lists.freebsd.org/pipermail/freebsd-hackers/2010-February/030746.html seems to indicate that inodes are figments created as requested.
https://github.com/zfsonlinux/zfs is the project that handles the Linux kernel implementation.
Afaict, these things are probably moderately reliable until a computer reboots or a device detaches, and entirely unreliable after either of those events occurs. Again, the algorithm I suggested of only relying on these pieces of information up to the point where the system has rebooted should work for most cases.
I don't have any advice for how to deal w/ the case where physical devices come/go while a system is running. And, fwiw that's probably going to happen much more often. (This morning we hot swapped a disk on a physical server because it failed. This afternoon I started talking about plans for various migrations between systems, some models could involve ejecting disks and moving them to other systems.)
@jsoref Thanks for updating the title. Swapping the physical disk is also a very interesting use case. I'm kind of surprised we didn't get hit by this issue yet (or we just didn't hear of it).
@ph FYI as you are current doing quite a bit of cleanup / work on filebeat.
@ruflin Any update on this?
We face the same issue with a nfs volumes mounted in a Kubernetes pod.
@bquartier It still something we are planning to do to improve our story on shared FS, we haven't started working on it yet.
@bquartier: thanks for the note (we're considering playing w/ kubernetes, so you've saved me a check...)
@ruflin Any updates on this enhancement?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue doesn't have a Team:<team> label.
that's very unhelpful.
Most helpful comment
@ruflin Any update on this?
We face the same issue with a nfs volumes mounted in a Kubernetes pod.