If I run fd --follow something / (on Linux) fd will use 100% of my CPU and slowly allocate more and more ram until the system runs out and stops responding (unless I kill fd).
Killing fd after a few minutes frees a ton of ram:

Other details
fd --follow --hidden something ~ causes it too, if the user has installed Wine, since that creates a symlinks to / from .wine/dosdevices/z:.--follow it only takes a second/sys (but not /proc) it completes in ~10 seconds.821,92s user 1006,11s system 819% cpu 3:42,93 total). Ram allocation peaks about halfway, and then starts freeing up ram.--threads 1 (on a system with 8 vcores) oddly makes it run significantly faster :open_mouth: ~2:30m (78,85s user 99,34s system 119% cpu 2:29,18 total). It peaks at about the same amount of RAM though.I'm using fd 7.2.0 and /proc isn't the problem, so this isn't #288, though the symptoms are nearly identical. Maybe the very large number of symlinks in /sys is the problem (symlinks to folders containing symlinks to folders to ...), causing an even much larger number of full paths that the ignore crate needs to store? Maybe this warrants gigabytes of ram allocated?
I'm not sure if this is to be considered a bug or just an unfortunate combination, but since the outcome is quite serious I thought it deserved it's own issue.
Thank you very much for the detailed bug report. I can reproduce this.
Maybe the very large number of symlinks in /sys is be the problem (symlinks to folders containing symlinks to folders to ...),
I am pretty sure this is the case. It's not just an explosion of the number of symlinks. There are loops, too. find reports these:
> cd /sys
> find -L -name '*something*'
[...]
find: File system loop detected; ‘./class/thermal/cooling_device1/device/physical_node/node0/cpu7/firmware_node/subsystem/devices/PNP0C14:00/physical_node/driver/module/holders/ideapad_laptop/drivers/platform:ideapad_acpi/VPC2004:00/rfkill/rfkill0/subsystem/rfkill2/device/device/leds/phy0-led/subsystem/input25::kana/device/device/driver/0003:046D:C318.0004/subsystem/devices/0003:09DA:9090.0001/input/input22/subsystem/input14/device/controlC0/subsystem/pcmC1D0c/device/device/driver/0000:00:03.0/subsystem/devices/0000:00:1f.2/ata1/host0/subsystem/devices/target2:0:0/2:0:0:0/block/sda/device’ is part of the same file system loop as ‘./class/thermal/cooling_device1/device/physical_node/node0/cpu7/firmware_node/subsystem/devices/PNP0C14:00/physical_node/driver/module/holders/ideapad_laptop/drivers/platform:ideapad_acpi/VPC2004:00/rfkill/rfkill0/subsystem/rfkill2/device/device/leds/phy0-led/subsystem/input25::kana/device/device/driver/0003:046D:C318.0004/subsystem/devices/0003:09DA:9090.0001/input/input22/subsystem/input14/device/controlC0/subsystem/pcmC1D0c/device/device/driver/0000:00:03.0/subsystem/devices/0000:00:1f.2/ata1/host0/subsystem/devices/target2:0:0/2:0:0:0’.
[...]
Interestingly, find also seems to have something like a maximum number of redirections through symlinks ("Too many levels of symbolic links"):
find: ‘./class/thermal/cooling_device1/device/physical_node/node0/cpu7/firmware_node/subsystem/devices/PNP0C14:00/physical_node/driver/module/holders/ideapad_laptop/drivers/platform:ideapad_acpi/VPC2004:00/rfkill/rfkill0/subsystem/rfkill2/device/device/leds/phy0-led/subsystem/input25::kana/device/device/driver/0003:046D:C318.0004/subsystem/devices/0003:09DA:9090.0001/input/input22/subsystem/input14/device/controlC0/subsystem/pcmC1D0c/device/device/driver/0000:00:03.0/subsystem/devices/0000:00:1f.2/ata1/host0/scsi_host/host0/subsystem/host5/device/target5:0:0/5:0:0:0/block/sdb/device’: Too many levels of symbolic links
Still, even with these checks in place, find -L -name '*something*' also does not seem to finish, when run from /sys.
In principle (as I understand it), loops should be detected by the ignore crate, but we should double-check if we handle this correctly in fd. In a small experiment with the following structure, fd did not end up in an infinite loop:
▶ tree
.
└── bar
└── foo -> ..
2 directories, 0 files
So something must be different inside /sys. Maybe it is related to the fact that these files do not really exist, but are created by the kernel "on the fly".
Interestingly,
findalso seems to have something like a maximum number of redirections through symlinks ("Too many levels of symbolic links"):
This is just ELOOP coming back from the kernel. path_resolution(7) says the limit on modern kernels is 40.
So something must be different inside
/sys. Maybe it is related to the fact that these files do not really exist, but are created by the kernel "on the fly".
I don't believe there is much special about /sys except for symlinks that point back up the tree very high, and high fanout. Cycle detection in find only triggers when the directory it's examining is the same as one of its parents. Theoretically this is enough to avoid going infinite, but we can still get combinatorial blowup in the number of paths. Imagine this directory structure:
root/i/j/link -> ../../(i + 1)%10
so root/1/2/link points to root/2 etc. That's only 211 files, but it's well over 100,000,000,000 cycle-free traversals, since you only get a loop after you follow 10 links (root/A/B/link/C/link/.../K/link is a loop, but nothing before it is).
/sys seems to have a similar thing going on with traversals like
/sys/class/thermal/cooling_device13/device/physical_node/driver/cpu21/firmware_node/subsystem/devices/device:6b/physical_node/subsystem/devices/0000:00:17.0/firmware_node/device:b2/physical_node/ata_device/dev4.0/subsystem/dev3.0/device/firmware_node/subsystem
Something that might be nice is a "visit files only once" mode, that remembers which files (by identity, i.e. st_dev/st_ino pair on *nix) have already been seen.
There's some discussion about that in #318
I have been thinking about this for some time.
I currently don't want to implement a "visit files only once" mode, as it would require proper synchronization across threads with a global set of visited device/inode pairs (on Unix. No idea about Windows). This would (1) increase code complexity and (2) might have a performance impact, as the global set has to be managed on the receiver side (in a single thread).
However, if someone wants to experiment with this, I have an implementation of this exact concept in https://github.com/sharkdp/diskus/
A much simpler (but also more "hacky") solution would be the following: Let's assume that it is very unlikely that someone wants to find something within /sys without explicitly using /sys as a search path. If this assumption is correct, we could ignore the /sys directory by default IF --follow is used AND /sys is not explicitly listed as a search root. Similar to what rg -uuu does for binary files, we could use the third occurence of -u/--unrestricted to lift that ban of /sys and include it in this case.
It looks like this has been resolved in v7.5.0 for some (yet unknown) reason:
❯ /usr/bin/time -f "%M" fd-v7.4.0 --follow --max-depth 8 non-existing /sys
741792
❯ /usr/bin/time -f "%M" fd-v7.5.0 --follow --max-depth 8 non-existing /sys
7600
Timing is still the same though:
| Command | Mean [s] | Min [s] | Max [s] | Relative |
|:---|---:|---:|---:|---:|
| fd-v7.4.0 --follow --max-depth 7 non-existing /sys | 23.867 ± 0.341 | 23.603 | 24.252 | 1.03 ± 0.03 |
| fd-v7.5.0 --follow --max-depth 7 non-existing /sys | 23.077 ± 0.462 | 22.556 | 23.438 | 1.00 |
Since it's no longer hogging up all my ram and causing the system to crash when I run the command I consider this fixed :slightly_smiling_face:
I actually looked into this a bit and it seems to be another instance of an unexpected performance ~regression~ improvement!
If I compile fd v7.4.0 with Rust 1.35, I get this:
❯ /usr/bin/time -f "%M" fd --follow --max-depth 6 non-existing /sys
62104
If I compile it with Rust 1.36 or newer, I get this:
❯ /usr/bin/time -f "%M" fd --follow --max-depth 6 non-existing /sys
7232
Rust 1.36 introduced some changes concerning memory management, so this seems plausible at least (https://blog.rust-lang.org/2019/07/04/Rust-1.36.0.html). But I have no idea what exactly caused the previous memory consumption.
Most helpful comment
It looks like this has been resolved in v7.5.0 for some (yet unknown) reason:
Timing is still the same though:
| Command | Mean [s] | Min [s] | Max [s] | Relative |
|:---|---:|---:|---:|---:|
|
fd-v7.4.0 --follow --max-depth 7 non-existing /sys| 23.867 ± 0.341 | 23.603 | 24.252 | 1.03 ± 0.03 ||
fd-v7.5.0 --follow --max-depth 7 non-existing /sys| 23.077 ± 0.462 | 22.556 | 23.438 | 1.00 |