Fd: Heavy resource usage with "--follow" flag and /sys

Created on 31 Dec 2018  ·  8Comments  ·  Source: sharkdp/fd

If I run fd --follow something / (on Linux) fd will use 100% of my CPU and slowly allocate more and more ram until the system runs out and stops responding (unless I kill fd).

Killing fd after a few minutes frees a ton of ram:

fd resource usage 2

Other details

  • fd --follow --hidden something ~ causes it too, if the user has installed Wine, since that creates a symlinks to / from .wine/dosdevices/z:.
  • Without --follow it only takes a second
  • If I exclude /sys (but not /proc) it completes in ~10 seconds.
  • With --max-depth 10 it takes ~3:40 minutes (821,92s user 1006,11s system 819% cpu 3:42,93 total). Ram allocation peaks about halfway, and then starts freeing up ram.

    • Adding --threads 1 (on a system with 8 vcores) oddly makes it run significantly faster :open_mouth: ~2:30m (78,85s user 99,34s system 119% cpu 2:29,18 total). It peaks at about the same amount of RAM though.

I'm using fd 7.2.0 and /proc isn't the problem, so this isn't #288, though the symptoms are nearly identical. Maybe the very large number of symlinks in /sys is the problem (symlinks to folders containing symlinks to folders to ...), causing an even much larger number of full paths that the ignore crate needs to store? Maybe this warrants gigabytes of ram allocated?

I'm not sure if this is to be considered a bug or just an unfortunate combination, but since the outcome is quite serious I thought it deserved it's own issue.

bug help wanted

Most helpful comment

It looks like this has been resolved in v7.5.0 for some (yet unknown) reason:

❯ /usr/bin/time -f "%M" fd-v7.4.0 --follow --max-depth 8 non-existing /sys
741792

❯ /usr/bin/time -f "%M" fd-v7.5.0 --follow --max-depth 8 non-existing /sys
7600

Timing is still the same though:

| Command | Mean [s] | Min [s] | Max [s] | Relative |
|:---|---:|---:|---:|---:|
| fd-v7.4.0 --follow --max-depth 7 non-existing /sys | 23.867 ± 0.341 | 23.603 | 24.252 | 1.03 ± 0.03 |
| fd-v7.5.0 --follow --max-depth 7 non-existing /sys | 23.077 ± 0.462 | 22.556 | 23.438 | 1.00 |

All 8 comments

Thank you very much for the detailed bug report. I can reproduce this.

Maybe the very large number of symlinks in /sys is be the problem (symlinks to folders containing symlinks to folders to ...),

I am pretty sure this is the case. It's not just an explosion of the number of symlinks. There are loops, too. find reports these:

> cd /sys
> find -L -name '*something*'
[...]
find: File system loop detected; ‘./class/thermal/cooling_device1/device/physical_node/node0/cpu7/firmware_node/subsystem/devices/PNP0C14:00/physical_node/driver/module/holders/ideapad_laptop/drivers/platform:ideapad_acpi/VPC2004:00/rfkill/rfkill0/subsystem/rfkill2/device/device/leds/phy0-led/subsystem/input25::kana/device/device/driver/0003:046D:C318.0004/subsystem/devices/0003:09DA:9090.0001/input/input22/subsystem/input14/device/controlC0/subsystem/pcmC1D0c/device/device/driver/0000:00:03.0/subsystem/devices/0000:00:1f.2/ata1/host0/subsystem/devices/target2:0:0/2:0:0:0/block/sda/device’ is part of the same file system loop as ‘./class/thermal/cooling_device1/device/physical_node/node0/cpu7/firmware_node/subsystem/devices/PNP0C14:00/physical_node/driver/module/holders/ideapad_laptop/drivers/platform:ideapad_acpi/VPC2004:00/rfkill/rfkill0/subsystem/rfkill2/device/device/leds/phy0-led/subsystem/input25::kana/device/device/driver/0003:046D:C318.0004/subsystem/devices/0003:09DA:9090.0001/input/input22/subsystem/input14/device/controlC0/subsystem/pcmC1D0c/device/device/driver/0000:00:03.0/subsystem/devices/0000:00:1f.2/ata1/host0/subsystem/devices/target2:0:0/2:0:0:0’.
[...]

Interestingly, find also seems to have something like a maximum number of redirections through symlinks ("Too many levels of symbolic links"):

find: ‘./class/thermal/cooling_device1/device/physical_node/node0/cpu7/firmware_node/subsystem/devices/PNP0C14:00/physical_node/driver/module/holders/ideapad_laptop/drivers/platform:ideapad_acpi/VPC2004:00/rfkill/rfkill0/subsystem/rfkill2/device/device/leds/phy0-led/subsystem/input25::kana/device/device/driver/0003:046D:C318.0004/subsystem/devices/0003:09DA:9090.0001/input/input22/subsystem/input14/device/controlC0/subsystem/pcmC1D0c/device/device/driver/0000:00:03.0/subsystem/devices/0000:00:1f.2/ata1/host0/scsi_host/host0/subsystem/host5/device/target5:0:0/5:0:0:0/block/sdb/device’: Too many levels of symbolic links

Still, even with these checks in place, find -L -name '*something*' also does not seem to finish, when run from /sys.

In principle (as I understand it), loops should be detected by the ignore crate, but we should double-check if we handle this correctly in fd. In a small experiment with the following structure, fd did not end up in an infinite loop:

▶ tree
.
└── bar
    └── foo -> ..

2 directories, 0 files

So something must be different inside /sys. Maybe it is related to the fact that these files do not really exist, but are created by the kernel "on the fly".

Interestingly, find also seems to have something like a maximum number of redirections through symlinks ("Too many levels of symbolic links"):

This is just ELOOP coming back from the kernel. path_resolution(7) says the limit on modern kernels is 40.

So something must be different inside /sys. Maybe it is related to the fact that these files do not really exist, but are created by the kernel "on the fly".

I don't believe there is much special about /sys except for symlinks that point back up the tree very high, and high fanout. Cycle detection in find only triggers when the directory it's examining is the same as one of its parents. Theoretically this is enough to avoid going infinite, but we can still get combinatorial blowup in the number of paths. Imagine this directory structure:

root/i/j/link -> ../../(i + 1)%10

so root/1/2/link points to root/2 etc. That's only 211 files, but it's well over 100,000,000,000 cycle-free traversals, since you only get a loop after you follow 10 links (root/A/B/link/C/link/.../K/link is a loop, but nothing before it is).

/sys seems to have a similar thing going on with traversals like

/sys/class/thermal/cooling_device13/device/physical_node/driver/cpu21/firmware_node/subsystem/devices/device:6b/physical_node/subsystem/devices/0000:00:17.0/firmware_node/device:b2/physical_node/ata_device/dev4.0/subsystem/dev3.0/device/firmware_node/subsystem

Something that might be nice is a "visit files only once" mode, that remembers which files (by identity, i.e. st_dev/st_ino pair on *nix) have already been seen.

There's some discussion about that in #318

I have been thinking about this for some time.

I currently don't want to implement a "visit files only once" mode, as it would require proper synchronization across threads with a global set of visited device/inode pairs (on Unix. No idea about Windows). This would (1) increase code complexity and (2) might have a performance impact, as the global set has to be managed on the receiver side (in a single thread).

However, if someone wants to experiment with this, I have an implementation of this exact concept in https://github.com/sharkdp/diskus/

A much simpler (but also more "hacky") solution would be the following: Let's assume that it is very unlikely that someone wants to find something within /sys without explicitly using /sys as a search path. If this assumption is correct, we could ignore the /sys directory by default IF --follow is used AND /sys is not explicitly listed as a search root. Similar to what rg -uuu does for binary files, we could use the third occurence of -u/--unrestricted to lift that ban of /sys and include it in this case.

It looks like this has been resolved in v7.5.0 for some (yet unknown) reason:

❯ /usr/bin/time -f "%M" fd-v7.4.0 --follow --max-depth 8 non-existing /sys
741792

❯ /usr/bin/time -f "%M" fd-v7.5.0 --follow --max-depth 8 non-existing /sys
7600

Timing is still the same though:

| Command | Mean [s] | Min [s] | Max [s] | Relative |
|:---|---:|---:|---:|---:|
| fd-v7.4.0 --follow --max-depth 7 non-existing /sys | 23.867 ± 0.341 | 23.603 | 24.252 | 1.03 ± 0.03 |
| fd-v7.5.0 --follow --max-depth 7 non-existing /sys | 23.077 ± 0.462 | 22.556 | 23.438 | 1.00 |

Since it's no longer hogging up all my ram and causing the system to crash when I run the command I consider this fixed :slightly_smiling_face:

I actually looked into this a bit and it seems to be another instance of an unexpected performance ~regression~ improvement!

If I compile fd v7.4.0 with Rust 1.35, I get this:

❯ /usr/bin/time -f "%M" fd --follow --max-depth 6 non-existing /sys
62104

If I compile it with Rust 1.36 or newer, I get this:

❯ /usr/bin/time -f "%M" fd --follow --max-depth 6 non-existing /sys
7232

Rust 1.36 introduced some changes concerning memory management, so this seems plausible at least (https://blog.rust-lang.org/2019/07/04/Rust-1.36.0.html). But I have no idea what exactly caused the previous memory consumption.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ariecattan picture ariecattan  ·  3Comments

longcp picture longcp  ·  3Comments

sharkdp picture sharkdp  ·  4Comments

ChengCat picture ChengCat  ·  3Comments

matu3ba picture matu3ba  ·  4Comments