Fd: Discussion: show Git-ignored files by default?

Created on 7 Jun 2020  路  32Comments  路  Source: sharkdp/fd

Since fd was first published, the feature to hide Git-ignored files by default has always been controversial. It's the number one pitfall for new users, as witnessed by the numerous issues that have been opened over time (even though this is the first point in the Troubleshooting section). Even experienced users will likely run into this from time to time.

We have had past discussions about this (see #179, #220, #18), but I'm not so sure anymore if this default is the best possible option for the "average user".

I thought it might make sense to discuss this again and see what others think. Whatever we choose as the default, it will always be easy for users to select a different default via an alias.

Pro current behavior (do not show .gitignored entries by default):

  • Most searches are faster if we take .gitignore files into account. .gitignored directories tend to contain huge amounts of automatically generated build artifacts or downloaded dependency files. Pruning these directories from the search tree typically results in a faster search overall. There are counterexamples to this where the parsing of long .gitignore files takes longer than actually traversing these directories.
  • Most of the time, .gitignored results are not "interesting" to the user (however, see counterpart below).
  • When running fd without any arguments, I typically don't want to see .gitignored files.

Cons:

  • It can be very confusing to (new) users. If 10% of users go so far as to create a ticket on GitHub to ask about their problem, there must be hundreds of users that ran into this problem at some point.
  • Even if you know about the default, it can be annoying to repeat the search because you forgot to add -I or -u. There are a lot of valid use cases where users are - in fact - interested in results from ignored directories or files. Personally, I would estimate that I use -uu or -HI in roughly 20% of my searches, which is quite high.
help wanted question

Most helpful comment

Just to throw my opinion into the ring. I'm in favor of changing the default.

When I use fd with no options/arguments other than a pattern, 99% of the time I'm just using it to quickly narrow down the list of files I need to look at to find what I want. In that case I'm okay if I get some things that I don't care about in the search, but I'm much more annoyed if I don't find something that's actually there because I forgot to specify that I wanted to search .gitignored files as well. @sharkdp said that he adds -H or -u around 20% of the time, meaning those flags mattered 20% of the time. But I'm willing to bet that if those flags were enabled by default then they would have to be disabled much less than 20% of the time.

Also, from a scripting/reducing noise perspective, normally when when I'm doing something more precise than just quickly narrowing down a list of files, I'm more willing to add flags and check the documentation in order to narrow down the search results to be only what I care about.

And concerning adding an fdg binary (or symlink), I don't see how that's better than just adding an optional flag. It feels like it would complicate CI and packaging a lot for something that essentially just flips a flag on by default.

All 32 comments

I want to add that the nature of the files in .gitignore depends a lot on the nature of our projects. In my case for example most of the time the ignored files are files with sensitive information (not crap) that I want to be able to search with fd.

But I understand that for other people often the files in .gitignore have to be ignored by fd as well.

For this reason the desired default behavior is not the same for everyone.

In my opinion, the default behavior should be "search all", because it's easier to figure out why there are too many results than it is to figure out why there are missing results.

But such a change in behavior will not be backward compatible, which is never good. To overcome this, people must be allowed to easily return to the old default behavior. Hence the need to be able to configure the default behavior of fd (#362).

because it's easier to figure out why there are too many results than it is to figure out why there are missing results.

I think this is an excellent point.

But such a change in behavior will not be backward compatible, which is never good.

Agreed. This is probably the main reason why I never changed the behavior. Still, if it should turn out that 90% of users would like a different default, I'm more than willing to make a breaking change.

To overcome this, people must be allowed to easily return to the old default behavior. Hence the need to be able to configure the default behavior of fd (#362).

Okay, I have reopened the ticket once again. Let's discuss this aspect in #362. There are other ways to configure the default behavior as well (aliases, wrappers, environment variables).

Personally, I think the pros outweigh the cons. But setting up an alias is easy, so I wouldn't be too upset if this changed.

I am a new user and think this is a cool feature but I also think it should be the default as it is not obvious from any short description and is not intuitive.

"Powerusers" could easily setup an alias as was commented earlier.

I would add my vote to search all files except hidden files by default.

Hello.

I'd like to make a point that fd is a general-purpose file-searching utility that is not git specific, so having it to take into account .gitignore files, laying around in the filesystem, does not feel right.
In fact, I've stumbled upon this issue the very first time I tried fd: I've tried to find something, starting from a non-git directory in subdirectories which happened to be git repos and found nothing, although I knew it was there. After that, the very next thing I did is patched fd locally, so it wouldn't read .gitignore files by default.

The dilemma

I think that for this issue and for #362 the question is:

Should fd behaves as "general" or "git-style" tool ?

What I mean by this is summarized in the following table

| | general tool | git style tool |
|:--------------|:-------------:|:-----------------------------------:|
| examples | find,grep | git,rg |
| ignore files | no | yes |
| configuration | flags only | flags, files, environment variables |

Actually fd is in between the two worlds. And respecting the ignore files without the configuration feature is bad, IMO. So fd should choose the red pill or the blue pill ;)

My proposal

Or may be we don't have to choose and we can have both.

I think that :

  • fd should be a general tool (by default)
  • there should be a flag --git that makes it act in a "git style"
  • fdg should be another compiled version from the same code but with different default behavior, that is equivalent to fd --git and heaving --no-git flag to switch it to "general tool".

My arguments for creating the additional fdg are the following:

  • it should be easy to make the automation tool to build two versions, one with --no-git option as default, the other with --git as default.
  • the alias solution discussed already in #362 is very "shell specific" :

    • there is no general alias that can work in all shells (powershell and cmd included)

    • every time we download fd we should setup the aliases, so the advantage of "no install, just download and use it" is not valid any more

    • aliases are not working when you wan to make a general script (let's say in python) using fd

  • this solve the backward compatibility with ignore files as any user can decide simply to switch to fdg or even to rename fdg fo fd and continue using it as before (respecting the ignore files by default)
  • this allows to solve not only this issue but also #362 and all other issues that ask to change the default behavior of fd (this would be done using the configuration file in fdg)
  • having both versions will make the supporters on both sides ("pro-general" and the "pro-git") happy

I like that idea, with one caveat. Instead of having a separately compiled version of fd, fdg should just be a symlink to fd, and fd check the name that it is called with, and if it is "fd" use the general behavior, and if it is "fdg" use the "git" behavior. Or alternatively distribute OS-specific wrapper scripts for fdg (for example that does something like exec fd --ignore, or whatever the windows equivalent is).

@tmccombs The problem with all workarounds : aliases, scripts, symlinks, ... are the same :

  • they are all system/shell specific
  • in general they are ok in the linux world but not in windows
  • you should configure all this before to start using the tool
  • they can generate additional problems that will create more issues here

Did you have arguments against heaving fdg in addition ? Once the CI/CD tool configured to compile both tools (for all systems) this do not create more work for the maintainers.

The only drawback I see is that having two versions can confuse the novice user which one to take. But this can be easily solved by promoting fd and talking about fdg only in a section at the end of the README.

Sorry, I am suggesting that the symlink or shell would be distributed with fd. Having separate, but nearly identical binaries means the package is larger (longer to download and more space on disk). fd is small enough it's probably not that big of a deal for most people, but it still feels wrong to me.

Once the CI/CD tool configured to compile both tools (for all systems) this do not create more work for the maintainers.

It means builds take longer, which affects anyone who builds it from source.

Sorry, I am suggesting that the symlink or shell would be distributed with fd.

This means that all scripts should be tested and maintened and they will raise more issues here.
I have very bad experience with the provided scripts for windows for example with conda (python package manager).
Providing additional scripts is the end of "single executable tool".

Having separate, but nearly identical binaries means the package is larger (longer to download and more space on disk). fd is small enough it's probably not that big of a deal for most people, but it still feels wrong to me.

I don,t know what you call "the package". The builds are larger, but the sources are not. And this argument is valid for all os specific builds. If we want smallers builds it is easy : provide only one (or zero) builds and tell the peopel to build their own, but this is not very user firnedly.

Once the CI/CD tool configured to compile both tools (for all systems) this do not create more work for the maintainers.

It means builds take longer, which affects anyone who builds it from source.

No. You build only the version that you need. This is already the case for all os specific builds.
Only the CI/CD builds all the versions.

And if we have to choose only one type of build (general vs git style) which one this should be ?
Giving a good solution only for one category of users and telling to the other category "get by with scripts" is not very user friendly, IMO.

This means that all scripts should be tested and maintened and they will raise more issues here.
The script I'm proposing is extremely simple, it would just call fd with an option and the arguments passed to the script. If we had a separate executable, that would have to be tested and maintained too :).

I don,t know what you call "the package".

I mean the zip, tarball, or deb you download from the releases page, or a package you install with a package manager like apt, dnf, pacman, homebrew, winget (maybe?), etc.

Providing additional scripts is the end of "single executable tool".

From your next paragraph, I think what you mean by this is that you need to install more than just a single executable. But afaik, this is not a design goal of fd. And fd is not currently distributed as just an executable. all packages include command completions scripts, and on linux they also include man pages.

The builds are larger, but the sources are not. And this argument is valid for all os specific builds. If we want smallers builds it is easy : provide only one (or zero) builds and tell the peopel to build their own, but this is not very user firnedly.

So you're saying we would have twice as many packages on the release page and in package managers/app stores? I really don't like that idea. I think it would make it even more confusing for users to know which package to install/download, and it roughly doubles the amount of work for packagers to maintain the package for package managers like apt, dnf, pacman, etc.

No. You build only the version that you need.

You're assuming no one would want both. This also means complicates building from source, since the user now needs to be able to configure which version they want to build somehow (the OS they want is implicit in the OS they are running the build from, unless they are cross-building).

And if we have to choose only one type of build (general vs git style) which one this should be ?
Giving a good solution only for one category of users and telling to the other category "get by with scripts" is not very user friendly, IMO.

That is not what I'm suggesting.

If using a script really is such a problem for windows, why not the symlink/command name approach?

fd would have something like:

let cmdName = std::env::args().next().unwrap();
if cmdName.ends_with("fdg") {
  // use git-style default
} else {
  // use "general" default
}

Then on unix-like systems, the package would just have a symlink from fdg to fd (or vice versa). If windows doesn't have an equivalent it could just have a copy of fd called fdg, or have a seperate package if that makes more sense, and/or have instructions to rename the file in the readme depending on what functionality you want.

We already have _~/.fdignore_. Maybe this file could somehow 'include' gitignore (via something like _@~/.gitignore_ or some other character/directive)? With this approach showing git-ignored files could be enabled via default, also allowing user to add his git-ignored entries that are already in _~/.gitignore_ (or _./.git/ignore_ when inside repository) in an easy way?

Personally, I wasn't even aware that git-ignored files are omitted: https://imgur.com/a/UlLD8ED
For now my _.fdignore_ contains mostly 100% of _.gitignore_ + other patterns. It would be great to be able to 'include' the file as a whole, not to copy it's content manually.

Maybe this file could somehow 'include' gitignore (via something like @~/.gitignore or some other character/directive)

respecting .gitignore is more than just respecting a global ~/.gitignore file though. It's also uses .gitignore files in ancestor directories, and descendent directories (scoped to those directories). You _could_ have some syntax in .fdignore to enable --ignore-vcs, but that isn't really an include, so much as a flag. And we would still probably want a way to disable that with --no-ignore-vcs on the command line to overrule that, but keep the other ignores from .fdignore, which seems kind of weird to me.

FWIW, ripgrep (which I imagine a lot of fd users also use) also respects gitignore by default.

Thought dump:

git is primarily concerned about the contents of files, their state, but not their presence.
This means that .gitignore files are also about the state of the files, but not their presence.

ripgrep, just like git, also primarily concerned about the contents of files,
and this shared concern makes its choice of consideration of .gitignore files understandable,
although it could also be opt-in.

fd, on the other hand, is not concerned about the state of the files, but is concerned about their presence,
what differs from concerns of git and ripgrep, what makes its consideration of .gitignore files slightly less fitting.

fd, on the other hand, is not concerned about the state of the files, but is concerned about their presence

That's a good way of looking at it, my stance is pretty neutral now. As a long-time user I'd have no problem making an inverse alias to fdh: aliased to fd --no-ignore-vcs -H -E '.git/', but I still feel there's value in performant (I assume most fd users have git repos in their filesystems) defaults. Plus, it's already established and would take some effort to flip.

Just to throw my opinion into the ring. I'm in favor of changing the default.

When I use fd with no options/arguments other than a pattern, 99% of the time I'm just using it to quickly narrow down the list of files I need to look at to find what I want. In that case I'm okay if I get some things that I don't care about in the search, but I'm much more annoyed if I don't find something that's actually there because I forgot to specify that I wanted to search .gitignored files as well. @sharkdp said that he adds -H or -u around 20% of the time, meaning those flags mattered 20% of the time. But I'm willing to bet that if those flags were enabled by default then they would have to be disabled much less than 20% of the time.

Also, from a scripting/reducing noise perspective, normally when when I'm doing something more precise than just quickly narrowing down a list of files, I'm more willing to add flags and check the documentation in order to narrow down the search results to be only what I care about.

And concerning adding an fdg binary (or symlink), I don't see how that's better than just adding an optional flag. It feels like it would complicate CI and packaging a lot for something that essentially just flips a flag on by default.

I have an additive suggestion, which could leverage or make the suggestion obsolete: Add an according description to tldr.
rg/ripgrep has a description rg -uu pattern, which is the second result and thus searchable in 1s.

20% typing the thing would then overall still mean less time.
Bonus is that -uu could be established as use hidden github stuff or "do more work".

One client for tldr is tealdeer.

Some thoughts on a couple of points brought up in this thread, though nothing terribly new.

  1. Conflating _git_ ignore with general ignore

I'm also in favour of changing the default, because IMO paths being present .gitignore mean literally what it says on the tin - "not interesting _for the purposes of version control_" and I wish that tools like fd as well as ripgrep did not overload this definition to mean roughly "not interesting to search/scan in by default". I, like many others here, have had false negatives due to this. It's (subjectively!) a bit sad that in order to be confident in a negative search result one has to either provide extra flags, or rerun the search using a "legacy" tool.

  1. Special treatment of .gitignore with regards to other similar files

fd, or to be more precise the ignore crate that it uses, appears to only support git's ignore files, which means fd's behaviour for say Mercurial users will be different. Firefox, arguably the poster child for Rust, uses Mercurial for example.

  1. Prior art re: aliases/separate binaries

This applies to grep, and is likely Linux (or perhaps even Debian) specific:

root@9fb4e89aea1b:/# man grep | head -4
GREP(1)                                                                User Commands                                                               GREP(1)

NAME
       grep, egrep, fgrep, rgrep - print lines that match patterns

So, using aliases for commonly-used flags is at the very least nothing new.

Some thoughts on a couple of points brought up in this thread, though nothing terribly new.

1. Conflating _git_ ignore with general ignore

I'm also in favour of changing the default, because IMO paths being present .gitignore mean literally what it says on the tin - "not interesting _for the purposes of version control_" and I wish that tools like fd as well as ripgrep did not overload this definition to mean roughly "not interesting to search/scan in by default". I, like many others here, have had false negatives due to this. It's (subjectively!) a bit sad that in order to be confident in a negative search result one has to either provide extra flags, or rerun the search using a "legacy" tool.

1. Special treatment of `.gitignore` with regards to other similar files

Did you ever run grep on repos with huge binary files (>5 GB) or big amount of files ignored by .gitignore ?
Especially binary data (without newline) use linear time and that is why ripgrep has another default than grep.
For usage for fd of many, many files inside .gitignore ie compiling Linux Kernel the same argument can be made.

fd, or to be more precise the ignore crate that it uses, appears to only support git's ignore files, which means fd's behaviour for say Mercurial users will be different. Firefox, arguably the poster child for Rust, uses Mercurial for example.

Argument of authority is no technical argument on usage. And you cant make everyone happy for using the tool. Here a short catalogue for decision making:

  1. Usage consistency What type of consistency to other tools (arguments + effects) should be used? (for me that is Rust and ripgrep, if possible)?
  2. Usage purpose Ignoring build files has the purpose of supporting dev environments, where you frequently want to search for (relative) filepaths in a complex tree. (effect of clear speed win as less paths and files need to be traversed)
  3. Oriented user base If the author chooses to support such thing, when should a version control tool be supported? (market share, user base?)
  4. Clarification How should it be documented? (manual, cheat sheet, tealdeer,tldr )
  5. Technical feasability How many build files can be "hosted" (as result of codegen) on Mercurial to justify ignoring them?
  6. Usage feasability How many build files are "hosted" (as result of codegen) on Mercurial to justify ignoring them?
1. Prior art re: aliases/separate binaries

This applies to grep, and is likely Linux (or perhaps even Debian) specific:

root@9fb4e89aea1b:/# man grep | head -4
GREP(1)                                                                User Commands                                                               GREP(1)

NAME
       grep, egrep, fgrep, rgrep - print lines that match patterns

So, using aliases for commonly-used flags is at the very least nothing new.

How should this be maintained and name-clashes prevented ? cfdisk, df, efi,rfkill are already used. Do you have specific names in mind?

This applies to grep, and is likely Linux (or perhaps even Debian) specific:

root@9fb4e89aea1b:/# man grep | head -4
GREP(1)                                                                User Commands                                                               GREP(1)

NAME
       grep, egrep, fgrep, rgrep - print lines that match patterns

So, using aliases for commonly-used flags is at the very least nothing new.

On Ubuntu at least, egrep, fgrep and rgrep are simply shell scripts that run grep with the given options. I would be ok with that, or with a symlink approach. But I'd rather not have seperate but nearly identical compiled binaries.

(repo maintainer - this reply certainly cuts close to being off-topic, please feel free to remove it)

Did you ever run grep on repos with huge binary files (>5 GB) or big amount of files ignored by .gitignore ?
Especially binary data (without newline) use linear time and that is why ripgrep has another default than grep.
For usage for fd of many, many files inside .gitignore ie compiling Linux Kernel the same argument can be made.

  • grep ignores binaries by default as well
  • most repos don't contain 5GB binaries, nor nearly as many build artifacts as the LInux kernel

I'd like to refer back to the original issue, which talks about picking the best default for the "average user", not about removing support for ignoring based on .gitignore altogether. Overall, the ability to piggyback on .gitignore is tremendously helpful. However, I have been bitten by it myself and have seen others in the same situation, hence my _personal_ stance of "don't look at .gitignore by default". It's an anecdotal account and you can absolutely find more valid critiques against it, but I don't think that's gonna advance the overall discussion much more.

Argument of authority is no technical argument on usage. And you cant make everyone happy for using the tool. Here a short catalogue for decision making:

My argument (2) is definitely a weak one, and you're right that it's impossible to make everyone happy (e.g. look at us having this very discussion!). I mentioned it because fd (ripgrep is arguably more guilty, but that's offtopic) talks about version control ignore files in a general sense, but in fact supports only git. From fd --help:

  • "<...>that would otherwise be ignored by '.*ignore' files"
  • the very name of --no-ignore-vcs flag

Again, this is only a quibble.

How should this be maintained and name-clashes prevented ? cfdisk, df, efi,rfkill are already used. Do you have specific names in mind?

I don't have any specific suggestions here. My aim is only to highlight that using aliases shell scripts to invoke certain flags is something that's already ships with popular Linux distros.

fd (ripgrep is arguably more guilty, but that's offtopic) talks about version control ignore files in a general sense, but in fact supports only git

I believe this is for forwards compatibility. So if at some future point the ignore crate adds support for additional VCS systems (such as mercurial), then the --no-ignore-vcs flag will just work, without having to add a new --no-ignore-hg flag and similar for each VCS system that is added.

* grep ignores binaries by default as well

* most repos don't contain 5GB binaries, nor nearly as many build artifacts as the LInux kernel

Not, when you did not write a binary-data header or alike.

My argument (2) is definitely a weak one, and you're right that it's impossible to make everyone happy (e.g. look at us having this very discussion!). I mentioned it because fd (ripgrep is arguably more guilty, but that's offtopic) talks about version control ignore files in a general sense, but in fact supports only git. From fd --help:

* "<...>that would otherwise be ignored by '.*ignore' files"

* the very name of `--no-ignore-vcs` flag

Again, this is only a quibble.

True.
@sharkdp
Why is checking.*ignore* for files/folders to ignore not possible? The filepaths needs to be checked against the .gitignore anyway or is there a limit to 1 or 2?

I don't have any specific suggestions here. My aim is only to highlight that using ~aliases~ shell scripts to invoke certain flags is something that's already ships with popular Linux distros.

I dont like them at all and prefer shell aliases.

If you do make this change, _please_ consider using separate flags for .gitignore, .fdignore, etc. I have run into valid use cases for (observe .fdignore, ignore .gitignore) and visa-versa.

Examples:
-I/--ignore -- Ignore file patterns in .gitignore and .fdignore
-Ig/--ignore-gitignore -- Ignore file patterns in .gitignore
-If/--ignore-fdignore -- Ignore file patterns in .fdignore
-N/--no-ignore -- do Not ignore file patterns in .gitignore and .fdignore
-Ng/--no-ignore-gitignore -- do Not ignore file patterns in .gitignore
-Nf/--no-ignore-fdignore -- do Not ignore file patterns in .fdignore

Unfortunately these all use double-negatives, and there is a potential confusion about the double-meaning of "ignore .gitignore" (ignoring the .gitignore file and ignoring the files within it have opposite meaning). Other terms that may be less confusing:

  • E/N - --Exclude-from (exclude files listed in .{}ignore), do Not exclude
  • U/N - Use, do Not use

There is precedent for fine-grained ignore params in ripgrep:
(--no-ignore-dot, --no-ignore-global, --no-ignore-vcs, etc.)

[ If the above comment about supporting non-git repositories is implemented, then Ig might become Iv (for vcs) ]

I agree with having more granular control over ignore files. However, I'd rather use flags consistent with what ripgrep uses, both because of consistentency and because I think they are a little less confusing.

I think the point of having a tool like this is that it's opinionated. If I have to start adding flags to reach the default behavior/length of find, I might as well use find

Like rg and the rest of the modern tools, what makes them great is their defaults. If the only advantage is a very minor speed bump, people would just use the preinstalled tools they already know.

The fact that it ignores hidden and gitignored files is in the main bullet point feature list. If one doesn't bother to read that...

Wow, thanks for pinning this issue. I manage my dotfiles by creating a git repo in ~ and then adding * to ~/.gitignore. I was _so_ confused at why I wasn't getting any results in my home directory.

I think we should follow the convention from grep and git grep:

  • fd does not exclude files from .gitignore
  • git fd excludes files from .gitignore

Workarounds: I need to make sure that any time fd is invoked (by any script!) it's called with --no-ignore-vcs. I'd love to be able to set this in a configuration file or environment variable.

Wow, thanks for pinning this issue. I manage my dotfiles by creating a git repo in ~ and then adding * to ~/.gitignore. I was _so_ confused at why I wasn't getting any results in my home directory.

I think we should follow the convention from grep and git grep:

* `fd` does not exclude files from `.gitignore`

* `git fd` excludes files from `.gitignore`

Workarounds: I need to make sure that any time fd is invoked (by any script!) it's called with --no-ignore-vcs. I'd love to be able to set this in a configuration file or environment variable.

This is not a convention.
git grep is a subcommand of git while grep is a completely different programm.

This is not a convention.

This is a convention -- the git-grep binary is provided by the Git project, but from the perspective of a user it's clear that git-grep takes into account .gitignore whereas grep does not. I'm suggesting that Git-specific usage should be provided under git-fd rather than fd. In summary:

  • foo -- should aim to be as general-purpose as practical
  • git-foo -- should aim to be as Git-specific as practical

git grep is a subcommand of git while grep is a completely different programm.

Yes, I agree.

fd does not exclude files from .gitignore
git fd excludes files from .gitignore

I have a few issues with this:

  1. git grep works very differently than fd. While git grep runs grep on each file that is checked in (basically equivalent to git ls-files | xargs grep), fd does file traversal itself, in parallel, which s where you get a lot of the speed.
  2. git subcommands are generally implied to be operating on a single git repository. But I sometimes want to use fd (and rg) in a directory that contains multiple git repos and still respect .gitignore files.
  3. It means adding a new executable that needs to be installed, and two levels of indirection when calling it.

In short, I don't think it is any better of an option than making --no-ignore-vcs the dfault rather than --ignore-vcs.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

mathomp4 picture mathomp4  路  4Comments

kclevenger picture kclevenger  路  3Comments

nishithkhanna picture nishithkhanna  路  4Comments

hungptit picture hungptit  路  3Comments

blueray453 picture blueray453  路  3Comments