Ripgrep: Expand glob paths on Windows

Created on 13 Nov 2016  路  22Comments  路  Source: BurntSushi/ripgrep

On Windows, glob expansion is not done by the shell (cmd.exe), but left to the individual program.
That means, that currently something like this doesn't work:

>rg PATTERN *.txt
*.txt: The filename, directory name, or volume label syntax is incorrect. (os error 123)
No files were searched, which means ripgrep probably applied a filter you didn't expect. Try running again with --debug.

It tries to open a file called *.txt, which obviously doesn't exist.

icebox question

All 22 comments

What do other command line tools like grep do?

grep does work as expected for me.
I installed it with cygwin though, so I'm not sure if this is a cygwin or grep feature.

But all native Windows tools I know do the expansion themselves, at least if it makes sense to expand.

Usually you just use FindFirstFile/FindNextFile for this.
Not sure if it uses the exact same rules as unix glob.

_sigh_ Indeed, it looks like globbing is done as part of the command line program: https://cygwin.com/ml/cygwin/2009-12/msg01097.html

Other instances of the same problem:

I think what this means is that I need to add a glob iterator to globset. Alternatively, we could use the existing iterator in the glob crate, but it doesn't support {a,b} syntax and gets some non-UTF-8 corner cases wrong.

I think what this means is that I need to add a glob iterator to globset. Alternatively, we could use the existing iterator in the glob crate, but it doesn't support {a,b} syntax and gets some non-UTF-8 corner cases wrong.

The other other alternative is to use the standard Windows APIs for this. It seems like the kinds of globs it supports are not as nice as Unix-style globbing, but perhaps that's what Windows users expect, so it could be defensible to use that.

I don't anticipate working on this soon unfortunately.

I thinks using the Windows API is better than nothing and probably easier to implement.
I might give it a try if I find the time.

@troplin Thanks! If you do, I would hope to see the Windows logic put behind a separate crate. :-) (Which could live inside ripgrep, or could be yours to maintain.)

Question: should ripgrep support Unix-style globbing here, or should it use the standard FindFirstFile/FindNextFile APIs, presumably to be consistent with other Windows CLI tools?

cc @retep998 @roblourens

Do the windows APIs support anything besides * and ?? It's not clear.

This doesn't affect vscode scenarios, but as a CLI user, I'd prefer more powerful patterns.

How about -g <GLOB> using Unix-style globs (as it does already) and just <GLOB> falling back to Windows API?

Basically there is already such behavior separation between ripgrep and shell globbing on Linux. And Windows CLI is just an "inferior shell" one might argue. So it would be natural for (emulated) "shell globbing" to work in the shell native way.

FYI:
cmd.exe does NOT support anything but * and ? (specifically NOT [a-z] character classes).
PowerShell does support the character classes as part of "wildcards" (what it calls Globbing.

Current -g switch in RipGrep seems to work fine with ?, *, [class]

It does NOT work without the -g (which is as designed but a bit unexpected for a Windows only user.)

Worth putting out a Windows specific warning when the last thing on the line is *.txt or similar and nothing to search, instead of the current warning:

*.txt: The filename, directory name, or volume label syntax is incorrect. (os error 123)

Maybe add,

use '-g FilePattern' for globbing (wildcard file patterns)

+1 for @HerbM's suggestion about mentioning the -g flag with the error message. This would have saved me time Googling how to grep against a file pattern, to eventually find this Github issue.

I've hit this issue before, but due to being in the middle of something, I've just moved over to searching in vscode instead of using RipGrep. I'm guessing that plenty of other Windows users have hit the same issue. Updating the error message would resolve this and make it immediately obvious how to do what the user was trying to do.

ps. Awesome tool btw! This is the only issue I've hit - other than that, it's amazing! Thanks for the great work!

@dracan Thanks for the feedback! Funny note though: VS Code's search uses ripgrep. :)

Definite +1 for yelling at the operator about -g, at the very least.

Counterpoint though:

findstr PATTERN *.txt

Functions as one would expect with glob-expansion ... as does ag.

I know windows path expansions is a pain in the behind, but there's a case to be made for functional parity when even findstr manages it :D

Folks, +1 comments are not all that useful. Instead of basically just saying "me too", it would help if Windows users could answer questions I've asked. For example: https://github.com/BurntSushi/ripgrep/issues/234#issuecomment-362597070

So I suppose you didn't like my proposal from https://github.com/BurntSushi/ripgrep/issues/234#issuecomment-424727971 ?

I have two answers to my question do far. Both of them are different. Yours is one of them. More input from others would be great. In particular, I would love to hear how existing cli utilities work. Do they use standard unix globbing? Or shell native globbing? And more importantly, do users actually like that?

I believe most utilities use FindFirstFile / FindNextFile. For sure the standard ones like findstr.
But also e.g. pcre2grep and git (see https://github.com/git/git/blob/master/compat/win32/dirent.c).

IMO "consistency" is more important than "liking". So I would vote for using shell native globbing with FindFirstFile / FindNextFile on Windows.
However, as long as a special -g parameter is considered it would be fine IMO to make it behave consistently across all OSes (i.e. use Unix globbing).

findstr has limitations - it supports file globbing but not directory. findstr /s PATTERN path/to/*.file works but findstr /s PATTERN path/*/*.file doesn't.

That would be an acceptable minimum but I would strongly prefer unix parity here so that tools that rely on ripgrep don't need to split code paths for windows.

Question: should ripgrep support Unix-style globbing here, or should it use the standard FindFirstFile/FindNextFile APIs

As a Windows user, IMO just * and ?. Reasons:

  • They are standard for Windows tools. It's what the MS C runtime does when building argv, so command line users will be used to it.
  • They are invalid filename characters, so you don't need to worry about escaping mechanisms (which is good, as backslash is unavailable due to it being the path separator).

I'd argue for globbing in all elements of the path, not just on the leaf element, and supporting ** meaning "one or more levels of subdirectory", but these are nice to have rather than essential, and are not standard for the CRT, so it's entirely justifiable not to offer them.

Unix style globbing is powerful, and might be a bit more consistent cross-platform, but (a) you'd have to add an escaping mechanism (see above) and (b) not all Unix shells use the exactly same globbing syntax, so it's still not entirely portable. I dont think it's worth it, personally.

@pfmoore Thanks for the feedback! I think the two choices here are "support Windows-style globbing using the corresponding winapi calls" or "support Unix-style globbing." I don't think I'm willing to do some inbetween state.

Also, note that ripgrep already supports Unix-style globbing on Windows with the -g/--glob flag, along with the various gitignore support. This is why the underlying glob library permits disabling backslash as an escape, which is indeed disabled on Windows by default. Moreover, globs already have an additional escaping mechanism built in. e.g., You can use [*] to refer to a literal *.

I have filed issue #1667 after browsing open issues which brought me here.

IMHO the best option would be for ripgrep to behave

  • Windows-like when running from CMD or PowerShell
  • UNIX-like when running from Git Bash (MINGW64) or CygWin
    including input and output of path seperators, nor only globbing mechanisms.

So, this is no inbetween state - it is just checking for the type of environment (shell) and behaving accordingly.

There's no RUST for OS/2, but there we have the same situation, the normal OS/2 CMD (or 4OS2) and a UNIX-like environment with Dash (instead of Bash).

Was this page helpful?
0 / 5 - 0 ratings

Related issues

timotheecour picture timotheecour  路  3Comments

fcantournet picture fcantournet  路  3Comments

lexicalunit picture lexicalunit  路  3Comments

crumblingstatue picture crumblingstatue  路  3Comments

kenorb picture kenorb  路  3Comments