Literal file sets end up parsed as globs: so the following are currently equivalent:
sources=globs('*.scala')
sources=['*.scala']
Parsing literal filesets as globs is not how things used to work, but it raises an interesting question: given that glob syntax supports escapes, is there any advantage to having special syntax for globs anymore?
We "mostly" support a subset of the .gitignore syntax, although as mentioned below we would need to add support for excludes via !: ie:
sources=['*.scala', '!ButNotThisOne.scala'],
We do differ from the spec in (at least) one important way though: there is an implicit / on the front of all of our globs, which means that they are not recursive unless a ** is prepended.
is there any advantage to having special syntax for globs anymore?
seems like glob form is unique enough that its syntax alone is enough to differentiate vs filenames (e.g. you typically won't find files with * ? or [xyz] literals in their filenames outside of non-odd cases) - so I'd say killing/obviating the globs symbol seems reasonable and like a great simplification to me.
one possible mitigation for any future odd cases of glob syntax in literal filenames might be to invert the existing form - i.e. provide a literal_files('my_[wacky]_nodejs_file.?', '??') etc.
I presume that in blazel, the literal files syntax started as a way to avoid listing a directory in order to build a target in a directory. But since scandir-alikes are more and more common, it does seem like a literal-files syntax would be an outlier that you'd use only in very large directories...?
How do you specify excludes?
Excellent question!
As it stands, they would not be supported. But there is a fairly easy way to include them, which would be to support exactly the .gitignore syntax for this: !something would be an inverted pattern.
I'm not convinced that that is more clear to read than:
globs(["*.foo"], excludes = ["moo.foo"])
but I guess if you're coming from a place where you know git and don't know pants, the git syntax may be more intuitive...
we could just leave the explicit globs symbol for excludes definition, but just not encourage it's use outside of that or other one-off cases?
Multiple ways of doing things make me sad :( But an option, for sure :)
or possibly have symbols in the list form that incur special handling (essentially, a type-wrapped form of the gitignore syntax):
sources=['*.py*', excluded('*.pyc')]
Today I wanted to combine a globs() with an rglobs(), and couldn't figure out an easy way to do it (although @jsirois just reminded me on slack I could use zglobs()). I think it would make everything much more natural to kill globs() and rglobs() and only allow zglobs() so specifying files is exactly like on a zsh command line (natural for many people, even non zsh users as it is used elsewhere). It would also remove the (in my opinion) weird distinction between globs() and rglobs() that exists now.
I don't really like the idea of parsing sources=[...] as a glob automatically -- when I look at a BUILD file now, I know seeing sources=[...] is definitely going to refer to a specific list of files, and I would like to not have to check each element to see if there are any stars. I also _really_ like the idea of checking whether all the files specified in a literal sources= list exist (see the TODO in wrapped_globs.py)
I also think generally that mixing globs and literal source files in a sources= argument is a smell, and would prefer to _encourage_ instead making separate targets if you need to select by glob or by specific file paths. An exclude component really only makes sense for globs -- otherwise, you could just remove the excludes from your literal file list. For those reasons I feel like turning literal strings into globs is not a natural or expected interpretation of sources= unless we explicitly use a *globs() verb (which we could cut down to just zglobs()).
I also think generally that mixing globs and literal source files in a sources= argument is a smell, and would prefer to encourage instead making separate targets if you need to select by glob or by specific file paths.
If the target(s) are in one directory, it's a smell regardless of whether it is one target or two. See 1:1:1
Today I wanted to combine a globs() with an rglobs()
Due to 1:1:1, that might also be a smell. But because the v2 glob syntax (mostly) implements globs as defined in the gitignore spec, the way to do that would be to do: ['*.txt', '**/otherstuff.txt'].
I also really like the idea of checking whether all the files specified in a literal sources= list exist (see the TODO in wrapped_globs.py)
Agreed: opened #5430 for that.
Bump on this thread given the developments with filesystem specs (https://docs.google.com/document/d/15xphZcFnowytF0Qu2sO1PG7wHAWoyLof7hK0jVa5T8E/edit#) and https://github.com/pantsbuild/pants/pull/8985.
Filesystem specs allow for globs to be expressed the same way as the sources field, e.g. ./pants cloc2 'src/python/pants/**/*.py' (note the quotes). Filesystem specs _do not_ allow globs(), rglobs(), and zglobs(), understandbly.
We now have an extra motivation to deprecate globs et al. Beyond fixing the problem of multiple ways to declare sources, we'd be unifying filesystem specs with the sources field.
Some possible steps to get here:
1) [x] Add support for excludes via !
2) [x] Improve glob match warning. Possible improvements:
* [x] Remove the ignore option. This is not a safe choice for users, especially with FS specs.
* [x] Refer to the source of the failure, e.g. FS spec vs. BUILD file -- which BUILD file?
* [x] Improve the wording, e.g. don't show excludes if there are no excludes
3) [x] Update documentation
4) [x] Deprecate globs, rglobs, and zglobs.
Thoughts? cc @gshuflin @benjyw @jsirois
We now have an extra motivation to deprecate globs et al. Beyond fixing the problem of multiple ways to declare sources, we'd be unifying filesystem specs with the sources field.
Fully in favor. The other important unification to acknowledge here is the unification with git's (and gitignore's) syntax.
Also fully in favor.
SGTM
Most helpful comment
Fully in favor. The other important unification to acknowledge here is the unification with git's (and gitignore's) syntax.