When possible, Pants will generate multiple smaller targets from one original target. Each generated target will have exactly one sources file.
For example, if a target owns 5 files, Pants would generate 5 new targets. All of the metadata gets copied from the original, except for the address and the sources field.
If a file has multiple owning targets, then we will never attempt to generate a subtarget because it is ambiguous where to copy the metadata from.
--changed-since: this is essentially sugar for file arguments. Same semantics as file arguments.Imagine you have the target :utils. strutil.py depends on dirutil.py. In the BUILD file, :utils does not include a dependency on dirutil.py because that is already in the sources field; Dependency inference is also disabled. Then, ./pants dependencies strutil.py will end up not including dirutil.py nor :utils, and we won鈥檛 include dirutil.py anywhere in the closure.
The only way to get the feature to not work is that you have multiple owning targets.
We could possibly add a global flag.
lint, fmt, and test due to special logic../pants dependencies foo.py will only show the deps of that file (if dep inference is enabled).files(), resources(), or python_requirement_library() targets, you will always have conventional targets../pants list foo.py. Now this gives you back the file name foo.py.--query, where it's unclear how we should handle generated subtargets.dependees https://github.com/pantsbuild/pants/issues/10354Thanks for opening! I'll take a look tomorrow morning.
cc @cosmicexplorer, I'm curious your thoughts as you've had to interact with this recently with your --query work.
- File addresses are deceptive.
- We output a file name for generated subtargets, but it's not really a file. It's a target that owns exactly one file and was generated from some other target.
- This nuance risks confusion; it hides key information about how Pants works, i.e. that Pants cares about the metatadata for that target, not only the source file.
There is a path forward for this issue described in the design doc by making the syntax non-ambiguous: ie, applying the @ to select the owner. This works both as a Spec input and as an Address output.
- We now must teach two concepts, rather than one.
- If you have any
files(),resources(), orpython_requirement_library()targets, you will always have conventional targets.
- Because it's so common to use those target types, we will still need to teach conventional targets.
This doesn't feel like a second concept... afaict those target types would work the same as any others. Inference is never going to choose to depend on just one file from one of these targets (if it ever figured out how to infer a dependency on those files), but if a human did, it would be perfectly fine to depend on just one file explicitly.
- A user's experience varies substantially depending on if they use file arguments or address arguments, and dep inference or not.
- This has a risk for confusion why something works one way, but not another.
- The tool feels less unified and less consistent.
- See https://github.com/pantsbuild/pants/pull/10420#issue-454768171 for a concrete issue we had in Toolchain because of this. If we use file args, Pants works; otherwise, it errors.
As mentioned on that ticket, this general class of issues feels like it could be fixed by adjusting the generation of file targets. The generation of both the base target and the file targets needs to be deterministic and consistent, regardless of which of those a caller is depending on.
With regard to the difference between ./pants test :: and ./pants test '**/*': this was acknowledged during the design session, with consensus that it was reasonable for those to have different effects. AFAICT, it continues to be simplest to treat "address Specs" as expanding to targets, and "file Specs" as expanding to files.
- No mechanism to determine the owner of a file.
- Before, you would use
./pants list foo.py. Now this gives you back the file namefoo.py.
- It's not clear what metadata is being applied for a file, and where to go to edit that metadata.
- Substantially increases our code complexity and maintenance burden.
- This is particularly tricky code to reason about, particularly because there are so many edge cases.
- Example: this has interfered with adding
--query, where it's unclear how we should handle generated subtargets.
- Example: confusing how to integrate this with
dependeeshttps://github.com/pantsbuild/pants/issues/10354
Both of these issues can be resolved by a model change that I have suggested: both files and targets should be nodes in the dependency graph, and the graph introspection goals should gain filtering options (or we should do that via query). The effect of both files and targets being nodes in the graph would be that:
dependees --direct --targets (ie, with a filter to render only targets).dependees would automatically work for either files or targets (or both, with no filter specified).filedeps would merge with dependencies via an identical filtering option on output.query (assuming it lands) would be able to operate on both files and targets.In my mind, this model change falls into the camp of "Pros"... right now our graph introspection goals have gaps and are bifurcated due to the fact that they each operate on either files or targets, but not both. It would also make file deps inherent: ie, a target always depends on its files, regardless of whether inference is used.
I'll spend some time this morning looking concretely at what API changes would be involved in the model change.
These are my notes based on a convo I just had with Eric.
Stepping back, I'd like to distinguish two concepts, for extra clarity:
A) File-level dependencies.
B) File-level subtargets.
A) is a feature - it is what we want to achieve. B) is "merely" an implementation decision - it was our chosen way to implement A). We chose B) because it would allow downstream rules to work as usual, since they already operate on targets.
However it appears that the implementation is over-complex, and that complexity is leaking out in user-facing ways. "Downstream rules can work as usual" was too naive an assumption.
So, I'd like to propose an alternative implementation of A). One that does require downstream rules to change, but is conceptually simpler, and requires a lot fewer changes elsewhere.
Basically, instead of the basic 'currency' of operation being "list of targets", it would be "list of (source_path, target) pairs".
This would require a bunch of changes in various rules, yes. But it is easy to go from "list of targets" to "list of (source_path, target) pairs" by enumerating the source paths in the target. (*) And it's obviously even easier to go the other way.
So, for example, when a rule needs the transitive dep closure, it requests it for a list of (source_path, target) pairs and gets back a bigger list of (source_path, target) pairs. This is where file-level deps (and dep inference) could kick in.
Basically, if we're saying that "a target is metadata attached to a file" then we probably need to model that fact explicitly in the rules that care, rather than trying to emulate that via these up-front generated subtargets.
I haven't thought through this enough to get a sense of the problems that might ensue, but wanted to see what y'all thought.
(*) Eric mentioned that an intrinsic to evaluate globs without snapshotting the contents would make this more performant.
I haven't thought through this enough to get a sense of the problems that might ensue, but wanted to see what y'all thought.
This is nearly how things work at HEAD: Address now has an optional field that indicates which "base target" it is associated with. So in practice, Address today acts roughly like a union of:
Union[metadata_source, Tuple[File, metadata_source]]
But yes, @benjyw : I think that what you and I are suggesting is fairly similar. The effect of the model change I mentioned would be that Target was roughly:
Target[metadata_source, Optional[File]]
... ie, that every target would own exactly one or zero files, and that when we generated "subtargets", the effect would be that the "base" target owned no files, and instead just depended on its subtargets/file-targets.
The difference between these two models is that I think that you still want to have "named/base target" nodes in the graph (even if they don't "own" files). They exist, people can depend on them (and probably need to in the case of resources), and graph introspection goals should optionally show them.
EDIT: My types were not aligning. The change I'm suggesting has more to do with Targets than addresses, afaict.
Glad we're converging on something here. My main point is that we should never generate a Target. Those should only represent actual BUILD file entities. In my (source file, target) pair scheme, what I really mean is (source file, metadata), where metadata is represented by a target, or - possibly - by some Metadata type derived from a target (even if all that Metadata type is is a copy of the target's fields).
And for example if there is no target at all owning a file that we inferred a dep on, we can just stick in some empty/default metadata. I think.
I think Benjy's proposal is great. It's an extension and improvement on the idea of AddressWithOrigin. There, we are able to use that WithOrigin part to sometimes do things more precisely, like with test and fmt/lint. Here, we can have something like AddressWithSource, which is a pair of (Address, Optional[str]).
If you use an address arg or traditional dependency, then you will depend on every single AddressWithSource that belongs to the target; if you use dependency inference or file arguments, then we can only use the AddressWithSources that are relevant.
Given an AddressWithSource, it's easy to go to Address. This means that some rules that don't care about files can continue to take Addresses as their input, e.g. list.
This allows us to differentiate clearly between what is truly a target vs. what is a file. Currently, ./pants test outputs "Running on target foo.py" for generated subtargets, but "Running on target :foo" if dealing with a normal address. This is a leak to the end user. Instead, we would be able to always choose to output the file name or the address name, like "Running on file foo.py" or "Running on target :foo". Regardless of if the user used file args vs. address args, we have access to both the address and the file. There is no weird mixing of the two ideas.
--
This is very feasible to implement, and we know that because of the implementation of AddressWithOrigin.
Also, I think we could reimplement rules to use this concept with no behavior changes at all relative to the "classic" behavior (pre dep inference). Then introducing file-level dep inference on-top-of/after reworking the rules should be straightforward.
Again, I think...
Also, I think we could reimplement rules to use this concept with no behavior changes at all relative to the "classic" behavior (pre dep inference). Then introducing file-level dep inference on-top-of/after reworking the rules should be straightforward.
Yes, I think so. I think I have a pretty good idea of how to implement this.
Cool... I think that at some level these changes are pretty similar.
This allows us to differentiate clearly between what is truly a target vs. what is a file. Currently, ./pants test outputs "Running on target foo.py" for generated subtargets, but "Running on target :foo" if dealing with a normal address. This is a leak to the end user. Instead, we would be able to always choose to output the file name or the address name, like "Running on file foo.py" or "Running on target :foo". Regardless of if the user used file args vs. address args, we have access to both the address and the file. There is no weird mixing of the two ideas.
AFAICT, it would already be possible to render things this way at HEAD. An Address with a base_target_name is a "file", always, and an address without one is a "target", always. It's possible that AddressWithOrigin is a more typesafe way to represent this difference though, because if you want the owning target, you request an Address, but otherwise you request an AddressWithOrigin?
I'll maintain though that I think changes to Target would further simplify this.
It's possible that AddressWithOrigin is a more typesafe way to represent this difference though, because if you want the owning target, you request an Address, but otherwise you request an AddressWithOrigin?
Yeah, I think so. We could figure out how to get this, but it requires plugin authors being fluent with the property .generated_base_target_name. This is a much more natural way to get the original benefits that prompted generated subtargets.
Also, nit: s/AddressWithOrigin/AddressWithSource. I'm not sure what should happen to WithOrigin. We only use it right now to support precise file arguments, and that is now superseded by WithSource.
Are we happier with this now? Would like to close if we think @stuhood's work has helped allay concerns.
Not all resolved, but we decided to lean more into the experiment to see if that will address the concerns. It's possible that we still end up reverting file dependencies. If we decide that, we can re-open this as relevant.