In https://github.com/pantsbuild/pants/issues/7715, we decided to inject __init__.py files for every module.
Turns out, even if the repo already had an __init__.py file created, like src/util/__init__.py, then we still will inject our own, rather than using the original. This is because the file is never specified in the BUILD for that target.
The downside to this approach is that we do not support custom __init__.pys, i.e. those with content in them.
Instead, should we somehow implicitly link a folder's Python targets with that folder's __init__.py? I don't think we want to start making this an explicit dependency because of how much noise there would be.
If a target declares an __init__.py and captures it in its sources, then it will be included. And if you follow 1:1:1 and have one target per directory, the "default" sources globs will capture that file as well. See: https://www.pantsbuild.org/build_files.html#sources
If a target declares an __init__.py and captures it in its sources, then it will be included. And if you follow 1:1:1 and have one target per directory, the "default" sources globs will capture that file as well.
Agreed that this is currently how things work. My question is if we should change this, so that we start to implicitly associate each file in a directory with that directory's __init__.py? In general, horrible idea, but __init__.py is a very special type of file whereby I think this implicit dependency could be worth considering. Or, at least, update our documentation to mention that we do not special case __init__.py and custom files must be depended upon in the corresponding BUILD.
I believe this issue will be key to getting contrib unit tests to work with V2, specifically due to namespace packages.
Namespace packages are Python packages where the source files aren't necessarily all in the same folder. pants and pants_test are namespace packages because we can call pants.contrib.mypy and pants.base, even though the root pants folders are in different places. We achieve this via setuptools:
This line declares the __name__ (i.e. pants) to be a namespace package. Any other __init__.py with the same line and same __name__ will be added into the namespace package at import time.
--
Currently, if we want to ensure that every single pants/python/src file depends on this root src/python/pants/__init__.py file, we would need to explicitly mark that in every BUILD entry! Not a very elegant or feasible idea.
Instead, I propose that V2 will search for each source file if it has __init__.py, including a recursive search through its parents. So, using src/python/pants/util/strutil.py in a V2 test would result in the generated PEX also including src/python/pants/__init__.py and src/python/pants/util/__init__.py. This is how Python normally works, that it implicitly imports __init__.py.
I think it might make sense to make a rust intrinsic for this particular filesystem traversal you've described (ascending directories to look for a filename matching some pattern). I think that since many, many languages have the idea of "namespace packages" (see: node.js), it wouldn't really be too much "special-case" logic (perhaps it can share code with AscendantAddresses?), which makes me more comfortable about introducing an intrinsic. This could enable something like:
class NamespacePackage(datatype([('pkg_name', optional_string)])):
pass
# The engine provides an intrinsic from NamespacePackageRequest => NamespacePackage.
@rule(PythonNamespacePackage, [PythonSourceFile])
def get_namespace_package(python_source_file):
rel_path_from_buildroot = python_source_file.path
maybe_namespace_package = yield Get(NamespacePackage, NamespacePackageRequest(rel_path_from_buildroot, "__init__.py"))
yield PythonNamespacePackage(maybe_namespace_package)
From thinking about how python and node.js might use this feature, I think that extracting this particular filesystem traversal described in https://github.com/pantsbuild/pants/issues/8082#issuecomment-522261334 into a rust intrinsic might be a really useful and terse decoupling.
I've been trying to land no longer using __init__.py so that we instead use implicit namespace packages (PEP 420). Closing this.
If any community members end up wanting this feature, then we can open it back up.
I'll note that this is not a featue, it's a bug fix. I suggest leaving this open since it's not a bug about Pants itself, but Pants support for python. It just so happens Pants dogfooding itself is safe here.
Other new and existing python Pants users will not be so lucky. It's true we suggest 1:1:1 and all of that. Until we outlaw non 1:1:1 though, this is a bug in pur python support. We allow multiple targets in directory that is a package and we support non-empty __init__.py, and, for now, we support python 2.7.
The v1 solution is to inject setuptools style ns packages in generated PEXes since these are forward compatible with implicit namespace packages:
https://github.com/pantsbuild/pants/blob/93047bdaeff92cc8b1e31e4b1754469fdd7b6cd3/src/python/pants/backend/python/subsystems/pex_build_util.py#L321-L327
https://github.com/pantsbuild/pants/blob/93047bdaeff92cc8b1e31e4b1754469fdd7b6cd3/src/python/pants/backend/python/subsystems/pex_build_util.py#L306-L316
https://github.com/pantsbuild/pants/blob/93047bdaeff92cc8b1e31e4b1754469fdd7b6cd3/src/python/pants/backend/python/subsystems/pex_build_util.py#L82-L95
That's not a fully satisfying solution, but it has proven robust (the old solution had all this occurring inside PEX itself). I don't see any reason the same solution wouldn't work just fine in v2.
Bumping this, as I came across it today.
I think the right way to handle this is via dep inference. Basically, there is always an implied dependency of any code on all the __init__.py files above it up to the source root, even if there are no imports from any of those files.
@benjyw : Automatically using parent __init__.py files might be a reasonable way to avoid boilerplate here. But it also runs into a few issues: if this happens via "inferring dependencies", then the assumptions are that 1) there are parent targets, 2) they own __init__.py files. If there aren't, or they don't, then it fails, and people then need to go and create those files and targets. And unless we are implicitly creating targets per __init__.py file independent of other stuff in the module, it also seems very likely to cause dependency cycles (which we might want to allow for in user code, but would not want to trigger eagerly by ourselves).
@jsirois's comment is helpful for cleanly handling the case where __init__.py files do _not_ already exist. But that's half of the problem here.
The alternative that @cosmicexplorer and @Eric-Arellano mentioned of not using target _dependencies_, but instead slurping in parent __init__.py files via filesystem patterns (ie, every target below some __init__.py would end up "owning" it via globs) would avoid those issues. But it would be weird for targets to have implicit (shared) ownership of any __init__.py files between them and their source root: for one thing, our owners-detection code currently assumes that targets only own files located at-or-below themselves in the directory structure.
Either implicit targets or glob-based capturing would naturally report targets being merged with mismatched package declarations via MergeDirectories failing with a summary of the mismatched file content... so that's nice.
So it seems like the question is: do we want to get into the business of implicitly creating targets for __init__.py files (and in general)? It goes beyond dependency inference, and I don't think that we've really considered the implications (beyond previously having flirted with "targets that don't exist on disk" via v1 synthetic targets).
This just brings me back to "targets" not being the vital concept we treat it as... Targets don't exist in the user's conception of their codebase. We force them on the user, because supposedly they help. But in this case they appear to hinder.
The right thing to do is (I _think_) fairly obvious: Those __init__.py files need to be in any pexes/chroots we create. That is (I _think_, could be wrong) what the user will expect. If the target concept is getting in the way of that than the problem is with the concept, not the functionality.
Targets don't exist in the user's conception of their codebase. We force them on the user, because supposedly they help. But in this case they appear to hinder.
The issue is just that it is impossible for targets to go away completely, because there will ~always be configuration that needs to be attached in various contexts (setup-py config, python version, un-inferrable dependencies, etc).
So it's less a question of whether we "can" implicitly create targets (we can), and more a question of how to deal with a mix of implicitly created targets and targets that actually live on disk. To some degree, people are already familiar with targets that only appear in list, due to macros and synthetic targets (as used in Go buildgen). But even macros are much less magical then this... you can learn that the macro means N targets are going to be created in a directory. And how do you handle the case where some target already owns the source, for example?
On the flip side, "implicit targets" might be a more principled way to resolve #7022 as well... essentially, a set of implicit targets created for requirements.txt files. But thinking about that example makes it clearer (imo) that this would need to be tackled very carefully.
There will always be configuration, perhaps attached to files or directories or globs of files, but we may not have to impose the target concept on end users (e.g., we can imagine getting rid of target specs entirely), and we may not have to create targets implicitly in order to consume source files.