Bazel is creating __init__.py files automatically. This is an issue in several situations (see https://github.com/bazelbuild/rules_python/issues/55).
This behavior can be suppressed using:
https://docs.bazel.build/versions/master/be/python.html#py_test.legacy_create_init
But in case that you have a large code base with a lot of python rules this is not a good solution. It would mean that this flag needs to be set in every single rule in case that you have a folder in the root of the workspace with the same name as a python package.
When this happens, it is not so easy to figure out what is going on. For modern Python developers __init__.py files are like source, these are not generated files. Thus, is not intuitive that the error could be because some init files are automatically generated.
In a first step: There should be an option to decide the behavior of legacy_create_init globally.
In a second step: The default option should be not to generate __init__.py files by default.
Same like https://github.com/bazelbuild/rules_python/issues/55 or any setup that at the root of the workspace you have a folder called like a builtin python module (time, logging, platform, queue, etc.)
Ubuntu 18.04
bazel info release?release 0.22.0
The end state shouldn't be a global option but rather removing the legacy create init behavior altogether. This might look like the following.
Go through an incompatible change migration to set the default value of legacy_create_init to false.
Then go through an incompatible change migration to force the value to false for everyone, effectively making the attribute a no-op.
Then go through an incompatible change migration to delete the redundant attribute.
The first step of the above is implemented in @Faqa's PR #9271, which is in the process of being merged. It introduces an incompatible change flag tracked by #10076.
Steps 2 and 3 above can come after that. They can actually be merged into a single step. But that'll come down the line, since I don't anticipate flipping the first flag for a while. (We need to see how difficult the migration is, and if it requires new tooling, that'll be hard to prioritize.)
We can use this bug to track the overall deprecation effort.
A concern came up in internal review of #9271: How will a user know that a failure at execution time (from missing __init__.py) is due to flipping this flag?
We already have some logic in the stub script to give users a more informative error message when a failure may be due to a different incompatible change. We can adopt a similar strategy for this change, so that when the incompatible flag is enabled, the exit code is 1, and the stderr matches ^(ModuleNotFoundError|ImportError):, we print an additional message linking to the incompatible change docs. (The second incompatible change, to remove legacy_create_init altogether, won't need any special diagnostic because it'll be a straightforward analysis-time error.)
The downside of this approach is that it means Bazel injects stderr spam into failing py_binarys. But since it would only trigger when legacy_create_init is set to the default value, auto, the user could opt out of the spam on a per-target basis by explicitly setting this attr to true or false -- which works until the second incompatible change to delete the attr. We'd remove the special message from the stub script once the first incompatible flag is flipped.
@brandjon Curious, are there some existing best practices for writing BUILD files when automatic __init__.py file generation is switched off?
Suppose that I have the following setup:
foo/
__init__.py
BUILD
lib/
__init__.py
lib.py
BUILD
bin/
__init__.py
bin.py
BUILD
With automatic __init__.py file generation, there need not be checked-in __init__.py files, and the BUILD files could look like this:
# foo/lib/BUILD
py_library(
name = "lib",
srcs = ["lib.py"],
)
# foo/bin/BUILD
py_library(
name = "bin",
srcs = ["bin.py"],
deps = [
"//foo/lib",
],
)
With manual __init__.py file generation, I guess I'd have to set up something like this:
# foo/BUILD
py_library(
name = "init",
srcs = ["__init__.py"],
)
# foo/lib/BUILD
py_library(
name = "init",
srcs = ["__init__.py"],
)
py_library(
name = "lib",
srcs = ["lib.py"],
deps = [":init"],
)
# foo/bin/BUILD
py_library(
name = "init",
srcs = ["__init__.py"],
)
py_library(
name = "bin",
srcs = ["bin.py"],
deps = [
":init",
"//foo/lib",
"//foo:init",
],
)
This is unfortunately pretty verbose, but somewhat doable for the most part I guess.. To me, the most annoying thing seems to be that we have to explicitly add dependencies to the top-level //foo:init target, in order to have foo/__init__.py show up in runfiles and make import foo.<something> work at runtime. It does not seem to scale well when the repository becomes larger: if I have a target //a/b/c/d:lib, it would need to depend on all parent __init__.py files, like //a:init, //a/b:init, //a/b/c:init. Encouraging a pattern where child libs mechanically depend on their parent packages could also be seen problematic - usually it's a good thing for unit test cacheability if leaf directories are independent.
Or maybe I'm missing something - are there existing bazel code bases that have switched automatic __init__.py generation off?
There's no recommended best practice for what dependency structure to use for __init__.py files. You could have it be included in the srcs of multiple py_librarys in the same package, or factor it into its own target and add it to deps. For parent init files you would have to depend on the parent packages.
I will take the liberty to answer, I would disagree on having the __init__.py files in seperate dependencies.
In this example, as you said with automatic __init__.py it would look something like this, but the issue that I have with automatic __init__.py is that several are not needed.
foo/
__init__.py <--- Not needed
BUILD
lib/
__init__.py
lib.py
BUILD
bin/
__init__.py <--- Not needed
bin.py
BUILD
An init file is to indicate that is a module, for libraries this should be part of a library, I would not recommend to remove it from the srcs of the library and put it separate.
# foo/lib/BUILD
py_library(
name = "lib",
srcs = [
"__init__.py", <-- This should be part of the library, it is indicating what is a module
"lib.py",
],
imports = ["."],
)
The imports "." is needed to import from this library, I think that this is unfortunate but probably relates to be able to import always from the root of the workspace. If it would be a big repository with a lot of different languages mix I don't think that scales always starting your import from the workspace.
Here you can see the example that you put how is best to my eyes but it will depend a lot from the project and this is just my personal opinion.
https://github.com/limdor/bazel-examples/tree/master/python
I would love to extend the example with more complex situations, I am also preparing one example to see how someone could integrate pylint into bazel.
I am sure that all together we will find the best way for each situation.
Likely a naive idea: Is it possible to create a merging py_library variant that properly merges two libraries with clashing namespace?
For example, with google-cloud-datastore and google-cloud-bigquery, the proper result of a single dependency that offers google.cloud.* based on these two dependencies can then be a dependency of other targets. You'd have a BUILD something like:
py_test(
name = "test_ns",
srcs = [
"test_ns.py",
],
deps = [
":merged_google_cloud",
],
)
py_merged_library( # new explicit merge action
name = "merged_google_cloud",
deps = [
requirement("google-cloud-datastore"),
requirement("google-cloud-bigquery"),
],
)
This magical new py_merged_library essentially pip-installs both dependencies, and repacks a single wheel.
Most helpful comment
I will take the liberty to answer, I would disagree on having the __init__.py files in seperate dependencies.
In this example, as you said with automatic
__init__.pyit would look something like this, but the issue that I have with automatic__init__.pyis that several are not needed.An init file is to indicate that is a module, for libraries this should be part of a library, I would not recommend to remove it from the srcs of the library and put it separate.
The imports "." is needed to import from this library, I think that this is unfortunate but probably relates to be able to import always from the root of the workspace. If it would be a big repository with a lot of different languages mix I don't think that scales always starting your import from the workspace.
Here you can see the example that you put how is best to my eyes but it will depend a lot from the project and this is just my personal opinion.
https://github.com/limdor/bazel-examples/tree/master/python
I would love to extend the example with more complex situations, I am also preparing one example to see how someone could integrate pylint into bazel.
I am sure that all together we will find the best way for each situation.