Bazel: Deprecate automatic creation of empty `init.py` files

Created on 8 Feb 2019 · 7Comments · Source: bazelbuild/bazel

Description of the problem / feature request:

Bazel is creating __init__.py files automatically. This is an issue in several situations (see https://github.com/bazelbuild/rules_python/issues/55).

This behavior can be suppressed using:
https://docs.bazel.build/versions/master/be/python.html#py_test.legacy_create_init

But in case that you have a large code base with a lot of python rules this is not a good solution. It would mean that this flag needs to be set in every single rule in case that you have a folder in the root of the workspace with the same name as a python package.
When this happens, it is not so easy to figure out what is going on. For modern Python developers __init__.py files are like source, these are not generated files. Thus, is not intuitive that the error could be because some init files are automatically generated.

Feature requests: what underlying problem are you trying to solve with this feature?

In a first step: There should be an option to decide the behavior of legacy_create_init globally.
In a second step: The default option should be not to generate __init__.py files by default.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Same like https://github.com/bazelbuild/rules_python/issues/55 or any setup that at the root of the workspace you have a folder called like a builtin python module (time, logging, platform, queue, etc.)

What operating system are you running Bazel on?

Ubuntu 18.04

What's the output of `bazel info release`?

release 0.22.0

P3 team-Rules-Python feature request

Source

limdor

👍6

Most helpful comment

I will take the liberty to answer, I would disagree on having the __init__.py files in seperate dependencies.
In this example, as you said with automatic __init__.py it would look something like this, but the issue that I have with automatic __init__.py is that several are not needed.

foo/
   __init__.py   <--- Not needed
   BUILD
   lib/
      __init__.py
      lib.py
      BUILD
   bin/
      __init__.py   <--- Not needed
      bin.py
      BUILD

An init file is to indicate that is a module, for libraries this should be part of a library, I would not recommend to remove it from the srcs of the library and put it separate.

# foo/lib/BUILD
py_library(
    name = "lib",
    srcs = [
        "__init__.py",  <-- This should be part of the library, it is indicating what is a module
        "lib.py",
    ],
    imports = ["."],
)

The imports "." is needed to import from this library, I think that this is unfortunate but probably relates to be able to import always from the root of the workspace. If it would be a big repository with a lot of different languages mix I don't think that scales always starting your import from the workspace.
Here you can see the example that you put how is best to my eyes but it will depend a lot from the project and this is just my personal opinion.
https://github.com/limdor/bazel-examples/tree/master/python

I would love to extend the example with more complex situations, I am also preparing one example to see how someone could integrate pylint into bazel.
I am sure that all together we will find the best way for each situation.

limdor on 6 Feb 2020

👍2

All 7 comments

The end state shouldn't be a global option but rather removing the legacy create init behavior altogether. This might look like the following.

Go through an incompatible change migration to set the default value of legacy_create_init to false.
Then go through an incompatible change migration to force the value to false for everyone, effectively making the attribute a no-op.
Then go through an incompatible change migration to delete the redundant attribute.

brandjon on 14 Mar 2019

👍2

The first step of the above is implemented in @Faqa's PR #9271, which is in the process of being merged. It introduces an incompatible change flag tracked by #10076.

Steps 2 and 3 above can come after that. They can actually be merged into a single step. But that'll come down the line, since I don't anticipate flipping the first flag for a while. (We need to see how difficult the migration is, and if it requires new tooling, that'll be hard to prioritize.)

We can use this bug to track the overall deprecation effort.

brandjon on 21 Oct 2019

A concern came up in internal review of #9271: How will a user know that a failure at execution time (from missing __init__.py) is due to flipping this flag?

We already have some logic in the stub script to give users a more informative error message when a failure may be due to a different incompatible change. We can adopt a similar strategy for this change, so that when the incompatible flag is enabled, the exit code is 1, and the stderr matches ^(ModuleNotFoundError|ImportError):, we print an additional message linking to the incompatible change docs. (The second incompatible change, to remove legacy_create_init altogether, won't need any special diagnostic because it'll be a straightforward analysis-time error.)

The downside of this approach is that it means Bazel injects stderr spam into failing py_binarys. But since it would only trigger when legacy_create_init is set to the default value, auto, the user could opt out of the spam on a per-target basis by explicitly setting this attr to true or false -- which works until the second incompatible change to delete the attr. We'd remove the special message from the stub script once the first incompatible flag is flipped.

brandjon on 24 Oct 2019

@brandjon Curious, are there some existing best practices for writing BUILD files when automatic __init__.py file generation is switched off?

Suppose that I have the following setup:

foo/
   __init__.py
   BUILD
   lib/
      __init__.py
      lib.py
      BUILD
   bin/
      __init__.py
      bin.py
      BUILD

With automatic __init__.py file generation, there need not be checked-in __init__.py files, and the BUILD files could look like this:

# foo/lib/BUILD
py_library(
    name = "lib",
    srcs = ["lib.py"],
)

# foo/bin/BUILD
py_library(
    name = "bin",
    srcs = ["bin.py"],
    deps = [
        "//foo/lib",
    ],
)

With manual __init__.py file generation, I guess I'd have to set up something like this:

# foo/BUILD
py_library(
    name = "init",
    srcs = ["__init__.py"],
)

# foo/lib/BUILD
py_library(
    name = "init",
    srcs = ["__init__.py"],
)
py_library(
    name = "lib",
    srcs = ["lib.py"],
    deps = [":init"],
)

# foo/bin/BUILD
py_library(
    name = "init",
    srcs = ["__init__.py"],
)
py_library(
    name = "bin",
    srcs = ["bin.py"],
    deps = [
        ":init",
        "//foo/lib",
        "//foo:init",
    ],
)

This is unfortunately pretty verbose, but somewhat doable for the most part I guess.. To me, the most annoying thing seems to be that we have to explicitly add dependencies to the top-level //foo:init target, in order to have foo/__init__.py show up in runfiles and make import foo.<something> work at runtime. It does not seem to scale well when the repository becomes larger: if I have a target //a/b/c/d:lib, it would need to depend on all parent __init__.py files, like //a:init, //a/b:init, //a/b/c:init. Encouraging a pattern where child libs mechanically depend on their parent packages could also be seen problematic - usually it's a good thing for unit test cacheability if leaf directories are independent.

Or maybe I'm missing something - are there existing bazel code bases that have switched automatic __init__.py generation off?

scele on 3 Feb 2020

There's no recommended best practice for what dependency structure to use for __init__.py files. You could have it be included in the srcs of multiple py_librarys in the same package, or factor it into its own target and add it to deps. For parent init files you would have to depend on the parent packages.

brandjon on 3 Feb 2020

foo/
   __init__.py   <--- Not needed
   BUILD
   lib/
      __init__.py
      lib.py
      BUILD
   bin/
      __init__.py   <--- Not needed
      bin.py
      BUILD

An init file is to indicate that is a module, for libraries this should be part of a library, I would not recommend to remove it from the srcs of the library and put it separate.

# foo/lib/BUILD
py_library(
    name = "lib",
    srcs = [
        "__init__.py",  <-- This should be part of the library, it is indicating what is a module
        "lib.py",
    ],
    imports = ["."],
)

limdor on 6 Feb 2020

👍2

Likely a naive idea: Is it possible to create a merging py_library variant that properly merges two libraries with clashing namespace?

For example, with google-cloud-datastore and google-cloud-bigquery, the proper result of a single dependency that offers google.cloud.* based on these two dependencies can then be a dependency of other targets. You'd have a BUILD something like:

py_test(
  name = "test_ns",
  srcs = [
    "test_ns.py",
  ],
  deps = [
      ":merged_google_cloud",
  ],
)

py_merged_library(      # new explicit merge action
  name = "merged_google_cloud",
  deps = [
    requirement("google-cloud-datastore"),
    requirement("google-cloud-bigquery"),
  ],
)

This magical new py_merged_library essentially pip-installs both dependencies, and repacks a single wheel.