This bug tracks designing, implementing, and migrating built-in Python rules to use the toolchain mechanism for resolving the Python runtime. This replaces the legacy mechanism of specifying --python_top
to globally make all Python targets look at a specific py_runtime
, or omitting --python_top
to use the system default "python" command.
This is prioritized. Design doc to follow.
(Plan is to use this work to also address #4815, rather than implement an ad hoc solution to that bug.)
Design doc here.
Discussion can proceed in this thread. Initial reviewers are @katre (Bazel, Toolchains), @mrovner (Python), and @nlopezgi (Remote Execution).
Overall, this looks good. A few points:
Ok, I'll remove host_python_toolchain
. I don't have a strong opinion about its impact on performance, but it's worth remembering that it will still be superseded by any other applicable toolchain that the user registers.
By "delete" I really mean "fail-fast if the user tries to specify them while the incompatible flag is enabled". This can be implemented by checking the flag value for null, and logging an error in the fragment's reportInvalidOptions()
(see example). The true deletion would happen around the time the old code paths and incompatible flag are removed.
I think the distinction between "delete" and "ignore with a warning" is worth calling out.
(See also Nicolas's comments here.)
Replying to @nlopezgi's concern about toolchains and extensibility here:
I think you need something similar as there are rules that execute the native py_binary (e.g., rules_docker does here: https://github.com/bazelbuild/rules_docker/blob/master/python/image.bzl#L84) and there might be something special that these type of users need to do to work with the new py toolchains properly.
Users who operate at the level of BUILD targets and macros only need to worry about having definitions for their platforms and python toolchains, i.e. the py_runtime_pair
target and an associated toolchain
target referencing it, with appropriate constraints. Whatever the build graph looks like, it should Just Work.
Users who are writing new rules that interoperate with the Python rules may need to manipulate Python toolchains. To receive a Python toolchain, they should declare @bazel_tools//tools/python:toolchain_type
as a required toolchain type in their rule and receive it with ctx.toolchains[<same label>]
. This would give them back the ToolchainInfo
with the schema described in the doc. This would be necessary for instance to write a replica of py_binary
in Starlark. Conversely, users can write a replacement for py_runtime
or py_runtime_pair
by simply returning an appropriate PyRuntimeInfo
/ ToolchainInfo
provider.
I didn't go into too much detail about how to write rules that produce or consume Python toolchains (aside from a description of the provider schema) because this is already covered by the general toolchain documentation linked in the doc's abstract.
(please do not assign issues to more than one team-*
)
Let me add my 5 coins as an end user
I would like to see similar naming convention and signatures for toolchain types as is done for all other toolchains
py_runtime_pair -> py_toolchain
Also related the section
Why not split Python 2 and Python 3 into two separate toolchain types?
The naming convention for toolchain types is to literally name the target "toolchain_type",
Based on jdk toolchain implementation it doesn't look like a problem - tools/jdk/BUILD
Also
@bazel_tools//tools/python2:toolchain_type
@bazel_tools//tools/python3:toolchain_type
doesn't break the convention but helps avoids problems with a missed py_runtime attribute in the pair and provide an opportunity to force version as
--python3_toolchain .../python:remote3.7.15 --python2_toolchain ../python:remote2.7.13
Also it allows to rid of python_version attribute from py_binary
/py_test
rules and select runtime based on provided toolchain type
Hi Guro, thanks for the comments. There's a few reasons I didn't organize it that way.
py_runtime_pair -> py_toolchain
It's not clear what the ideal way to deal with multiple Python runtimes will be long-term. Currently the rules are organized around this PY2/PY3 distinction, but in the future that will become less important and there may be more emphasis on choosing minor versions. Building the current schema into the rule name gives us a little more room to redesign it in the future without extra breaking changes.
Why not split Python 2 and Python 3 into two separate toolchain types?
Based on jdk toolchain implementation it doesn't look like a problem
Again, future-proofing is part of the reason. It's easier to migrate the schema of a ToolchainInfo
to use some other layout rather than py2_runtime = ..., py3_runtime = ...
, and even to migrate a toolchain rule to have different attributes, than it is to migrate Python rules to accept different toolchain types.
For platforms that do not provide an implementation for one of these toolchains, we'd still have to define a dummy toolchain that fails at analysis time, since even if a target does not use the toolchain its rule type will still require both toolchain types.
Also, creating .../python[2|3]:toolchain_type
implies the creation of two additional directories and BUILD
files whose only purpose is to provide naming disambiguation for the toolchain types. I'd sooner just break convention and name them .../python:py[2|3]_toolchain_type
.
provide an opportunity to force version
I don't believe there's anything in the toolchain framework that would allow you to directly force the toolchain implementation to use for a given toolchain type, as opposed to just letting Bazel select one based on constraints, though I could be wrong.
Also it allows to rid of
python_version
attribute frompy_binary
/py_test
rules and select runtime based on provided toolchain type
I checked with @katre before starting the design doc, and it turns out that currently toolchains can really only differentiate themselves based on platform constraints, not other rule logic constraints. You can't easily express that a given py_binary has a constraint of "I want Python 2".
Looks good to me.
Thanks for addressing my question about extending python rules. It would be ideal to get this info as part of user docs whenever the right time for that comes.
Looks good to me.
@brandjon is it really not possible to express a version constraint for e.g. a py_binary using toolchains? There is an example here: https://docs.bazel.build/versions/master/platforms.html#defining-constraints-and-platforms talking about defining Version as a constraint. That could not be used?
This is sidetracking a bit but I am also curios about this as I was planning to implement to add a constraint for the nodejs toolchains so it would be possible to use different versions of nodejs in the same workspace and one could choose on nodejs_binary level.
In that example the platform either supports one version or the other, and targets build/run on whichever platform has the version they require. Here the same platform is generally expected to provide both Python versions. Currently all values for a constraint setting are considered mutually exclusive.
I think the bigger issue is that Python version is a property of the target, not the platform. @katre could comment more.
Ah yeah I see, so I can say a platform is compatible with a certain version, but I can not say that a platform is compatible with multiple versions and then just choose the version on a per-target level. It would be nice to hear more details about how to make such a use-case possible in a generic way from @katre. As I imagine this could come up again and again. I am also happy to pull this in a separate issue or bazel-discuss as this is just tangentially related to this issue.
I also do wonder how rules_go
manages to allow the setting toolchain attributes on a per-target basis, e.g. I can set go_binary(os="windows",...)
and then that will compile for windows.
We don't yet have a good story for handling constraints on specific toolchain versions. I know that rules_scala also has problems with this, but I am not sure what the best UX would be (ie, what is the easiest way for users to express the idea that "this target needs nodejs X.Y to build").
I don't think there's an existing issue in Github to track this, so feel free to open one and make any suggestions as to how you think it should be handled.
I'm running into some bootstrapping issues with the new provider definition that I won't have time to address before we cut the baseline for 0.24. So this feature and the associated --incompatible change to preview it will be available in 0.25.
Legacy code cleanup tracked by #8169.
Reopening this because the flag flip failed due to downstream breakages. Follow the migration tracking bug #7899.
This seems to be resolved, but as a casual user of Python and new user of rules_python, it would be helpful to have simple documentation explaining how to do a hermetic build.
(Somehow I am able to run my py_binary targets at the moment even though I know the dependencies in the BUILD.bazel files are inadequate.)
Most helpful comment
This seems to be resolved, but as a casual user of Python and new user of rules_python, it would be helpful to have simple documentation explaining how to do a hermetic build.
(Somehow I am able to run my py_binary targets at the moment even though I know the dependencies in the BUILD.bazel files are inadequate.)