Bazel: Switch Python rules to use toolchains

Created on 7 Feb 2019 · 20Comments · Source: bazelbuild/bazel

This bug tracks designing, implementing, and migrating built-in Python rules to use the toolchain mechanism for resolving the Python runtime. This replaces the legacy mechanism of specifying --python_top to globally make all Python targets look at a specific py_runtime, or omitting --python_top to use the system default "python" command.

This is prioritized. Design doc to follow.

P1 team-Rules-Python feature request

Source

brandjon

Most helpful comment

This seems to be resolved, but as a casual user of Python and new user of rules_python, it would be helpful to have simple documentation explaining how to do a hermetic build.

(Somehow I am able to run my py_binary targets at the moment even though I know the dependencies in the BUILD.bazel files are inadequate.)

gonzojive on 26 May 2020

👍13

All 20 comments

(Plan is to use this work to also address #4815, rather than implement an ad hoc solution to that bug.)

brandjon on 7 Feb 2019

Design doc here.

Discussion can proceed in this thread. Initial reviewers are @katre (Bazel, Toolchains), @mrovner (Python), and @nlopezgi (Remote Execution).

brandjon on 12 Feb 2019

Overall, this looks good. A few points:

Get rid of the host_python_toolchain, just use the autodetecting version. I don't think the time spent running the autodetection script will really hurt python performance, will it?
In the Backwards Compatbility section, you mention deleting several flags: I do not think flags can be conditionally deleted based on another flag. Ignoring the values should be sufficient, and they can be deleted in a post-migration cleanup.

katre on 12 Feb 2019

Ok, I'll remove host_python_toolchain. I don't have a strong opinion about its impact on performance, but it's worth remembering that it will still be superseded by any other applicable toolchain that the user registers.
By "delete" I really mean "fail-fast if the user tries to specify them while the incompatible flag is enabled". This can be implemented by checking the flag value for null, and logging an error in the fragment's reportInvalidOptions() (see example). The true deletion would happen around the time the old code paths and incompatible flag are removed.

brandjon on 12 Feb 2019

I think the distinction between "delete" and "ignore with a warning" is worth calling out.

katre on 12 Feb 2019

(See also Nicolas's comments here.)

brandjon on 12 Feb 2019

Replying to @nlopezgi's concern about toolchains and extensibility here:

I think you need something similar as there are rules that execute the native py_binary (e.g., rules_docker does here: https://github.com/bazelbuild/rules_docker/blob/master/python/image.bzl#L84) and there might be something special that these type of users need to do to work with the new py toolchains properly.

Users who operate at the level of BUILD targets and macros only need to worry about having definitions for their platforms and python toolchains, i.e. the py_runtime_pair target and an associated toolchain target referencing it, with appropriate constraints. Whatever the build graph looks like, it should Just Work.

Users who are writing new rules that interoperate with the Python rules may need to manipulate Python toolchains. To receive a Python toolchain, they should declare @bazel_tools//tools/python:toolchain_type as a required toolchain type in their rule and receive it with ctx.toolchains[<same label>]. This would give them back the ToolchainInfo with the schema described in the doc. This would be necessary for instance to write a replica of py_binary in Starlark. Conversely, users can write a replacement for py_runtime or py_runtime_pair by simply returning an appropriate PyRuntimeInfo / ToolchainInfo provider.

I didn't go into too much detail about how to write rules that produce or consume Python toolchains (aside from a description of the provider schema) because this is already covered by the general toolchain documentation linked in the doc's abstract.

brandjon on 12 Feb 2019

(please do not assign issues to more than one team-*)

dslomov on 15 Feb 2019

Let me add my 5 coins as an end user
I would like to see similar naming convention and signatures for toolchain types as is done for all other toolchains

py_runtime_pair -> py_toolchain

Also related the section

Why not split Python 2 and Python 3 into two separate toolchain types?
The naming convention for toolchain types is to literally name the target "toolchain_type",

Based on jdk toolchain implementation it doesn't look like a problem - tools/jdk/BUILD
Also

@bazel_tools//tools/python2:toolchain_type
@bazel_tools//tools/python3:toolchain_type

doesn't break the convention but helps avoids problems with a missed py_runtime attribute in the pair and provide an opportunity to force version as

--python3_toolchain .../python:remote3.7.15 --python2_toolchain ../python:remote2.7.13

Also it allows to rid of python_version attribute from py_binary/py_test rules and select runtime based on provided toolchain type

JIoJIaJIu on 21 Feb 2019

Hi Guro, thanks for the comments. There's a few reasons I didn't organize it that way.

py_runtime_pair -> py_toolchain

It's not clear what the ideal way to deal with multiple Python runtimes will be long-term. Currently the rules are organized around this PY2/PY3 distinction, but in the future that will become less important and there may be more emphasis on choosing minor versions. Building the current schema into the rule name gives us a little more room to redesign it in the future without extra breaking changes.

Why not split Python 2 and Python 3 into two separate toolchain types?

Based on jdk toolchain implementation it doesn't look like a problem

Again, future-proofing is part of the reason. It's easier to migrate the schema of a ToolchainInfo to use some other layout rather than py2_runtime = ..., py3_runtime = ..., and even to migrate a toolchain rule to have different attributes, than it is to migrate Python rules to accept different toolchain types.

For platforms that do not provide an implementation for one of these toolchains, we'd still have to define a dummy toolchain that fails at analysis time, since even if a target does not use the toolchain its rule type will still require both toolchain types.

Also, creating .../python[2|3]:toolchain_type implies the creation of two additional directories and BUILD files whose only purpose is to provide naming disambiguation for the toolchain types. I'd sooner just break convention and name them .../python:py[2|3]_toolchain_type.

provide an opportunity to force version

I don't believe there's anything in the toolchain framework that would allow you to directly force the toolchain implementation to use for a given toolchain type, as opposed to just letting Bazel select one based on constraints, though I could be wrong.

Also it allows to rid of python_version attribute from py_binary/py_test rules and select runtime based on provided toolchain type

I checked with @katre before starting the design doc, and it turns out that currently toolchains can really only differentiate themselves based on platform constraints, not other rule logic constraints. You can't easily express that a given py_binary has a constraint of "I want Python 2".

brandjon on 21 Feb 2019

Looks good to me.

mrovner on 21 Feb 2019

Thanks for addressing my question about extending python rules. It would be ideal to get this info as part of user docs whenever the right time for that comes.

Looks good to me.

nlopezgi on 21 Feb 2019

@brandjon is it really not possible to express a version constraint for e.g. a py_binary using toolchains? There is an example here: https://docs.bazel.build/versions/master/platforms.html#defining-constraints-and-platforms talking about defining Version as a constraint. That could not be used?

This is sidetracking a bit but I am also curios about this as I was planning to implement to add a constraint for the nodejs toolchains so it would be possible to use different versions of nodejs in the same workspace and one could choose on nodejs_binary level.

Globegitter on 21 Feb 2019

In that example the platform either supports one version or the other, and targets build/run on whichever platform has the version they require. Here the same platform is generally expected to provide both Python versions. Currently all values for a constraint setting are considered mutually exclusive.

I think the bigger issue is that Python version is a property of the target, not the platform. @katre could comment more.

brandjon on 21 Feb 2019

Ah yeah I see, so I can say a platform is compatible with a certain version, but I can not say that a platform is compatible with multiple versions and then just choose the version on a per-target level. It would be nice to hear more details about how to make such a use-case possible in a generic way from @katre. As I imagine this could come up again and again. I am also happy to pull this in a separate issue or bazel-discuss as this is just tangentially related to this issue.

I also do wonder how rules_go manages to allow the setting toolchain attributes on a per-target basis, e.g. I can set go_binary(os="windows",...) and then that will compile for windows.

Globegitter on 22 Feb 2019

We don't yet have a good story for handling constraints on specific toolchain versions. I know that rules_scala also has problems with this, but I am not sure what the best UX would be (ie, what is the easiest way for users to express the idea that "this target needs nodejs X.Y to build").

I don't think there's an existing issue in Github to track this, so feel free to open one and make any suggestions as to how you think it should be handled.

katre on 25 Feb 2019

I'm running into some bootstrapping issues with the new provider definition that I won't have time to address before we cut the baseline for 0.24. So this feature and the associated --incompatible change to preview it will be available in 0.25.

brandjon on 28 Feb 2019

Legacy code cleanup tracked by #8169.

brandjon on 26 Apr 2019

Reopening this because the flag flip failed due to downstream breakages. Follow the migration tracking bug #7899.

brandjon on 2 May 2019

This seems to be resolved, but as a casual user of Python and new user of rules_python, it would be helpful to have simple documentation explaining how to do a hermetic build.

(Somehow I am able to run my py_binary targets at the moment even though I know the dependencies in the BUILD.bazel files are inadequate.)

gonzojive on 26 May 2020

👍13

Was this page helpful?

0 / 5 - 0 ratings