Bazel: Feature request: pypi_package

Created on 10 Dec 2015  Â·  29Comments  Â·  Source: bazelbuild/bazel

... similar to maven_jar, but downloads Python code from PyPi and makes them available in the build.

P2 feature request

Most helpful comment

So I have been messing around with this for a while and came up with something very short. I loved @benley implementation of pex_rules and ended up learning a ton from it and implementing my own but the missing part was the pex,pip,setuptools dependencies that had to come in during the analysis stage. After reviewing @yugui repo for her implementation of the pypi_rules I ended up creating something somewhat similar in concept.

Sorry for posting it in raw form. I am kinda short on time and just wanted to share for others.

WORKSPACE

load('//tools/build_rules:pypi_rules.bzl', 'pypi_repositories')

pypi_repositories(['pex', 'protobuf'])

The syntax for pypi_repositories is exactly as one that is used with pip since that function just pipes things into get-pip.py. One could use syntax like "setuptools==x.x.x" etc.

pypi_rules.bzl

_BUILD_FILE = """
filegroup(
    name = 'pip_tools',
    srcs = glob(
        include = ['bin/**/*', 'site-packages/**/*'],
        exclude = [
            # Illegal as Bazel labels but are not required by pip.
            "site-packages/setuptools/command/launcher manifest.xml",
            "site-packages/setuptools/*.tmpl",
        ]
    ),
    visibility = ['//visibility:public']
)
"""


def _pip_tools_impl(ctx):
    getpip = ctx.path(ctx.attr._getpip)
    tools = ctx.path('site-packages')

    command = ['python3', str(getpip)]
    command += list(ctx.attr.packages)
    command += ['--target', str(tools)]
    command += ['--install-option', '--install-scripts=%s' % ctx.path('bin')]
    command += ['--no-cache-dir']
    ctx.execute(command)
    ctx.file('BUILD', _BUILD_FILE, False)


_pip_tools = repository_rule(
    _pip_tools_impl,
    attrs={
        'packages': attr.string_list(),
        '_getpip': attr.label(
            default=Label('@getpip//file:get-pip.py'),
            allow_single_file=True,
            executable=True,
            cfg='host'
        )
    }
)


def pypi_repositories(packages=None):
    native.http_file(
        name="getpip",
        url="https://bootstrap.pypa.io/get-pip.py",
        sha256="19dae841a150c86e2a09d475b5eb0602861f2a5b7761ec268049a662dbd2bd0c"
    )

    _pip_tools(
        name="pypi",
        visibility=['//visibility:public'],
        packages=packages if packages else []
    )

    native.bind(
        name="pip_tools",
        actual="@pypi//:pip_tools",
    )

I am including the pex_rules just for the visualization of how it is actually used. Concentrate on the ctx.action() inside of pex_binary_impl() and _pip_tools attr in pex_bin_attrs for how the dependencies end up getting propagated from the cache. I ended up using --no-cache-dir flag for now but obviously one could expand on this and make it more advanced by allowing it actually check with whatever is installed on the local machine to save time from forcing a download every time a build is made.

pex_rules.bzl

pex_file_types = FileType([".py"])


def collect_transitive_srcs(ctx):
    transitive_srcs = set(order="compile")
    for dep in ctx.attr.deps:
        transitive_srcs += dep.transitive_srcs
    transitive_srcs += pex_file_types.filter(ctx.files.srcs)
    return transitive_srcs


def collect_transitive_reqs(ctx):
    transitive_reqs = set(order="compile")
    for dep in ctx.attr.deps:
        transitive_reqs += dep.transitive_reqs
    transitive_reqs += ctx.attr.reqs
    return transitive_reqs


def pex_library_impl(ctx):
    build_path = '/'.join(ctx.build_file_path.split('/')[:-1])
    transitive_srcs = collect_transitive_srcs(ctx)
    transitive_reqs = collect_transitive_reqs(ctx)
    transitive_reqs += set([build_path])
    return struct(
        files=set(),
        transitive_srcs=transitive_srcs,
        transitive_reqs=transitive_reqs
    )


def pex_binary_impl(ctx):
    build_path = '/'.join(ctx.build_file_path.split('/')[:-1])
    transitive_srcs = collect_transitive_srcs(ctx)
    transitive_reqs = collect_transitive_reqs(ctx)
    transitive_reqs += set([build_path])

    command = 'external/pypi/bin/pex ' + \
              '%s ' % ' '.join([f for f in transitive_reqs]) + \
              ('-v ' if ctx.attr.verbose else ' ') + \
              '--entry-point=%s ' % ctx.attr.entry_point + \
              '--output-file=%s ' % ctx.outputs.executable.path + \
              '--python=%s' % ctx.attr.interpreter

    ctx.action(
        mnemonic='PexCompile',
        inputs=list(transitive_srcs + ctx.attr._pip_tools.files),
        command=command,
        outputs=[ctx.outputs.executable],
        env={
            'PATH': '/bin:/usr/bin:/usr/local/bin',
            'PYTHONPATH': 'external/pypi/site-packages',
            'LANG': 'en_US.UTF-8',
            'PEX_ROOT': '.pex'
        },
    )

    return struct(files=set([ctx.outputs.executable]))


pex_attrs = {
    'srcs': attr.label_list(allow_files=True),
    'reqs': attr.string_list(),
    'deps': attr.label_list(
        providers=[
            'transitive_srcs',
            'transitive_reqs'
        ],
        allow_files=False
    )
}

pex_bin_attrs = pex_attrs + {
    'entry_point': attr.string(mandatory=True),
    'interpreter': attr.string(default='python3.5'),
    'verbose': attr.bool(default=False),
    "_pip_tools": attr.label(default=Label("//external:pip_tools"))
}

pex_library = rule(
    pex_library_impl,
    attrs=pex_attrs
)

pex_binary = rule(
    pex_binary_impl,
    attrs=pex_bin_attrs,
    executable=True
)

All 29 comments

@Shaywei

I am definitely interested in this!
Can't promise anything, but if I'll get some time to hack on Bazel, I'll definitely give this a shot!

Registering interest in this as well.

In the mean time, https://github.com/mihaibivol/bazel_pipy_rules provides some hacks for getting PyPI-based libraries into the build system.

/cc @meteorcloudy

This is more of a matter of naming but would it be confusing to have workspace rules called *_library that appear to be regular build rules? Would it be better to name this rule pypi_package instead?

I have prototyped a workspace rule for PyPI packages. I hope it helps us to implement this feature in the main Bazel repository.
https://github.com/gengo/rules_pypi

Currently this prototype does not support building extensions in Bazel sandbox.
Now I am trying to extract extension metadata from setup.py and to let cc_library build extensions as cgo support in rules_go does.

BTW, @damienmg, is there any good way to get paths to python2 and python3 interpreters in Skylark rules?

There is not AFAIK :(

The pypi rules would be ok to contribute back, I am however a bit hesitant in using directly pypi to do it (rather than a more reproducible way, like download), can we have confidence in what pypi does?

@damienmg

I am however a bit hesitant in using directly pypi to do it

I agree. In the prototype, I have used pip download just for ease of implementation.
But we can write a wrapper script of pip.index.PackageFinder to parse PyPI page.
Then we can use ctx.download in Bazel.

Sounds like a good plan.

On Tue, Oct 4, 2016 at 3:45 PM Yuki Yugui Sonoda [email protected]
wrote:

@damienmg https://github.com/damienmg

I am however a bit hesitant in using directly pypi to do it

I agree. In the prototype, I have used pip download just for ease of
implementation.
But we can write a wrapper script of pip.index.PackageFinder to parse
PyPI page.
Then we can use ctx.download in Bazel.

—
You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
https://github.com/bazelbuild/bazel/issues/699#issuecomment-251391962,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADjHf-a3m1pQfmweCQjiD1ez3xS6WLR6ks5qwliUgaJpZM4Gy_5I
.

There is not AFAIK :(

:(
It would be great if I can have ctx.fragments.python so that I can call appropriate interpreters in repository rules.

repository rules does not even have access to that information :( I guess
depending on a python target could work though. But this target does not
exists in Bazel :(

On Tue, Oct 4, 2016 at 3:48 PM Yuki Yugui Sonoda [email protected]
wrote:

There is not AFAIK :(

:( I want ctx.fragments.python so that I can call appropriate
interpreters in repository rules.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/bazelbuild/bazel/issues/699#issuecomment-251392646,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADjHf4kUuR1gHJYw_9xLusDcHb2mi-6Xks5qwlksgaJpZM4Gy_5I
.

I found third party build rules for that:
https://github.com/benley/bazel_rules_pex

rules_pex looks to be mainly responsible for building PEX from the given package.
It just records the given list of PyPI packages in the PEX Manifest and let PEX to fetch and build the packages.
So I don't think rules_pex solve the issue we are trying to solve in this feature. Instead, rules_pex implements a simple workaround of the issue with some compromise of {fast, correct}.

That's correct, rules_pex delegates pypi handling to the pex application. It does provide a mechanism for staying fast and correct, by using the eggs attribute to pass python eggs or wheels (downloaded by bazel http_file rules) to pex, at least. I have been trying to come up with a way of extracting pex's pypi handling to repository rules, so I'm quite pleased to see @yugui's implementation come along!

Now I am trying to extract extension metadata from setup.py and to let cc_library build extensions as cgo support in rules_go does.

Small update.
I tried to extract extension metadata from setup.py in a branch but I realized that it was nearly impossible because build setups in setup.py is fundamentally arbitrary code execution.

The basic architecture of setup.py is that (1) the package author gives build step description to distutils.core.setup -- they can also use setuptools to collect such description from the source file tree; (2) then, execute build steps driven by the description metadata.
However some popular packages like numpy or psycopg2 customize implementation of the build steps driven by data. So we cannot expect the given data is sufficient to construct a dependency graph which Bazel needs nor that the description metadata is written in a certain common format. The only way to know how should happen with setup.py looks to be to actually execute it.

Tried another approach that assumes wheel files are available locally or remote.
https://github.com/gengo/rules_pypi/tree/feature/wheel

However this approach does not solve the problem of building wheels from sources as is.
Also it is not so easy to build wheels without resolving package dependencies -- e.g. pip wheel pyxDamerauLevenshtein fails unless we install numpy in a virtualenv.

Can we provide a way to build a virtualenv? That would be awesome to have a comprehensive suite of tooling to work with pip.

/cc @meteorcloudy

The pex rules use virtualenv in a genrule to bootstrap the pex builder:
https://github.com/benley/bazel_rules_pex/blob/master/pex/BUILD#L4

It's a bit gross, but I suspect it wouldn't be too hard to generalize into a skylark py_virtualenv rule. What I'm unsure about is how to handle the output in a way that makes it useful to other rules without turning it into a tarball.

Here's our tentative solution.
https://gist.github.com/yugui/b7e80987dc2077754d91750b26eca3de

I have implemented it as a repository_rule so that the output is available as a plain py_library without archiving files -- it requires the rule to generate an unpredictable set of files.

Findings from my prototype.

  1. It is possible to locate source archives or python wheels and download them with ctx.download.
  2. We cannot know what build steps are required to build the target PyPI package because setup.py is actually arbitrary code execution even though it looks to be metadata-driven.
  3. We cannot know a PyPI package alone in a sandbox because its setup.py often depends on another PyPI package. This is another reason why we need to actually install the package in a virtualenv.
  4. Not all PyPI packages are zip-safe. So,
  5. Some good examples of major PyPI packages.

    • Numpy -- it does not require any C libraries installed, but its build process is complicated. It is not zip-safe.

    • pyyaml -- it depends on a C library, libyaml.

    • pyxDamerauLevenshtein -- pip install pyxDamerauLevenshtein will fail without installing numpy before. pip install numpy pyxDamerauLevenshtein will also fail.

One unfortunate thing I ran into while experimenting with your implementation: On MacOS, the default case-preserving-but-insensitive HFS+ filesystem causes setup.py builds to break. Setuptools (and presumably distutils) seems to insist on creating a directory called build/ in the source root, but bazel's BUILD file makes that impossible.

(edit: only applies when applying py_requirements to a source repo, not when using the -r requirements.txt style)

@yugui Will you be able to share pypi_universal_repository, required by https://gist.github.com/yugui/b7e80987dc2077754d91750b26eca3de?

So I have been messing around with this for a while and came up with something very short. I loved @benley implementation of pex_rules and ended up learning a ton from it and implementing my own but the missing part was the pex,pip,setuptools dependencies that had to come in during the analysis stage. After reviewing @yugui repo for her implementation of the pypi_rules I ended up creating something somewhat similar in concept.

Sorry for posting it in raw form. I am kinda short on time and just wanted to share for others.

WORKSPACE

load('//tools/build_rules:pypi_rules.bzl', 'pypi_repositories')

pypi_repositories(['pex', 'protobuf'])

The syntax for pypi_repositories is exactly as one that is used with pip since that function just pipes things into get-pip.py. One could use syntax like "setuptools==x.x.x" etc.

pypi_rules.bzl

_BUILD_FILE = """
filegroup(
    name = 'pip_tools',
    srcs = glob(
        include = ['bin/**/*', 'site-packages/**/*'],
        exclude = [
            # Illegal as Bazel labels but are not required by pip.
            "site-packages/setuptools/command/launcher manifest.xml",
            "site-packages/setuptools/*.tmpl",
        ]
    ),
    visibility = ['//visibility:public']
)
"""


def _pip_tools_impl(ctx):
    getpip = ctx.path(ctx.attr._getpip)
    tools = ctx.path('site-packages')

    command = ['python3', str(getpip)]
    command += list(ctx.attr.packages)
    command += ['--target', str(tools)]
    command += ['--install-option', '--install-scripts=%s' % ctx.path('bin')]
    command += ['--no-cache-dir']
    ctx.execute(command)
    ctx.file('BUILD', _BUILD_FILE, False)


_pip_tools = repository_rule(
    _pip_tools_impl,
    attrs={
        'packages': attr.string_list(),
        '_getpip': attr.label(
            default=Label('@getpip//file:get-pip.py'),
            allow_single_file=True,
            executable=True,
            cfg='host'
        )
    }
)


def pypi_repositories(packages=None):
    native.http_file(
        name="getpip",
        url="https://bootstrap.pypa.io/get-pip.py",
        sha256="19dae841a150c86e2a09d475b5eb0602861f2a5b7761ec268049a662dbd2bd0c"
    )

    _pip_tools(
        name="pypi",
        visibility=['//visibility:public'],
        packages=packages if packages else []
    )

    native.bind(
        name="pip_tools",
        actual="@pypi//:pip_tools",
    )

I am including the pex_rules just for the visualization of how it is actually used. Concentrate on the ctx.action() inside of pex_binary_impl() and _pip_tools attr in pex_bin_attrs for how the dependencies end up getting propagated from the cache. I ended up using --no-cache-dir flag for now but obviously one could expand on this and make it more advanced by allowing it actually check with whatever is installed on the local machine to save time from forcing a download every time a build is made.

pex_rules.bzl

pex_file_types = FileType([".py"])


def collect_transitive_srcs(ctx):
    transitive_srcs = set(order="compile")
    for dep in ctx.attr.deps:
        transitive_srcs += dep.transitive_srcs
    transitive_srcs += pex_file_types.filter(ctx.files.srcs)
    return transitive_srcs


def collect_transitive_reqs(ctx):
    transitive_reqs = set(order="compile")
    for dep in ctx.attr.deps:
        transitive_reqs += dep.transitive_reqs
    transitive_reqs += ctx.attr.reqs
    return transitive_reqs


def pex_library_impl(ctx):
    build_path = '/'.join(ctx.build_file_path.split('/')[:-1])
    transitive_srcs = collect_transitive_srcs(ctx)
    transitive_reqs = collect_transitive_reqs(ctx)
    transitive_reqs += set([build_path])
    return struct(
        files=set(),
        transitive_srcs=transitive_srcs,
        transitive_reqs=transitive_reqs
    )


def pex_binary_impl(ctx):
    build_path = '/'.join(ctx.build_file_path.split('/')[:-1])
    transitive_srcs = collect_transitive_srcs(ctx)
    transitive_reqs = collect_transitive_reqs(ctx)
    transitive_reqs += set([build_path])

    command = 'external/pypi/bin/pex ' + \
              '%s ' % ' '.join([f for f in transitive_reqs]) + \
              ('-v ' if ctx.attr.verbose else ' ') + \
              '--entry-point=%s ' % ctx.attr.entry_point + \
              '--output-file=%s ' % ctx.outputs.executable.path + \
              '--python=%s' % ctx.attr.interpreter

    ctx.action(
        mnemonic='PexCompile',
        inputs=list(transitive_srcs + ctx.attr._pip_tools.files),
        command=command,
        outputs=[ctx.outputs.executable],
        env={
            'PATH': '/bin:/usr/bin:/usr/local/bin',
            'PYTHONPATH': 'external/pypi/site-packages',
            'LANG': 'en_US.UTF-8',
            'PEX_ROOT': '.pex'
        },
    )

    return struct(files=set([ctx.outputs.executable]))


pex_attrs = {
    'srcs': attr.label_list(allow_files=True),
    'reqs': attr.string_list(),
    'deps': attr.label_list(
        providers=[
            'transitive_srcs',
            'transitive_reqs'
        ],
        allow_files=False
    )
}

pex_bin_attrs = pex_attrs + {
    'entry_point': attr.string(mandatory=True),
    'interpreter': attr.string(default='python3.5'),
    'verbose': attr.bool(default=False),
    "_pip_tools": attr.label(default=Label("//external:pip_tools"))
}

pex_library = rule(
    pex_library_impl,
    attrs=pex_attrs
)

pex_binary = rule(
    pex_binary_impl,
    attrs=pex_bin_attrs,
    executable=True
)

Is there a progress on this?

No sorry.

Thanks @trivigy for the code snippet. I modified the code to work with python 2.7.

Here's the full working example: https://github.com/tanin47/bazel-pex-pip/blob/master/pypi.bzl

Just found this thread, but I have a version of the rules (in progress) here built around the pip concept of .whl files.

Basically, it relies on pip wheel to translate requirements.txt into .whl files (either by fetching them, or by building them). Once in .whl form, it imports the content from each .whl into a py_library, importing dependency data from foo.dist-info/metadata.json.

I'd love feedback on the PR, or reports of any issues people may have with them.

My PR is merged, so I'm going to close this :)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

filipesilva picture filipesilva  Â·  3Comments

GaofengCheng picture GaofengCheng  Â·  3Comments

davidzchen picture davidzchen  Â·  3Comments

cyberbono3 picture cyberbono3  Â·  3Comments

sandipmgiri picture sandipmgiri  Â·  3Comments