... similar to maven_jar, but downloads Python code from PyPi and makes them available in the build.
@Shaywei
I am definitely interested in this!
Can't promise anything, but if I'll get some time to hack on Bazel, I'll definitely give this a shot!
On the roadmap but might take some time, see https://docs.google.com/document/d/1jKbNXOVp2T1zJD_iRnVr8k5D0xZKgO8blMVDlXOksJg/
Registering interest in this as well.
In the mean time, https://github.com/mihaibivol/bazel_pipy_rules provides some hacks for getting PyPI-based libraries into the build system.
/cc @meteorcloudy
This is more of a matter of naming but would it be confusing to have workspace rules called *_library
that appear to be regular build rules? Would it be better to name this rule pypi_package
instead?
I have prototyped a workspace rule for PyPI packages. I hope it helps us to implement this feature in the main Bazel repository.
https://github.com/gengo/rules_pypi
Currently this prototype does not support building extensions in Bazel sandbox.
Now I am trying to extract extension metadata from setup.py and to let cc_library
build extensions as cgo support in rules_go does.
BTW, @damienmg, is there any good way to get paths to python2 and python3 interpreters in Skylark rules?
There is not AFAIK :(
The pypi rules would be ok to contribute back, I am however a bit hesitant in using directly pypi to do it (rather than a more reproducible way, like download), can we have confidence in what pypi does?
@damienmg
I am however a bit hesitant in using directly pypi to do it
I agree. In the prototype, I have used pip download
just for ease of implementation.
But we can write a wrapper script of pip.index.PackageFinder
to parse PyPI page.
Then we can use ctx.download
in Bazel.
Sounds like a good plan.
On Tue, Oct 4, 2016 at 3:45 PM Yuki Yugui Sonoda [email protected]
wrote:
@damienmg https://github.com/damienmg
I am however a bit hesitant in using directly pypi to do it
I agree. In the prototype, I have used pip download just for ease of
implementation.
But we can write a wrapper script of pip.index.PackageFinder to parse
PyPI page.
Then we can use ctx.download in Bazel.—
You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub
https://github.com/bazelbuild/bazel/issues/699#issuecomment-251391962,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADjHf-a3m1pQfmweCQjiD1ez3xS6WLR6ks5qwliUgaJpZM4Gy_5I
.
There is not AFAIK :(
:(
It would be great if I can have ctx.fragments.python
so that I can call appropriate interpreters in repository rules.
repository rules does not even have access to that information :( I guess
depending on a python target could work though. But this target does not
exists in Bazel :(
On Tue, Oct 4, 2016 at 3:48 PM Yuki Yugui Sonoda [email protected]
wrote:
There is not AFAIK :(
:( I want ctx.fragments.python so that I can call appropriate
interpreters in repository rules.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/bazelbuild/bazel/issues/699#issuecomment-251392646,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADjHf4kUuR1gHJYw_9xLusDcHb2mi-6Xks5qwlksgaJpZM4Gy_5I
.
I found third party build rules for that:
https://github.com/benley/bazel_rules_pex
rules_pex looks to be mainly responsible for building PEX from the given package.
It just records the given list of PyPI packages in the PEX Manifest and let PEX to fetch and build the packages.
So I don't think rules_pex solve the issue we are trying to solve in this feature. Instead, rules_pex implements a simple workaround of the issue with some compromise of {fast, correct}.
That's correct, rules_pex delegates pypi handling to the pex application. It does provide a mechanism for staying fast and correct, by using the eggs attribute to pass python eggs or wheels (downloaded by bazel http_file rules) to pex, at least. I have been trying to come up with a way of extracting pex's pypi handling to repository rules, so I'm quite pleased to see @yugui's implementation come along!
Now I am trying to extract extension metadata from setup.py and to let cc_library build extensions as cgo support in rules_go does.
Small update.
I tried to extract extension metadata from setup.py
in a branch but I realized that it was nearly impossible because build setups in setup.py
is fundamentally arbitrary code execution.
The basic architecture of setup.py
is that (1) the package author gives build step description to distutils.core.setup
-- they can also use setuptools
to collect such description from the source file tree; (2) then, execute build steps driven by the description metadata.
However some popular packages like numpy
or psycopg2
customize implementation of the build steps driven by data. So we cannot expect the given data is sufficient to construct a dependency graph which Bazel needs nor that the description metadata is written in a certain common format. The only way to know how should happen with setup.py
looks to be to actually execute it.
Tried another approach that assumes wheel files are available locally or remote.
https://github.com/gengo/rules_pypi/tree/feature/wheel
However this approach does not solve the problem of building wheels from sources as is.
Also it is not so easy to build wheels without resolving package dependencies -- e.g. pip wheel pyxDamerauLevenshtein
fails unless we install numpy
in a virtualenv.
Can we provide a way to build a virtualenv? That would be awesome to have a comprehensive suite of tooling to work with pip.
/cc @meteorcloudy
The pex rules use virtualenv in a genrule to bootstrap the pex builder:
https://github.com/benley/bazel_rules_pex/blob/master/pex/BUILD#L4
It's a bit gross, but I suspect it wouldn't be too hard to generalize into a skylark py_virtualenv rule. What I'm unsure about is how to handle the output in a way that makes it useful to other rules without turning it into a tarball.
Here's our tentative solution.
https://gist.github.com/yugui/b7e80987dc2077754d91750b26eca3de
I have implemented it as a repository_rule so that the output is available as a plain py_library
without archiving files -- it requires the rule to generate an unpredictable set of files.
Findings from my prototype.
ctx.download
.setup.py
is actually arbitrary code execution even though it looks to be metadata-driven.setup.py
often depends on another PyPI package. This is another reason why we need to actually install the package in a virtualenv.genrule
-- it needs to archive the virtualenv to avoid unpredictable output, it requires a trick to unarchive the archive file at runtime like this.libyaml
.pip install pyxDamerauLevenshtein
will fail without installing numpy before. pip install numpy pyxDamerauLevenshtein
will also fail.One unfortunate thing I ran into while experimenting with your implementation: On MacOS, the default case-preserving-but-insensitive HFS+ filesystem causes setup.py builds to break. Setuptools (and presumably distutils) seems to insist on creating a directory called build/
in the source root, but bazel's BUILD
file makes that impossible.
(edit: only applies when applying py_requirements to a source repo, not when using the -r requirements.txt
style)
@yugui Will you be able to share pypi_universal_repository
, required by https://gist.github.com/yugui/b7e80987dc2077754d91750b26eca3de?
So I have been messing around with this for a while and came up with something very short. I loved @benley implementation of pex_rules and ended up learning a ton from it and implementing my own but the missing part was the pex,pip,setuptools dependencies that had to come in during the analysis stage. After reviewing @yugui repo for her implementation of the pypi_rules I ended up creating something somewhat similar in concept.
Sorry for posting it in raw form. I am kinda short on time and just wanted to share for others.
WORKSPACE
load('//tools/build_rules:pypi_rules.bzl', 'pypi_repositories')
pypi_repositories(['pex', 'protobuf'])
The syntax for pypi_repositories is exactly as one that is used with pip since that function just pipes things into get-pip.py. One could use syntax like "setuptools==x.x.x" etc.
pypi_rules.bzl
_BUILD_FILE = """
filegroup(
name = 'pip_tools',
srcs = glob(
include = ['bin/**/*', 'site-packages/**/*'],
exclude = [
# Illegal as Bazel labels but are not required by pip.
"site-packages/setuptools/command/launcher manifest.xml",
"site-packages/setuptools/*.tmpl",
]
),
visibility = ['//visibility:public']
)
"""
def _pip_tools_impl(ctx):
getpip = ctx.path(ctx.attr._getpip)
tools = ctx.path('site-packages')
command = ['python3', str(getpip)]
command += list(ctx.attr.packages)
command += ['--target', str(tools)]
command += ['--install-option', '--install-scripts=%s' % ctx.path('bin')]
command += ['--no-cache-dir']
ctx.execute(command)
ctx.file('BUILD', _BUILD_FILE, False)
_pip_tools = repository_rule(
_pip_tools_impl,
attrs={
'packages': attr.string_list(),
'_getpip': attr.label(
default=Label('@getpip//file:get-pip.py'),
allow_single_file=True,
executable=True,
cfg='host'
)
}
)
def pypi_repositories(packages=None):
native.http_file(
name="getpip",
url="https://bootstrap.pypa.io/get-pip.py",
sha256="19dae841a150c86e2a09d475b5eb0602861f2a5b7761ec268049a662dbd2bd0c"
)
_pip_tools(
name="pypi",
visibility=['//visibility:public'],
packages=packages if packages else []
)
native.bind(
name="pip_tools",
actual="@pypi//:pip_tools",
)
I am including the pex_rules just for the visualization of how it is actually used. Concentrate on the ctx.action()
inside of pex_binary_impl()
and _pip_tools
attr in pex_bin_attrs
for how the dependencies end up getting propagated from the cache. I ended up using --no-cache-dir
flag for now but obviously one could expand on this and make it more advanced by allowing it actually check with whatever is installed on the local machine to save time from forcing a download every time a build is made.
pex_rules.bzl
pex_file_types = FileType([".py"])
def collect_transitive_srcs(ctx):
transitive_srcs = set(order="compile")
for dep in ctx.attr.deps:
transitive_srcs += dep.transitive_srcs
transitive_srcs += pex_file_types.filter(ctx.files.srcs)
return transitive_srcs
def collect_transitive_reqs(ctx):
transitive_reqs = set(order="compile")
for dep in ctx.attr.deps:
transitive_reqs += dep.transitive_reqs
transitive_reqs += ctx.attr.reqs
return transitive_reqs
def pex_library_impl(ctx):
build_path = '/'.join(ctx.build_file_path.split('/')[:-1])
transitive_srcs = collect_transitive_srcs(ctx)
transitive_reqs = collect_transitive_reqs(ctx)
transitive_reqs += set([build_path])
return struct(
files=set(),
transitive_srcs=transitive_srcs,
transitive_reqs=transitive_reqs
)
def pex_binary_impl(ctx):
build_path = '/'.join(ctx.build_file_path.split('/')[:-1])
transitive_srcs = collect_transitive_srcs(ctx)
transitive_reqs = collect_transitive_reqs(ctx)
transitive_reqs += set([build_path])
command = 'external/pypi/bin/pex ' + \
'%s ' % ' '.join([f for f in transitive_reqs]) + \
('-v ' if ctx.attr.verbose else ' ') + \
'--entry-point=%s ' % ctx.attr.entry_point + \
'--output-file=%s ' % ctx.outputs.executable.path + \
'--python=%s' % ctx.attr.interpreter
ctx.action(
mnemonic='PexCompile',
inputs=list(transitive_srcs + ctx.attr._pip_tools.files),
command=command,
outputs=[ctx.outputs.executable],
env={
'PATH': '/bin:/usr/bin:/usr/local/bin',
'PYTHONPATH': 'external/pypi/site-packages',
'LANG': 'en_US.UTF-8',
'PEX_ROOT': '.pex'
},
)
return struct(files=set([ctx.outputs.executable]))
pex_attrs = {
'srcs': attr.label_list(allow_files=True),
'reqs': attr.string_list(),
'deps': attr.label_list(
providers=[
'transitive_srcs',
'transitive_reqs'
],
allow_files=False
)
}
pex_bin_attrs = pex_attrs + {
'entry_point': attr.string(mandatory=True),
'interpreter': attr.string(default='python3.5'),
'verbose': attr.bool(default=False),
"_pip_tools": attr.label(default=Label("//external:pip_tools"))
}
pex_library = rule(
pex_library_impl,
attrs=pex_attrs
)
pex_binary = rule(
pex_binary_impl,
attrs=pex_bin_attrs,
executable=True
)
Is there a progress on this?
No sorry.
Thanks @trivigy for the code snippet. I modified the code to work with python 2.7.
Here's the full working example: https://github.com/tanin47/bazel-pex-pip/blob/master/pypi.bzl
Just found this thread, but I have a version of the rules (in progress) here built around the pip
concept of .whl
files.
Basically, it relies on pip wheel
to translate requirements.txt
into .whl
files (either by fetching them, or by building them). Once in .whl
form, it imports the content from each .whl
into a py_library
, importing dependency data from foo.dist-info/metadata.json
.
I'd love feedback on the PR, or reports of any issues people may have with them.
My PR is merged, so I'm going to close this :)
Most helpful comment
So I have been messing around with this for a while and came up with something very short. I loved @benley implementation of pex_rules and ended up learning a ton from it and implementing my own but the missing part was the pex,pip,setuptools dependencies that had to come in during the analysis stage. After reviewing @yugui repo for her implementation of the pypi_rules I ended up creating something somewhat similar in concept.
Sorry for posting it in raw form. I am kinda short on time and just wanted to share for others.
WORKSPACE
The syntax for pypi_repositories is exactly as one that is used with pip since that function just pipes things into get-pip.py. One could use syntax like "setuptools==x.x.x" etc.
pypi_rules.bzl
I am including the pex_rules just for the visualization of how it is actually used. Concentrate on the
ctx.action()
inside ofpex_binary_impl()
and_pip_tools
attr inpex_bin_attrs
for how the dependencies end up getting propagated from the cache. I ended up using--no-cache-dir
flag for now but obviously one could expand on this and make it more advanced by allowing it actually check with whatever is installed on the local machine to save time from forcing a download every time a build is made.pex_rules.bzl