Bazel: Package runfiles along with C++ binaries in pkg_* rules

Created on 3 Jan 2018  ยท  16Comments  ยท  Source: bazelbuild/bazel

Description of the problem / feature request / question:

(https://groups.google.com/forum/#!topic/bazel-discuss/5r_Ajw_j-ZI for context)

The current pkg_* rules don't package runfiles along with C++ binaries. This behavior makes it difficult to deploy an entire C++ application to machines that don't have source access.

Can we update the packaging rules to include runfiles along with binaries?

Environment info

  • Operating System:
    MacOS 10.13.2

  • Bazel version (output of bazel info release):
    release 0.8.1-homebrew

P4 team-Rules-CPP bug

Most helpful comment

Major +1 for this

This also is the same for python. At area17, we want to use this for deployment, but it is difficult currently because there isn't a good way to package everything we need together (for both py and cc).

For example, for our python targets, we often depend on pip targets (which are installed within bazel via the pip_import rule). Those pip imports aren't packaged.

This would be a great feature!

All 16 comments

cc @lberki @ventrescadeatun
Lukacs do you know if this is by design? It looks broken to me. To me pkg_tar should tar everything that's needed to execute the cc_binary, am I wrong? :)

Yep, this is by design because the runfiles are not always necessary.

We internally have a pkg_runfiles rule that packages the runfiles of a binary, which should simply be open sourced.

Major +1 for this

This also is the same for python. At area17, we want to use this for deployment, but it is difficult currently because there isn't a good way to package everything we need together (for both py and cc).

For example, for our python targets, we often depend on pip targets (which are installed within bazel via the pip_import rule). Those pip imports aren't packaged.

This would be a great feature!

@laszlocsomor Is this something your work on runfiles will make simple(r)? :)

@mhlopko : The runfiles library will make it very easy to ~_use_~ _look up_ runfiles, once they are available one way or another. _Packaging_ the runfiles is a separate issue; my work is unrelated to that.

Ok, thanks for the info!

Thanks for looking into this, @mhlopko and @lberki.

Does pkg_runfiles include the associated binary as well? If not, this feature request is for a slightly different behavior: to provide a packaging rule that produces a self-contained package of a binary and its runfiles.

Sorry for the silence. I just looked into the code and it looks like you should be able to write a skylark rule that will provide this behavior. https://docs.bazel.build/versions/master/skylark/lib/runfiles.html and https://docs.bazel.build/versions/master/skylark/rules.html#runfiles should help you get started.

It looks like some commits have made it into 0.15 that 100% resolve this issue on c++ for me (https://github.com/bazelbuild/bazel/commit/f90ed652e223fffdf3f64cf1d9f49663be540b18#diff-73cc3e84377e7c63ef4406039e060016), but doesn't necessarily fix this for python still.

The new include_runfiles parameter to the pkg_tar rule will copy over all the required files for both c++ and python, but don't correctly update the python runfile paths to reflect. The c++ paths work fine.

I created a repository that demonstrates the issue here https://github.com/curtismuntz/bazel_pkg_tar

After building via bazel build src:foo_tar and extracting the produced tarball, the following tree structure exists:

$ tree  -L 3
.
โ”œโ”€โ”€ opt
โ”‚ย ย  โ”œโ”€โ”€ foo
โ”‚ย ย  โ””โ”€โ”€ foo.py
โ””โ”€โ”€ pypi__numpy_1_13_1
    โ”œโ”€โ”€ numpy
    โ”‚ย ย  โ”œโ”€โ”€ add_newdocs.py
    โ”‚ย ย  โ”œโ”€โ”€ compat
    โ”‚ย ย  โ”œโ”€โ”€ __config__.py
    โ”‚ย ย  โ”œโ”€โ”€ core
    โ”‚ย ย  โ”œโ”€โ”€ ctypeslib.py
    โ”‚ย ย  โ”œโ”€โ”€ _distributor_init.py
    โ”‚ย ย  โ”œโ”€โ”€ distutils
    โ”‚ย ย  โ”œโ”€โ”€ doc
    โ”‚ย ย  โ”œโ”€โ”€ dual.py
    โ”‚ย ย  โ”œโ”€โ”€ f2py
    โ”‚ย ย  โ”œโ”€โ”€ fft
    โ”‚ย ย  โ”œโ”€โ”€ _globals.py
    โ”‚ย ย  โ”œโ”€โ”€ _import_tools.py
    โ”‚ย ย  โ”œโ”€โ”€ __init__.py
    โ”‚ย ย  โ”œโ”€โ”€ lib
    โ”‚ย ย  โ”œโ”€โ”€ linalg
    โ”‚ย ย  โ”œโ”€โ”€ ma
    โ”‚ย ย  โ”œโ”€โ”€ matlib.py
    โ”‚ย ย  โ”œโ”€โ”€ matrixlib
    โ”‚ย ย  โ”œโ”€โ”€ polynomial
    โ”‚ย ย  โ”œโ”€โ”€ random
    โ”‚ย ย  โ”œโ”€โ”€ setup.py
    โ”‚ย ย  โ”œโ”€โ”€ testing
    โ”‚ย ย  โ”œโ”€โ”€ tests
    โ”‚ย ย  โ””โ”€โ”€ version.py
    โ”œโ”€โ”€ numpy-1.13.1.data
    โ”‚ย ย  โ””โ”€โ”€ scripts
    โ””โ”€โ”€ numpy-1.13.1.dist-info
        โ”œโ”€โ”€ DESCRIPTION.rst
        โ”œโ”€โ”€ METADATA
        โ”œโ”€โ”€ metadata.json
        โ”œโ”€โ”€ RECORD
        โ”œโ”€โ”€ top_level.txt
        โ””โ”€โ”€ WHEEL

Attempting to run opt/foo produces:

$ ./opt/foo   
Traceback (most recent call last):
  File "./foo", line 203, in <module>
    Main()
  File "./foo", line 139, in Main
    module_space = FindModuleSpace()
  File "./foo", line 86, in FindModuleSpace
    raise AssertionError('Cannot find .runfiles directory for %s' % sys.argv[0])
AssertionError: Cannot find .runfiles directory for ./foo

And trying to call opt/foo.py directly:

$ ./opt/foo.py 
Traceback (most recent call last):
  File "./foo.py", line 2, in <module>
    import numpy as np
ImportError: No module named 'numpy'

Pretty sure this is a bazel core py_binary issue, but is the plan for pkg_tar to provide this interface via include_runfiles? Or is it best practice to implement a skylark rule for this functionality?

CC @c4urself

@lberki : how hard would it be to opensource pkg_runfiles? The question came up again: https://stackoverflow.com/questions/52823983

Actually, it should be easy to implement pkg_runfiles in Starlark:

  • DefaultInfo.data_runfiles (or DefaultInfo.default_runfiles?) contains the File objects for the runfiles
  • FilesToRunProvider.runfiles_manifest contains File object for the runfiles manifest
  • It should be easy to implement a custom C++ binary or Bash script that can parse a runfiles manifest and tars/zips up the runfiles it references
  • Using that binary/script, our hypothetical pkg_runfiles rule could create a ctx.actions.run / ctx.actions.run_shell with the binary/script, pass all the File objects for the runfiles and the manifest as inputs, and expect just a tar or zip as the output.

WDYT?

FilesToRunProvider.runfiles_manifest contains File object for the runfiles manifest

I think this attribute should be removed from Starlark. The manifest contains absolute paths, so depending on this artifact is a good way to make your rule non-reproducible across machines. I raised this point on the mailing list, but it didn't generate much interest.

What would be helpful would be extending the Starlark runfiles API as I have proposed to allow complete introspection of runfiles objects. With my CL merged, it should be possible for Starlark to construct a runfiles tree identical to the one build-runfiles makes in whatever package is desired.

That's a good point.

Your proposal sounds good to me in general. My only question is how to distinguish runfiles types -- normal symlinks vs. empty files (__init__.py) vs. whatever else there could be? Implementing a pkg_runfiles rule would need that ability.

@benjaminp : let's continue this discussion on the thread: https://groups.google.com/d/msg/bazel-dev/uCfpNnVLJa4/Uomy07-iBgAJ

@laszlocsomor : depends on what you want to do -- we seem to have an API so that one can look into runfiles trees to see which symlinks there are and where they point. There are all sorts of little wrinkles, though, that need to be considered. They are mostly for Google-internal awful hacks we had the sense not to contaminate Bazel with, but still, some auditing will be required.

Was this page helpful?
0 / 5 - 0 ratings