Since bazel needs to have deterministic outputs for each rule, archive files such as zip, tar or jar are often used when we need multiple variable outputs.
Unfortunately, off the shelf archiving often embeds metadata (such as timestamps) or can have non-deterministic order of building the files. So rules authors need to jump through hoops (using touch or custom code to create zips or jars).
What would be nice is to have an action like file_action that allows you to create a directory and a name of an archive file (it returns a tuple pair). So users can copy or write to that directory and after the rule is complete it will be deterministically zipped (or tarred) so that if the directory contents are the same the archive will be bit-for-bit identical.
On the consumption side another function could open the archive and return a directory to read the contents.
This has an additional benefit that it sidesteps the use of zip/tar on the path which can have different flags on Linux, macOS or windows.
@laurentlb , @vladmos : FYI
Of course, a longer term solution is to allow directories as bazel outputs. So bazel would compute the hash by sorting the paths and hashing in the sorted order.
This would have the additional performance benefit of not creating and opening zips all the time just to use them as RPC between different build steps.
@johnynek : We actually have that feature in the Google-internal version of Bazel, and it's something we'd rather forget the existence of, to put it lightly. (You can find traces of it in opensource Bazel too, if you look for "fileset".) The problem isn't so much outputting a directory, as it is consuming it from downstream rules, plus making sure everything works nicely / efficiently / correctly.
The current situation is a real bummer since at least half the rules I write need this feature and the first thing I start worrying about is if I am making the archive a pure function of the inputs to the archive.
This might be somewhat related to #1311.
At the moment I'm using my own internal rule to consume targets like cc_library and filegroup to create tarballs and zips with the right file-layout. It's not that much of nightmare from my perspective. Making that cross-platform is however blocked for the moment on the issue linked above.
This is kind-of the same as #2336
I recommend @bazel-tools//zip:zipper.
Usage: external/bazel_tools/tools/zip/zipper/zipper [vxc[fC]] x.zip [-d exdir] [[zip_path1=]file1 ... [zip_pathn=]filen]
  v verbose - list all file in x.zip
  x extract - extract files in x.zip to current directory, or     an optional directory relative to the current directory     specified through -d option
  c create  - add files to x.zip
  f flatten - flatten files to use with create or extract operation
  C compress - compress files when using the create operation
x and c cannot be used in the same command-line.
For every file, a path in the zip can be specified. Examples:
  zipper c x.zip a/b/__init__.py= # Add an empty file at a/b/__init__.py
  zipper c x.zip a/b/main.py=foo/bar/bin.py # Add file foo/bar/bin.py at a/b/main.py
If the zip path is not specified, it is assumed to be the file path.
Agreed that file tree primitives would be a very user friendly addition.
@pauldraper can you confirm this zip is deterministic? it only depends on the set of files and their content, not the order of the files, nor any timestamps?
It appears to be. Also, https://github.com/bazelbuild/rules_scala/pull/286
Of course, a longer term solution is to allow directories as bazel outputs.
And this is now the case, with ctx.actions.declare_directory. The support for it can be iffy, e.g. BuildFarm doesn't support it now (though there is a fork that does).
What is the status of the request?
With declare_directory, can we close this issue?
I still think it is useful for building artifacts you might want to deploy.
I agree. Java rules, for example, work with classfile JAR archives, but not with classfile directories.
And Bazel's zipper works pretty well but lacks some nice features, e.g. the ability to map a directory path to another path in the archive.
There seems to be a general lack of portable, reproducible archiving tools (zip, tar, deb, ar, etc). I started to implement some but stopped https://github.com/lucidsoftware/rules_archive
Bazel has zipper plus some Python scripts for other archive formats. I think only zipper is accessible though.
I use minizip from zlib in one of my project to create zip archives through bazel and it works well and cross platform.
An action to run zip/tar would be useful. However, I think it should be implemented outside Bazel, in Starlark (possibly in https://github.com/bazelbuild/bazel-skylib or another repository).
Related to @laurentlb 's comment: https://stackoverflow.com/q/53163137/7778502
By putting it outside bazel we have to solve the reproducibility problem of your zip/tar toolchain.
This challenge is considerable and impacts caching. Putting something like this, which is pretty fundamental, as a library function without a need to set up any toolchain type things would be helpful.
+1
I use the zipper currently
On Thu, 15 Nov 2018 at 19:18 P. Oscar Boykin notifications@github.com
wrote:
By putting it outside bazel we have to solve the reproducibility problem
of your zip/tar toolchain.This challenge is considerable and impacts caching. Putting something like
this, which is pretty fundamental, as a library function without a need to
set up any toolchain type things would be helpful.—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/bazelbuild/bazel/issues/2414#issuecomment-439118909,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABUIF-ZKoAQjYowzdwWDrC5TzJVCSEiQks5uvaHOgaJpZM4LtDmz
.
There is now https://github.com/bazelbuild/rules_pkg which is the home of replacements for native rules for tar, zip, deb, and rpm archives.
Currently, it requires installation of Python 2 or 3.
The zip tool is @rules_pkg//:build_zip. (Unless the tar tools however, it lacks the ability to compose archive paths.)
Most helpful comment
And this is now the case, with ctx.actions.declare_directory. The support for it can be iffy, e.g. BuildFarm doesn't support it now (though there is a fork that does).