Bazel: There's no way to pass directories between rules

Created on 8 Mar 2016 · 8Comments · Source: bazelbuild/bazel

I have a binary that requires a significant number of generated data files generated from one source file.

As far as I can tell, Bazel has no way to express this. The architecture of Bazel forces us to enumerate our outputs during the Loading phase, but we won't have the list of outputs until we parse the source file, and we can't perform any parsing actions during the Loading phase.

Here are some workarounds I've considered:

1) I could zip the runfiles up during the build and declare only that zip as a data file. At runtime, I could run a wrapper script to unzip them into a temp directory before running the real binary. But it's a lot of files; launching the binary will be unacceptably slow.

2) The build is hermetic, so the output files are technically predictable; I could run the generator once, enumerate the files, and declare them all explicitly as outputs in my BUILD file. But then the BUILD file would just duplicate the data from the source file; I'd really prefer not to do that.

3) I tried generating the data files in a directory and declaring just the directory as an output, but I got a scary warning that "dependency checking of directories is unsound."

P2 misc > misc feature request

Source

dfabulich

👍5

Most helpful comment

We are already working on allowing actions to generate a set of output files which is subsequently treated as a unit. As Austin said, it's a significant change, and it will be a while before it's fully working and stable.

ulfjack on 9 Mar 2016

👍3

All 8 comments

The current recommended best practice is 1). This is how Java does things. You may be able to use tar to do this faster. 2) is a valid way to do that. Could you re-use out the logic to generate the file list in your code and use that to generate the BUILD file?

AustinSchuh on 8 Mar 2016

Is this a bug in Bazel that can be fixed? In particular, can you explain what's wrong with option 3 and how/whether dependency checking directories could be done soundly?

I've seen the documentation that says, "dependency checking of directories is unsound," but I don't get it. Verifying the integrity of a zip file containing a tree of files is just as sound as verifying the integrity of the files unextracted. It's admittedly slower to stat/checksum a bunch of files than it is to stat/checksum a single zip file, but it's equally sound, right?

As for Option 2, that's a bad option because it means whenever anyone changes the source file, we'll have to do a manual pre-Bazel step to generate the BUILD file; Bazel can't generate its own BUILD files until after the loading phase.

dfabulich on 8 Mar 2016

👍2

You are correct about how 2 would have to work. You could add some checking to verify that the number of outputs matches the expected number of outputs so that while people will have to re-run a script to generate the BUILD file, Bazel will check that you got all the outputs right.

One of the Bazel devs will have to weigh in to get you more clarity, but my understanding is that Bazel treats each file as a target that can be referenced elsewhere. That's pretty well baked into Bazel, so any changes to that behavior would take a lot of work. You would have to convince them that there is enough value for this to be worth the work, and even then, it would be a while before they would have time. Using a directory also isn't a well-tested mechanism and likely won't be. That puts you on a lightly tested code path, which should also be a consideration.

AustinSchuh on 8 Mar 2016

ulfjack on 9 Mar 2016

👍3

Any updates on this? I'm trying to generate API clients for various languages from a single Swagger spec and don't know ahead of time which files will be generated.

nimerritt on 2 Jan 2018

👍1

You can create a directory output in a Skylark rule with ctx.actions.declare_directory, see here:
https://docs.bazel.build/versions/master/skylark/lib/actions.html#declare_directory

I can't say off the top of my head how to consume the directory from a downstream action.

ulfjack on 8 Jan 2018

👎1

You can consume it by declaring a dependency on the return value from declare_directory. That'll make the directory available in the downstream action. I think it may not work with remote execution yet.

ulfjack on 8 Jan 2018

Is there a fix for getting declare_directory to work remotely? Or an issue to follow?