pants sends invalid input root to remote execution

Created on 17 Sep 2020  Â·  14Comments  Â·  Source: pantsbuild/pants

Pants appears to be sending an invalid input root to Buildbarn when I use the example-python repo (with protobuf support) as the project.

client_1  | 04:38:14.03 [DEBUG] Got (nested) execute response: result {execution_metadata {worker: "{\"datacenter\":\"paris\",\"hostname\":\"ubuntu-worker.example.com\",\"rack\":\"4\",\"slot\":\"15\",\"thread\":\"5\"}" queued_timestamp {seconds: 1600317494 nanos: 15424400} worker_start_timestamp {seconds: 1600317494 nanos: 15926600} worker_completed_timestamp {seconds: 1600317494 nanos: 28122700} input_fetch_start_timestamp {seconds: 1600317494 nanos: 18120100} input_fetch_completed_timestamp {seconds: 1600317494 nanos: 28122700}}} status {code: 3 message: "Failed to create input directory \".\": Invalid filename: \"\""} message: "Action details (uncached result): /uncached_action_result/remote-execution/0f697c1f9194369758e5e5bb77f7189d2b6b5d2da611725a42be5d6c56171434/378/"

The error message Failed to create input directory \".\": Invalid filename: \"\"" suggests an issue similar to https://github.com/pantsbuild/pants/pull/10707 where relative paths needed to be normalized.

Looking at the input root, it seems that we passed / as the name of a file, which is invalid:

Screen Shot 2020-09-16 at 9 55 52 PM

To reproduce, in the example-python repo:

$ git remote add tdyas https://gitlab.com/tdyas/remote-apis-testing.git
$ git fetch tdyas
$ git checkout pants_client_support_protobuf
$ cd docker-compose
$ ./run.sh -s docker-compose-buildbarn.yml -c docker-compose-pants.yml

All 14 comments

The underlying Action submitted for execution was:


client_1 | 04:38:13.97 [DEBUG] built REv2 request (build_id=pants_run_2020_09_17_04_38_11_744_29884c4f83c94ac9966046ea45b1e419): action=command_digest {hash: "59679df9aaec89215aa24e2ad26f0b92fc246d3369aa5872f91da8cad41a16ac" size_bytes: 162} input_root_digest {hash: "94a887816e440de5c7e1cf1b35728568c029c2333ca5adc296e4c457381702f6" size_bytes: 245}; command=arguments: "./bin/protoc" arguments: "--python_out" arguments: "_generated_files" arguments: "helloworld/util/proto/config.proto" environment_variables {name: "PANTS_CACHE_KEY_TARGET_PLATFORM" value: "none"} output_directories: "_generated_files" platform {properties {name: "OSFamily" value: "Linux"}}; execute_request=instance_name: "remote-execution" action_digest {hash: "367097432c116e1944d46bd802fb0a5d0f5a9b896b5cce00b8d6eb978d38e228" size_bytes: 142}

Very likely an issue with source roots. This repo uses / as a source root, and our handling is sometimes off for it as it normally requires special casing. We need to do a better job testing it, given that Pants and Toolchain do not use this top-level source root.

I’ll take a look tomorrow if still relevant.

Sent from my iPhone.

On Sep 16, 2020, at 9:57 PM, Tom Dyas notifications@github.com wrote:


Pants appears to be sending an invalid input root to Buildbarn when I use the example-python repo (with protobuf support) as the project.

client_1 | 04:38:14.03 [DEBUG] Got (nested) execute response: result {execution_metadata {worker: "{\"datacenter\":\"paris\",\"hostname\":\"ubuntu-worker.example.com\",\"rack\":\"4\",\"slot\":\"15\",\"thread\":\"5\"}" queued_timestamp {seconds: 1600317494 nanos: 15424400} worker_start_timestamp {seconds: 1600317494 nanos: 15926600} worker_completed_timestamp {seconds: 1600317494 nanos: 28122700} input_fetch_start_timestamp {seconds: 1600317494 nanos: 18120100} input_fetch_completed_timestamp {seconds: 1600317494 nanos: 28122700}}} status {code: 3 message: "Failed to create input directory \".\": Invalid filename: \"\""} message: "Action details (uncached result): /uncached_action_result/remote-execution/0f697c1f9194369758e5e5bb77f7189d2b6b5d2da611725a42be5d6c56171434/378/"
The error message Failed to create input directory \".\": Invalid filename: \"\"" suggests an issue similar to #10707 where relative paths needed to be normalized.

Looking at the input root, it seems that we passed / as the name of a file, which is invalid:

To reproduce:

git checkout --branch=pants_client_support_protobuf https://gitlab.com/tdyas/remote-apis-testing.git
cd remote-apis-testing/docker-compose
./run.sh -s docker-compose-buildbarn.yml -c docker-compose-pants.yml
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.

It is relevant. Remote execution should never send an invalid input root regardless of other aspects of Pants (e.g., source roots). So at the very least, Pants needs additional validation work for input roots.

I'm seeing this for the input_digest to the Process that calls bin/protoc:

Snapshot(digest=Digest(fingerprint='e9b36d2cd63642a8ed0a601027fac57c8e5da642b5ee4869b65cc65e1a58cea6', serialized_bytes_length=415), files=('bin/protoc', 'helloworld/util/proto/config.proto', 'include/google/protobuf/any.proto', 'include/google/protobuf/api.proto', 'include/google/protobuf/compiler/plugin.proto', 'include/google/protobuf/descriptor.proto', 'include/google/protobuf/duration.proto', 'include/google/protobuf/empty.proto', 'include/google/protobuf/field_mask.proto', 'include/google/protobuf/source_context.proto', 'include/google/protobuf/struct.proto', 'include/google/protobuf/timestamp.proto', 'include/google/protobuf/type.proto', 'include/google/protobuf/wrappers.proto', 'readme.txt'), dirs=('_generated_files', 'bin', 'helloworld', 'helloworld/util', 'helloworld/util/proto', 'include', 'include/google', 'include/google/protobuf', 'include/google/protobuf/compiler'))

This is from running ./pants_from_sources test helloworld/util:: with this diff applied to Pants (8f4e3988e8f863a92dae4631fced28c85e40f7ec):

diff --git a/src/python/pants/backend/codegen/protobuf/python/rules.py b/src/python/pants/backend/codegen/protobuf/python/rules.py
index 7b6deda98..1f73acfae 100644
--- a/src/python/pants/backend/codegen/protobuf/python/rules.py
+++ b/src/python/pants/backend/codegen/protobuf/python/rules.py
@@ -74,7 +74,7 @@ async def generate_python_from_protobuf(
     )

     input_digest = await Get(
-        Digest,
+        Snapshot,
         MergeDigests(
             (
                 all_sources_stripped.snapshot.digest,
@@ -83,6 +83,9 @@ async def generate_python_from_protobuf(
             )
         ),
     )
+    import logging
+    logger = logging.getLogger(__name__)
+    logger.info(input_digest)

     result = await Get(
         ProcessResult,
@@ -93,7 +96,7 @@ async def generate_python_from_protobuf(
                 output_dir,
                 *target_sources_stripped.snapshot.files,
             ),
-            input_digest=input_digest,
+            input_digest=input_digest.digest,
             description=f"Generating Python sources from {request.protocol_target.address}.",
             level=LogLevel.DEBUG,
             output_directories=(output_dir,),

I don't see an entry for an empty file, but possibly the __repr__ is misleading for the Snapshot and it is still there?

I added some debug logging to digest merging. The digest for the downloaded protoc has the empty filename _before_ any of the digest merging occurs:

07:28:18.17 [INFO] digest = Ok(Some(directories {name: "helloworld" digest {hash: "757487a61bf7eba5b703922f6e32b508ab653298e9fb98e998a266b2112c54c5" size_bytes: 78}}))
07:28:18.17 [INFO] digest = Ok(Some(directories {digest {hash: "ef427fc5175ee72b07538148decff7dd930b4449fcd5ad1d4970268407820c8f" size_bytes: 243}}))
07:28:18.17 [INFO] digest = Ok(Some(directories {name: "_generated_files" digest {hash: "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"}}))
07:28:18.17 [INFO] merged = Ok(Some(directories {digest {hash: "ef427fc5175ee72b07538148decff7dd930b4449fcd5ad1d4970268407820c8f" size_bytes: 243}} directories {name: "_generated_files" digest {hash: "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"}} directories {name: "helloworld" digest {hash: "757487a61bf7eba5b703922f6e32b508ab653298e9fb98e998a266b2112c54c5" size_bytes: 78}}))

RemovePrefix intrinsic is being passed a sub-Directory which doesn't have the name:

09:02:11.30 [INFO] strip_prefix: dir=directories {name: "out" digest {hash: "ee8dd84213b9b3c19ab12fe50f5758d5b81b03388aeba4dd7037fc4e7450a3e2" size_bytes: 73}}
09:02:11.30 [INFO] strip_prefix: "out" => directories {digest {hash: "ef427fc5175ee72b07538148decff7dd930b4449fcd5ad1d4970268407820c8f" size_bytes: 243}}

To reproduce:

  1. Ensure you are running on a Linux system (a VM is fine).

  2. Clone the remote-apis-testing project including my fork of it:

$ git clone https://gitlab.com/remote-apis-testing/remote-apis-testing.git
$ cd remote-apis-testing
$ git remote add tdyas https://gitlab.com/tdyas/remote-apis-testing.git
$ git fetch tdyas
$ git checkout --track tdyas/pants_client_support_protobuf
  1. Ensure you have a clone of the example-python repository.

  2. Ensure you have a clone of the Pants repository including my fork and debugging branch as a sibling to the remote-apis-testing clone:

$ git clone https://github.com/pantsbuild/pants.git
$ cd pants
$ git remote add tdyas https://github.com/tdyas/pants.git
$ git fetch tdyas 
$ git checkout --track tdyas/protobuf_input_root_issues

This branch on my fork adds the debugging code that produced the logs in earlier comments.

  1. Start a BuildBarn server in a separate terminal:
$ cd remote-apis-testing/docker-compose
$ DOCKER_BUILDKIT=1 docker build --target=bb-runner -t bb-runner:latest docker
$ sudo rm -rf bb worker storage-*
$ mkdir -p worker/{build,cache,remote-execution}
$ docker-compose -f docker-compose-buildbarn.yml up

You now have a BuildBarn instance running at 127.0.0.1:8980. (It is necessary to use the pants_client_support_protobuf branch in order to ensure Python 3 is installed in the runner environment as well as the unzip command needed by the Pants external tool support.)

  1. Run the example-python tests against that BuildBarn instance:
$ cd example-python
$ rm -rf ~/.cache/pants/lmdb_store/
$ ./pants_from_sources -ldebug --remote-execution --remote-execution-server=127.0.0.1:8980 --remote-store-server=127.0.0.1:8980  --remote-instance-name=remote-execution --remote-execution-extra-platform-properties=OSFamily=Linux test helloworld::

Full log from the running the above instructions:

log.txt

Thanks for adding all of that logging @tdyas . So far I have not run the example (sorry!), but from looking at the log, my hypothesis is that:

  1. When we run a process to extract the protobuf executable in https://github.com/pantsbuild/pants/blob/2fd32f02bd2cb26a1f3fffffee1a2880a0d5631d/src/python/pants/core/util_rules/archive.py#L47-L61, we use an output_directory out/.
  2. Buildgrid interprets the trailing slash as representing a directory with an empty name: ie, a directory made up of the components ["out", ""]. According to the spec (https://github.com/bazelbuild/remote-apis/blob/master/build/bazel/remote/execution/v2/remote_execution.proto#L634-L653) an empty path name is legal, but afaict, it's a non-nonsensical thing to allow.
  3. Remote execution for the extraction process returns a directory for ee8dd84213b9b3c19ab12fe50f5758d5b81b03388aeba4dd7037fc4e7450a3e2 which contains the top-level out directory and a nested "empty"-directory-named child.
  4. The strip prefix call a few lines down from the execution observes the valid parent and invalid child (the invalid child has a digest of ef427fc5175ee72b07538148decff7dd930b4449fcd5ad1d4970268407820c8f) but successfully strips the prefix
  5. etc

So:

  1. I'm going to adjust our Process construction code to either fail-for or normalize unnormalized output_directories/output_files, because we can't guarantee that implementations/servers will behave a particular way. Failing feels maybe-reasonable (for now), because it's easy to loosen the restriction in the future.
  2. I'm considering adjusting https://github.com/pantsbuild/pants/blob/2fd32f02bd2cb26a1f3fffffee1a2880a0d5631d/src/rust/engine/process_execution/bazel_protos/src/verification.rs#L5 to ban empty child names and filing a bug in https://github.com/bazelbuild/remote-apis to add that to the definition of "canonical".

cc @illicitonion

Buildgrid interprets the trailing slash as representing a directory with an empty name: ie, a directory made up of the components ["out", ""]. According to the spec (https://github.com/bazelbuild/remote-apis/blob/master/build/bazel/remote/execution/v2/remote_execution.proto#L634-L653) an empty path name is legal, but afaict, it's a non-nonsensical thing to allow.

The trailing slash is actually illegal according to the spec:

The path MUST NOT include a trailing slash, nor a leading slash, being a relative path.

https://github.com/bazelbuild/remote-apis/blob/f54876595da9f2c2d66c98c318d00b60fd64900b/build/bazel/remote/execution/v2/remote_execution.proto#L507-L508

@stuhood: Just to prove out the hypothesis, would a good test be to just change the output directory to out?

Yes. If you already have the test environment set up, doing that would validate.

That worked. This diff works around the issue:

diff --git a/src/python/pants/core/util_rules/archive.py b/src/python/pants/core/util_rules/archive.py
index cf7ccd39a..d7e993988 100644
--- a/src/python/pants/core/util_rules/archive.py
+++ b/src/python/pants/core/util_rules/archive.py
@@ -44,7 +44,7 @@ async def maybe_extract(extractable: MaybeExtractable) -> ExtractedDigest:
     digest = extractable.digest
     snapshot = await Get(Snapshot, Digest, digest)
     if len(snapshot.files) == 1:
-        output_dir = "out/"
+        output_dir = "out"
         extraction_cmd = get_extraction_cmd(snapshot.files[0], output_dir)
         if extraction_cmd:
             extraction_cmd_str = " ".join(extraction_cmd)
Was this page helpful?
0 / 5 - 0 ratings