Pants: `CreatePex` taking ~3x longer in V2 than on the command line

Created on 22 Nov 2019  路  13Comments  路  Source: pantsbuild/pants

On the command line:

$ time pex --disable-cache black==19.3b0 setuptools -o black.pex
1.14s user 0.37s system 62% cpu 2.415 total

With this diff and running ./pants test src/python/pants/backend/python/rules:integration -- -k test_respects_config_file -s:

diff --git a/src/python/pants/backend/python/rules/pex.py b/src/python/pants/backend/python/rules/pex.py
index 6ffe54c3c..7097228fe 100644
--- a/src/python/pants/backend/python/rules/pex.py
+++ b/src/python/pants/backend/python/rules/pex.py
@@ -134,11 +134,14 @@ def create_pex(
     }
   )

+  import time
+  start = time.time()
   result = yield Get(
     ExecuteProcessResult,
     MultiPlatformExecuteProcessRequest,
     execute_process_request
   )
+  print(f"time: {time.time() - start}")
   yield Pex(directory_digest=result.output_directory_digest, output_filename=request.output_filename)


We get time: 8.380165815353394.

--

This has been an issue since at least June. See https://github.com/pantsbuild/pants/issues/7795.

engine performance

Most helpful comment

Is this actually going to buy us much? Separating the tool pex from the requirements pex is only a win if the requirements change, which isn't that frequent.

Even if so, the code is not much more complex and, importantly, things probably need to be structured this anyhow to have a reasonable hope of implementing tool shading for python like we do for the jvm going forward (#9206).

I agree though that measurement of components of the rule - extrinsic and intrinsic - is still needed. If snapshotting, for example, is slow that's a problem to quantify and fix since it is so crucial to the workings of v2.

All 13 comments

Note that apples to apples should compare time pex --disable-cache black==19.3b0 setuptools -o black.pex. The v2 rule is not leveraging the pex cache if it fires.

Note that apples to apples should compare time pex --disable-cache black==19.3b0 setuptools -o black.pex. The v2 rule is not leveraging the pex cache if it fires.

Good point. I updated the PR description to use this. V2 is still ~3x slower.

How many interpreters does the v2 example use? The CLI example uses 1, the current interpreter. If the v2 example is using interpreter constraints with open bounds it could be collecting more. When PEX collects more than one it resolves for each.

How many interpreters does the v2 example use? The CLI example uses 1, the current interpreter. If the v2 example is using interpreter constraints with open bounds it could be collecting more. When PEX collects more than one it resolves for each.

The example uses "CPython>=3.6". My system has Py36, Py37, and Py38 discoverable.

When running with the below diff, the first time I ran it I got 6.4 (compared to 8.4). But then I ran again two more times and got 8.2 and 8.5.

```diff
diff --git a/src/python/pants/backend/python/subsystems/black.py b/src/python/pants/backend/python/subsystems/black.py
index cc85a73c2..8234b7ef6 100644
--- a/src/python/pants/backend/python/subsystems/black.py
+++ b/src/python/pants/backend/python/subsystems/black.py
@@ -9,7 +9,7 @@ class Black(PythonToolBase):
options_scope = 'black'
default_requirements = ['black==19.3b0', 'setuptools']
default_entry_point = 'black:patched_main'
- default_interpreter_constraints = ["CPython>=3.6"]
+ default_interpreter_constraints = ["CPython==3.6.*"]

@classmethod
def register_options(cls, register):

I spent some time experimenting with running pex on my own linux system. Pants is running version 1.6.12 of pex, and I made sure I had the same version locally. Running this script ( https://gist.github.com/gshuflin/65b793bad41ca152d6c91a39112ca6b0 ), which replicates the argv passed to ExecuteProcessRequest when running ./pants test src/python/pants/backend/python/rules:integration -- -k test_respects_config_file -s:

gregs@gregs-dev-laptop > time test.sh 
54.53user 2.93system 0:59.18elapsed 97%CPU (0avgtext+0avgdata 236820maxresident)k
5800inputs+19808outputs (14major+1300649minor)pagefaults 0swaps
gregs@gregs-dev-laptop > time test.sh
52.20user 2.42system 0:54.13elapsed 100%CPU (0avgtext+0avgdata 236592maxresident)k
0inputs+0outputs (0major+1295682minor)pagefaults 0swaps

So, pex is taking about 54 seconds to run on my system with this set of requirements, completely independently of pants. I noticed that in the pants v2 UI the ExecuteProcessRequest associated with this command took somewhat more than 60 seconds to run, so it's possible there's still some overhead from the EPR abstraction that my local test isn't capturing. Still, it looks like the bulk of the performance problem is happening in pex rather than pants.

OK - the issue here is we create a fat PEX:
https://github.com/pantsbuild/pants/blob/8e5af2b1865befc9838b9a110bd9dc2d2cd03215/src/python/pants/backend/python/rules/python_test_runner.py#L94-L121

This is slightly worse than the v0 python pipeline for Pants @benjyw replaced with the current v1 python pipeline in Pants v1 in that the v2 monolithic pex, in addition to invalidating for user sources changes, user requirement changes or pytest requirement changes, re-builds for any of those changes without the benefit of a resolver cache.

To speed things up we need one or both of the current v1 speed hacks:

  1. Break up the monolthic pex into a pytest pex, a user requirements pex and a sources/resources pex and use PEX_PATH to merge them.
  2. Capture the /pex_root from the EPR, somehow merge these from all such EPRs and use them as inputs to all such EPRs; ie: enable the pex cache to be persistent across EPRs in some sane way.

So - yay! This is very likely footgunning and not anything sneaky like the LMDB store.

Break up the monolthic pex into a pytest pex, a user requirements pex and a sources/resources pex and use PEX_PATH to merge them.

Woot.

Capture the /pex_root from the EPR, somehow merge these from all such EPRs and use them as inputs to all such EPRs; ie: enable the pex cache to be persistent across EPRs in some sane way.

This part is, unfortunately, challenging to do without intentionally busting out of the sandbox into an absolute path. And busting out of the sandbox means the optimization won't work in the context of remote execution (it won't hurt... it just won't work).

I'll knock out the easy part here and seperate the tool pex from the requirements pex. The looming issue of cold resolves for any resolver family (resolver can't use a cache), will have to wait since it is so engine unfriendly.

Is this actually going to buy us much? Separating the tool pex from the requirements pex is only a win if the requirements change, which isn't that frequent.

AFAIK the sources aren't built into the pex but provided as input files to the EPR. Could the extra time be going on all the snapshotting and chrooting of those files? Would the daemon help here?

Is this actually going to buy us much? Separating the tool pex from the requirements pex is only a win if the requirements change, which isn't that frequent.

Yes, I believe it will becauase there are so many possible permutations of requirements pexes. You might have one with just colors, one with colors and ordered-set, one with just ordered-set, etc. Each of those must be their own requirements PEX. Ditto on interpreter constraints. You might have a Py2 colors pex, Py3 colors pex, and so on.

It takes about 3-5 seconds to create just the Pytest tool Pex last time I tried on the CLI, so this change will save us 3-5 seconds every time we encounter a new permutation of the requirements x interpreter constraints.

Is this actually going to buy us much? Separating the tool pex from the requirements pex is only a win if the requirements change, which isn't that frequent.

Even if so, the code is not much more complex and, importantly, things probably need to be structured this anyhow to have a reasonable hope of implementing tool shading for python like we do for the jvm going forward (#9206).

I agree though that measurement of components of the rule - extrinsic and intrinsic - is still needed. If snapshotting, for example, is slow that's a problem to quantify and fix since it is so crucial to the workings of v2.

Should be fixed now that we have "internal_only" PEXes that only resolve for one interpreter.

Was this page helpful?
0 / 5 - 0 ratings