Pants: `--use-first-matching-interpreter` shebang doesn't play nicely with macOS Python 3.7 install

Created on 18 Aug 2020  路  40Comments  路  Source: pantsbuild/pants

pants fmt failure on master as of master plus just some typing changes in https://github.com/pantsbuild/pants/pull/10647 : env: python3.7: No such file or directory

OS is macOS 10.15.6. which python3 reveals interpreter in a venv that is v3.8.5.

```
$ ./pants fmt src/python/pants::
Compiling engine v0.0.1 (XXX)
Finished release [optimized + debuginfo] target(s) in 2m 49s
12:11:49 [INFO] initializing pantsd...
12:12:00 [INFO] pantsd initialized.
12:12:08.46 [INFO] Completed: Building black.pex with 2 requirements: black==19.10b0, setuptools
12:12:08 [WARN] /Users/tdyas/TC/pants/src/python/pants/base/exception_sink.py:359: DeprecationWarning: PY_SSIZE_T_CLEAN will be required for '#' formats
process_title=setproctitle.getproctitle(),

12:12:08 [ERROR] 1 Exception encountered:

Engine traceback:
in select
in fmt goal
in pants.backend.python.lint.python_fmt.format_python_target
in Format with Black
in pants.engine.process.fallible_to_exec_result_or_raise
Traceback (most recent call last):
File "XXX/pants/src/python/pants/engine/process.py", line 229, in fallible_to_exec_result_or_raise
raise ProcessExecutionFailure(
pants.engine.process.ProcessExecutionFailure: Process 'Run Black on 413 files.' failed with exit code 127.
stdout:

stderr:
env: python3.7: No such file or directory

Traceback (most recent call last):
File "XXX/pants/src/python/pants/bin/local_pants_runner.py", line 255, in run
engine_result = self._run_v2()
File "XXX/pants/src/python/pants/bin/local_pants_runner.py", line 166, in _run_v2
return self._maybe_run_v2_body(goals, poll=False)
File "XXX/pants/src/python/pants/bin/local_pants_runner.py", line 183, in _maybe_run_v2_body
return self.graph_session.run_goal_rules(
File "XXX/pants/src/python/pants/init/engine_initializer.py", line 117, in run_goal_rules
exit_code = self.scheduler_session.run_goal_rule(
File "XXX/pants/src/python/pants/engine/internals/scheduler.py", line 552, in run_goal_rule
self._raise_on_error([t for _, t in throws])
File "XXX/pants/src/python/pants/engine/internals/scheduler.py", line 511, in _raise_on_error
raise ExecutionError(
pants.engine.internals.scheduler.ExecutionError: 1 Exception encountered:

Engine traceback:
in select
in fmt goal
in pants.backend.python.lint.python_fmt.format_python_target
in Format with Black
in pants.engine.process.fallible_to_exec_result_or_raise
Traceback (most recent call last):
File "XXX/pants/src/python/pants/engine/process.py", line 229, in fallible_to_exec_result_or_raise
raise ProcessExecutionFailure(
pants.engine.process.ProcessExecutionFailure: Process 'Run Black on 413 files.' failed with exit code 127.
stdout:

stderr:
env: python3.7: No such file or directory

bug python

Most helpful comment

It only would be picked back out of the cache if the BinaryPaths discovery was not re-run

This is false. The BinaryPaths discovery only finds a bootstrap interpreter to run the Pex CLI. The Pex CLI just needs some python 2.7 or python 3.5+. The interpreter used to build any individual PEX file is determined by the Pex CLI from interpreter constraints and platforms and the vagaries of what the Pex CLI happens to find on the system. This process is opaque to Pants today, although #10779 fixes that.

All 40 comments

Preserving the mypex.pex, it has a shebang that references python3.7. macOS installs /usr/bin/python3 but not /usr/bin/python3.7 so the pants or pex assumption that it can find python3.7 is a bit wrong.

Huh, to confirm, what does which -a python3.7 return on your machine?

Context on what's going on. For "internal" Pexes like mypy.pex, we now use the flag --use-first-matching-interpreter. Even if the constraints are loose like >=3.6, we will use the minimum version that matches on your machine. This avoids having to resolve requirements for every single valid interpreter, which results in a dramatic speedup when buildings Pexes. However, this requires that the runtime use the same interpreter that we used to build the Pex, which Pex achieves by setting the shebang.

Huh, to confirm, what does which -a python3.7 return on your machine?

Nothing since macOS does not install a python3.7 executable just /usr/bin/python3.

Gr, why Apple for not putting /usr/bin/python3.7..

@jsirois @benjyw @stuhood any ideas on how to approach this dilemma with --use-first-matching-interpreter? Pex was able to discover a Python 3.7 interpreter through searching for python and python3. But there is no python3.7 binary on the machine, so the shebang set from --use-first-matching-interpreter fails.

I do expect this problem to come up for other macOS users.

--

Tom, two short-term workarounds, neither of which are ideal.

  1. Change this to use Py38. But we unfortunately couldn't check that in.

https://github.com/pantsbuild/pants/blob/d5801ae1503d212ed0cff1b3725a5e1e999f4b40/pants.toml#L74-L78

  1. Install Python 3.7 on your machine via pyenv. (I know we can't really ask users "simply install a new interpreter due to this bug in Pex!")

https://github.com/pantsbuild/pex/blob/436dc554c1ac234e01529f9f275f7bd01130a51b/pex/interpreter.py#L197-L201 - pex assumes that it can form a binary name of pythonMAJOR.MINOR even if that is not actually the Python binary that it found when searching for an interpreter.

I actually think it may be OK to say "Pants requires a binary named python3.x, and therefore doesn't work with macos python, here is how to use pyenv to easily install other pythons"?

I agree with the sentiment. The tricky thing is, there is no one specific minor version we know we need. The Pex shebang could be python2.7 but it could also be python3.6 or python3.7.

For macOS, it appears that recent installations come with Python 3.7 with a missing binary for python3.7. But there is no guarantee that this will even be an issue for users if their built PEXes don't end up using python3.7. On older macOS installs, I expect this same problem would hit Python 3.6.

--

I am wondering, though, if we could handle --use-first-matching-interpreter differently. John changed it so that we do not set any interpreter constraints and we instead set the shebang. I wonder if, alternatively, we could set interpreter constraints, but for the specific interpreter that was chosen?

For macOS, it appears that recent installations come with Python 3.7 with a missing binary for python3.7. But there is no guarantee that this will even be an issue for users if their built PEXes don't end up using python3.7. On older macOS installs, I expect this same problem would hit Python 3.6.

@benjyw: Adding to Eric's point, this is a "standard" misfeature of macOS and, given the number of users likely on macOS, the more ergonomic design choice is for Pants/PEX to work-around this misfeature itself. The issue to me seems to be that an interpreter is found but instead of using the exact path to that interpreter, pex assumes it can just format _another_ binary name that it technically did not yet find in the filesystem (i.e., pythonMAJOR.MINOR). Why not just use the interpreter path that was found in the filesystem?

To further the point, when IntelliJ has me choose a venv for a project, it makes me choose the full path to the interpreter. No guesswork.

Ohhh @tdyas I really like that suggestion! @benjyw do you see any issue with it?

Normally, I'd be afraid it would break speculation, that that interpreter might be available locally but not in the remote environment. But that was always an issue, which we've worked around by marking building a Pex as being Platform specific. That is, the Pex you build locally isn't expected to work on the remote machine to begin with.

An important thing to remember with --use-first-matching-interperter is that it's solely for internal Pexes, like mypy.pex and pytest_runner.pex. Any Pex that gets built for external usage, like from the binary or awslambda goals, will not set the shebang so won't have this major coupling to the host machine.

Normally, I'd be afraid it would break speculation, that that interpreter might be available locally but not in the remote environment. But that was always an issue, which we've worked around by marking building a Pex as being Platform specific. That is, the Pex you build locally isn't expected to work on the remote machine to begin with.

I recommend Pants/Pex should not assume that the interpreter name is the same in both environments. They are different environments. As an example outside of Pants, Bazel's "toolchain" concept models this and I believe a local toolchain and a remote toolchain could resolve different interpreter names (although worth looking to see how Bazel does Python toolchains).

The platforms thing alone doesn't solve the issue, except on MacOS (assuming you're not remoting to a MacOS cluster). But you could still get into trouble creating a pex with a full-path shebang on your linux desktop and then trying to use it remotely.

But then, that's likely true even now, albeit maybe less often.

So the benefit of /usr/bin/env python3.x is that it is system-independent. The downside is that it makes an assumption that isn't true on some MacOS, namely that a python3.x binary exists. Setting the exact interpreter constraints may give us the best of both, but is also a performance hit, as pex has to do interpreter discovery. Can the solution be as simple as creating a python3.7 symlink if needed?

Can the solution be as simple as creating a python3.7 symlink if needed?

I had tried creating that symlink in /usr/bin but macOS security disallowed that even when running as root.

is that it is system-independent

For internal only Pexes, I don't know if we need to care about this, though? For external PEXes like binary, yes, for sure. But for internal, I don't think so. My only concern was remoting, but as discussed above, I think it ends up being fine.

And just checked again to verify:

root@XXX:/usr/bin# id -u
0
root@XXX:/usr/bin# ln -s python python3.7
ln: python3.7: Operation not permitted

We do care about this for internal-only pexes. "system-independent" != "platform-independent".

Imagine building a pex locally and then trying to run it on a remoting cluster, or vice versa. With speculation this can easily happen.

I see - right now we only guard against platform dependence.

Hm, well, I think I'd still advocate we go with this precise shebang fix. Speculation is not ready. And this could have very easily been broken already, like you have Py36 on your local machine but Py37 when remoting, and your constraints are >=3.6, so either is fine. --use-first-matching-interpreter would have broken that already.

True, but it's a less aggressive breakage. But yeah, for right now the precise shebang is probably the way to go. I'll put a change up.

OK so maybe dumb question, but for internal pexes why do we even bother embedding a shebang? What we really want is for pex (to whom we now delegate interpreter selection) to tell us "I selected 3.7", and Pants can then find an appropriate 3.7, depending on platform and invoke the interpreter on the pex.

Pants can then find an appropriate 3.7, depending on platform and invoke the interpreter on the pex.

I don't follow that. Pants no longer knows how to discover an interpreter.

But it's the only thing that can, properly, in that it knows where the pex will be running. pex, at build time, does not.

@tdyas is it possible to put a python3.7 symlink on the PATH but not in /usr/bin? I.e., in some user-owned dir on the PATH? Does that at least work?

But it's the only thing that can, properly, in that it knows where the pex will be running. pex, at build time, does not.

I disagree. The key assumption with internal_only and --use-first-matching-interpreter is that the interpreter you used at build time will continue to be available at runtime. The only way I see this breaking is speculation, which we're for now not worrying about.

For external Pexes, absolutely, it is not safe. Hence not using this optimization.

Pex should be selecting interpreters, no matter how the PEX was built, by looking at the requested requirement "wheels" embedded in the PEX and what local interpreters are compatible with those wheels and recursing. Everything else is a hack. Granted, we may need to hack if no one thinks they can implement this before I get back in town.

To be sure I understand how this works in practice: the hack is "use the shebang to bootstrap pex, pex's interpreter selection prioritizes the bootstrapping interpreter if it's compatible, and if we arrange the shebang correctly it will be" ?

pex's interpreter selection prioritizes the bootstrapping interpreter if it's compatible

Maybe add an option to pex to print out the path to the interpreter it selects? That would allow reusing the pex selection logic in other tools (e.g., pants).

Catching up with some of the above:

The issue to me seems to be that an interpreter is found but instead of using the exact path to that interpreter, pex assumes it can just format another binary name that it technically did not yet find in the filesystem (i.e., pythonMAJOR.MINOR). Why not just use the interpreter path that was found in the filesystem?

Pex fundamentally assumes its building a deployable; so local machine considerations are not in play. A PEX file has to generally assume it will run on some other machine so it cannot assume information from the current build machine as valid on target runtime machines.

I disagree. The key assumption with internal_only and --use-first-matching-interpreter is that the interpreter you used at build time will continue to be available at runtime. The only way I see this breaking is speculation, which we're for now not worrying about.

The --use-first-matching-interpreter feature on the Pex side says nothing about locality, it just says to build wheels for one interpreter (if wheels even need to be built). Its this bit that has been the awkward fit from the get go with this feature. Pants wants to say "build me a non-portable PEX that will only run on this machine" - that's fundamentally non-Pexy.

This Pex issue tracks fixing PEX runtime interpreter selection to just use local knowledge of available Pythons and contained distributions: https://github.com/pantsbuild/pex/issues/1020. That would solve the problem in this issue by making --use-first-matching-interpreter fully coherent on the Pex side as well as cleaning up other existing runtime interpreter selection issues.

Noting another problem - this time on the Pants side - with using absolute path shebangs related to #10751. If a PEX file gets a statically determined python baked in it becomes a time bomb in the CAS. It will get plucked out to run next week when that interpreter is removed and go boom. The only remedy at that point will be to nuke the local lmdb.

It seems in general for local execution that relies on local binaries (not a problem for remote executiuon where images are static) Pants has this problem. We always need to check a cached binary exists and then re-run discovery if it does not before attempting to finally run the desired Process that uses that binary as argv0. In the case of a which style discovery process, this could be baked into the local command runner on the Rust side. In more complex cases like Pex interpreter discovery where the interpreter must be found, executed, and data extracted from the runtime environment to check applicability, it is alot easier to approach from the Python rule side of things.

In the case of PEX files, we could have a rule that uses a ./my.pex -c '' fallible Process to run the cached pex noop which should always exit 0 if an appropriate interpreter is still found. This would mean, once per pantsd lifecycle we'd have to no-op execute each cached pex tool using an absolute path shebang to verify it does not need to be rebuilt much like #10751 does.

If a PEX file gets a statically determined python baked in it becomes a time bomb in the CAS. It will get plucked out to run next week when that interpreter is removed and go boom. The only remedy at that point will be to nuke the local lmdb.

It only would be picked back out of the cache if the BinaryPaths discovery was not re-run, which is handled by #10769. If the discovery was re-run, the inputs used to generate the Process for the real meat of the work would be different, and you wouldn't hit the cache for the new/different Process. So I think that that might be orthogonal to this issue.


On the original topic of this ticket: can this problematic python install be filtered out by setting:

$ cat ~/.pants.rc
[python-setup]
# Avoid ever using a python that is not provided by pyenv.
interpreter_search_paths = ["<PYENV>"]

...which was the solution original proposed on #9760 (which was broken at the time, but should work now that #9760 is resolved)?

It only would be picked back out of the cache if the BinaryPaths discovery was not re-run

This is false. The BinaryPaths discovery only finds a bootstrap interpreter to run the Pex CLI. The Pex CLI just needs some python 2.7 or python 3.5+. The interpreter used to build any individual PEX file is determined by the Pex CLI from interpreter constraints and platforms and the vagaries of what the Pex CLI happens to find on the system. This process is opaque to Pants today, although #10779 fixes that.

This unfortunately is still broken, but in a probably better way.

pants.engine.process.ProcessExecutionFailure: Process 'Building black.pex with 2 requirements: black==20.8b1, setuptools' failed with exit code 1.
stdout:

stderr:
  ERROR: Command errored out with exit status 1:
   command: /Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/bin/python3.7 /Users/eric/.cache/pants/named_caches/pex_root/pip.pex/aef609891d42d65c887d1aeee58c46f6713a7e49/.deps/pip/pip install --ignore-installed --no-user --prefix /private/var/folders/sx/pdpbqz4x5cscn9hhfpbsbqvm0000gn/T/process-executionA1rmUx/.tmp/pip-build-env-eeemq7rq/overlay --no-warn-script-location --no-binary :none: --only-binary :none: -i https://pypi.org/simple/ -- 'setuptools>=41.0' setuptools-scm wheel
       cwd: None
  Complete output (13 lines):
  Looking in indexes: https://pypi.org/simple/
  Collecting setuptools>=41.0
    Using cached setuptools-50.3.0-py3-none-any.whl (785 kB)
  Collecting setuptools-scm
    Using cached setuptools_scm-4.1.2-py2.py3-none-any.whl (27 kB)
  Collecting wheel
    Using cached wheel-0.35.1-py2.py3-none-any.whl (33 kB)
  Installing collected packages: setuptools, setuptools-scm, wheel
  ERROR: Could not install packages due to an EnvironmentError: [Errno 13] Permission denied: '/Library/Python/3.7'
  Consider using the `--user` option or check the permissions.

  WARNING: You are using pip version 20.0.dev0; however, version 20.2.3 is available.
  You should consider upgrading via the 'pip install --upgrade pip' command.
  ----------------------------------------
ERROR: Command errored out with exit status 1: /Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/bin/python3.7 /Users/eric/.cache/pants/named_caches/pex_root/pip.pex/aef609891d42d65c887d1aeee58c46f6713a7e49/.deps/pip/pip install --ignore-installed --no-user --prefix /private/var/folders/sx/pdpbqz4x5cscn9hhfpbsbqvm0000gn/T/process-executionA1rmUx/.tmp/pip-build-env-eeemq7rq/overlay --no-warn-script-location --no-binary :none: --only-binary :none: -i https://pypi.org/simple/ -- 'setuptools>=41.0' setuptools-scm wheel Check the logs for full command output.
pid 12981 -> /Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/bin/python3.7 /Users/eric/.cache/pants/named_caches/pex_root/pip.pex/aef609891d42d65c887d1aeee58c46f6713a7e49 --disable-pip-version-check --isolated --no-python-version-warning --exists-action a -q --cache-dir /Users/eric/.cache/pants/named_caches/pex_root download --dest /private/var/folders/sx/pdpbqz4x5cscn9hhfpbsbqvm0000gn/T/process-executionA1rmUx/.tmp/tmp436je7ty/Library.Developer.CommandLineTools.Library.Frameworks.Python3.framework.Versions.3.7.bin.python3.7 --index-url https://pypi.org/simple/ --header Cache-Control:max-age=3600 --retries 5 --timeout 15 --constraint 3rdparty/python/constraints.txt black==20.8b1 setuptools exited with 1 and STDERR:
None

It seems that /usr/bin/python is broken in a pathological way on macOS. Ideally, we would avoid this for all users automatically, but I don't know how to go about achieving that. For example, we don't want to ban /usr/bin/python on Linux.

It seems that /usr/bin/python is broken in a pathological way on macOS. Ideally, we would avoid this for all users automatically, but I don't know how to go about achieving that. For example, we don't want to ban /usr/bin/python on Linux.

Oh, John's new script to find the Pex Python might work. It's a Python script. We can add logic to check if we're on darwin, and some other heuristic, so that we can skip it.

It would be great if you could add a lot more information about your Mac environment to this ticket since no approach I can see will avoid this error in a principled way. IE a revert may get things working again, but presumably, like before, by sheer luck.

I have鈥攍ike most macOS users afaict鈥攖wo interpreters in /usr/python that are severely busted:

  • /usr/bin/python{,2} -> /System/Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python
  • /usr/bin/python3 -> /Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/bin/python3.7. On Benjy's machine, this is /Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.7/bin/python3.7)

I changed our logging messages for both finding the Bootstrap Interpreter, and finding an interpreter based on interpeter constraints, to log which was chosen. Both were correctly choosing Pants's venv's Py36 interpreter, rather than this busted system interpreter.

For good measure, I added this logic to the interpreter discover to avoid using the bad Pythons:

+                            python = os.path.realpath(sys.executable)
+
+                            # We avoid `/usr/bin/python` on macOS, which has pathological behavior
+                            # like `[Errno 13] Permission denied` when running a Pex.
+                            is_bad_system_installed_python = (
+                                sys.platform == "darwin"
+                                and (
+                                    python.startswith('/System/Library/Frameworks')
+                                    or python.startswith('/Library/Developer/CommandLineTools/')
+                                )
+                            )
+

Didn't make a difference, as Pants was choosing the right interpreter from the start.

From the above error message, it looks like Pex is for some reason running system Python 3.7 when running Pip. I couldn't figure out why it wasn't using the bootstrap interpreter, i.e. the venv's interpreter. The Process's argv was ['/Users/eric/DocsLocal/code/projects/pants/build-support/virtualenvs/Darwin/pants_dev_deps.py36.venv/bin/python3', './pex', '--output-file', 'black.pex', ...].

Afaict, we will want to reland #10779, but either:
1) Figure out why Pex isn't using the virtualenv interpreter, or
2) Teach Pex to always avoid these problematic interpreters.

Am I right that we can close this?

I think so, yes.

Was this page helpful?
0 / 5 - 0 ratings