-vvv option).Poetry install fails for the given pyproject.toml (specifically with pyarrow 2.0.0) with the error zipfile.BadZipFile: File is not a zip file. Install works fine when poetry config experimental.new-installer false is set.
I believe this is not a duplicate of https://github.com/python-poetry/poetry/issues/2388 because this error is produced when installing from the public PyPI repository. I believe this is not a duplicate of https://github.com/python-poetry/poetry/issues/2674 because it persists after running poetry cache clear --all pypi.
Full traceback:
Using virtualenv: /Users/kyle/Library/Caches/pypoetry/virtualenvs/tmp-2O282PEO-py3.7
Installing dependencies from lock file
Finding the necessary packages for the current system
Package operations: 1 install, 0 updates, 0 removals, 1 skipped
โข Installing numpy (1.19.4): Skipped for the following reason: Already installed
โข Installing pyarrow (2.0.0)
Stack trace:
7 ~/.poetry/lib/poetry/installation/executor.py:202 in _execute_operation
200โ
201โ try:
โ 202โ result = self._do_execute_operation(operation)
203โ except EnvCommandError as e:
204โ if e.e.returncode == -2:
6 ~/.poetry/lib/poetry/installation/executor.py:276 in _do_execute_operation
274โ return 0
275โ
โ 276โ result = getattr(self, "_execute_{}".format(method))(operation)
277โ
278โ if result != 0:
5 ~/.poetry/lib/poetry/installation/executor.py:411 in _execute_install
409โ
410โ def _execute_install(self, operation): # type: (Install) -> None
โ 411โ return self._install(operation)
412โ
413โ def _execute_update(self, operation): # type: (Update) -> None
4 ~/.poetry/lib/poetry/installation/executor.py:449 in _install
447โ args.insert(2, "-U")
448โ
โ 449โ return self.run_pip(*args)
450โ
451โ def _update(self, operation):
3 ~/.poetry/lib/poetry/installation/executor.py:300 in run_pip
298โ def run_pip(self, *args, **kwargs): # type: (...) -> int
299โ try:
โ 300โ self._env.run_pip(*args, **kwargs)
301โ except EnvCommandError as e:
302โ output = decode(e.e.output)
2 ~/.poetry/lib/poetry/utils/env.py:1042 in run_pip
1040โ pip = self.get_pip_command()
1041โ cmd = pip + list(args)
โ 1042โ return self._run(cmd, **kwargs)
1043โ
1044โ def _run(self, cmd, **kwargs):
1 ~/.poetry/lib/poetry/utils/env.py:1332 in _run
1330โ self.unset_env("__PYVENV_LAUNCHER__")
1331โ
โ 1332โ return super(VirtualEnv, self)._run(cmd, **kwargs)
1333โ
1334โ def execute(self, bin, *args, **kwargs):
EnvCommandError
Command ['/Users/kyle/Library/Caches/pypoetry/virtualenvs/tmp-2O282PEO-py3.7/bin/pip', 'install', '--no-deps', 'file:///Users/kyle/Library/Caches/pypoetry/artifacts/ee/6f/d6/686cc5fcab3e5917f4baa20df3b737a0a33ec8f94b09724c03624424a9/pyarrow-2.0.0-cp37-cp37m-macosx_10_13_x86_64.whl'] errored with the following return code 2, and output:
Processing /Users/kyle/Library/Caches/pypoetry/artifacts/ee/6f/d6/686cc5fcab3e5917f4baa20df3b737a0a33ec8f94b09724c03624424a9/pyarrow-2.0.0-cp37-cp37m-macosx_10_13_x86_64.whl
ERROR: Exception:
Traceback (most recent call last):
File "/Users/kyle/Library/Caches/pypoetry/virtualenvs/tmp-2O282PEO-py3.7/lib/python3.7/site-packages/pip/_internal/cli/base_command.py", line 228, in _main
status = self.run(options, args)
File "/Users/kyle/Library/Caches/pypoetry/virtualenvs/tmp-2O282PEO-py3.7/lib/python3.7/site-packages/pip/_internal/cli/req_command.py", line 182, in wrapper
return func(self, options, args)
File "/Users/kyle/Library/Caches/pypoetry/virtualenvs/tmp-2O282PEO-py3.7/lib/python3.7/site-packages/pip/_internal/commands/install.py", line 324, in run
reqs, check_supported_wheels=not options.target_dir
File "/Users/kyle/Library/Caches/pypoetry/virtualenvs/tmp-2O282PEO-py3.7/lib/python3.7/site-packages/pip/_internal/resolution/legacy/resolver.py", line 183, in resolve
discovered_reqs.extend(self._resolve_one(requirement_set, req))
File "/Users/kyle/Library/Caches/pypoetry/virtualenvs/tmp-2O282PEO-py3.7/lib/python3.7/site-packages/pip/_internal/resolution/legacy/resolver.py", line 391, in _resolve_one
dist = abstract_dist.get_pkg_resources_distribution()
File "/Users/kyle/Library/Caches/pypoetry/virtualenvs/tmp-2O282PEO-py3.7/lib/python3.7/site-packages/pip/_internal/distributions/wheel.py", line 29, in get_pkg_resources_distribution
with ZipFile(self.req.local_file_path, allowZip64=True) as z:
File "/Users/kyle/local/anaconda3/lib/python3.7/zipfile.py", line 1258, in __init__
self._RealGetContents()
File "/Users/kyle/local/anaconda3/lib/python3.7/zipfile.py", line 1325, in _RealGetContents
raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file
WARNING: You are using pip version 20.2.3; however, version 20.2.4 is available.
You should consider upgrading via the '/Users/kyle/Library/Caches/pypoetry/virtualenvs/tmp-2O282PEO-py3.7/bin/python -m pip install --upgrade pip' command.
at ~/.poetry/lib/poetry/utils/env.py:1074 in _run
1070โ output = subprocess.check_output(
1071โ cmd, stderr=subprocess.STDOUT, **kwargs
1072โ )
1073โ except CalledProcessError as e:
โ 1074โ raise EnvCommandError(e, input=input_)
1075โ
1076โ return decode(output)
1077โ
1078โ def execute(self, bin, *args, **kwargs):
It could be a _pip_ issue. At the point where the failure happens, it is _pip_ that is running (in a subprocess created by _poetry_).
Are you downloading directly from _PyPI_? No private index, cache proxy or anything of the sort?
The error message shows the full path to the wheel file for _pyarrow_, maybe you could try to locate it and inspect it manually: see if it is a well formed _zip_ file, and so on.
Pip works fine when called directly.
> pip --version
pip 20.2.2 from /Users/kyle/local/anaconda3/lib/python3.7/site-packages/pip (python 3.7)
> pip install pyarrow==2.0.0
Collecting pyarrow==2.0.0
Using cached pyarrow-2.0.0-cp37-cp37m-macosx_10_13_x86_64.whl (13.4 MB)
Requirement already satisfied: numpy>=1.14 in /Users/kyle/local/anaconda3/lib/python3.7/site-packages (from pyarrow==2.0.0) (1.19.1)
Installing collected packages: pyarrow
Successfully installed pyarrow-2.0.0
Edit: it also works fine when not using the cached file:
> pip install --no-cache-dir pyarrow==2.0.0
Collecting pyarrow==2.0.0
Downloading pyarrow-2.0.0-cp37-cp37m-macosx_10_13_x86_64.whl (13.4 MB)
|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 13.4 MB 400 kB/s
Requirement already satisfied: numpy>=1.14 in /Users/kyle/local/anaconda3/lib/python3.7/site-packages (from pyarrow==2.0.0) (1.19.1)
Installing collected packages: pyarrow
Successfully installed pyarrow-2.0.0
Pip works fine when called directly.
> pip --version pip 20.2.2 from /Users/kyle/local/anaconda3/lib/python3.7/site-packages/pip (python 3.7)
I do not think it is as simple as that. Anyway, it is a different _pip_. Check the version number in the stack trace:
WARNING: You are using pip version 20.2.3;
Your stack trace shows this:
file:///Users/kyle/Library/Caches/pypoetry/artifacts/ee/6f/d6/686cc5fcab3e5917f4baa20df3b737a0a33ec8f94b09724c03624424a9/pyarrow-2.0.0-cp37-cp37m-macosx_10_13_x86_64.whl
In case that file still exists on the file system, I would encourage you to have a look at it. Maybe _poetry_ downloaded or wrote something wrong on disk.
Trying to install the wheel file at that path directly (with the global pip) fails:
> pip install /Users/kyle/Library/Caches/pypoetry/artifacts/ee/6f/d6/686cc5fcab3e5917f4baa20df3b737a0a33ec8f94b09724c03624424a9/pyarrow-2.0.0-cp37-cp37m-macosx_10_13_x86_64.whl
Processing /Users/kyle/Library/Caches/pypoetry/artifacts/ee/6f/d6/686cc5fcab3e5917f4baa20df3b737a0a33ec8f94b09724c03624424a9/pyarrow-2.0.0-cp37-cp37m-macosx_10_13_x86_64.whl
ERROR: Exception:
Traceback (most recent call last):
File "/Users/kyle/local/anaconda3/lib/python3.7/site-packages/pip/_internal/cli/base_command.py", line 216, in _main
status = self.run(options, args)
File "/Users/kyle/local/anaconda3/lib/python3.7/site-packages/pip/_internal/cli/req_command.py", line 182, in wrapper
return func(self, options, args)
File "/Users/kyle/local/anaconda3/lib/python3.7/site-packages/pip/_internal/commands/install.py", line 325, in run
reqs, check_supported_wheels=not options.target_dir
File "/Users/kyle/local/anaconda3/lib/python3.7/site-packages/pip/_internal/resolution/legacy/resolver.py", line 183, in resolve
discovered_reqs.extend(self._resolve_one(requirement_set, req))
File "/Users/kyle/local/anaconda3/lib/python3.7/site-packages/pip/_internal/resolution/legacy/resolver.py", line 391, in _resolve_one
dist = abstract_dist.get_pkg_resources_distribution()
File "/Users/kyle/local/anaconda3/lib/python3.7/site-packages/pip/_internal/distributions/wheel.py", line 29, in get_pkg_resources_distribution
with ZipFile(self.req.local_file_path, allowZip64=True) as z:
File "/Users/kyle/local/anaconda3/lib/python3.7/zipfile.py", line 1258, in __init__
self._RealGetContents()
File "/Users/kyle/local/anaconda3/lib/python3.7/zipfile.py", line 1325, in _RealGetContents
raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file
Also the md5sum (and file size) doesn't match the latest file on PyPI; 10MB vs 13MB on PyPI.
> md5sum /Users/kyle/Library/Caches/pypoetry/artifacts/ee/6f/d6/686cc5fcab3e5917f4baa20df3b737a0a33ec8f94b09724c03624424a9/pyarrow-2.0.0-cp37-cp37m-macosx_10_13_x86_64.whl
bebffb42b7e1f69affd2fa28632ea4ba /Users/kyle/Library/Caches/pypoetry/artifacts/ee/6f/d6/686cc5fcab3e5917f4baa20df3b737a0a33ec8f94b09724c03624424a9/pyarrow-2.0.0-cp37-cp37m-macosx_10_13_x86_64.whl
> md5sum pyarrow-2.0.0-cp37-cp37m-macosx_10_13_x86_64.whl
9c430a3e0ec33bb5a2b7171bb00647cd pyarrow-2.0.0-cp37-cp37m-macosx_10_13_x86_64.whl
So I think what happened is that my internet cut out in the middle of the download, but the first 10MB was saved to the cache, and next time I tried to install pyarrow, that fragment was used for install. So it seems like either pip or poetry should check that local and remote hashes or file sizes match, and if not redownload the package?
This error popped up after I aborted poetry update while still downloading the wheels; reproducible e.g. with torch/tensorflow dependencies that are pretty large:
$ pip uninstall -y torch
Found existing installation: torch 1.7.0
Uninstalling torch-1.7.0:
Successfully uninstalled torch-1.7.0
$ poetry update
Updating dependencies
Resolving dependencies... (49.5s)
Package operations: 1 install, 0 updates, 0 removals
โข Installing torch (1.7.0): Downloading... 3%
^C^CException ignored in: <module 'threading' from '/usr/lib64/python3.9/threading.py'>
Traceback (most recent call last):
File "/usr/lib64/python3.9/threading.py", line 1411, in _shutdown
atexit_call()
File "/usr/lib64/python3.9/concurrent/futures/thread.py", line 31, in _python_exit
t.join()
File "/usr/lib64/python3.9/threading.py", line 1029, in join
self._wait_for_tstate_lock()
File "/usr/lib64/python3.9/threading.py", line 1045, in _wait_for_tstate_lock
elif lock.acquire(block, timeout):
KeyboardInterrupt:
The artifacts cache now contains broken dists, resulting in a BadZipFile:
$ poetry update
Updating dependencies
Resolving dependencies... (37.4s)
Package operations: 1 install, 0 updates, 0 removals
โข Installing torch (1.7.0): Failed
EnvCommandError
Command ['/home/oleg.hoefling/.cache/pypoetry/virtualenvs/neural-knapsack-dE7ihQtM-py3.8/bin/pip', 'install', '--no-deps', 'file:///home/oleg.hoefling/.cache/pypoetry/artifacts/8f/0b/9a/52152cc1a51f13d2081ec3c41b5b16c0b37a819ea651c6130c24d7f3f0/torch-1.7.0-cp38-cp38-manylinux1_x86_64.whl'] errored with the following return code 2, and output:
Processing /home/oleg.hoefling/.cache/pypoetry/artifacts/8f/0b/9a/52152cc1a51f13d2081ec3c41b5b16c0b37a819ea651c6130c24d7f3f0/torch-1.7.0-cp38-cp38-manylinux1_x86_64.whl
ERROR: Exception:
Traceback (most recent call last):
File "/home/oleg.hoefling/.cache/pypoetry/virtualenvs/neural-knapsack-dE7ihQtM-py3.8/lib/python3.8/site-packages/pip/_internal/cli/base_command.py", line 228, in _main
status = self.run(options, args)
File "/home/oleg.hoefling/.cache/pypoetry/virtualenvs/neural-knapsack-dE7ihQtM-py3.8/lib/python3.8/site-packages/pip/_internal/cli/req_command.py", line 182, in wrapper
return func(self, options, args)
File "/home/oleg.hoefling/.cache/pypoetry/virtualenvs/neural-knapsack-dE7ihQtM-py3.8/lib/python3.8/site-packages/pip/_internal/commands/install.py", line 323, in run
requirement_set = resolver.resolve(
File "/home/oleg.hoefling/.cache/pypoetry/virtualenvs/neural-knapsack-dE7ihQtM-py3.8/lib/python3.8/site-packages/pip/_internal/resolution/legacy/resolver.py", line 183, in resolve
discovered_reqs.extend(self._resolve_one(requirement_set, req))
File "/home/oleg.hoefling/.cache/pypoetry/virtualenvs/neural-knapsack-dE7ihQtM-py3.8/lib/python3.8/site-packages/pip/_internal/resolution/legacy/resolver.py", line 391, in _resolve_one
dist = abstract_dist.get_pkg_resources_distribution()
File "/home/oleg.hoefling/.cache/pypoetry/virtualenvs/neural-knapsack-dE7ihQtM-py3.8/lib/python3.8/site-packages/pip/_internal/distributions/wheel.py", line 29, in get_pkg_resources_distribution
with ZipFile(self.req.local_file_path, allowZip64=True) as z:
File "/usr/lib64/python3.8/zipfile.py", line 1269, in __init__
self._RealGetContents()
File "/usr/lib64/python3.8/zipfile.py", line 1336, in _RealGetContents
raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file
at ~/.local/pipx/venvs/poetry/lib64/python3.9/site-packages/poetry/utils/env.py:1074 in _run
1070โ output = subprocess.check_output(
1071โ cmd, stderr=subprocess.STDOUT, **kwargs
1072โ )
1073โ except CalledProcessError as e:
โ 1074โ raise EnvCommandError(e, input=input_)
1075โ
1076โ return decode(output)
1077โ
1078โ def execute(self, bin, *args, **kwargs):
Clearing the cache indeed doesn't remove the incomplete downloads. Luckily, poetry prints the full path in the error message as can be seen in the above log. Thus:
$ rm -f /home/oleg.hoefling/.cache/pypoetry/artifacts/8f/0b/9a/52152cc1a51f13d2081ec3c41b5b16c0b37a819ea651c6130c24d7f3f0/torch-1.7.0-cp38-cp38-manylinux1_x86_64.whl
fixes the issue.
@finswimmer @kylebarron aside from hash comparison, downloading to a temp file and moving it to artifacts on success could be another improvement, e.g.
--- a/poetry/installation/executor.py
+++ b/poetry/installation/executor.py
@@ -3,6 +3,7 @@ from __future__ import division
import itertools
import os
+import shutil
import threading
from concurrent.futures import ThreadPoolExecutor
@@ -12,6 +13,7 @@ from subprocess import CalledProcessError
from poetry.core.packages.file_dependency import FileDependency
from poetry.core.packages.utils.link import Link
from poetry.core.pyproject.toml import PyProjectTOML
+from poetry.core.utils.helpers import temporary_directory
from poetry.io.null_io import NullIO
from poetry.utils._compat import PY2
from poetry.utils._compat import WINDOWS
@@ -639,18 +641,21 @@ class Executor(object):
done = 0
archive = self._chef.get_cache_directory_for_link(link) / link.filename
archive.parent.mkdir(parents=True, exist_ok=True)
- with archive.open("wb") as f:
- for chunk in response.iter_content(chunk_size=4096):
- if not chunk:
- break
+ with temporary_directory() as tmpdir:
+ tmpfile = Path(tmpdir, link.filename)
+ with tmpfile.open("wb") as f:
+ for chunk in response.iter_content(chunk_size=4096):
+ if not chunk:
+ break
- done += len(chunk)
+ done += len(chunk)
- if progress:
- with self._lock:
- progress.set_progress(done)
+ if progress:
+ with self._lock:
+ progress.set_progress(done)
- f.write(chunk)
+ f.write(chunk)
+ shutil.move(str(tmpfile), str(archive))
if progress:
with self._lock:
Most helpful comment
This error popped up after I aborted
poetry updatewhile still downloading the wheels; reproducible e.g. withtorch/tensorflowdependencies that are pretty large:The artifacts cache now contains broken dists, resulting in a
BadZipFile:Clearing the cache indeed doesn't remove the incomplete downloads. Luckily,
poetryprints the full path in the error message as can be seen in the above log. Thus:fixes the issue.