Coverage.py 4.5.1 was released in February. I added wheels for Python 3.7 yesterday: https://pypi.org/project/coverage/4.5.1/#files . Then people said I broke their pipenv environments: https://github.com/nedbat/coveragepy/issues/679#issuecomment-406447940
Did I do the wrong thing? What should they do?
Pipenv keeps a list of artifact hashes in the locked results, checks whether the downloaded package matches any of them, and would refuse to install if the downloaded artifact does not match any of them. The new wheel is not recognised by the previous lock result, and this rejected. This is more of a precaution measure to encourage users to keep an eye on artifact downloads, and check back to the maintainer to make sure the new artifact contains no malice.
So no, you did nothing wrong, this is normal. Users can run pipenv lock
again to get the new hashes, and things will be back to normal.
Sorry for the disturbance. We probably should add this instruction to the hash mismatch error messages directly, so both users and package maintainers know exactly how to react when this happens.
Yeah basically you broke python :(
Just kidding. People will have lockfiles for python 3.7 which only have sdists and hashes for those sdists in them. Now pip and pipenv will prioritize the wheel, which seems like it should be simple to work around but for various reasons it isn’t
We may want to brainstorm a bit to find out whether it is possible to improve the user/maintainer experience on this.
Perhaps the hash storage should include the distribution type? I just ran in to the same issue with pluggy
, which added wheel files this April, so a Pipenv.lock
file committed in March is suddenly suspicious.
Note that the sha256 in that lock file matches the source distribution on PyPI. If Pipenv recorded more information about the distribution type used to install the version, then the error message could be made more informative.
Breaking the reproducibility of builds is Very Bad.
_Users can run pipenv lock
again to get the new hashes, and things will be back to normal._
Please keep in mind the common use case of resetting a project to an old lockfile to troubleshoot. If the original version isn't available, they'll just want to update the hash of that exact version so that it's as close as humanly possible.
Thank you for your helpful feedback.
We would be happy to review pull requests that modify the behavior of installations so that they try to use whatever other approaches haven't been attempted yet; I'm not sure it's as simple as you seem to think
It's rarely as simple as I think when I'm the one who wrote it. I guess the underlying problem here is that pypi won't let you overwrite a particular file, but it will let you add new artifacts to an existing version. So pipenv, to ensure builds are reproducible, hashes all artifacts associated with a version.
Is the solution here on pypi's end? If it had a way to seal a version so that no new artifacts can be added, pipenv could only consider sealed versions.
With the sealed version approach, the most likely scenario is almost nobody would seal, and the situation would be worse. If this is to be resolved on PyPI’s side, the likely solution would be to only allow files to be uploaded in one batch, and disallow any subsequent changes. I doubt if they would want to do this, however.
I guess the better approach may be to add an environment variable or flag for the user to explicitly disable hash checks. If the user wants to forgo security, that’s their decision.
Why can't PipEnv be taught to allow for additional distribution formats? I'm fine with an explicit command or switch to include additional hashes for an existing version. That'd be infinitely preferable over completely ignoring hashes altogether (which is an invitation to get hacked).
The functionality would have to
at which point you can leave it up to the developer to decide to trust the new distribution and update the lock file.
The use case is not all that common, but when it does happen, it is confusing as hell for anyone not seeped in the deep arcane magic of Python distribution formats.
I fear that any 'ignore the hashes' advice is going to leave developers vulnerable to attack.
The vector the hashes are trying to prevent is when an attacker manipulates download semantics, and let a URL-pointed resource to point to something else.
Say you want foo==1.3.0
. pip uses its internal logic to find a URL for it, say https://pythonhosted.org/foo/1.3.0/foo-1.3.0.tar.gz
, and try to download it. The attacker, however, manipulates the download mechanism, and let pip download something different from the actual thing it wants.
In this scenario, a hash update on a compromised machine would defeat the hash list completely. An attacker would simply re-route not only the package URL, but also the URL used for fetching hashes, and add the malicious package’s hash into the list. The hash updater would be tricked into thinking “oh, there’s a new file uploaded for this version! better allow it”.
The current hash list feature is designed to allow you to lock on a machine known to be safe, optionally send it through additional screening (e.g. GitHub’s new security alert feature), and use it to install packages. Updating hashes when you install, as mentioned above, would render it useless.
While closing #2815 I have a new idea: maybe Pipenv can fallback to files it knows, instead of erroring out on the first mismatch. Say we have an sdist when the lock file is generated, but later a wheel is added. During sync
, when Pipenv sees the wheel, it can ignore it, and try to find other files that matches the hash. If one does (and is installable), it would just use it, and ignore the hash mismatch. This would avoid the “a new file is uploaded after version publish” problem, but still maintain the same level of guarantee of hash verification. Since PyPI lists multiple downloads on the same page anyway, I think we can tweak pip enough to make it work (or just implement our own finder with distlib).
I've hit this issue today. zope.interface added wheels 6 days ago, there's no announcement (I can find - and this is a transitive dependency so I'm not familiar with the project administrative structure in the first place), there's no author information or details on pypi, etc.
Updating the hashes to use the wheel now instead of the tarball is not a solution - we've performed testing with the tarball, checked over the source, and can reasonably expect a tarball with the same hash to work the same way. We can't make any of those guarantees for a wheel uploaded 6 months after the source release.
Pinning/locking versions is intended to prevent a known working project to suddenly stop working due to external dependency updates. The lock file should be a perfect image of a known working project configuration. If the tar was what was installed when the lock was created, there is no question that pipenv should grab that tar when installing again - especially considering that the original file is still available at the exact location it was previously. Maybe this will require extra metadata in the lock file if the current metadata is insufficient to distinguish a wheel and tarball.
I don't think it's worth spending significant effort on edge cases (original file disappears, but is available with a different name but same hash) - the current behavior is breaking operation for the 95% to allow flexibility in the circumstances of the 5%, when there's already the option of just installing the new package and running pipenv lock
.
For the record, we _are_ using wheels for some dependencies (ex: lxml
). We're familiar with how the wheels were generated and they were included in our initial testing. We don't want to use only wheels or only tarballs.
As an added note, updating the lockfile is also a significant obstacle with CI. Our only recourse in that situation is to update the lockfile in _every affected branch_ (or every base branch and then merge the bases in/rebase every branch) which is a ton of work - resolving conflicts, adding commit noise, retesting to make sure the merges didn't mess things up, more merge conflicts (in the lockfile) when merging back, etc.
Yeah we definitely should just use whatever is in the Pipfile if it’s an allowable match. I don’t think I’ve ever held a different position about that but it’s just a matter of implementing it.
Related issue: pypa/pip#5874
pip has implemented hash-based artifact selection, and I believe included in the late 19.2 release. So users with up-to-date pip should get the desired behaviour.
Most helpful comment
pip has implemented hash-based artifact selection, and I believe included in the late 19.2 release. So users with up-to-date pip should get the desired behaviour.