-vvv
option).I created an empty project and run poetry add allennlp. It takes ages to resolve the dependencies.
Could this be due to downloading packages from pypi to inspect their dependencies, when not properly specified?
Could this be due to downloading packages from pypi to inspect their dependencies, when not properly specified?
It seems so. I have checked the detailed log, poetry kept retrying to resolve the dependency for botocore, but without success. So I assume that the dependency can be eventually resolved if enough time is given.
However, is there any way to get around this?
BTW, I also consider it's better to give some warning if there are some dependencies are not properly specified and could not be resolved after a number of attempts.
Hi,
I encounter a similiar problem on my MacOS. Python version used is 3.7.6, Poetry is 1.0.5. I just created a new project with no dependencies so far in pyproject.toml, just initially pytest. It takes ages until the new virtualenv is setup with all 11 packages installed.
Running it with -vvv does not bring any new findings.
Regards, Thomas
Yes, i'm running into the same problem. Resolving dependencies takes forever. I tried to use VPN to get through the GFW, nevertheless, it is still not working. I also tried to change pip source and wrote local source in the toml file, neither works. It's driving me nuts.
same here...😱
Same here. I just created an empty project then ran poetry install
and it takes so much time to resolve dependencies.
I'm currently using this workaround:
poetry export -f requirements.txt > requirements.txt
python -m pip install -r requirements.txt
poetry install
It takes a lower time space to install the package locally since all deps are already installed.
Make sure to run poetry shell
before to access the created virtual environment and install on it instead of on user/global path.
Poetry being slow to resolve dependencies seems to be a reoccuring issue:
Maybe there is a dependency conflict.
No conflict. Poetry is slow as hell.
First of all, I want to say there is ongoing work to improve the dependency resolution.
However, there is so much Poetry can do with the current state of the Python ecosystem. I invite you to read https://python-poetry.org/docs/faq/#why-is-the-dependency-resolution-process-slow to know a little more about why the dependency resolution can be slow.
If you report that Poetry is slow, we would appreciate a pyproject.toml
that reproduces the issue so we can debug what's going on and if it's on Poetry's end or just the expected behavior.
@gagarine Could you provide the pyproject.toml
file you are using?
Take about 2min to resolve dependencies after adding newspaper3k on a fresh project.
Connexion: 40ms ping and 10Mb/s down.
pyproject.toml
[tool.poetry]
name = "datafox"
version = "0.1.0"
description = ""
authors = ["Me <[email protected]>"]
[tool.poetry.dependencies]
python = "^3.8"
newspaper3k = "^0.2.8"
[tool.poetry.dev-dependencies]
pytest = "^5.2"
[build-system]
requires = ["poetry>=0.12"]
build-backend = "poetry.masonry.api"
Hey dudes - as Sebastian implied, the root cause is the Python eco's inconsistent/incomplete way of specifying dependencies and package metadata. Unfortunately, the Pypi team is treating this as a wont fix
.
In particular, using the Pypi json endpoint, an empty dep list could either mean "no dependencies", or "dependencies not specified". The Pypi team doesn't want to differentiate between these two cases for reasoning I don't follow.
The soln is to workaround this by maintaining a sep cache from Pypi that properly handles this distinction, and perhaps refuse to use packages that don't properly specify deps. However, this latter aspect may be tough, due to long dep nests.
Python's grown a lot over the decades, and it has much remaining from its early days. There's a culture of no-breaking-changes at any cost.
Having to run arbitrary python code to find dependencies is fucked, but .... we can do this for each noncompliant package, and save it.
First, it's capitalized PyPI.
Second, there is no way for PyPI to know dependencies for all packages without executing arbitrary code -- which is difficult to do safely and expensive (computationally and financially). PyPI is run on donated infrastructure from sponsors, maintained by volunteers and does not have millions of dollars of funding like many other language ecosystems' package indexes.
For anyone interested in further reading, here's an article written by a PyPI admin on this topic: https://dustingram.com/articles/2018/03/05/why-pypi-doesnt-know-dependencies/
It's not as tough as you imply.
You accept some risk by running the arbitrary code, but accepting things as they are isn't the right approach. We're already forcing this on anyone who installs Python packages; it's what triggers the delays cited in this thread.
I have the above repo running on a $10/month Heroku plan, and it works well.
I've made the assumption that if dependencies are specified, they're specified correctly, so only check the ones that show as having no deps. This won't work every time, but does in a large majority of cases.
Related: Projects like Poetry are already taking a swing at preventing this in the future: Specifying deps in pyproject.toml
, Pipfile
etc.
A personal Heroku app is not going to be as valuable a target as PyPI would be. Neither is a $10/month Heroku app going to be able to support the millions of API requests that PyPI gets everyday. The problem isn't in writing a script run a setup.py file in a sandbox, but in the logistics and challenges of providing it for the entire ecosystem.
"It works 90% of the time" is not an approach that can be taken by the canonical package index (which has to be used by everyone) but can be taken by specific tools (which users opt into using). Similar to how poetry
can use an AST parser for setup.py files which works >90% of the time, to avoid the overhead of a subprocess call, but pip shouldn't.
Anyway, I wanted to call out that "just blame PyPI folks because they don't care/are lazy" is straight up wrong IMO -- there are reasons that things are the way they are. That doesn't mean we shouldn't improve them, but it's important to understand why we're where we are. I'm going to step away now.
Before you step away - Can you think of a reason PyPi shouldn't differentiate between no dependencies, and missing dependency data?
If going through existing releases is too bold, what about for new ones?
I'm new to (more serious) python and don't understand the big drama. Yet setup.py
seems a powerful and very bad idea. Dependency management is terrible in python, because of setup.py
?
Can someone post a couple of examples where a txt file is not enough andsetup.py
was absolutely necessary?
"is a feature that has enabled better compatibility across an increasingly broad spectrum of platforms."
Cargo do it like this: https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html#platform-specific-dependencies this is not enough for python?
Why poetry does not create their own package repository, avoiding setup.py and using their own dependency declaration? Could take time... but a bot can automatise the pull request on most python module based on the kind of technics used in https://github.com/David-OConnor/pydeps
I think the root cause is Python's been around for a while, and tries to maintain backwards compatibility. I agree - setup.py
isn't an elegant way to do things, and a file that declares dependencies and metadata is a better system. The wheel format causes dependencies to be specified in a MANIFEST
file, but there are still many older packages that don't use this format.
As a new lang, Rust
benefited by learning from the successes and failures of existing ones. Ie it has nice tools like Cargo, docs, clippy, fmt etc. It's possible to to implement tools / defaults like this for Python, but involves a big change, and potentially backwards-incompatibility. There are equivalents for many of these (pyproject.toml
, black
etc), but they're not officially supported or widely-adopted. Look at how long it took the Python 3 to be widely adopted for a taste of the challenge.
Can someone post a couple of examples where a txt file is not enough andsetup.py was absolutely necessary?
Not absolutely necessary, but helpful in the following scenario:
With setup.py, you can follow the DRY principle:
requires_a = ('some', 'thing')
requires_b = requires_a + ('foo', 'bar')
For requirements.txt, I'm on the one hand not sure how you denote extras at all and even if you can, you would need to repeat the requirements of a within the requirements of b. This is prone to human error.
However, while creating the package, the package builder could output a textfile having those requirements.
Why poetry does not create their own package repository
You mean replacing PyPI? Good luck with that. I analyzed the packages on PyPI in January (PyPI Analysis 2020):
I also gave a course about packaging in Python this year to PhD students. They simply want to share there work to a broad audience. I only mentioned poetry briefly because it is such a niche right now.
Changing a big, working system is hard. It took Python 2 -> 3 about 12 years and it is still not completely finished.
Hi,
I would like to invite everone interested in how depedencies should be declared to this discussion on python.org
fin swimmer
@finswimmer I check the discussion. Seem like they are reinventing the wheel instead of copy/past something that works (composer, Cargo, ...).
For requirements.txt, I'm on the one hand not sure how you denote extras at all and even if you can, you would need to repeat the requirements of a within the requirements of b. This is prone to human error.
For sure requirements.txt is not good.
You mean replacing PyPI? Good luck with that.
Yes. But why making poetry if it's not to replace PyPI and requirements.txt?
If poetry is compatible with PyPI, there is no incentive to add a pyproject.toml. Perhaps I don't even know I should add one. Now if every time I try to install a package that has no pyproject.toml the command line proposes me to open an issue on this project with a ready to use a template, this could speed things up.
Can you think of a reason PyPi shouldn't differentiate between _no dependencies_, and _missing dependency data_?
It'd be more productive to file an issue on https://github.com/pypa/warehouse, to ask this. There's either a good reason, or PyPI would be open to adding this functionality. In the latter case, depending on how the details work out, it might need to be standardized like pyproject.toml was before poetry adopted it, so that the entire ecosystem can depend on and utilize it.
Yes. But why making poetry if it's not to replace PyPI and requirements.txt?
You seem to confuse multiple parts of the ecosystem. I would distinguish those entities:
Under the hood, I think, poetry uses a couple of those base tools. It is just meant to show a more consistent interface to the user.
I realise that now as I mention in #2338 . I'm therefor not that interested in poetry at the moment. I taught it was like composer and https://packagist.org, but it looks mostly like a wrapper around differents legacy tools.
[poetry] looks mostly like a wrapper around differents legacy tools
That is not the case. All tools I've mentioned are wide-spread, used by a majority of the Python developers and under active development. Yes, some of the tools are old - pip, for example, is 9 years old. Old is not the same as legacy. The hammer is an old tool. And still people use it. Why? Because it does the job it was designed for.
I don't know PHP well enough to be sure, but I think packagist.org is for PHP what pypi.org is for Python. Composer seems to be a package manager and thus comparable to pip. As composer also supports dependency management during project development, it fills a similar niche as poetry does.
I figured I would add more to this issue. It's taking more than 20 minutes for me:
gcoakes@workstation ~/s/sys-expect (master) [1]> time poetry add --dev 'pytest-asyncio'
The currently activated Python version 3.7.7 is not supported by the project (^3.8).
Trying to find and use a compatible version.
Using python3.8 (3.8.2)
Using version ^0.12.0 for pytest-asyncio
Updating dependencies
Resolving dependencies... (655.1s)
Writing lock file
Package operations: 1 install, 0 updates, 0 removals
- Installing pytest-asyncio (0.12.0)
________________________________________________________
Executed in 20.98 mins fish external
usr time 4.96 secs 0.00 micros 4.96 secs
sys time 0.35 secs 560.00 micros 0.35 secs
This is the pyproject.toml:
[tool.poetry]
name = "sys-expect"
version = "0.1.0"
description = ""
readme = "README.md"
include = [
"sys_expect/**/*.html",
"sys_expect/**/*.js",
]
[tool.poetry.dependencies]
python = "^3.8"
pyyaml = "^5.3.1"
serde = "^0.8.0"
aiohttp = "^3.6.2"
async_lru = "^1.0.2"
astunparse = "^1.6.3"
coloredlogs = "^14.0"
aiofiles = "^0.5.0"
[tool.poetry.dev-dependencies]
pytest = "^5.4"
black = "^19.10b0"
isort = { version = "^4.3.21", extras = ["pyproject"] }
flakehell = "^0.3.3"
flake8-bugbear = "^20.1"
flake8-mypy = "^17.8"
flake8-builtins = "^1.5"
coverage = "^5.1"
pytest-asyncio = "^0.12.0"
[tool.poetry.scripts]
sys-expect = 'sys_expect.cli:run'
[tool.isort]
multi_line_output = 3
include_trailing_comma = true
force_grid_wrap = 0
use_parentheses = true
line_length = 88
[tool.flakehell.plugins]
pyflakes = ["+*"]
flake8-bugbear = ["+*"]
flake8-mypy = ["+*"]
flake8-builtins = ["+*"]
[build-system]
requires = ["poetry>=0.12"]
build-backend = "poetry.masonry.api"
Question: What exactly does poetry
does extra here, that makes it way much slower than pip
's dependency resolution?
Does it actually put a lot of extra effort to figure out dependencies in a lot of situations that pip
doesn't?
Edit: It doesn't
As it is now, pip doesn’t have true dependency resolution, but instead simply uses the first specification it finds for a project.
https://pip.pypa.io/en/stable/user_guide/#requirements-files
Pip doesn't have dependency resolution.
@David-OConnor It doesn't? But when I install a package in a blank environment, I usually see a lot of packages installed. Isn't that dependency resolution?
That's the extent of it - it'll install sub dependencies, but whichever one you install last wins. Tools like Poetry, Cargo, npm, pyflow etc store info about the relationship between all dependencies, and attempt to find a solution that satisfies all constraints. The particular issue of this thread is that the python ecosystem provides no reliable way of determining a packages dependencies without installing the package.
Cool, thank you for clarifying :-)
the python ecosystem provides no reliable way of determining a packages dependencies without installing the package
When a package is in WHEEL format, I see in the dist
directory a METADATA
file which contains:
Requires-Dist: click
That seems to be the dependency of the package (see pep-0345). Isn't that a way to get the dependencies without installing the package?
Anecdotally, I've found that if dependencies are listed listed there, they're accurate. Here are the issues:
My proposal is to use the metadata if it's present, and if not, build the package, determine what the dependencies are and cache, but most people are more conservative, ie pt #2 is a dealbreaker.
We had a discussion higher up in the thread about this, if you'd like more info.
@David-OConnor, what's your suggestion for resolving things in the immediate term. How can I determine which package is causing the slow down? I am more than happy to make a PR to whichever project that is, but as it is now, any changes to pyproject.toml takes upwards of 20 minutes. When I run with -vvv
, I see 1: derived: pyyaml (^5.3.1)
as the last line before it hangs for several minutes, but I would assume you are doing installation asynchronously or something.
@gcoakes I don't have the exact dep graph, but from running it through the package manager I use, there are conflicting requirements for six
: a constraint somewhere requires >= 1.13.0, and another requires 1.0.0 exactly. Poetry doesn't install more than one version of a dependency, so it's unsolvable. Not sure why it doesn't just say that though instead of hanging.
Just to add another data point to the conversation. Running poetry update
on many of our projects now takes > 15 minutes.
I understand that doing a comparison between pip and poetry install is not an apples for apples comparison, and also that there are many variables outside poetry's control - however it is hard to believe that 15 minutes resolving a small number of dependencies is unavoidable.
I created a vaguely representative list of dependencies for our projects and put the identical deps in both a pyproject.toml
(see https://gist.github.com/jacques-/82b15d76bab3540f98b658c03c9778ea) and Pipfile
(see https://gist.github.com/jacques-/293e531e4308bd4d6ad8eabea5299f57).
Poetry resolved this on my machine in around 10-11 minutes, while pipenv did the same in around 1 - 1:15 minutes. This is a roughly 10x improvement.
Unless I'm missing a big part of the puzzle here both pipenv and poetry are doing similar dependency resolution, and are working from the same repositories, so there is no external reason the performance should be this different. Would be great to see prioritising this issue and some of the proposed fixes that are ready to merge e.g. https://github.com/python-poetry/poetry/pull/2149
P.S. thanks for making an awesome tool, poetry has made our lives better since we started using it!
Maybe a stepping stone to a solution could be to add a flag to show some more info regarding dependency resolution - e.g. for each package, how long it took, and issues encountered/processes used. This would at least let us see where slowdowns are coming from, and potentially let us send PRs to other projects to provide better/more appropriate metadata?
geopandas seems to take a particularly long time. This was on a new project as the first dependency:
```Updating dependencies
Resolving dependencies... (3335.6s)
In my case the slow dependency resolution in Poetry was related to an IPv6 issue (also see this related answer on StackOverflow). Temporarily disabling IPv6 solved it. On Ubuntu this can be achieved using the following commands:
sudo sysctl -w net.ipv6.conf.all.disable_ipv6=1
sudo sysctl -w net.ipv6.conf.default.disable_ipv6=1
Follwing @lmarsden suggestion I managed to speed up the process by making sure that sites/servers that prefer ipv4 use ipv4. On ubuntu I modified /etc/gai.conf
by removing #
(uncommenting) the following line:
# precedence ::ffff:0:0/96 100
I just noticed that the issue for me seemed related to using boto3 without specifying a version in the package I was importing. So I had package A that I built using poetry with boto3 = '*'
. That did seem to resolve fairly quickly. But when I tried to import package A into a new package, B, it took >10 minutes to resolve (if it would ever finish). I specified the version used by package A for boto3 in package B, and it resolved my dependencies in < 30 seconds.
Maybe a stepping stone to a solution could be to add a flag to show some more info regarding dependency resolution - e.g. for each package, how long it took, and issues encountered/processes used. This would at least let us see where slowdowns are coming from, and potentially let us send PRs to other projects to provide better/more appropriate metadata?
This is a pragmatic approach to the problem. Are there any logging / debugging flags we can enable to show installation metadata?
for any one who come from china mainland (如果你来自中国大陆), add this to pyproject.toml
[[tool.poetry.source]]
name = "aliyun"
url = "https://mirrors.aliyun.com/pypi/simple/"
default = true
for any one who come from china mainland (如果你来自中国大陆), add this to pyproject.toml
[[tool.poetry.source]] name = "aliyun" url = "https://mirrors.aliyun.com/pypi/simple/" default = true
Why?
for any one who come from china mainland (如果你来自中国大陆), add this to pyproject.toml
[[tool.poetry.source]] name = "aliyun" url = "https://mirrors.aliyun.com/pypi/simple/" default = true
Why?
A pypi mirror with faster network access in mainland China.
Well, "slow" is an understatement, I left poetry update
running overnight and it's still going at 100% CPU and using 10.04 GB of memory:
@intgr is there a pyproject.toml
you can share that reproduces this behavior?
@abn Sure, I'll submit a new issue soon. I managed to get it working, poetry update
now runs in 22 seconds.
Another example. Adding black
took 284 seconds.
% poetry add --dev black
Using version ^20.8b1 for black
Updating dependencies
Resolving dependencies... (284.1s)
Writing lock file
Package operations: 6 installs, 0 updates, 0 removals
• Installing appdirs (1.4.4)
• Installing mypy-extensions (0.4.3)
• Installing pathspec (0.8.0)
• Installing typed-ast (1.4.1)
• Installing typing-extensions (3.7.4.3)
• Installing black (20.8b1)
Unfortunately I can't share the pyproject.toml
.
I also experienced the same problem on a fresh debian 10.6 install. Funny enough everything runs fine on my linux-mint laptop (same poetry version 1.1.4). Also pip
itself seemed to be very slow.
Both pip
and poetry
started to being usable again once I disabled IPv6 with:
sudo sysctl -w net.ipv6.conf.all.disable_ipv6=1
sudo sysctl -w net.ipv6.conf.default.disable_ipv6=1
This disables IPv6 only temporarily until the next reboot. Obviously this only treats a symptom and not the root cause. Hopefully others can chip in and trace the breadcrumbs ...
See https://stackoverflow.com/questions/50787522/extremely-slow-pip-installs
Discussions here are quite interesting for a noob like me. I have a very naïve question though. Are packages built/published with poetry "correctly specifying their dependencies". In other words, imagine I only add packages built with poetry, will the resolving phase be lightening fast? Or will this still apply:
... the python ecosystem provides no reliable way of determining a packages dependencies without installing the package
I first came here because I though 40s for resolving a single package dependencies was slow, but when I see minutes and hours to the counter I suppose it is normal.
I guess it's not a good idea to use poetry for creating docker images (in CI pipelines for example)?
Packages built using Poetry (or wheel in general) presumably provides the required information to Pypi, but Poetry doesn't use this; it treats all packages as if they must be installed to verify dependencies.
As far as I know:
If your project and all its dependencies (and their dependencies) are available for your platform (Python interpreter minor version, operating system, and CPU bitness) as wheels, then it is the best case scenario. Because in a wheel, the dependencies are defined statically (no need to build a _sdist_ to figure out what the exact dependencies are, for example).
You, as the developer (maintainer) of a project, the best you can do to help lower the difficulty of dependency resolution for everyone else, is to distribute the wheels of your project for as many platforms as possible (upload the .whl
files to _PyPI_). Often projects (libraries, applications) are made of pure Python code (no C extension for example) so just 1 wheel is enough to cover all platforms.
@David-OConnor Is there a technical reason for that? Isn't it possible to check wether a package correctly specifies its dependencies?
Is there a technical reason for that?
@cglacet What do you mean?
@MartinThoma I was asking why "Poetry doesn't use this (information)".
Ultimately, the issue lies with PyPi having no flag to distinguish properly-specified (ie wheel) vs improperly. Poetry accepts this, instead of attempting to work around it.
@cglacet I think what @David-OConnor (please correct me if I'm wrong) might be referring to, is the difference between source distributions (_sdist_) and pre-built distributions (such as _wheels_).
In short and simplified:
On _PyPI_ there are 2 types of distribution formats: _sdist_ and _wheel_. For our concern here (dependency resolution), the meaningful difference between these 2 formats is that the info contained in _sdist_ is not reliable (for various reasons, legitimate ones and some much less so). Now when _poetry_ is doing dependency resolution for project App
that depends on Lib
, and it so happens that Lib
is only available as _sdist_, _poetry_ needs to build that _sdist_ locally (which can take a large amount of time), to figure out if Lib
has dependencies of its own and what they really are. If Lib
were to be available as _wheel_, then it would be much easier for _poetry_ to figure out Lib
's dependencies, because the _wheel_ format specifies that all meta-information it contains is static and entirely reliable, and as a consequence no build step is necessary for the dependency resolution of _wheels_.
Yep. Anecdotally, if deps are specified at all on PyPi, they're probably accurate. If not, it could mean either deps aren't specified, or there are no deps.
Pypi not fixing this is irresponsible. Yes, I'm throwing spears at people doing work in their free time, but this needs to be fixed, even if that means being mean.
@sinoroc @David-OConnor So if I get it correctly there are two problems, 1) we need to retrieve the whole package to answer any question about it, 2) answering dependency questions could potentially take some time because you need to open some files. I can very well understand why 1) is a problem because package registry are slow, but why is 2) really a problem?
Ins't there a cache for storing/reading already installed packages? I mean, when I run pip install
on my machine (or even on gitlab) it rarely download the package (because I already have it stored in cache). So unless I run some package updates this should only sum up to problem 2 (for which you can build a cache too package name -> dependencies
).
Are there some references I could use to read more about this without adding interferences to this thread (I find it interesting but that's probably off-topic for maintainers or even most users).
Thanks again for your time.
@David-OConnor
if deps are specified at all on PyPi, they're probably accurate. If not, it could mean either deps aren't specified, or there are no deps.
I do not understand what you mean. Dependencies are not specified in _PyPI_, they are specified in the distributions.
_PyPI_ does have some insight into what is contained in those distributions. But if that insight is based on the unreliable info contained in the _sdist_ then it is unreliable insight.
Short of building wheels of all existing _sdist_ for all existing combinations of 1. Python interpreter minor version, 2. operating system, 3. CPU bitness, there is no way _PyPI_ can deliver reliable information for the dependency resolver.
Pypi not fixing this is irresponsible. Yes, I'm throwing spears at people doing work in their free time, but this needs to be fixed, even if that means being mean.
The fix is here: use _wheels_! Anyone (you) can go help any project to help them build wheels and upload them to _PyPI_. I do not think this is _PyPI_'s role to intervene here. What other solution do you have in mind?
@cglacet
So if I get it correctly there are two problems, 1) we need to retrieve the whole package to answer any question about it, 2) answering dependency questions could potentially take some time because you need to open some files. I can very well understand why 1) is a problem because package registry are slow, but why is 2) really a problem?
1. Yes. True for both wheels and sdists. They have to be downloaded. Although there is some ongoing work that would result in the possibility to skip the download for the wheels.
2. Yes and no. True for both wheels and sdists, these archives have to be "_opened_" and some files have to be read to figure if there are dependencies and what they are. But this is not the part that is slow. The slow part, is that for _sdists_ (not for _wheels_) just opening the archive and reading some files is not enough, those files have to be built (execute the setup.py
for example) and in some cases a resource intensive compilation step is necessary (C extensions for example need to be compiled with a C compiler which is usually the very slow bit of the whole process).
Ins't there a cache for storing/reading already installed packages?
As far as I know, there is and subsequent dependency resolutions for the same projects should be faster (download and build steps can be partially be skipped). The wheels built locally in previous attempts are reused.
Are there some references I could use to read more about this without adding interferences to this thread (I find it interesting but that's probably off-topic for maintainers or even most users).
Yes, a bit off-topic but I believe it is helpful for the next users wondering the slow dependency resolution to read some insight into why.
Some good reading I could find on the spot:
Update:
Thinking about it more, I realize I might have mischaracterised things. Getting the metadata (including dependency requirements) out of an _sdist_, does not require compiling the C extensions (setup.py build
). It should be enough to get the egg info (setup.py egg_info
).
Ref https://pypi.org/pypi/requests/json info -> requires_dist
There is no reason a package manager needs to download, install, and examine each package each user installers.
I've already implemented a solution, which is the cache and dependency manager I've posted earlier. See how package managers in other languages like Rust handle this. It builds packages/examines dependencies once total for each package/version, then stores it online. Even if all packages switch to wheels, the problem won't be solved until we have a queryable database of the dependencies... ie PyPi.
Ref https://pypi.org/pypi/requests/json _info_ -> _requires_dist_
There is no reason a package manager needs to download, install, and examine each package each user installers.
Yes. I forgot about that. I think _poetry_ uses _PyPI_'s JSON API. _pip_ doesn't. But the same story again, this info is only available for pre-built distributions (_wheels_), and not for _sdists_. Those still need currently to be downloaded and built locally (not installed).
I've already implemented a solution, which is the cache I've posted earlier.
You mean your _pydeps_ project? I should look into that, I do not know it yet. There is for sure some room for improvement on _PyPI_'s side, I do not think anyone would deny that, but it is a slow process.
As far as I know, _PyPI_ would like to stay out of the business of building _sdists_. Maybe some 3rd party organisation would be willing to do that work and deliver the results to _PyPI_. There are platforms such as libraries.io
who could be good candidates for such work. They already built a DB of dependencies (no idea how reliable it is).
Even if all packages switch to wheels, the problem won't be solved until we have a queryable database of the dependencies... ie PyPi.
Well the database is "_queryable_" via the JSON API as you have shown. So that is already done. Isn't it?
_[I am writing this of the top of my head according to the bits of info I have gathered here and there along the way. I do not have insight into all the processes involved, so there might be some inaccuracies. Feel free to correct me.]_
Yep, I was referring to Pydeps.
You brought up a point I hadn't considered: Different dependencies depending on OS etc. That sounds like an important consideration I haven't looked in to. Is this common, in your experience? Ie different dep sets for a single package/version.
The database on the JSON API queryable, but the info is only useful if there's at least 1 dependency listed. Otherwise, it might be that there are no deps. (Which is good info), or simply that they're not specified, ie due not not having a wheel. A simple fix would be for the PyPi requires_dist field to return different things for these cases. Perhaps an empty list if no deps, null etc if not specified. That alone would let Poetry etc reduce the cases it has to download and build for.
Different dependencies depending on OS etc. That sounds like an important consideration I haven't looked in to. Is this common, in your experience? Ie different dep sets for a single package/version.
Yes, it is problematic. I think there are some setup.py
files that look like this (pseudo code):
dependencies = []
if sys.platform == 'linux':
dependencies.append('This')
elif sys.platform == 'windows':
dependencies.append('That')
setuptools.setup(install_requires=dependencies)
That makes an _sdist_ completely unreliable. It has to be built on the target platform to know what the dependencies truly are. The correct way is (pseudo-code):
DEPENDENCIES = [
'This; sys_platform == "linux"',
'That; sys_platform == "windows"',
]
setuptools.setup(install_requires=DEPENDENCIES)
which is much more reliable. Still not 100% reliable, but that would be somewhat workable. Truth is: as long as there is an executable setup.py
file in the _sdist_, then the resulting metadata can not be guaranteed until after it is indeed executed. We are talking about _setuptools_ here. Other build backends (such as _poetry_) do not directly rely on an executable script, so things are much more static in the _sdist_.
the info is only useful if there's at least 1 dependency listed. Otherwise, it might be that there are no deps. (Which is good info), or simply that they're not specified, ie due not not having a wheel. A simple fix would be for the PyPi _requires_dist_ field to return different things for these cases. Perhaps an empty list if no deps, _null_ etc if not specified. That alone would let Poetry etc reduce the cases it has to download and build for.
Ah, I didn't know that. I never really looked at that JSON API. That might explain why _pip_ does not use it (which I thought was odd).
I'm sorry - I'd completely forgotten about the sys_platform
flag. I do account for that in Pyflow. So, it is specified on PyPi for packages built with wheels, and package managers like Poetry can use this info.
The fix is here: use wheels! Anyone (you) can go help any project to help them build wheels and upload them to PyPI. I do not think this is PyPI's role to intervene here. What other solution do you have in mind?
Usually in these situations what works best is lobbying for a better (simpler/clearer/broader/...) standards to be promoted by the authority. In this case I guess its PyPA? Or maybe both PSF and PyPA?
I never worked in any project that actively creates python packages so I might be wrong, but from my point of view packaging is not something that is currently crystal clear. It seems like there are way too many ways of doing one thing, so people like me don't really now what they should do. Because in the end, we have no idea about our choices impacts. Which, ultimately, leads to the problems you are talking here.
For what its worth, I find poetry's documentation to be a good way of preaching for better solutions. It's so clean it makes you want to make things cleaner.
From the user perspective its already a very good improvement to have a standards such as PEP 518 -- Specifying Minimum Build System Requirements for Python Projects, but that's apparently not sufficient? Or maybe the problem you are debating here only arise for older projects ?
What about this news: New pip resolver to roll out this year?
@cglacet
Usually in these situations what works best is lobbying for a better (simpler/clearer/broader/...) standards to be promoted by the authority. In this case I guess its PyPA? Or maybe both PSF and PyPA?
Yes. That would be _PyPA_. They know all about these kinds of issues. They are actively working on solving them. These things take time. There is no need to lobby. There is need to participate with writing good documentation and good code. And most important of all, donate to fund developers to work full time on it.
What about this news: New pip resolver to roll out this year?
This is a part of the work, yes. Once this rolls out, _PyPA_ will be able to move on to solving other packaging issues. This work was partly done thanks to financial grants (money donations).
You can read more about related, ongoing work (these links are only a short, semi-random selection, but they are all somewhat intertwined):
I never worked in any project that actively creates python packages so I might be wrong, but from my point of view packaging is not something that is currently crystal clear. It seems like there are way too many ways of doing one thing, so people like me don't really now what they should do. Because in the end, we have no idea about our choices impacts. Which, ultimately, leads to the problems you are talking here.
Yes. From my point of view, issue is that the overwhelming majority of advice found on the internet (articles, blogs, StackOverflow answers, etc.) is either outdated, misguided, or plain wrong.
A good reference is this website (from _PyPA_ itself):
If you follow _poetry_'s workflows you are already in very good hands, and you should not worry about anything too much. Upload wheels! Well, you need to upload both _sdists_ and _wheels_. The _sdists_ are still very important, do not forget them.
For what its worth, I find poetry's documentation to be a good way of preaching for better solutions. It's so clean it makes you want to make things cleaner.
Yes, it is also doing a very good job at getting rid of outdated, bad practices.
_[Sadly somehow, there are always users pushing for poetry to adapt to their own broken workflows, instead of users changing their habits for the clean workflows of poetry. It is a constant battle.]_
From the user perspective its already a very good improvement to have a standards such as PEP 518 -- Specifying Minimum Build System Requirements for Python Projects, but that's apparently not sufficient? Or maybe the problem you are debating here only arise for older projects ?
Yes, this was another great step forward. Python packaging ecosystem is improving a lot these days.
And yes, exactly, a great hurdle is keeping the compatibility with older projects. This slows down the work a lot. In particular older, broken _setuptools_ / _distutils_ setup.py
based projects are very problematic. Although it is nowadays entirely possible to write clean, well-behaved, and standard-conform _setuptools_ based projects.
_[I am writing this of the top of my head according to the bits of info I have gathered here and there along the way. I do not have insight into all the processes involved, so there might be some inaccuracies. Feel free to correct me. Feel free to ask me for clarifications.]_
If you are looking for projects to help contribute to that don't yet have wheels, this site lists the top 360 packages, a handful of which don't have wheels: https://pythonwheels.com/
The discussion on python.org might be interesting for some as well: Standardized way for receiving dependencies
Upload a poetry.lock
to CI will avoid resolving dependencies.
Here is a toml cost more than 700 secs in 'resolving dependencies'
[[tool.poetry.source]]
name = "aliyun"
url = "https://mirrors.aliyun.com/pypi/simple/"
default = true
[tool.poetry]
name = "omega"
version = "0.7.0"
description = "Blazing fast data server for Zillionare"
authors = ["jieyu <[email protected]>"]
license = "MIT"
[tool.poetry.dependencies]
python = "^3.8"
apscheduler = "^3.6"
arrow = "^0.15"
cfg4py = "^0.6"
"ruamel.yaml" = "^0.16"
aioredis = "^1.3"
hiredis = "^1.0"
numpy = "^1.18"
aiohttp = "^3.6"
pytz = "^2020.1"
xxhash = "^1.4"
zillionare-omicron = "0.2.0"
aiocache = "^0.11"
sanic = "^20.3"
psutil = "^5.7"
termcolor = "^1.1"
gino = "^1.0"
asyncpg = "^0.20"
sh = "^1.13"
[tool.poetry.dev-dependencies]
flake8 = "^3.8.4"
flake8-docstrings = "^1.5.0"
tox = "^3.14"
coverage = "^4.5.4"
Sphinx = "^1.8.5"
black = "^20.8b1"
pre-commit = "^2.8.2"
[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"
No difference with/without private repo defined. Hope helps.
One thing @zillionare's example has in common with mine is black
. They recently had a bug so they pulled off the wheel from PyPI: https://pypi.org/project/black/20.8b1/#files. This could be contributing to this.
One thing @zillionare's example has in common with mine is
black
. They recently had a bug so they pulled off the wheel from PyPI: https://pypi.org/project/black/20.8b1/#files. This could be contributing to this.
I did the test, sounds like to me blakc 20.8b1 is not the culprit.
My steps:
black = xxx
from pyproject.tomlI'm new to poetry, so I'm not sure if my test steps are right. And sounds to me clear cache is not nessecary.
---- update 2020/11/22
Today I tried on another machine:
it's still running...
@zillionare Have went through the same issue, fixed it after setting up VPN to kill GFW.
@zillionare Have went through the same issue, fixed it after setting up VPN to kill GFW.
Guess this is the root cause for my case too. I have setup a proxy for pypi, it still runs slow (with -vvv options on, I can see it's progressing), and I know how it works now. A lot of files need to be download before the deps is resolved.
Hope poetry can support mirror pip source, so the performance issue will be solved.
@cglacet
Most helpful comment
Maybe a stepping stone to a solution could be to add a flag to show some more info regarding dependency resolution - e.g. for each package, how long it took, and issues encountered/processes used. This would at least let us see where slowdowns are coming from, and potentially let us send PRs to other projects to provide better/more appropriate metadata?