Poetry: Allow user to override PyPI URL without modifying pyproject.toml

Created on 26 Nov 2019  Â·  33Comments  Â·  Source: python-poetry/poetry

  • [X] I have searched the issues of this repo and believe that this is not a duplicate.
  • [X] I have searched the documentation and believe that my question is not covered.

Feature Request

Similar to one of the proposals in https://github.com/sdispater/poetry/issues/1070 (which was recently marked stale), Poetry should allow the user to override the default repository URL (PyPI). The user should be able to do this without modifying pyproject.toml.

In certain environments (e.g. corporate networks) PyPI is unavailable, but a mirror exists. These users should be able to specify the address of the mirror without modifying project files, as the mirror settings are irrelevant to contributors in different environments. Similarly, if a mirror user adds a dependency, the generated lock file should not list the user's mirror as the source. The source should remain the default (which in most cases would refer to standard PyPI).

This feature exists in pipenv, see https://github.com/pypa/pipenv/issues/2075 (where the need for this functionality is described in greater detail) and https://github.com/pypa/pipenv/pull/2281.

Feature

Most helpful comment

Poetry needs the url information of a dependency for a private repository.

I agree. This contributes to usability of poetry as it provides complete information where to install from.

it cannot guarantee the determinism of the lock file since two files, even with the same name, may not have the same information.

{file = "yarl-1.4.2.tar.gz", hash = "sha256:58cd9c469eced558cd81aa3f484b2924e8897049e06889e8ff2510435b7ef74b"}

I thought, that the hash above is calculated from the package file content and does not depend on filename and url and thus it allows to check, that two files (even from different urls) provide exactly the same information.

Treat source url the way git treats remote configuration

The analogy is not perfect, but it is very close to the use case.

git allows to clone a repository, have initial remote configured, but it is easy to change the remote to another git server (e.g. from Github to GitLab or alternative repo name) and all will still work. If I configure the remote badly, git will complain immediately at the first command dealing with remote server, because the commit hashes will not match.

I hope, poetry will once allow me to keep existing pyproject.toml and poetry.lock untouched and accept alternative url (e.g. configured via env variable) of my private pypi for given source (name) to do sort of "temporary git remote reconfiguration".

If my alternative private pypi url serves exactly the same packages for my installation (checked by comparing hashes), all shall run as usually, if alternative url provides different package content, it shall fail.

Such level of determinism would still provide all the service I appreciate from poetry today and would provide enough flexibility to fit common CI/CD processes.

All 33 comments

This is essential for many business uses, not simply when PyPI is unavailable but also in any case where the organization has its own libraries (not uncommon). Note that since some private repo tools (e.g. Nexus) use basic auth URLs, putting the repo URL into a project config file is absolutely inappropriate and a global config or environment variable (e.g. pip.conf, PIP_INDEX_URL) is necessary.

https://github.com/sdispater/poetry/issues/625 also seems related.

Something I tried, which might be nice to make work:

poetry config repositories.pypi https://.../+simple/

@sdispater, I wonder, if #1070 elaboration of requested feature is usable as is or it needs some update. If so, I volunteer to join effort with one or few others, have a telco and try to move this request on as this is one of two showstoppers for our usage of poetry (the other is managing versions of the resulting package - but this I definitely do not want to discuss here).

a simple patch for ci/cd:

# install dependencies from lock file
COPY pyproject.toml poetry.lock /opt/app/

RUN sed -i "s/${origin_pypi_url}/${private_pypi_cache_url}/g" poetry.lock
RUN sed -i "s/${origin_pypi_url}/${private_pypi_cache_url}/g" pyproject.toml

RUN poetry install -vvv

@lovepocky Wouldn't that break the content-hash in poetry.lock? I think this might cause poetry to refresh the lock file.

Poetry needs the url information of a dependency for a private repository. Otherwise, it cannot guarantee the determinism of the lock file since two files, even with the same name, may not have the same information.

And if it's a question of not storing the private index credentials in the pyproject.toml, only the base url should be put in the pyproject.toml file. The credentials should be configured separately vie the config command or via environment variables, see https://python-poetry.org/docs/repositories/#configuring-credentials

Poetry needs the url information of a dependency for a private repository. Otherwise, it cannot guarantee the determinism of the lock file since two files, even with the same name, may not have the same information.

The idea here is the private repo specified as the override will be a PyPI mirror. The packages served by the mirror will be exact copies of the ones from https://pypi.org/, without any modifications. Anything else belongs in a separate repo, with URLs included explicitly.

Poetry needs the url information of a dependency for a private repository.

I agree. This contributes to usability of poetry as it provides complete information where to install from.

it cannot guarantee the determinism of the lock file since two files, even with the same name, may not have the same information.

{file = "yarl-1.4.2.tar.gz", hash = "sha256:58cd9c469eced558cd81aa3f484b2924e8897049e06889e8ff2510435b7ef74b"}

I thought, that the hash above is calculated from the package file content and does not depend on filename and url and thus it allows to check, that two files (even from different urls) provide exactly the same information.

Treat source url the way git treats remote configuration

The analogy is not perfect, but it is very close to the use case.

git allows to clone a repository, have initial remote configured, but it is easy to change the remote to another git server (e.g. from Github to GitLab or alternative repo name) and all will still work. If I configure the remote badly, git will complain immediately at the first command dealing with remote server, because the commit hashes will not match.

I hope, poetry will once allow me to keep existing pyproject.toml and poetry.lock untouched and accept alternative url (e.g. configured via env variable) of my private pypi for given source (name) to do sort of "temporary git remote reconfiguration".

If my alternative private pypi url serves exactly the same packages for my installation (checked by comparing hashes), all shall run as usually, if alternative url provides different package content, it shall fail.

Such level of determinism would still provide all the service I appreciate from poetry today and would provide enough flexibility to fit common CI/CD processes.

As above, my use case is a private pypi mirror. At some stage, the public pypi may even be firewalled off, and it doesn't feel right to have to have a different pyproject.toml for use behind a firewall as for in front of it, for the same code.

I fully agree with https://github.com/python-poetry/poetry/issues/1632#issuecomment-568199401.

IIRC poetry is using pip already under the hood for a certain part of its functionality. Wouldn't it be sufficient if poetry would simply adhere to the pip.conf (Unix-derived) or pip.ini (Windows) [global] configuration items index, index-url, and trusted-host?
(see https://pip.pypa.io/en/stable/user_guide/#config-file)

@jhbuhrman
https://github.com/python-poetry/poetry/issues/1554#issuecomment-553113626 said poetry will not going to respect pip.ini

I'm a little confused. I would've assumed that this would've been sufficient:

poetry config repositories.REPO_NAME https://artifactory.XXX.com/artifactory/api/pypi

But it seems that setting the config globally doesn't negate the need for setting the URL in each pyproject.toml file. Is that by design or is it a bug? If it's by design, then what's the rationale behind it?

This feature would be very useful for scenarios where jwt for authenticating with the registry is prepended to the beginning of the repo url, AWS codeartifact for example builds repo urls like so:

https://aws:<JWT>@<domain>-<aws-account>.d.codeartifact.eu-west-1.amazonaws.com/pypi/python/simple/

Current setup that requires poetry users to define this as a static url inside of pyproject.tml makes it impossible to use (because these are sessioned to ~12hours, JWT gets re-rolled afterwards)

I see the workaround to the effect off:

re log-in whenever the session expires

but that still requires me to set the url on every project rather than once and for all for my docker image builder

@swist I think, that in this case you will manage with existing poetry as the part in front of @ is username (aws) and password (<JWT>), which can be edited out of pyproject.toml file. poetry will store it either in file ~/.config/pypoetry/auth.toml or in system credential store such as in seahorse (I am working in Debian Buster).

Just configure url in form of https://<domain>-<aws-account>.d.codeartifact.eu-west-1.amazonaws.com/pypi/python (for me the form without the /+simple suffix works)

Turns out there's a magic envvar (should have finished reading the docs) that does the auth. Still doesn't quite solve the problem when you're accessing the same repository via different vpc endpoints (for example building your images in multiple clusters but pushing to same registry) - that would still require a rewrite of pyproject.toml (and the lockfile I suppose) at build time

Turns out there's a magic envvar (should have finished reading the docs) that does the auth. Still doesn't quite solve the problem when you're accessing the same repository via different vpc endpoints (for example building your images in multiple clusters but pushing to same registry) - that would still require a rewrite of pyproject.toml (and the lockfile I suppose) at build time

@swist Do you mind sharing how exactly you're using Poetry with CodeArtifact? Ignoring the rolling creds bit (I'm aware of it), and assuming a hard coded or configured set of creds, that's fine. I'm having a hard time understanding how to get Poetry to work without getting 403's and such (and yes, I've seen the docs for using config, env vars, etc).

Apologies for piggybacking off this thread, I'd message directly or open an issue but looks like you have something already :)

@m1hawkgsm turns out there are two separate urls you need to use.

If you want to pull you need to set the url to be

[[tool.poetry.source]]
name = "my_org"
url = "https://my_org-my_account_id.d.codeartifact.region.amazonaws.com/pypi/repo_name/simple/"

But if you want to push you want do the following cli call:

poetry config repositories.myorg https://my_org-my_account_id.d.codeartifact.region.amazonaws.com/pypi/repo_name

Is there any update on this? On the one hand, this ticket is still open, on the other hand, this comment seems to hint that this might never be implemented.

As another data point, I tried to hack around this by doing find/replace for all mentions of pypi.org with our Nexus URL in POETRY_INSTALL/lib/poetry/repositories/pypi_repository.py. It turns out that Nexus doesn't currently support the package JSON endpoint, so using Nexus would require using the LegacyRepository.

Long story short, it would be great if poetry could allow overriding the PyPI URL, but also allow specifying if poetry needs to use the legacy endpoint for the repository

Update: seems like I got a workaround working

  1. Edit POETRY_INSTALL_DIR/lib/poetry/factory.py:
@@ -88,6 +88,14 @@

             poetry.pool.add_repository(repository, is_default, secondary=is_secondary)

+        # Support alternate PyPI repository
+        # https://github.com/python-poetry/poetry/issues/1632
+        pypi_legacy_repository = config.get("repositories.pypi-legacy")
+        if pypi_legacy_repository:
+            source = dict(pypi_legacy_repository, name="pypi-legacy")
+            repository = self.create_legacy_repository(source, config)
+            poetry.pool.add_repository(repository, True, secondary=False)
+
         # Always put PyPI last to prefer private repositories
         # but only if we have no other default source
         if not poetry.pool.has_default():
  1. Run poetry config repositories.pypi-legacy NEXUS_URL.com/repository/pypi/simple

Hello,

  • I spent 2 days to evaluate poetry on my workstation and liked it very much.
  • Then I tried to get a build running in our company datacenter and nothing worked because there I may not access pypi.org directly.
  • Sadly enough, even running poetry install -vvv did not reveal the fact, that poetry tries to reach out to "the internet".
  • As a lot of others already stated using an internal mirror of pypi.org should be possible without putting another [[tool.poetry.source]] into each and every project.
  • poetry.lock holds the SHA sums of the wheel's content already, so no one would be able to intermingle here.
  • The proposal of @brandon-leapyear in https://github.com/python-poetry/poetry/issues/1632#issuecomment-723249167 looks very reasonable to me.
  • For pip I place one file into the default Python Docker image:
$ cat ~/.pip/pip.conf
[global]
cert=/etc/ssl/certs/ca-certificates.crt
index-url = https://repo.example.com/artifactory/api/pypi/pypi-mam/simple

and am done for good for all projects.

I am giving up on poetry, it is close to unusable in a shielded development environment with a Nexus, and the maintainer does not seem to understand the frequently brought up issues regarding this. This is sad, because I think it has the greatest dependency-resolver around.

Keep the :+1: votes on the issue coming, it could eventually land in the feature roadmap. It's already in the first page of issues when you sort by :+1:

You could also take over or upvote this PR #2074 which, to me, is even better than what this feature request is asking for.

My first impression is that this PR https://github.com/python-poetry/poetry/pull/2074 is going in the right direction as well (I do not know if it is the right implementation, I did not look at the code). I guess if you all manage to collaborate on such a PR, it might get released quicker.

I'd also like to draw attention to this _PyPA_ discussion. In my opinion it gives good background insight why the proposed changes here are the right way to go, and why _indexes_ do not belong in pyproject.toml. I also discussed this in https://github.com/python-poetry/poetry/issues/3355#issuecomment-726683158.

Coming from Java world, Apache Maven, one of the two de facto standard build tools has the ability as well to define additional repositories for consumption in the pom.xml, the equivalent of pyproject.toml.

However it is considered a bad practice to use the element because it makes scanning stuff for malware much more difficult and because your projects may start to pull stuff from everywhere in the internet.

For Maven you need to reserve a namespace at Maven Central, normally for a reverse-domain you somehow own (poetry could probably reserve com/github/python-poetry/ or org/python-poetry/ for example)

Artifacts are referenced with a complete path, i.e. something like org/python-poetry/poetry-core/1.1.4.
However a simple caching mirror (use Nginx e.g.) pointing to https://repo1.maven.org/maven2/ is sufficient to do all caching.

You just state the location of your mirror in a user's.m2/settings.xml file like you do for pip and are done.

Even git allows this kind of mirroring centrally in the user's .gitconfig: https://coderwall.com/p/sitezg/force-git-to-clone-with-https-instead-of-git-urls

@mfriedenhagen it's pretty similar in Docker world. Images from Docker Hub can be used without any domain prefix and it's considered a bad practice to add different repositories to the default namespace - build results should be repeatable on different environments and same image name pointing to different stuff breaks that idea. Other sources are using full domain name as a namespace. But it's still possible to set global registry mirror by adding it to Docker daemon settings - it's the same behaviour as PIP_REGISTRY_URL.
Also keep in mind that as long as you don't use private repo poetry's legacy installer seems to work just fine with PIP_REGISTRY_URL env or config file as it just calls pip without specifying registries. It's private repo where problem arises as then pypi is added as an extra registry unless default registry is set. It doesn't break builds - just makes them really slow due to pip trying to connect to pypi (in my case: from 2-3min to 50min when using 2 private packages).

@Agalin, maybe I do misunderstand you here:

  • I already had our internal mirror in pip.conf but poetry was ignoring this.
  • I now set up a small test project (https://github.com/mfriedenhagen/poetry-pypitest) poetry ignores both .pip.conf and PIP_REGISTRY_URL during install.
  • In both cases I get:
HTTPSConnectionPool(host='pypi.org', port=443): Max retries exceeded with url: /pypi/zipp/3.4.0/json (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f2645899f60>: Failed to establish a new connection: [Errno 110] Connection timed out'))

I have tried poetry. Like it. Want to use it, am bitten too hard when trying to use a aws codeartifact repository. I can't keep pasting the key into the pyproject.toml and I can't check it into git this way. People add their private repos to the their pip.conf. Please let poetry read the pip.conf or add a flag for that.

I haven't read the entire thread here, but I feel like automatically adding the proper [[tool.poetry.source]] entry to the pyproject.toml file during poetry init would be useful and not invasive to existing projects. It would be pretty much running this after init. Pardon my shell.

url=$(pip config get global.index-url)
[ -n "$url" ] && echo "
[[tool.poetry.source]]
name = \"pypi-mirror\"
url = \"$url\"
default = true
" >> pyproject.toml

I haven't read the entire thread here, but I feel like automatically adding the proper [[tool.poetry.source]] entry to the pyproject.toml file during poetry init would be useful and not invasive to existing projects. It would be pretty much running this after init. Pardon my shell.

```shell
url=$(pip config get global.index-url)
[ -n "$url" ] && echo "
[[tool.poetry.source]]
name = "pypi-mirror"
url = "$url"
default = true
" >> pyproject.toml

The mirror should not be added to pyproject.toml, since it's likely org-internal. From the description:

In certain environments (e.g. corporate networks) PyPI is unavailable, but a mirror exists. These users should be able to specify the address of the mirror without modifying project files, as the mirror settings are irrelevant to contributors in different environments.

I haven't read the entire thread here, but I feel like automatically adding the proper [[tool.poetry.source]] entry to the pyproject.toml file during poetry init would be useful and not invasive to existing projects. It would be pretty much running this after init. Pardon my shell.

url=$(pip config get global.index-url)
[ -n "$url" ] && echo "
[[tool.poetry.source]]
name = \"pypi-mirror\"
url = \"$url\"
default = true
" >> pyproject.toml

aws codeartifact and many others use the security token in their url - this would mean you'd be storing the current security key (invalid after one day) in the toml - and therefore in git too, and would have to constantly manually change it. At worst it's a security risk, at best it's manual and laborious - the exact thing that you want tooling, such as poetry to make go away.

I completely agree with @mcsheehan and @JacobHenner. The only thing which currently works for me in a corporate environment is to run:

poetry config experimental.new-installer false

Then, as @Agalin pointed out, poetry just seems to use pip. So this works in our data center with a .pip/pip.conf like this:

[global]
cert=/etc/ssl/certs/ca-certificates.crt
index-url = https://artifactory.example.com/artifactory/api/pypi/pypi-mam/simple

pypi-mam is a view which aggregates both a private Python repository and pypi.org.

I’d say that existing poetry config is nearly sufficient. There is nothing wrong with source in pyoroject.toml and source name in the lock file. Just don’t require it having an url.
Then there is already existing repositories config. It stores url and credentials of a named repo. It’s just needed to merge it with sources at runtime…
No entry in repositories config? Use toml data. Repository matches source name? Use url and credentials from that repo.

This is a huge blocker for us to completely move to poetry - we're using a combo of poetry and twine at the moment. Poetry for publishing to our local (on prem) pypi server, and twine for publishing to CodeArtifact. This is a real pain point. If there's an agreed-upon spec, I can take a crack and solve this issue...

Also (this is an afterthought) perhaps native poetry support for CodeArtifact can be developed by the AWS team (maybe reaching out to https://twitter.com/bellevuesteve)? It's in their interest as well :)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

sobolevn picture sobolevn  Â·  3Comments

jeremy886 picture jeremy886  Â·  3Comments

EdgyEdgemond picture EdgyEdgemond  Â·  3Comments

ghost picture ghost  Â·  3Comments

mozartilize picture mozartilize  Â·  3Comments