Pants: Add Poetry macros to generate Python requirements and constraints

Created on 19 Aug 2020  路  14Comments  路  Source: pantsbuild/pants

Some orgs use Poetry for dependency management because it has an excellent resolver and lockfile support. While they can translate from Poetry into the pip/Pex world of requirements.txt and constraints.txt, this requires running poetry export. Pants can automate some of that via macros, similar to the python_requirements() macro converting requirements.txt into python_requirement_library targets.

We want to distinguish between the high-level pyproject.toml, where the dependencies you want are declared, vs. poetry.lock, where all transitive deps are used. The former should be used to generate python_requirement_library targets, which get used for things like dependency inference. The latter should be used for a constraints file.

Strawman design:

# BUILD
poetry_requirements(
  pyproject="pyproject.toml",   # the default
  lock="poetry.lock",  # the default
  module_mapping={"my_dist": ["module1"]},   # if you want augment Pants's default module mapping
)

Relates to https://github.com/pantsbuild/pants/issues/11710 for better Pipenv support.

Q42020-idea user reported

Most helpful comment

@jsirois mentioned it may be possible to preserve the hashes with Pip and Pex. I think it's out of scope for this ticket, but a possible followup.

It works straight out of the gate in Pex, so if Pants doesn't get in the way and you're already generating a requirements.txt as part of this work, you could just add in hash checking from the get go as shown here: https://github.com/pantsbuild/pex/pull/1310

All 14 comments

@jsirois : This might relate to some of what you're looking at: would be good to sync with Eric on it when you get back.

This is a great feature to have, but not going to make it into 2.0. While this workaround isn't ergonomic, Poetry and Pipenv users can in the meantime use the export commands to export into requirements.txt and constraints.txt files.

We can pick this back up for 2.1 or 2.2, as our focus for the first 2.x releases is futher improving Python, before we venture out to new languages.

Another useful capability if integrating with Poetry and Pipenv is to allow for propagating the constraint ranges to the generated setup.py files when packaging a project. So, rather than setting the install_requires = {'foo==1.2.3'} when Poetry has foo = "^1.0.0", it would generate something like install_requires = {'foo>=1.0.0,<2'}.

Hi @stuhood (long time no see!), my company is evaluating pants as a build tool for some of our python-based ML repositories, which currently use Poetry for package/dependency management.

Is there any way this feature will make it into any of the upcoming Pants 2.X releases? And - especially if that answer is no - would having a few devs like us collaborate on the feature help improve its chances of getting released?

Thanks in advance, and let us know how we can help.

Hey @lucinvitae! Good to hear from you!

Is there any way this feature will make it into any of the upcoming Pants 2.X releases?

It is not currently prioritized: we updated on the roadmap recently over here: https://groups.google.com/g/pants-devel/c/F8Saug3BrFw/m/ZtWDP4YmDwAJ

And - especially if that answer is no - would having a few devs like us collaborate on the feature help improve its chances of getting released?

We would love to collaborate with you on this one!

As you've seen, Pants' existing support for lockfiles is relatively limited:

  1. they are manually generated
  2. only one is supported per repository
  3. only constraints.txt is supported

We have work scheduled to resolve the first two issues in the next month or two: https://github.com/pantsbuild/pants/issues/11165 ... but the last issue is what this ticket deals with.


There are a few very different designs for this feature, and @Eric-Arellano has offered to help sketch those out in the next few days: if you have any thoughts on how you'd like the integration to work, they'd be very helpful in the meantime!

Hey @lucinvitae, awesome to hear! As Stu said, we'd love the help and are eager to assist to pull this off.

--

The first step is checking that we're on the same page for what Poetry support entails. I've updated this PR's title and description to focus on Poetry, rather than including Pipenv. Could you please check that that's what you're envisioning?

Note that Pants will still use Pex to do the actual resolve, which uses pip under the hood. All this PR is doing is automating you as an admin having to run poetry export to generate a requirements.txt and constraints.txt file.

Likewise, this is a lossy conversion. A Poetry lockfile says specifically which file to download, e.g.:

 {file = "psutil-5.7.0-cp37-cp37m-win32.whl", hash = "sha256:d008ddc00c6906ec80040d26dc2d3e3962109e40ad07fd8a12d0284ce5e0e4f8"},

Instead, we will only be able to capture this information:

[[package]]
name = "psutil"
version = "5.7.0"
description = "Cross-platform lib for process and system monitoring in Python."
category = "main"
optional = false
python-versions = ">=2.6, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*"

For example, we would likely convert that into psutil==5.7.0 in a constraints.txt file. Is this lossiness okay with what you're looking for?

Instructions

We will be creating a macro, very similar to the python_requirements() and pipenv_requirements() macros. Both do similar things:

  1. In __call__, take some args that will get used in the BUILD file.
  2. Create a _python_requirements_file(), which is a somewhat hacky way that we ensure correct caching. All the targets you're creating will depend on this, so when pyproject.toml and poetry.lock change, we know to invalidate those targets. The target type is a bit of a misnomer, but it's fine to use here.
  3. Use Path().read_text() to read the file, then parse it. You want back pkg_resources.Requirement objects.

    • The parsing will probably be the trickiest part. We do have access to the toml library. It'd be valuable to see prior art from other tools, including how Poetry itself consumes this file. Definitely do feel free to talk through you're thinking on parsing before getting too far into coding - it's the biggest unknown imo.

  4. Some module_mapping code which can be copy and pasted.
  5. Convert each Requirement object into a python_requirement_library target.

I recommend modeling this as a single macro, something like poetry_project() from the PR description.

Unlike those two macros, we will be creating two types of targets: a python_requirement_library target for each entry in pyproject.toml, which is what gets used for things like dep inference; and a single _python_constraints() target representing the entirety of poetry.lock.

We should also create two _python_requirements_file() targets, one for the pyproject.toml and another for poetry.lock. Set the python_requirement_library targets to depend on the former, and _python_constraints to depend on the latter.

Then, add tests similar to https://github.com/pantsbuild/pants/blob/master/src/python/pants/backend/python/macros/python_requirements_test.py.

@lucinvitae FYI the prework has landed with https://github.com/pantsbuild/pants/pull/11724, so you should now be fully unblocked. I simplified the above instructions based on that landing :)

Two thoughts:

1) A plugin system was just merged into Poetry (python-poetry/poetry#3733). A plugin could probably be added to poetry that adds relevant stuff (whatever that looks like - I'm too new to say) in the BUILD files at the same time it is adjusting config in pyproject.toml.

2) Poetry is also written in python, so you could actually just reuse poetry's pyproject.toml / poetry.lock parsing logic. It looks like that is in the poetry-core library (thanks to OdobenusMaximus in the poetry discourse channel for pointing that out to me).

Thanks @cognifloyd! #2 is particularly compelling, that sounds great and you can add the library to 3rdparty/python/requirements.txt.

1 looks interesting, although is likely out of scope. It's much trickier to teach Poetry about Pants imo than to teach Pants how to read Poetry files via poetry-core.

@Eric-Arellano and @stuhood, thanks for the replies and detailed instructions. Replies/comments below:

We will be creating a macro, very similar to the python_requirements() and pipenv_requirements() macros.

e.g. poetry_requirements()?

Is this lossiness okay with what you're looking for?

I'm not sure yet. The "monorepo" approach requires more investigation as to whether or not it suits our internal project's needs. We're just now adding our first source-level dependencies to the internal project. Whether we lose the "hash"-level of detail for the project may not be a huge concern, but we'll see.

As for parsing:

The parsing will probably be the trickiest part. We do have access to the toml library. It'd be valuable to see prior art from other tools, including how Poetry itself consumes this file. Definitely do feel free to talk through you're thinking on parsing before getting too far into coding - it's the biggest unknown imo.

It looks like poetry-core is vendoring its dependencies here, which appears to be a subject of some contention, as seen in https://github.com/python-poetry/poetry/issues/2346. One vendored dependency includes tomlkit, which is used as a base library for parsing the pyproject.toml file (here). The lockfile parsing logic is as of this comment defined in the main poetry repo instead. Currently the class that includes the primary lockfile parsing logic is Locker. Some notes, mostly for myself:

  • Package parsing appears to be handled by the Package class.
  • Dependency and version parsing is mostly handled by Dependency (supports PEP 508 dep declarations).
  • Locker.locked_repository() does the heavy lifting, parsing the packages for each entry in the toml file with the section [[package]] key, as well as dependencies for each package.

There's some discussion to bring poetry.lock support into poetry-core here: https://github.com/python-poetry/poetry-core/pull/83.

Thanks for all that research @lucinvitae! I think it's a non-starter for us to depend on Poetry itself, and we do need to parse poetry.lock to get the constraints file. So, sounds like we need to do our own TOML reading one way or another. At that point, I think we can leave off poetry-core. The spec looks pretty simple to consume for both pyproject.toml and poetry.lock fwict, and we can of course always look to Poetry's code for how they do it. What do you think?

Is this lossiness okay with what you're looking for?

@jsirois mentioned it may be possible to preserve the hashes with Pip and Pex. I think it's out of scope for this ticket, but a possible followup.

e.g. poetry_requirements()?

I recommend calling it poetry_project(), where it's looking both at the top-level requirements and the lockfile for constraints.

--

We've had a couple more users ask for this Poetry integration in the past two weeks, so it's becoming a higher priority for the project and I'd love to help land this.

I'm wondering, how would it sound for you to focus on writing some generic code that reads poetry.lock and/or pyproject.toml, and then I can focus on wiring that up to Pants via the macro? Or perhaps if there are other ways I could help, like focusing on one of the two files while you look at the other? I'm happy to help however :)

@jsirois mentioned it may be possible to preserve the hashes with Pip and Pex. I think it's out of scope for this ticket, but a possible followup.

It works straight out of the gate in Pex, so if Pants doesn't get in the way and you're already generating a requirements.txt as part of this work, you could just add in hash checking from the get go as shown here: https://github.com/pantsbuild/pex/pull/1310

@Eric-Arellano to answer your question about consuming the lockfile:

I think using custom logic instead of adding a dependency on poetry from pants sounds reasonable. And poetry_project() seems like a decent name.

Regarding tackling the implementation:

I'm wondering, how would it sound for you to focus on writing some generic code that reads poetry.lock and/or pyproject.toml, and then I can focus on wiring that up to Pants via the macro? Or perhaps if there are other ways I could help, like focusing on one of the two files while you look at the other? I'm happy to help however :)

My team is still evaluating a monorepo architecture and hasn't committed to using Pants, so I probably can't commit to delivering this feature by a certain date. If you're comfortable with potentially a long wait, I think your proposal that I write some generic lockfile code would work. Or if - due to increased priority from users requesting this feature - you want to self-assign this issue, please do!

I think using custom logic instead of adding a dependency on poetry from pants sounds reasonable.

Great!

so I probably can't commit to delivering this feature by a certain date. If you're comfortable with potentially a long wait, I think your proposal that I write some generic lockfile code would work. Or if - due to increased priority from users requesting this feature - you want to self-assign this issue, please do!

Okay, sounds good. I'll try to land this soon, hopefully before y'all make a decision so you can see how the integration would work out for you :)

My team is still evaluating a monorepo architecture and hasn't committed to using Pants

Please feel free to let us know if there's anything we can help with or you have any questions! Folks are quite friendly on Slack and sometimes share their experience with monorepos, for example. https://www.pantsbuild.org/docs/getting-help#slack

Was this page helpful?
0 / 5 - 0 ratings