Poetry: Monorepo / Monobuild support?

Created on 6 Mar 2019 · 27Comments · Source: python-poetry/poetry

[x] I have searched the issues of this repo and believe that this is not a duplicate.
[x] I have searched the documentation and believe that my question is not covered.

Feature Request

The goal is to allow a developer to make changes to multiple python packages in their repo without having to submit and update lock files along the way.

Cargo, which poetry seems to be modeled after, supports this with two features

Path dependencies. You can specify where to find a dependency on disk for local development while still supporting version constraints for public consumption
Workspaces let you define a group of packages you can perform operations on, like cargo test --all will run tests on all packages in a workspace.

It looks like some monobuild tools exist for python (buck, pants, bazel) but

Some don't support all major platforms
They assume you are not using pypi and do not publish to pypi and expect all dependencies to be submitted to SCM.

Feature

Source

epage

👍47 👀1 🚀1

Most helpful comment

yarn workspaces makes it possible to have a set of shared libs and that are used across multiple apps in a monorepo, so I think it's a good model to follow or at least borrow ideas from

daveisfera on 19 Dec 2019

👍15

All 27 comments

[I'm just somebody driving by]

I've had to deal with the monorepo (anti)pattern1️⃣ and I agree it'd be very helpful to have help in dealing with it and packaging it; but also have learned _it's really hard_ to do good things for monorepo without making compromises to normal patterns.

To me it sounds like this could be hard to get into Poetry, at least right from the start. It's easier to imagine another library where someone chooses an option like buck/pants/bazel and then creates a helper to make that work well with Poetry and vice/versa

1️⃣: it's not always an antipattern, I know. too often, it is, and many best practices are abandoned. So it can make it hard to develop monorepo-related features without specific good examples that are targetted for support. TLDR could be good to link to an OSS example (or contrive one and link to that)

hangtwenty on 14 Mar 2019

I understand. I hate the religious view taken towards monorepos. I have found what I call mini-monorepos to be useful, small repos that serve a single purpose.

For example, in the Rust world, I have a package for generic testing of conditions, called predicates. I've split it into 3 different packages, predicates-core to define the interfaces, predicates, and predicates-tree for rendering predicate failures in a way similar to pytest. I did these splits so that (1) people aren't forced into dependencies they don't need and (2) Rust is thankfully strict on semver and so the split also represents a splitting of compatibility guarantees. It is more important to provide a stable API for vocab terms (predicates-core) than for implementations that aren't generally passed around.

I specially suggested continuing to follow after Cargo's model, like poetry has done in other ways, for monobuild support rather than getting into the more complex requirements related to tools like buck/pants/bazel (fast as possible, vendor-all-the-deps, etc). If someone needs the requirements of those tools, they should probably just use those tools instead. From at least my brief look, it seemed like they don't do a good job of interoperating with other python build / dependency management systems.

epage on 14 Mar 2019

👍5 ❤3

Also would love support here! Specifically for developing a library that has different components each with their own set of potentially bulky dependencies (e.g. core, server type 1, server type 2, client, etc.). Also helpful when trying to expose the same interface that supports different backends (similarly, you don't want to install every backend, just the one you want).

The only OSS library I could find that emulates this approach is toga -- it would be great if poetry could handle dependency resolution for these sorts of libraries.

The toga quickstart explains how the dependencies are managed with a bunch of setup.py files.

davidroeca on 1 Apr 2019

👍1

What about conda?

drunkwcodes on 2 Apr 2019

👍1

I'm interested in this also. How far does the current implementation of editable installs get you towards your use case?

NGaffney on 3 Apr 2019

I'm interested in this also. How far does the current implementation of editable installs get you towards your use case?

Use cases

I can make changes across packages in the same commit
My CI can publish each package as a wheel
- Requires using local dependencies for the build while publishing the dependency with a version
I can validate a change in the bottom of my stack does not break things higher in my stack

Rust can solve this at two levels

You can mix path and version dependencies. The path dependency wins for local builds but that is stripped when published and only the version dependency is used.
A workspace allows running commands (build, test) across multiple packages. I think this might also implicitly set path dependencies, I'm unsure.

So regarding the first, editable installs might cover this if you can mix path dependencies with version dependencies which is the key feature needed to partially handle my specified use cases.

epage on 3 Apr 2019

👍4

I tried some of this out today and it looks like the first feature you describe is nearly, but not quite supported.

You can declare a dependency both as a dev and non-dev dependency, when you do this the dev version _should_ take precedence (at least that's how I interpret this), allowing a package to be installed as editable during local dev. Then for final build the --no-dev flag would remove the dev version from consideration.

I've made it work with a toy example which contains a path dependency in dev and non-dev mode but not for a local vs pypi dependency.

NGaffney on 4 Apr 2019

I am currently trying to find a way to build a Python monorepo without using heavy-duty tools like Bazel.

I have seen a repository which uses yarn and lerna from the JavaScript world to build Python packages:

https://github.com/pymedphys/pymedphys

Using yarn you can declare workspaces, and using lerna you can execute commands in all workspaces. You can then use the script section of package.json in order to run Python build commands.

I've yet to try this technique with Poetry (and the pymedphys repository does not use it), however I feel it might be worth exploring.

guillaumep on 20 May 2019

👍2

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] on 13 Nov 2019

I still think this could be useful

davidroeca on 13 Nov 2019

👍1

yarn workspaces makes it possible to have a set of shared libs and that are used across multiple apps in a monorepo, so I think it's a good model to follow or at least borrow ideas from

daveisfera on 19 Dec 2019

👍15

Something I am running into right now is that using Poetry with path dependencies is very unfriendly to the Docker cache.

To correctly leverage the cache (read: don't install all dependencies from scratch on every build) I would want to install the dependencies first, and then copy my own code over and install it. However Poetry refuses to do anything (even poetry export -f requirements.txt) if my code has not already been copied in.

I am considering writing my own parser for the poetry.lock file, to pip install my dependencies from there before I copy my code over.

One of two things could help me:

Actually have some support in Poetry for this (if it knew my path = ... dependencies were part of the workspace, some command could install the other dependencies while ignoring those)
Just have poetry export only need pyproject.toml and poetry.lock, and work even if the code is not there (I'm not sure why it needs it right now)

remram44 on 17 Jan 2020

👍1

~It will be interesting to see how https://github.com/python-poetry/poetry/issues/1993 will be solved - it does imply the Poetry repo becoming a kind of monorepo, right?~ Edit: Err, sorry, I just assumed it would be done in the current repo, but it seems to not follow a monorepo structure (python-poetry/core)

kbakk on 7 Mar 2020

Any progress on this feature request? I've been using poetry for about a year now inside a monorepo in a research lab. We have a single global pyproject.toml that dictates the global set of dependencies since there isn't a way to have multiple separate pyproject.toml files (like cargo workspaces).

Needless to say, this creates a lot of pain, because we continually experience problems when one developer wants a dependency that is incompatible with another developer's code. As a result, most of the researches go off the grid and don't use poetry at all, instead having their own venvs that only work on their machines.

TheButlah on 17 Sep 2020

Related: #2270

@TheButlah this is something I wish to pick up sometime this year. Would be great to make sure the use case is listed in the issue.

abn on 17 Sep 2020

❤4

I've quickly made a prototype for handling monorepos, with success. In fact it nearly works out of the box already! Are you interested if I make a PR?

I put here some unordered details and thoughts. I don't guarantee it'd resolve all cases, but at least it's fine for my needs. I'm basing this workflow on the one I use for JS/TS projects using yarn and lerna.

Repo structure:

./
 |- .git/
 |- packages/
 |   |- foo/
 |   |   |- sources (foo.py or foo/__init__.py, with tests...)
 |   |   |- pyproject.toml
 |   |- bar/
 |   |   |- sources (...)
 |   |   |- pyproject.toml
 |- pyproject.toml

The idea is to have a private/virtual pyproject.toml at the root. Each package has its own pyproject.toml specifying the package name, version, dependencies, as usual. I could have created one virtualenv per package, but I find too cumbersome (I'm using VS Code and it would be very annoying to switch virtualenvs every time I change the file I'm working on), so I created a global virtualenv and that's the purpose of the root pyproject.toml.

This file contains all the project packages as sole dependencies (using the format foo = { path = "./packages/foo", develop = true }). Doing so, poetry resolves the dependencies of the packages. If bar depends on foo, packages/bar/pyproject.toml shall declare a regular version constraint (foo = "^1.2.3"). The root pyproject.toml has the dev dependencies (black, pytest...) since they are used for all packages, however the packages may have some as well (read below for CICD).

There is often the need for executing a command in every package directory. For instance, and this is already working, running poetry build or poetry publish from packages/foo successfully builds or publishes the package. Actually for these two use cases, I'm proposing new commands poetry packages build/publish, but a generic poetry packages exec -- args... would allow for doing anything else.

On the CICD server, it makes sense to create one virtualenv per package, at least for one thing: checking that each package only imports other packages that are declared as its dependencies. In that sense, it may make sense for packages to have their own dev dependencies, e.g. if they have some specific tests which are not shared with others.

I'm used to delegate version bumping to the CICD pipeline: every merged PR will automatically bump versions and publish packages, based on what has been changed (using conventional commits spec). Dependent packages need to be bumped as well: if bar depends on foo, bar hasn't changed but foo has some changes of any kind, bar still needs to get a patch bump. Actually this is only true if bar depends on the current version of foo: if an earlier version is specified, the dependency constraint is not updated.

Lerna takes care of git adding the changed files (package.json and CHANGELOG.md for every changed package) and can even git push... but thereafter, one may want to customize the commit message, so there's an option for that... well I think it's too much, the CICD pipeline can do it as well. I'd be happy if poetry could just bump versions, append messages in changelogs, and after that I can git add --all and git commit myself.

For publishing, lerna can retrieve the package versions on the registries, and publishes only the new ones. Here in Python this is tedious, since PyPi (warehouse) and pypiserver do not have a common route to get version info. My hack for now is just to publish everything, and I ignore 409: Conflict errors.

Caveats: packages shall not declare conflicting dependencies if there's only one global virtualenv.

Proposition

Again, what I'm proposing here addresses my needs, but this feature should fit it other use cases as well, so please give feedback about what you would need for your own workflow.

I prefer not to use "monorepo" in the names, which is too coercive.

New commands:

poetry packages bump - with the logic described above.
poetry packages clean - not sure about this one. In my prototype it removes dist and __pycache__ in each package, but one may have other files to clean as well, so I'm hesitating: should the list of files to be cleaned be customizable, and how? Or should the developers rely on poetry packages exec -- rm ...?
poetry packages exec -- ...
poetry packages new <name>
poetry packages publish
poetry packages show
poetry packages show dependencies - not sure about the actual command syntax, but the need is to show the packages with their dependencies, with selectable format. FTR I made this package and the dot format could be nice to have built-in too.
poetry package <name> add / remove ... - manages dependencies for each package.

New pyproject.toml section:

[tool.poetry.packages]
paths = ["packages/*"] # used to specify where to find packages
version = "independent" # optional flag to let each package have their own version. Without this flag, all packages get the same version.

KoltesDigital on 27 Nov 2020

👍12 👎1

I am currently trying to find a way to build a Python monorepo without using heavy-duty tools like Bazel.

I have seen a repository which uses yarn and lerna from the JavaScript world to build Python packages:

https://github.com/pymedphys/pymedphys

Using yarn you can declare workspaces, and using lerna you can execute commands in all workspaces. You can then use the script section of package.json in order to run Python build commands.

I've yet to try this technique with Poetry (and the pymedphys repository does not use it), however I feel it might be worth exploring.

Yes indeed pymedphys was using the yarn script section at some point (till their v0.11.x) but then restructured their repo and migrated to using poetry only. Trying to dig into their justification for this move.

Edit: Some hints in https://github.com/pymedphys/pymedphys/issues/192#issuecomment-485613901. Have requested the author to shed some more light though.

Edit 2: Thanks to SimonBiggs. https://github.com/pymedphys/pymedphys/issues/192#issuecomment-739557570

maneetgoyal on 6 Dec 2020

@KoltesDigital I would really like to try out the modifications you have made to allow poetry to manage the dependencies for a monorepo.

Might you be able to share a branch to see these changes? You might have already shared but I just could not find it. Thanks!

hpgmiskin on 12 Dec 2020

@hpgmiskin thanks for your interest! My prototype is more a workflow PoC, I actually haven't changed poetry yet, instead I just added some scripts on top of it. These scripts are in JS/TS, and of course the final solution should only use Python. Moreover, I haven't implemented my whole proposition, and to do so I'll have to modify poetry.

So before doing this work, I want to first have some feedback from contributors, in order to know whether they're ok with the direction I'm heading to.

KoltesDigital on 12 Dec 2020

What about having Poetry correctly update its lock file when a sub-pyproject.toml file changes, without having to run poetry lock from scratch? Or having all dependencies be collected in a single top-level poetry.lock (and possibly a single virtualenv), à la Cargo? Or installing dependencies without the code being there, for Docker cache and build system friendliness?

Those commands are nice, but I'm unlikely to use them, and I'd rather see the monorepo use-case be properly supported as a first step, and the opinionated utility commands (and CI integration) added as a second step.

remram44 on 12 Dec 2020

👍1

@remram44 that alone would be a big step forward

davidroeca on 12 Dec 2020

@remram44 it's actually what happens with the root-level pyproject.toml, it creates a single virtual environnement and leads to a single root-level poetry.lock. It's indeed the same with Cargo workspaces, and Yarn workspaces. And this is already working without any of my additions, you can try the repo structure I described.

I also believe that poetry should not reinvent the wheel, and leave the things that CICD does best. That's why I mentioned that I prefer invoking git myself (i.e. the CICD takes care of that), because every CICD pipeline is different. But IMHO bumping versions of the subpackages is something everybody will need, that's why I propose to make this part into poetry. Or as an alternative, as a plug-in to poetry. After that, users can version, publish to you, etc. the way they want.

KoltesDigital on 12 Dec 2020

If I run poetry add in a subdirectory, it will generate a poetry.lock there instead of updating the root-level one. Same with running poetry run or poetry shell in a subdirectory. This is different from what workspace-aware package managers do.

remram44 on 12 Dec 2020

@remram44 exactly, that's why I've proposed new root-level commands poetry package <name> add/remove, which mimics Yarn or Cargo CLI features.

My experience with monorepos is that the users should not run commands from subdirectories. If one were to add from a subdirectory with either Yarn or Cargo, this would create undesired lock files too. And this would be reasonable. I don't expect a project manager tool to find out if the current project is actually part of a larger monorepo. Monorepo settings define which subdirectories are to be considered as subpackages (workspaces in package.json, Cargo.toml, and in my proposition), not the other way around.

KoltesDigital on 13 Dec 2020

👍3

Cargo gets this right, I don't know why you say this is unreasonable. The shortcomings of some tools are no argument to ignore the behavior of working tools.

remram44 on 13 Dec 2020

@remram44 interesting, I wasn't aware of this feature of Cargo. But well, we're having different views about what should the monorepo support look like. Our two views are different, but not exclusive, so let's have both.

KoltesDigital on 16 Dec 2020

Jumping in here regarding yarn behavior—running a yarn add in a yarn workspace does not generate a subdirectory lock file, though it does modify the subdirectory package.json. This may not be the case if you haven’t specified the subdirectory as a workspace, but I think what we’re hoping for with this issue is something that enables us to have a multi-package lockfile akin to the yarn workspaces feature (since the multi-lockfile monorepo is close to being/already supported?)

davidroeca on 17 Dec 2020

👍1

Was this page helpful?

0 / 5 - 0 ratings