The goal is to allow a developer to make changes to multiple python packages in their repo without having to submit and update lock files along the way.
Cargo, which poetry seems to be modeled after, supports this with two features
cargo test --all
will run tests on all packages in a workspace.It looks like some monobuild tools exist for python (buck, pants, bazel) but
[I'm just somebody driving by]
I've had to deal with the monorepo (anti)pattern1锔忊儯 and I agree it'd be very helpful to have help in dealing with it and packaging it; but also have learned _it's really hard_ to do good things for monorepo without making compromises to normal patterns.
To me it sounds like this could be hard to get into Poetry, at least right from the start. It's easier to imagine another library where someone chooses an option like buck/pants/bazel and then creates a helper to make that work well with Poetry and vice/versa
1锔忊儯: it's not always an antipattern, I know. too often, it is, and many best practices are abandoned. So it can make it hard to develop monorepo-related features without specific good examples that are targetted for support. TLDR could be good to link to an OSS example (or contrive one and link to that)
I understand. I hate the religious view taken towards monorepos. I have found what I call mini-monorepos to be useful, small repos that serve a single purpose.
For example, in the Rust world, I have a package for generic testing of conditions, called predicates
. I've split it into 3 different packages, predicates-core
to define the interfaces, predicates
, and predicates-tree
for rendering predicate failures in a way similar to pytest. I did these splits so that (1) people aren't forced into dependencies they don't need and (2) Rust is thankfully strict on semver and so the split also represents a splitting of compatibility guarantees. It is more important to provide a stable API for vocab terms (predicates-core
) than for implementations that aren't generally passed around.
I specially suggested continuing to follow after Cargo's model, like poetry has done in other ways, for monobuild support rather than getting into the more complex requirements related to tools like buck/pants/bazel (fast as possible, vendor-all-the-deps, etc). If someone needs the requirements of those tools, they should probably just use those tools instead. From at least my brief look, it seemed like they don't do a good job of interoperating with other python build / dependency management systems.
Also would love support here! Specifically for developing a library that has different components each with their own set of potentially bulky dependencies (e.g. core, server type 1, server type 2, client, etc.). Also helpful when trying to expose the same interface that supports different backends (similarly, you don't want to install every backend, just the one you want).
The only OSS library I could find that emulates this approach is toga -- it would be great if poetry could handle dependency resolution for these sorts of libraries.
The toga quickstart explains how the dependencies are managed with a bunch of setup.py
files.
What about conda?
I'm interested in this also. How far does the current implementation of editable installs get you towards your use case?
I'm interested in this also. How far does the current implementation of editable installs get you towards your use case?
Use cases
Rust can solve this at two levels
So regarding the first, editable installs might cover this if you can mix path dependencies with version dependencies which is the key feature needed to partially handle my specified use cases.
I tried some of this out today and it looks like the first feature you describe is nearly, but not quite supported.
You can declare a dependency both as a dev and non-dev dependency, when you do this the dev version _should_ take precedence (at least that's how I interpret this), allowing a package to be installed as editable during local dev. Then for final build the --no-dev
flag would remove the dev version from consideration.
I've made it work with a toy example which contains a path dependency in dev and non-dev mode but not for a local vs pypi dependency.
I am currently trying to find a way to build a Python monorepo without using heavy-duty tools like Bazel.
I have seen a repository which uses yarn
and lerna
from the JavaScript world to build Python packages:
https://github.com/pymedphys/pymedphys
Using yarn you can declare workspaces, and using lerna you can execute commands in all workspaces. You can then use the script
section of package.json
in order to run Python build commands.
I've yet to try this technique with Poetry (and the pymedphys repository does not use it), however I feel it might be worth exploring.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I still think this could be useful
yarn workspaces
makes it possible to have a set of shared libs and that are used across multiple apps in a monorepo, so I think it's a good model to follow or at least borrow ideas from
Something I am running into right now is that using Poetry with path dependencies is very unfriendly to the Docker cache.
To correctly leverage the cache (read: don't install all dependencies from scratch on every build) I would want to install the dependencies first, and then copy my own code over and install it. However Poetry refuses to do anything (even poetry export -f requirements.txt
) if my code has not already been copied in.
I am considering writing my own parser for the poetry.lock
file, to pip install
my dependencies from there before I copy my code over.
One of two things could help me:
path = ...
dependencies were part of the workspace, some command could install the other dependencies while ignoring those)poetry export
only need pyproject.toml
and poetry.lock
, and work even if the code is not there (I'm not sure why it needs it right now)~It will be interesting to see how https://github.com/python-poetry/poetry/issues/1993 will be solved - it does imply the Poetry repo becoming a kind of monorepo, right?~ Edit: Err, sorry, I just assumed it would be done in the current repo, but it seems to not follow a monorepo structure (python-poetry/core)
Any progress on this feature request? I've been using poetry for about a year now inside a monorepo in a research lab. We have a single global pyproject.toml that dictates the global set of dependencies since there isn't a way to have multiple separate pyproject.toml files (like cargo workspaces).
Needless to say, this creates a lot of pain, because we continually experience problems when one developer wants a dependency that is incompatible with another developer's code. As a result, most of the researches go off the grid and don't use poetry at all, instead having their own venvs that only work on their machines.
Related: #2270
@TheButlah this is something I wish to pick up sometime this year. Would be great to make sure the use case is listed in the issue.
I've quickly made a prototype for handling monorepos, with success. In fact it nearly works out of the box already! Are you interested if I make a PR?
I put here some unordered details and thoughts. I don't guarantee it'd resolve all cases, but at least it's fine for my needs. I'm basing this workflow on the one I use for JS/TS projects using yarn and lerna.
Repo structure:
./
|- .git/
|- packages/
| |- foo/
| | |- sources (foo.py or foo/__init__.py, with tests...)
| | |- pyproject.toml
| |- bar/
| | |- sources (...)
| | |- pyproject.toml
|- pyproject.toml
The idea is to have a private/virtual pyproject.toml at the root. Each package has its own pyproject.toml specifying the package name, version, dependencies, as usual. I could have created one virtualenv per package, but I find too cumbersome (I'm using VS Code and it would be very annoying to switch virtualenvs every time I change the file I'm working on), so I created a global virtualenv and that's the purpose of the root pyproject.toml.
This file contains all the project packages as sole dependencies (using the format foo = { path = "./packages/foo", develop = true }
). Doing so, poetry resolves the dependencies of the packages. If bar depends on foo, packages/bar/pyproject.toml shall declare a regular version constraint (foo = "^1.2.3"
). The root pyproject.toml has the dev dependencies (black, pytest...) since they are used for all packages, however the packages may have some as well (read below for CICD).
There is often the need for executing a command in every package directory. For instance, and this is already working, running poetry build
or poetry publish
from packages/foo successfully builds or publishes the package. Actually for these two use cases, I'm proposing new commands poetry packages build/publish
, but a generic poetry packages exec -- args...
would allow for doing anything else.
On the CICD server, it makes sense to create one virtualenv per package, at least for one thing: checking that each package only imports other packages that are declared as its dependencies. In that sense, it may make sense for packages to have their own dev dependencies, e.g. if they have some specific tests which are not shared with others.
I'm used to delegate version bumping to the CICD pipeline: every merged PR will automatically bump versions and publish packages, based on what has been changed (using conventional commits spec). Dependent packages need to be bumped as well: if bar depends on foo, bar hasn't changed but foo has some changes of any kind, bar still needs to get a patch bump. Actually this is only true if bar depends on the current version of foo: if an earlier version is specified, the dependency constraint is not updated.
Lerna takes care of git add
ing the changed files (package.json
and CHANGELOG.md
for every changed package) and can even git push
... but thereafter, one may want to customize the commit message, so there's an option for that... well I think it's too much, the CICD pipeline can do it as well. I'd be happy if poetry could just bump versions, append messages in changelogs, and after that I can git add --all
and git commit
myself.
For publishing, lerna can retrieve the package versions on the registries, and publishes only the new ones. Here in Python this is tedious, since PyPi (warehouse) and pypiserver do not have a common route to get version info. My hack for now is just to publish everything, and I ignore 409: Conflict
errors.
Caveats: packages shall not declare conflicting dependencies if there's only one global virtualenv.
Again, what I'm proposing here addresses my needs, but this feature should fit it other use cases as well, so please give feedback about what you would need for your own workflow.
I prefer not to use "monorepo" in the names, which is too coercive.
New commands:
poetry packages bump
- with the logic described above.poetry packages clean
- not sure about this one. In my prototype it removes dist
and __pycache__
in each package, but one may have other files to clean as well, so I'm hesitating: should the list of files to be cleaned be customizable, and how? Or should the developers rely on poetry packages exec -- rm ...
?poetry packages exec -- ...
poetry packages new <name>
poetry packages publish
poetry packages show
poetry packages show dependencies
- not sure about the actual command syntax, but the need is to show the packages with their dependencies, with selectable format. FTR I made this package and the dot
format could be nice to have built-in too.poetry package <name> add / remove ...
- manages dependencies for each package.New pyproject.toml
section:
[tool.poetry.packages]
paths = ["packages/*"] # used to specify where to find packages
version = "independent" # optional flag to let each package have their own version. Without this flag, all packages get the same version.
I am currently trying to find a way to build a Python monorepo without using heavy-duty tools like Bazel.
I have seen a repository which uses
yarn
andlerna
from the JavaScript world to build Python packages:https://github.com/pymedphys/pymedphys
Using yarn you can declare workspaces, and using lerna you can execute commands in all workspaces. You can then use the
script
section ofpackage.json
in order to run Python build commands.I've yet to try this technique with Poetry (and the pymedphys repository does not use it), however I feel it might be worth exploring.
Yes indeed pymedphys
was using the yarn script
section at some point (till their v0.11.x
) but then restructured their repo and migrated to using poetry
only. Trying to dig into their justification for this move.
Edit: Some hints in https://github.com/pymedphys/pymedphys/issues/192#issuecomment-485613901. Have requested the author to shed some more light though.
Edit 2: Thanks to SimonBiggs. https://github.com/pymedphys/pymedphys/issues/192#issuecomment-739557570
@KoltesDigital I would really like to try out the modifications you have made to allow poetry to manage the dependencies for a monorepo.
Might you be able to share a branch to see these changes? You might have already shared but I just could not find it. Thanks!
@hpgmiskin thanks for your interest! My prototype is more a workflow PoC, I actually haven't changed poetry yet, instead I just added some scripts on top of it. These scripts are in JS/TS, and of course the final solution should only use Python. Moreover, I haven't implemented my whole proposition, and to do so I'll have to modify poetry.
So before doing this work, I want to first have some feedback from contributors, in order to know whether they're ok with the direction I'm heading to.
What about having Poetry correctly update its lock file when a sub-pyproject.toml file changes, without having to run poetry lock
from scratch? Or having all dependencies be collected in a single top-level poetry.lock (and possibly a single virtualenv), 脿 la Cargo? Or installing dependencies without the code being there, for Docker cache and build system friendliness?
Those commands are nice, but I'm unlikely to use them, and I'd rather see the monorepo use-case be properly supported as a first step, and the opinionated utility commands (and CI integration) added as a second step.
@remram44 that alone would be a big step forward
@remram44 it's actually what happens with the root-level pyproject.toml, it creates a single virtual environnement and leads to a single root-level poetry.lock. It's indeed the same with Cargo workspaces, and Yarn workspaces. And this is already working without any of my additions, you can try the repo structure I described.
I also believe that poetry should not reinvent the wheel, and leave the things that CICD does best. That's why I mentioned that I prefer invoking git myself (i.e. the CICD takes care of that), because every CICD pipeline is different. But IMHO bumping versions of the subpackages is something everybody will need, that's why I propose to make this part into poetry. Or as an alternative, as a plug-in to poetry. After that, users can version, publish to you, etc. the way they want.
If I run poetry add
in a subdirectory, it will generate a poetry.lock there instead of updating the root-level one. Same with running poetry run
or poetry shell
in a subdirectory. This is different from what workspace-aware package managers do.
@remram44 exactly, that's why I've proposed new root-level commands poetry package <name> add/remove
, which mimics Yarn or Cargo CLI features.
My experience with monorepos is that the users should not run commands from subdirectories. If one were to add
from a subdirectory with either Yarn or Cargo, this would create undesired lock files too. And this would be reasonable. I don't expect a project manager tool to find out if the current project is actually part of a larger monorepo. Monorepo settings define which subdirectories are to be considered as subpackages (workspaces
in package.json
, Cargo.toml
, and in my proposition), not the other way around.
Cargo gets this right, I don't know why you say this is unreasonable. The shortcomings of some tools are no argument to ignore the behavior of working tools.
@remram44 interesting, I wasn't aware of this feature of Cargo. But well, we're having different views about what should the monorepo support look like. Our two views are different, but not exclusive, so let's have both.
Jumping in here regarding yarn behavior鈥攔unning a yarn add
in a yarn workspace does not generate a subdirectory lock file, though it does modify the subdirectory package.json. This may not be the case if you haven鈥檛 specified the subdirectory as a workspace, but I think what we鈥檙e hoping for with this issue is something that enables us to have a multi-package lockfile akin to the yarn workspaces feature (since the multi-lockfile monorepo is close to being/already supported?)
Most helpful comment
yarn workspaces
makes it possible to have a set of shared libs and that are used across multiple apps in a monorepo, so I think it's a good model to follow or at least borrow ideas from