Pipenv: Proposal: `pipenv` patterns and antipatterns for python library project

Created on 5 Apr 2018  Â·  74Comments  Â·  Source: pypa/pipenv

Hacking maya I learned few lessons which resulted in my following proposal of recommended usage of pipenv in python libraries. I expect others to review the proposal and if we reach agreement, the (updated) text could end up in pipenv docs.

pipenv patterns and antipatterns for python library project

EDIT
Following is best applicable for general (mostly Open Source) python libraries, which are supposed to run on different python versions and OSes. Libraries developed in strict Enterprise environment may be different case (be sure to review all the Problems sections anyway).

END OF EDIT

TL;DR: Adding pipenv files into python library project is likely to introduce extra complexity and can hide some errors while not adding anything to library security. For this reason, keep Pipfile, Pipfile.lock and .env out of library source control.

You will be able to use full power of pipenv regardless of it's files living in .gitignore.

Python library versus python application

By python library I mean a project, typically having setup.py, being targeted for distribution and usage on various platform differing in python version and/or OS.

Examples being maya, requests, flask etc.

On the other side (not python library) there are applications targeted for specific python interpreter, OS and often being deployed in strictly consistent environment.

pipfile describes these differences very well in it's Pipfile vs setup.py.

What is pipenv (deployment tool)

I completely agree on the statement, that pipenv is deployment tool as it allows to:

  • define strict requirements (Pipfile.lock) for deployment of virtual environment
  • apply those strict requirements in reproducible manner on different machines

It helps when one has to deploy an application or develop in python environment very consistent across multiple developers.

To call pipenv packaging tool is misleading if one expects it to create python libraries or to be deeply involved in creation of them. Yes, pipenv can help a lot (in local development of libraries) but can possibly harm (often in CI tests when used without deeper thought).

Applying "security reasons" in wrong context

TL;DR: pipenv provides secure environment via applying approved concrete dependencies described in Pipfile.lock file and python library is only allowed to define abstract dependencies (thus cannot provide Pipfile.lock).

pipenv shines in deployment scenarios following these steps:

  • define abstract dependencies (via Pipfile)
  • generate from it concrete dependencies resulting in Pipfile.lock
  • create (virtual) python environment reflecting those concrete dependencies
  • run tests to make sure, given environment works as expected and is secure
  • release the tested "golden" Pipfile.lock as definition of approved python environment
  • others can use pipenv sync to apply "the golden" Pipfile.lock elsewhere getting identical python environment.

With development of python library one cannot achieve such security, because libraries must not define concrete dependencies. Breaking this rule (thus trying to declare concrete dependencies by python library) results in problems such as:

  • problems to find satisfying version of shared libraries (each strict package defines exact version of shared library and it is very likely the versions will differ and prevent finding commonly acceptable version)
  • concrete dependencies may depend on python version, OS or other environment markers and trying to install the package in diferent context can easily fail to satisfy some of rules defined in original abstract dependencies.

Problem: Hiding broken setup.py defined dependencies

setup.py shall define all abstract dependencies via install_requires.

If Pipfile defines those dependencies too, it may easily hide problems such as:

  • missing dependency in install_requires
  • Pipfile defines specific rules (version ranges etc.) for a dependency and install_requires does not.

To prevent it, follow these rules:

  • library defined dependencies must not appear in Pipfile
  • the [packages] section in Pipfile shall be either empty or define only single dependency on the library itself.

Problem: Pipfile.lock in repository

Keeping Pipfile.lock (typically for "security reasons") in library repository is wrong, because:

  • described dependencies are likely to be invalid for different python versions or in another OS
  • developers are forced to update the file not only when they add/remove some dependency, but also when other libraries are updated and may be usable within the library.

To prevent it, one should:

  • remove Pipfile.lock from repository and add it into .gitignore

Problem: Competing with tox (hiding usedevelop)

If tox.ini contains in it's commands section entries such as:

  • pipenv install
  • pipenv install --dev
  • pipenv lock

it is often a problem, because:

  • pipenv install shall install only the library itself, and tox is (by default) doing it too. Apart from duplicity it also prevents of usedevelop=True and usedevelop=False in tox.ini because Pipenv is able to express it only in one variant (and tox.ini allows differencies in different environments).

To prevent it, one should:

Problem: Breaking builds, if pipenv fails

pipenv is under heavy development and things break sometime. If such issue breaks your CI build, there is a failure which could be prevented by not using pipenv and using traditional tools (which are often a bit more mature).

To prevent it, one should:

  • think twice before adding pipenv into a CI build script, tox.ini or similar place. Do you know what value you get from adding it? Could be the job done with existing tooling?
  • do not add it "just for security reasons" or because "everybody does".

Summary

Key questions regarding pipenv role in development of python library are:

  • What value pipenv really brings? A: Virtualenv management tool.
  • What is relevent use case for pipenv? A: Manage virtualenv.
  • Shall it appear in the library repository? A: NO.

Few more details and tricks follow.

pipenv will not add any security to your package

Do not push it into project just because everybody does it or because you expect extra security. It will disappoint you.

Securing by using concrete (and approved) dependencies shall take place in later phase in the application going to use your library.

Keep Pipfile, Pipfile.lock and .env files out of repository

Put the files into .gitignore.

Pipfile is easy to recreate as demonstrated below as most or all requirements are already defined in your setup.py. And the .env file probably contains private information, which shall not be shared.

Keeping these files out of repository will prevent all the problems, which may happen with CI builds when using pipenv in situations, which are not appropriate.

pipenv as developer's private toolbox

pipenv may simplify developer's work as virtualenv management tool.

The trick is to learn, how to quickly recreate your (private) pipenv related files, e.g.:

$ cd <project_repository>
$ # your library will bring the dependencies (via install_requires in setup.py)
$ pipenv install -e .   
$ # add more dev tools you preffer 
$ pipenv install --dev ipython pdbpp
$ # start hacking
$ pipenv shell
...

Use .env file if you need convenient method for setting up environment variables.

Remember: Keep pipenv usage out of your CI builds and your life will be simpler.

Trick: Use setup.py ability to declare extras dependencies

In your setup.py use the extras_requires section:

from setuptools import setup

setup(
    name='mypackage',
    ....,
    install_requires=["jinja2", "simplejson"],
    extras_require={
        'tests': ['pytest', 'pyyaml'],
        'pg': ['psycopg2'],
    },
    ....
)

To install all dependencies declared for tests extra:

$ pipenv install -e .[tests]

Note, that it will always include the install_requires dependencies.

This method does not allow spliting dependencies into default and dev sections, but this shall not be real problem in expected scenarios.

Discussion Type

Most helpful comment

@Moritz90 Several of Python’s mailing lists would be good venues to hold this discussion.

pypa-dev is the most definite for discussions centring Python packaging, and the ecosystem around it. I’d probably start here if I were to post a similar discussion.

python-ideas is a place to get ideas discussed, and has quite high visibility to the whole Python community. It would also be a good starting point if you want to push this to the PEP level (eventually you would, I think).

All 74 comments

This is very impressive, thanks a ton for compiling. Will definitely review in more detail in a bit

/cc @uranusjr @jtratner @ncoghlan

Some references to maya issues:

  • kennethreitz/maya#138 (RemovePipfile.lock from repository)
  • kennethreitz/maya#139 (Skip pipenv run in tox.ini ...)
  • kennethreitz/maya#145 (fix pendulum>=1.0 in setup.py: version was in Pipfile but was missing in setup.py)
  • kennethreitz/maya#143 (PR showing how pipenv issue broke whole Travis run)
  • kennethreitz/maya#144 (PR Refactor pipenv usage according to semi-official best practices)

I love this too. Maybe we should add this to Pipenv’s documentation, or even the Python Packaging User Guide.

The corollary of the above advice appears to be "forego deterministic/reproducible CI builds", which strikes me as a very large anti-pattern.

What are you proposing as an alternative which would still allow for determinism?

@tsiq-oliverc Deterministic builds have their place at the moment, an application is to be built.

Imagine following attempt to perform really deterministic builds of python library:

  • builds have to be based on Pipfile.lock
  • each execution context (combination of each python and OS variant) may have different Pipfile.lock resulting from library abstract dependencies defined in Pipfile
  • Repository would have to provide separate Pipfile.lock instances defined in the repository. Note, that building Pipfile.lock automatically during CI build does not add any determinism

This is a lot of extra effort. And what you get is a library, which will be installed in different context (e.g. a week later standard installation will pick up upgraded dependency or two) and which will not get anything from the fact, you used Pipfile.lock, which is at the moment obsolete.

The conflict is in the fact the library must never define strict dependencies inside.

If you think, there is another alternative to gain deterministic builds for python library - describe it.

@vlcinsky - If a consumer of your library uses different versions of dependencies, etc. then that's out of your control. So I agree there's no feasible way for a library maintainer to manage that.

But the goal here is presumably much smaller scope. In particular, I'd see the goals for a library maintainer as the following (which are roughly equivalences):

  1. If you run your CI twice, you're guaranteed to get the same result (network issues notwithstanding!).
  2. You can locally recreate (and thus debug) behaviour you observe on CI, even if that means running Docker/etc. locally.
  3. You can confidently say "My library behaves as expected with dependency versions X, Y, Z" to your consumers.

If any of those three things don't hold, it strikes me as antithetical to quality control.

So yes, I'd say that if you guarantee to support Python variants A, B and C to your consumer, and they behave differently enough that one lockfile (etc.) doesn't cut it, then you should have three lockfiles (or whatever).

I haven't used Pipenv enough to know how easy that would be in practice, though.

I'm currently considering adding Pipfiles to a few library projects for the CI system as well.

I absolutely need the dependency locking (+hashing) for complying with company-wide security guidelines and I currently don't need to test with different Python versions, since there's only one that's officially supported. And the fact that pipenv simplifies setting up a local development environment, including the virtualenv, is a nice side-effect.

And what you get is a library, which will be installed in different context (e.g. a week later standard installation will pick up upgraded dependency or two) and which will not get anything from the fact, you used Pipfile.lock, which is at the moment obsolete.

This is not universally true. In the world of enterprise software, you still have very specific environments that are officially supported and a security issue in a dependency results in your product being updated rather than the customer updating the dependency themselves.

(Yes, I'm talking about a library, not an application here...)

@Moritz90 your scenario is for python library in enterprise environment and there pipenv may help because it is much more deterministic environment.

My description is aiming at general python libraries such as flask, request, maya etc. where the context is much more variable. Trying to fix couple of things in maya I got frustrated learning, that in many cases usage of pipenv introduced real problems (typically hiding problems which would be normally detected) while not providing much or any added value.

Getting deterministic builds is good thing, but it incurs costs. And if done wrong, you may pay extra for lower quality result - and this is what I wanted to prevent.

I’d argue this is one of the instances we don’t want the builds to be absolutely deterministic. If you don’t pin your dependencies with ==, you’re committing to maintain support to multiple versions by default, and should design the library that way. A dependency upgrade breaking the build on CI is actually a good thing because it exposes a bug in the library. Completely deterministic dependencies (as managed by Pipenv) would mask that. It would still be beneficial to be able to be determinitic when you want it, that is generally not the best.

@uranusjr - Sure. I agree that if the desire is "non-deterministic builds", then the advice up top may well make sense. In fact, it's almost a logical equivalence, and could be stated much more succinctly: "If you don't want deterministic builds, then don't use a tool (pipenv) whose purpose is to ensure deterministic builds" 😄.

But that's certainly not a desirable goal in general.

@tsiq-oliverc nice scope definition - it supports focused discussion. I would add one more requirement: The CI determinism shall not hide possible issues within tested library.

If we use Pipenv.lock, create virtualenv based on that and run CI test of the library, we did part of the functionality the library is supposed to do - installing proper dependencies. If the library is somehow broken in this regard - preinstalled environment would hide this problem.

To me it seems more important to detect issues within a library than to run CI in deterministic way. If there is a way to do both (e.g. running the test behind private pypi index, which could also support determinism) I have no problem, but if there is a conflict, I have my priorities.

Do not take me wrong: there is no desire to run non-deterministic builds, my desire is to run CI builds, which will detect as much issues as possible.

@vlcinsky Sure, I just wanted to share my experience to make sure that the updated documentation reflects it as well. The current documentation does a great job at explaining the tradeoffs:

For libraries, define abstract dependencies via install_requires in setup.py. [...]
For applications, define dependencies and where to get them in the Pipfile and use this file to update the set of concrete dependencies in Pipfile.lock. [...]
Of course, Pipfile and pipenv are still useful for library developers, as they can be used to define a development or test environment.
And, of course, there are projects for which the distinction between library and application isn’t that clear. In that case, use install_requires alongside pipenv and Pipfile.

(Highlighted the part that applies in my case.)

I just want to make sure it stays that way. I think your original post contains too many blanket statements without a disclaimer that you're talking about an open-source project that's going to be published on PyPI.

@Moritz90 I completely agree. I was trying to highlight that focus but I can make it even more visible.

@Moritz90 I added introductory note reflecting your comment.

@vlcinsky - That makes sense. I understand that you don't explicitly want non-deterministic builds, but I think that it's unavoidably equivalent to what you do want (i.e. to catch issues when your upstream dependencies update).

Thinking out loud, what's the best way to resolve these two conflicting goals? One possibility is to have a two-phase CI process:

  1. The deterministic phase. Leverages a Pipfile.lock in your repo, so it's entirely reproducible.
  2. The non-deterministic phase. Runs pipenv update and then runs the tests, so that it pulls in the latest of all your dependencies (which is basically the same as the behaviour with no lockfile, I think?).

@tsiq-oliverc To get deterministic builds, I would think of following setup:

  • build pypi cache job: run once upon a time, producing some form of pypi index cache (as directory of files or anything similar)
  • library testing job: using the pypi cache, but avoiding pipenv

Using pipenv to do the installation is similar to what installing the library itself shall do but it is definitely different because it is different code doing the work.

build pypi cache job

$ git clone <repo_url> <project_dir>
$ cd <project_dir>
$ pip install pipenv
$ $ # clean pypi cache and make it ready to cache somehow - not described here
$ pipenv install -e .[test]
$ # if we need extra testing packages in pipenv
$ pipenv install <extra_test_packages>
$ # record current requirements expressed in `Pipfile.lock`
$ pipenv lock
$ # if needed, record the `Pipfile.lock` somewhere

Outputs of such job are:

  • Pipfile.lock as recorded dependencies (may help developers to reproduce the environment easily)
  • pre-populated local pypi cache

library testing job

there are phases:

  • configure environment to force tox, pip etc. using only our local pypi cache
  • run the CI tests (avoid using pipenv)

what we get

  • library is tested in deterministic environment
  • library is tested incl. it's ability to install itself on it's own
  • Pipfile.lock records pypi packages which were used to install the library. It can be used to reproduce environment at developer site.
  • adaptation to upgraded packages on (possibly external) pypi is simple (rerun the "build pypi cache job) and is done in controlled manner (the content of pypi is recorded incl. hashes)

Another advantage is, this setup does not require developers maintaining Pipfile or Pipfile.lock. Also running the tests in different contexts is always the same (Pipfile.lock is always rebuilt in given context).

What is still missing (and can be done)

The pypi cache is the part which needs some research. I guess, simple directory would be sufficient and maybe pipenv is already ready to help with that. Maybe issue #1731 is the missing part.

As a package that does dependency resolution many of our own tests rely on deterministic builds — that is, taking known stuff and expecting a resolved graph. We use pytest-pypi for this.

Love the lively discussion on this topic. I think the nuance is important and you should always test against known dependencies as well as unpinned ones

you should always test against known dependencies as well as unpinned ones

I second this suggestion. It's a good idea to always have an explicit "known good state" for reproducible builds and to simplify debugging in case an update breaks something in addition to making sure that newer minor/bugfix versions work as well.

(In my very personal opinion, the ideal situation would be that the package manager installs the latest minor versions by default so that libraries can always specify the concrete dependency versions that they were tested with, but I realize that's a highly controversial opinion and requires everyone to follow semver.)

@Moritz90 @techalchemy @uranusjr @tsiq-oliverc

Here is my summary from previous discussion.

Particular problems and proposed solutions

Many execution contexts - who shall maintain Pipfile.lock file(s)?

Each supported OS and python interpreter contribute to matrix of possible execution contexts.

E.g. Flask supports (at least CI stuff visible in the repository):

  • OS Windows (python 2.7 and Python 3.6)
  • Linux (python 2.7, 3.4, 3.5, 3.6, nightly, pypi)
  • OSX (py - not sure if there are more versions)

It makes 9 different execution contexts which may differ.

Each execution context may have different Pipfile.lock.

Who shall maintain them?

Options are:

  • Let developers do that manually (NO WAY)
  • Maintain only one Pipfile.lock for main development platform (which platform enjoys to be ignored?)
  • Automate creation via CI (YES)

Proposal: Let CI generate the file by pipenv install -e .. Do not include it in repo, help developers to pick proper Pipfile.lock as result of automated builds.

Developers need predictable environment

When fixing an issue, which may be caused by changes of dependencies on pypi, developer may need simple mean to reproduce the environment from the failing test.

Proposal:

  • for many packages, the pypi dependencies changes are so rare, that they are not real problem
  • to fix the environment on it's own, developer may generate the Pipfile.lock by pipenv install -e . followed by pipenv lock.
  • to replicate environment from failing test, developer pick the Pipfile.lock from the failing test.
  • todo: show examples, how to apply Pipfile.lock in tox.ini.

CI must reveal broken setup.py

Library setup.py may be broken (missing dependency in install_requires, missing version specifier etc.) and CI test must not hide such problem (by preinstalling omitted dependencies on it's own).

Proposal:

  • trust pipenv install -e . to provide the same result as plain installation (there are currently some issues with that).
  • run plain installation test (without pipenv) and possibly compare that resulting pip freeze output is subset of what is installed by pipenv.

Updated pypi dependencies may break things, CI shall detect such chases

Some dependency update may break library using it. CI shall detect failure on such problem.

Proposal:

  • at least one test must run against unpinned version
  • if CI always generates new Pipfile.lock, this is non-issue (as we run in unpinned mode anyway)

CI modes for different library types

In all proposed modes I tried to avoid keeping pipenv files in the repository saving developers from maintaining this really complex stuff (automation!!!).

In contrast to my original text the 2nd and 3rd mode do use pipenv in CI scripts.

Mode: Run, Forrest, Run!

Simple package with smaller number of dependencies which are not changing often.

Simply run as before the pipenv era and keep things simple to us.

Rare cases when dependencies will make troubles are easy to fix and do not justify making CI more complex.

Mode: Generate and seal

Each time the CI test is run, generate new Pipfile.lock which completely describes the environment used at the moment.

The Pipfile.lock shall become CI artefact.

If things go wrong, developer can pick Pipfile.lock from broken build, apply it locally and do the testing and fixing.

If someone wants to deploy, Pipfile.lock from last successful build can be used.

Mode: Ice Age

When changing dependencies are real problem, CI shall create Pipfile.lock once upon a time and keep it using for certain period (a month?).

This makes CI setup more difficult, as there must be at least two different jobs (one generating Pipfile.lock, the other one applying it and using in tests).

Warning: Pipfile.lock must be update also at the moment, setup.py changes dependencies.

Note, that the Ice Age requires Scrat the squirrel type of test which ignores the frozen status and checks against unpinned versions.

Closing remarks

As seen, the determinism and complexity grows mode by mode.

My proposal would be:

  • start simple ("Run, Forrest, Run"). You gain efficiency and speed.
  • if things get too complex because of changing dependencies, move on to "Generate and seal". You gain repeatibility in local environment.
  • if things are really bad, go "Ice Age" mode. You gain (temporary) determinism.

All gains cost something.

If the goal here is to update the advice in the docs, then honestly it feels irresponsible to say something dramatically different to "Follow best practice (reproducible builds) by default, until you have no choice."

@vlcinsky Under the headline "Mode: Generate and seal", it might make sense to mention that the last successful Pipfile.lock should always be kept around, e.g. by declaring it as a Jenkins artifact. With that change, it would be fine to recommend that setup for most projects. Like @tsiq-oliverc, I wouldn't recommend the first mode, ever.

The more I think about it, the more I feel like this documentation will become a section on why using pipenv for CI builds is a great idea, even if you're developing a library.

@tsiq-oliverc vast majority of general python packages are in mode "Run, Forrest, Run". I have helped few of these packages with introducing tox and pytest, because I felt it would contribute to given package quality and because I had quite clear idea, how it could be done well.

Now there is another great tool and I wonder how to use pipenv properly in general python projects to contribute to it's quality. I want to find one or two well working recipes which are justified and easy to follow.

What would I say to Flask project?

  1. Follow best practice (reproducible builds) by default, until you have no choice?
  2. Add 9 Pipfile.lock files and set up policy for updating them?
  3. Refactor CI scripts for Travis and Appveyor to work in two phase manner following the Ice Age mode?
  4. Modify CI scripts for Travis and Appveyor to generate Pipfile.lock artefact for cases, when someone need to reproduce failing test on it's own computer?
  5. no comments, apart from "Thanks a lot for Flask."

The goal is to find functional working style. If it ends in doc, nice, if not, no problem.

@vlcinsky I'd say (1) and (4) should be the recommendation for such projects. While without a pre-existing Pipfile.lock you won't know the versions used in the build in advance (which is fine outside corporate environments), you'll still get a reproducible result if you generate and archive the lock file during the build.

Edit: The tl;dr version of my recommendation would be:

  • Always make sure your builds are reproducible, regardless of whether you're developing a library or an application. pipenv can help you achieve this goal.
  • If you're developing an application, commit Pipfile.lock to your repository and use it for deployment. (This is already covered by the existing documentation.)
  • If you're developing an open-source library, generate Pipfile.lock on-the-fly in your CI build and archive it for later.
  • If you're developing a library and working in a restrictive corporate environment, maintain the appropriate number of lock files and use those in your CI builds.

(Of course, the actual documentation should have a bit more detail and examples.)

@Moritz90 I modified the "Generate and Seal" as you proposed.

Re (1): easy to say, impossible to execute without being more specific.

Re (4): yes, I also think, that "Generate and Seal" is most feasible mode. But in case of Flask I will not dare (at least not at the moment).

Re pre-existing Pipfile.lock in enterprise environment: It has to be created somehow, either (semi)manually or automatically. I guess, in corporate environment you do not install directly from public pypi but use some private one and that provides only approved packages (devpi-server provides great service in that - multiple indexes, controlled volatility of published packages, approvals for external packages etc.) If the process of building Pipfile.lock runs in such environment, it can only use what is approved so if new version is to appear there, someone has to stand up and make it approved. Following CI build will test that it does not break things. And with pipenv check test of security issues may be also automated.

I guess such a workflow would be more secure compared to someone creating it (semi)manually. But my knowledge of enterprise environment is very limited.

Hello pipenv team. I do share a lot of what is said in this text, it helps a lot any developer better understand the limitations of Pipfile/pipenv when developing a library. I do want to see this text or part of this text integrated inside the official pipenv documentation.

I do have a the following amendement I would like to discuss:

For our internal python package, fully reusable, published on our internal pypi, etc, and even for my own python packages (ex: cfgtree, txrwlock, pipenv-to-requirements), I use a package that some may already know or even use, that abstracts these details and make life of python developer easier: PBR.
PBR basicaly reads requirements.txt found at the root folder of a distribution package and inject it into the install_requires of the setup.py. Developer simply needs to maintain a requirements.txt with loose dependency declarations. Until the support of Pipfile is integrated officially inside PBR, I have to use pipenv-to-requirements that automatically generates requirements.txt from Pipfile so that they are both synchronized and both commited in source, and PBR does the injection correctly after the distribution package has been built. I think one could use pipenv to generate this requirements.txt

I work on a support of Pipfile for PBR, so that it will be able to read the Pipfile (and not the lock file) and inject it into install_requires like it does with requirements.txt.

I do not know if other similar packages exist, because it also does other things people might not want (version from git history, auto generation of AUTHORS and ChangLog).

But at the end, I really feel it is so easier to write, maintain and handle versioning of a Python library, I would be sad not to share this experience. I am promoting it as the "recommended" way of writing modern python libraries on my company.

I do recon that it is like "cheating" on all difficulties about library and pipenv, but at the end the work is done and developers are happy to use it so far. Part of the python training, I am giving to new python developer in my company, involves, first writing a python library maintaining install_requires manually, and then switching to PBR to see how it becomes easier (and frankly I am a fan of the semantic commit feature of pbr to automatically create the right semver version tag).

Part of the reason to declare the libraries dependencies using a dedicated file also for libraries is to be able to use tools such as readthedocs or pyup (even if pyup makes more sence when linked to an application).

I do not necessarily want to promote this method as the "standard" way of doing python package, it is actually the "OpenStack" way, but I would to share my experience, and if others have similar or contradictory experience, I'll be happy to ear them and update my point of view.

Team, what do you think of a kind of "community" section on the documentation? So that users like me can share his experience on how he uses pipenv, without necessarily the full endorsement of pipenv team?

PS: I can move this to a dedicated issue if you do not want to polute this thread

@vlcinsky (1) is very easy to execute - put your lockfile in your repo.

I think what you instead mean is: it's impossible to give specific advice once this basic strategy is no longer sufficient. That's certainly true, but that's because the specific problem probably differs on a case-by-case basis.

Or to put it another way, the solution depends on what additional guarantees you want your CI workflow to provide.

@gsemet you know what? All my python packages created in last two years are based on pbr - it is really great. And I follow your attempts to support Pipfile in pbr whenever I can (some thumbs up, votes etc).

In case of this issue (searching for pipenv patterns and antipatterns for general python libraries) I intentionally omitted pbr for two reasons:

  • it would make conceptual discussion more complex
  • some people do not like pbr for other reasons (you mentioned them) and it would sidetrack the discussion probably

On the other hand, I am really looking forward to a recipe of yours for pbr-lovers. I will read it.

@tsiq-oliverc you hit the nail: put your lockfile in your repo

This is exactly the problem which motivated me to start this issue. If you reread the start of this issue, you will find description of few cases, where adding Pipfile.lock can break your CI tests (either breaking the build run or hiding issues, which would otherwise be detected, or installing wrong dependencies for given context...).

If you show me a repo, where this is done properly (general python library), I would be happy. Or I would demonstrate, what risks there are or what things are unfinished.

Cool ! I also maintain this cookiecutter :)

@vlcinsky Right, so let's enumerate the specific problems and find solutions for them 😄 (I don't know of any high-quality library that uses Pipenv, but that's mainly because I haven't looked.)

As best as I can tell, these are the specific symptoms in your original post:

  • Hiding broken setup.py dependencies. This sounds like a non-issue - pipenv install -e ., right?
  • Dependencies are likely to be invalid for different python versions or in another OS. I can see that this could be an issue, but could you provide a concrete example of where this has mattered in practice? (in the context of a lockfile)
  • Developers are forced to update ... when other libraries are updated and may be usable within the library. They're not forced to do that. They do that if they want to provide a guarantee that their library works against version n+1 rather than version n of their dependency. But note that I already proposed an alternative which provides the best of both worlds.
  • Competing with Tox. I don't really know anything about Tox. But yes, using two tools simultaneously to manage your dependencies sounds like a recipe for disaster. I'd say use whichever one is superior for that particular task.
  • Pipenv fails. This sounds like another non-issue - you can just pin the version of Pipenv (that's my current solution, just like I pin my Docker image, my version of Pip, etc.)

@tsiq-oliverc I have to say, your comments inspired me and I know they contributed to higher level of reproducibility of proposed solution.

Following is related to your proposal to put lockfile (Pipfile.lock) into repo to ensure repeatibility:

re Hiding broken setup.py dependencies.. The pipenv install -e . follow what I propose, but note, that this is not usage of Pipfile.lock, it is method to (re)create it. If someone keeps Pipenv.lock and use it to create virtualenv before installing the package, the problem is present.

re Dependencies are likely to be invalid for different python versions or in another OS. Examples are many: doit installed for Python 2.7 must be older version as newer one dropped support for Python 2.x. watchdog dependency requires platform dependent libraries: inotify on Linux, something else on Windows, something else on OSX. My former client used to say "This will never happen" and in 50% of situation it happened within 2 weeks. This is not the best practice for CI scripts.

re Developers are forced to update .. Imagine Open Source library with 15 contributors. It is so easy to forget regenerating Pipfile.lock by a newcomer or tired core developer. E.g. in maya package I was asked to regenerate the Pipfile.lock as new dependency was added to setup.py. Was that necessary? Did I update it properly? Did I update it for all execution contexts supported? Answers are no, not sure, no. Anyway, thanks for your proposal (it inspired me for solution described next to your comment).

re Competing with tox: Tox allows creation of multiple virtualenvs and automation of running tests within them. Typical tox.ini defines different virtualenvs for python 2.7, 3.4, 3.5, 3.6 and any other you need, and allows installing the package there and run the test suite. It is established powertool of serious testers. pipenv is not the tool for this purpose, but may interfere in installing things needed. In a way I followed your advice and proposed to use superior tool (tox) over pipenv where possible.

re Pipenv fails. This is really unfortunate. I had CI test (tox based) which run well on localhost, but when run via Travis, it failed due to pipenv issue. If I want to use it now, pinning does not help until fix is released. But this is how it goes - I will wait.

Note, that some parts of my original post will have to be updated as it seems, using pipenv in CI scripts has it's justified place ("sealing" virtualenv configuration for possible later use).

@tsiq-oliverc While I initially liked your suggestion of testing against both the "known good" and the latest versions, I find it harder and harder to justify the effort the more I think about it. I think you should decide to do either one or the other, not both.

The only thing you gain is that you'll immediately know whether a failure was caused by a dependency update or a code change. But you can achieve the same by simply making separate commits (when manually updating locked dependencies) or trying to reproduce the bug with the latest lock file produced by a successful build (when always using the latest versions). And in restricted environments, you cannot "just update" anyway...

@vlcinsky While I agree with your general point about differences between environments, the "one lock file per configuration" argument sounds like a straw man to me. In practice, you will be able to share the lock files between at least some of the environments.

One remaining open question that nobody is answered yet is how to deal with the case where you both need to test in different environment and lock your dependencies. I have to admit that I don't know anything about tox other than that it exists, but it seems like there's a need for some kind of glue between tox and pipenv that solves this problem somehow.

@Moritz90

Meeting the strawman

Regarding too many variants of Pipfile.lock serving as straw man (to keep others off my field):

Flask

I took flask project (considering it very mature) and run tox tests:

Here you see list of variants tested (just locally on Linux, multiply it by 3 as windows and OSX will do the same set of tests but may result in different environments).

There are 16 different test runs on one OS, 5 of them failed as I do not have them installed (that is fine), one is dealing with building doc (it requires imporable library) and another one coverage (which is also requiring importable library):

  coverage-report: commands succeeded
  docs-html: commands succeeded
  py27-devel: commands succeeded
  py27-lowest: commands succeeded
  py27-simplejson: commands succeeded
  py27: commands succeeded
  py35: commands succeeded
  py36-devel: commands succeeded
  py36-lowest: commands succeeded
  py36-simplejson: commands succeeded
  py36: commands succeeded
ERROR:   py34: InterpreterNotFound: python3.4
ERROR:   pypy-devel: InterpreterNotFound: pypy
ERROR:   pypy-lowest: InterpreterNotFound: pypy
ERROR:   pypy-simplejson: InterpreterNotFound: pypy
ERROR:   pypy: InterpreterNotFound: pypy

For each of created virtualenvs I have created requirements.txt file by pip freeze > {venv_name}.txt

Then calculated hashes for the files, sorted according to hash values, so all same will be grouped. Here comes the strawman:

b231a4cc8f30e3fd1ca0bfb0397c4918f5ab5ec3e56575c15920809705eb815e  py35.txt
b231a4cc8f30e3fd1ca0bfb0397c4918f5ab5ec3e56575c15920809705eb815e  py36.txt
cdf69aa2a87ffd0291ea65265a7714cc8c417805d613701af7b22c8ff2b5c0e4  py27-devel.txt
dfe27df6451f10a825f4a82dfe5bd58bd91c7e515240e1b102ffe46b4c358cdf  py36-simplejson.txt
e48cd24ea944fc9d8472d989ef0094bf42eb55cc28d7b59ee00ddcbee66ea69f  py36-lowest.txt
f8c745d16a20390873d146ccb50cf5689deb01aad6d157b77be203b407e6195d  py36-devel.txt
053e107ac856bc8845a1c8095aff6737dfb5d7718b081432f7a67f2125dc87ef  docs-html.txt
45b90aa0885182b883b16cb61091f754b2d889036c94eae0f49953aa6435ece5  py27-simplejson.txt
48bd0f6e66a6374a56b9c306e1c14217d224f9d42490328076993ebf490d61b5  coverage-report.txt
564580dad87c793c207a7cc6692554133e21a65fd4dd6fc964e5f819f9ab249c  py27.txt
8b8ff4633af0897652630903ba7155feee543a823e09ced63a14959b653a7340  py27-lowest.txt

Scary one, isn't it? From all the tests, only two share the same frozen dependencies.

This is reality of general python library with good test suite. You will now probably admit, this is something quite different from python library tested in enterprise environment.

Jinja2

Checking jinja2, which seems to be much simpler beast:

  coverage-report: commands succeeded
  py26: commands succeeded
  py27: commands succeeded
  py33: commands succeeded
  py35: commands succeeded
  py36: commands succeeded
ERROR:   docs-html: commands failed
ERROR:   py34: InterpreterNotFound: python3.4
ERROR:   pypy: InterpreterNotFound: pypy

Seeing checksums I am surprised, that py27.txt and py26.txt differ:

047a880804009107999888a3198f319e5bbba2fa461b74cfdfdc81384499864e  py26.txt
047a880804009107999888a3198f319e5bbba2fa461b74cfdfdc81384499864e  py33.txt
047a880804009107999888a3198f319e5bbba2fa461b74cfdfdc81384499864e  py35.txt
047a880804009107999888a3198f319e5bbba2fa461b74cfdfdc81384499864e  py36.txt
48bd0f6e66a6374a56b9c306e1c14217d224f9d42490328076993ebf490d61b5  coverage-report.txt
743ad9e4b59d19e97284e9a5be7839e39e5c46f0b9653c39ef8ca89c7b0bc417  py27.txt

@vlcinsky That is indeed scary. I'm wondering whether Flask is a special case or whether that's actually the norm, but you've definitely proven me wrong.

I'm now hoping our Python library will not suffer from the same problem someday and that the differences will be more manageable there.

@Moritz90 Your internal library is serving completely different audience so you may afford keeping execution context much more narrow.

General python libraries are often flexible and configurable, e.g. Flask allows alternative json parsers to be installed and used what is covered by separate test run.

One can learn a lot about testing and tox from Flask's tox.ini

lowest test variants take care to test against oldest dependency version.

devel is testing against development version of core dependencies.

I would say, Flask is on the higher level of complexity and exhibits careful test suite.

pyramid's tox.ini shows similar number of environments (they aim to 100% code coverage too).

maya's tox.ini is very fresh (2 days) and simple, even here are 4 different environments and py27 differs in frozen requirements from py35 and py36.

@Moritz90
Regarding glue between pipenv and tox

  • pipenv --man shows some instructions, how to use pipenv within tox.ini commands
  • tox-pipenv is attempting to provide some extra integration but it is confusing me at the moment.

tox.ini file allows running arbitrary commands so this includes pipenv.

pipenv has great feature, that when run in already activated virtualenv (what is case within tox based test), it installs into given virtual env. This is really nice.

As we probably need Pipfile.lock generated, some extra effort must be done to get it and move to proper place (a.g. into .tox/py36/Pipfile.lock to prevent overwriting by following test. This shall be possible, but some simplification would be welcome. Maybe some trick with environmental variable for location of Pipfile would make it even simpler.

@vlcinsky

  • setup.py - Still not sure I understand the problem. You run pipenv install -e . once so that setup.py is now tracked via your lockfile. And then run pipenv install whenever you add new packages to setup.py.
  • Developers forgetting to update lockfile - pipenv --deploy is designed to catch this. Run it in your CI!
  • Pipenv fails - agreed, if there are bugs in the tool, that sucks. But bugs generally get fixed. That's not a reason to throw away an entire philosophy 😞
  • Tox

    • If Tox is good for managing tests, that's great. If it's also great for managing packages and deterministic builds, that's even better.

    • But if that were true, there wouldn't be a reason for Pipenv to exist. So I can only assume there's some kind of limitation with Tox.

    • Similarly to above, the state of the world today (the lack of good interop) doesn't sound like a reason to reject the philosophy.

  • Multiple envs

    • It's clear that there are at least some special cases here, like Flask.

    • I don't have a better suggestion that multiple lockfiles (although maybe there's a future feature for Pipenv in this regard?)

    • But even in this case, I'm still not convinced that managing multiple lockfiles is an issue in practice. Worst case, it seems that you could create a simple script update-all-lockfiles.sh locally, and run pipenv --deploy on your CI to catch errors.


@Moritz90 - Agreed, the "two phase" approach may be overkill in most cases. In particular, if you're making deliberate/intentional "manual" updates to your lockfile, it's completely unnecessary.


More generally, it would be good to ensure this "proposal" focuses on the things that are actually hard problems (in my view, that's (A) serving multiple envs, (B) wanting to catch changes in upstream dependencies). It shouldn't be based transient things (bugs in Pipenv) or potential misunderstandings of how the tool is intended to be used.

But even for those "hard" problems, the framing should be like "in some complex edge cases, you may find that a basic Pipenv workflow is insufficient, so here are some things to think about". IMO, it should not be framed as the default approach (because most people won't have those concerns).

The documentaion example @vlcinsky provided would become simpler and less confusing if Pipenv/Pipfile would allow handling of lib-dependencies, app-dependencies and dev-dependencies. Docs could look something like this:

Use the lib-dependencies option if you package is a shared library. Example Pipfile:

[lib-dependencies]
some-lib=="*"
another-lib=="*"
yet-another-one==">=1.0"

[dev-dependencies]
some-dev-tool=="1.1"

For shared libraries it's important to keep the version ranges under [lib-dependencies] as wide as possible, to prevent version conflicts on the consumer system.

If your package is an application (intended to be installed by pipenv on the target system) that requires exact dependency versions you should use the [app-dependencies] option. Example Pipfile:

[app-dependencies]
some-lib=="1.0.12"
another-lib=="1.*"
yet-another-one=="2.0"

[dev-dependencies]
some-dev-tool=="1.1"

/End doc example

Another approach could be a Pipfile.lib and a Pipfile.app.

I think something like this would omit the need for a chunk of anti-pattern sections and third-party tools to fill the gap.

To call pipenv packaging tool is misleading if one expects it to create python libraries or to be deeply involved in creation of them.

I think this is a real problem, which leads to a lot of confusion. Especially among people that are used to package managers in other programming languages (e.g. JS, Rust, Elm). It took me several month and occasional reading of GIthub issues, until I realized that I was using Pipenv and setup.py the wrong way.

@feluxe

Your [lib-dependencies] or Pipfile.lib is what we have today in Pipfile (as abstract dependencies - being as wide as possible).

Your [app-dependencies] or Pipfile.app is what we have in Pipfile.lock (as specific dependencies).

pipenv and it's files can be used in two different situations - developing a library or preparing an application deployment but probably not for both at once. For this reason I do not see strong reasons for adding extra sections into Pipenv. It's developers responsibility to know, what type of purpose the Pipfile is going to serve.

I think this is a real problem, which leads to a lot of confusion. Especially among people that are used to package managers in other programming languages (e.g. JS, Rust, Elm). It took me several month and occasional reading of GIthub issues, until I realized that I was using Pipenv and setup.py the wrong way.

Agreed. The three-section solution is also a very interesting solution I’ve never considered, and it seems to be correct and (surprisingly!) simple.

Coming from a Python background myself, I’ve always felt Node’s package.json is doing it wrong (Rust is better because it has a compiler and linker, and can resolve this at a later stage). Treating app and lib dependencies the same way simply won’t work for a scripting language like Python, at least in an abstract sense—i.e. it might work for you, but a generic tool like Pipenv can’t do it because it needs to be generic.

While I do like the three-section solution in concept, it is still a rather incompatible change to the existing ecosystem. There are already setup.py, setup.cfg, and (potentially) pyproject.toml filling this space. If Pipenv (Pipfile, to be exact) wants to move into the space, it needs to consolidate with related projects, such as pip (library support should ideally be supported directly by it) and flit.

As I mentioned in other issues regarding lib/app dependency handling, this discussion needs to be escalated to pypa-dev (the mailing list) and/or the PEP process, so it can be better heard by other parties and relevant persons, before Pipenv (Pipfile) can move in any directions.

@vlcinsky

Your [lib-dependencies] or Pipfile.lib is what we have today in Pipfile (as abstract dependencies - being as wide as possible).

Sorry if this wasn't clear. My lib-dependencies are meant to be what people currently put into setup.py / install_requires. Maybe pypi-dependencies would be a better name for what I meant.

@uranusjr

There are already setup.py, setup.cfg, and (potentially) pyproject.toml filling this space.

Pipenv (the command line tool) could interface setup.py. Just the dependency section of setup.py would have to move to Pipfile. At least in my imagination :)

As I mentioned in other issues regarding lib/app dependency handling, this discussion needs to be escalated to pypa-dev (the mailing list) and/or the PEP process, so it can be better heard by other parties and relevant persons, before Pipenv (Pipfile) can move in any directions.

Ok, sorry for bothering ;) If I find some time I'll write something up for the mailling list.

Within the scope of this proposal, however, I would suggest it to focus on the currently possible best practices, instead of going into the rabbit hole of working out a new workflow for the whole Python packaging community. It’d be more productive to propose a best practice within the current constraints, and then start the discussion for improvements.

@uranusjr - I come from a "compiled" background, so I'm curious why this is the case?

Treating app and lib dependencies the same way simply won’t work for a scripting language like Python

@tsiq-oliverc Since the best practice of app requires you to pin your dependencies, libraries would start to pin theirs as well, if they use the same source of requirement files. This would lead to problems in dependency resolution.

Say my app has two dependencies A and B, both of them depend on C, but A pins v1, while B pins v2. Compiled languages allow the toolchain to detect this at compile time, and resolve it in many ways. Rust, for example, does this during linking time—The end executable would contain two copies of C (v1 and v2), with A and B linking to each of them. In C++ land this would be solved with dynamic libraries; the symbol lookup is done even later (at runtime), but the idea is the same—the compiler knows what you need (from the interface you use), and can act accordingly.

Scripting languages can’t do this because it doesn’t know what you really want to do until it actually reaches the call. Node works around this by always assuming the dependencies are incompatible (A and B always get their own C, even if the two copies are identical), but that leads to a new class of problems, and results in awkward hacks like peer dependencies that everyone (I hope?) agrees are terrible. Python probably don’t want to go there (it can’t, anyway, since that would likely break all existing Python installations).

Another way to work around this is to do something clever in the packaging tools that “unpins” the dependency version. Bundler (of Ruby) sort of does this, by recommending people to not include the lock file into the gem, so Bundler can use the unpinned versions in Gemfile, instead of pinned versions in Gemfile.lock. But people tend to ignore advices and do whatever they want, so you still get pinned versions everywhere.

I was probably a bit too strong to say that it simply won’t work. But at least all previous tries have failed, and many of those who tried are very smart people, much smarter than myself. I don’t think this can be done, personally, and I’d continue to think this way until I see the very brilliant proposal that actually does it.

@tsiq-oliverc Pieter Hintjens wrote somewhere a concept of "Comments are welcome in form of pull request"

I like that because it moves focus from philosophical advices to really tangible and practical things. And it also limits number of comments because a commenter often learns on the way that the idea is incomplete or somehow broken in real use.

I asked you for an example of python library, where pipenv is used properly (or at least used) and you did not provide any.

You comment on tox qualities but admit you are not familiar with it, still repeating something about best practices in the world of python package development.

You say Flask is possibly special case. So I searched Github for python projects using word "library", sorted according to number of forks (as it probably reflects how many people are doing some development with it), ignored all "currated list of something" and counted number of environments for one OS (typically Linux):

Real number of environments to run tests in will be mostly 2 (+Windows) or 3 (+OSX) times higher.

tox is used in 2 projects out of 3 (I do not compare it to Travis or Appveyor as they do another level of testing beside).

The number of environments to test in is rather high, Flask is definitely not the most wild one.

The number of environments to define fixed dependencies for is really not managable manually.

Simply dropping Pipfile.lock into a repository is rather easy, but it does no magic improvement (if yes, show me real scenario, when it will improve the situation).

Maybe you know golden rule from "compiled" world and feel that determinism (or repeatibility) is a must for Python too. As you see, really many Python projects lives without it rather well so may be the golden rule does not apply so strictly here.

I will be happy, if we find usage of pipenv for python libraries which will improve the situation. And I want to prevent usage, which would harm overall quality.

To reach that goal, my approach is to iterate over questions:

  • Shall I use the tool?
  • How?
  • Why, what value did I get?
  • Did it introduce some problems? (extra work, errors being hidden...)

@feluxe

Sorry if this wasn't clear. My lib-dependencies are meant to be what people currently put into setup.py / install_requires. Maybe pypi-dependencies would be a better name for what I meant.

See pbr discussion in this issue. It is the effort to support library dependencies by Pipfile.

I think, that one Pipfile shall not be used for two purposes (lib and app), these things shall be done separately. If you feel it is really needed, could you describe purpose of a project using it? I usually try to keep library development and deployment projects separated as they have quite different usage in time.

@vlcinsky I'm not really sure where you want to take this (I'm not sure what kind of PR you're asking for !), so I'm going to bow out of this conversation for now.

To restate the TL;DR of my position:

  1. Deterministic builds are highly desirable, commonplace elsewhere in the software industry, and easy to achieve with Pipenv.
  2. There are definitely some edge cases, and someone should push the state-of-the-art forward there (via workarounds or via better tooling).
  3. It would be irresponsible for the Pipenv docs to blanket recommend against (1) simply because (2) affects a small subset of cases.

@uranusjr Got it. Though I don't think there's anything language-specific here, it's simply that different communities have settled on different heuristics for dealing with a problem with no generic solution - if you have version conflicts, you have a problem.

Maven/Java (for example) forces you to think about it at build time. The NPM way means you have runtime issues if the mismatched versions cross an interface. Runtime resolution (e.g. Python, dynamic libraries) means that a dependent may crash/etc. if the dependency version is not what it expected.

@vlcinsky

See pbr discussion in this issue. It is the effort to support library dependencies by Pipfile.

pbr seems nice and all, but it falls under the category that I was trying to address with this:

I think something like this would omit the need for a chunk of anti-pattern sections and third-party tools to fill the gap.

I think such tools shouldn't be necessary in the first place.

If you feel it is really needed, could you describe purpose of a project using it? I usually try to keep library development and deployment projects separated as they have quite different usage in time.

When it comes to pypi packages, I ended up using Pipenv for handling dev-dependencies, Pipfile to describe dev-dependencies, setup.py to describe lib dependencies with install_requires and setuptools in setup.py to publish my package running pipenv run python setup.py bdist_wheel upload . This is what I consider complicated.

In other modern languages I have to learn one command line tool (package manager) plus one dependency file format. Documentation is in one place and easier to follow and a newcomer will get all this sorted out in a couple of hours. It's a matter of npm init, npm install foo --dev, npm publish. Pipenv/Pipfile can do most of it already, if it could do all of it, issues such as this one would not exist.

I reinterate my call for a kind of "community" section/wiki for this discution. There are several "pattern" can are legit and some of us might want to share it "way of doing python libraries", some like me with pbr, and other might have a very good pattern. But a page inside the pipenv document, no sure if it is a good idea.

PS: to prepare migration to the new pypi, you should use twine and not python setup.py upload. Using "upload" should be considered as an antipattern.

Maybe pipenv can grow a "publish" commands ?

@feluxe You might want to take a look at poetry. I just stumble across it and it seems that it's what you are looking for.

It does what pipenv does and more and it seems that they do it better especially regarding dependency management (at least that's what they pretend). It does dependency management, packaging and publishing all a single tool poetry.

I wonder if pipenv and poetry could gather effort to finally give Python a true package manager.

I want to reiterate myself again before this discussion goes too far. Pipenv cannot simply grow a publish command, or do anything that tries to take over the packaging duty. This would only fragment the ecosystem more because not everyone does it this way, and with app and lib dependencies being theoretically different, you cannot tell someone to merge them back together once the distinction is made in their workflow.

It may seem almost everyone is onboard with this merge , but the truth is there are a lot more people not joining this discussion because things work for them and they are doing something else. I’ve repeatedly said it: Discussion about improving the design of toolchains and file formats should happen somewhere higher in the Python packaging hierarchy, so it receives more exposure to people designing more fundamental things that Pipenv relies on. Please take the discussion there. There is no use suggesting it here, because Pipenv is not at the position to change it.

I’ve repeatedly said it: Ddiscussion about improving the design of toolchains and file formats should happen somewhere higher in the Python packaging hierarchy, so it receives more exposure to people designing more fundamental things that Pipenv relies on.

I agree that the discussion on this bug spirals out of control now that packaging and publishing came up (this bug is only about dependency management!), but could you please point us at the right place to have this discussion? People are having it here because pipenv is seen as a much-needed step in the right direction, not because they want to impose additional responsibilities upon the pipenv maintainers.

Edit: Sorry, I must have missed the post in which you did exactly that when reading the new comments the first time.

Within the scope of this proposal, however, I would suggest it to focus on the currently possible best practices, instead of going into the rabbit hole of working out a new workflow for the whole Python packaging community. It’d be more productive to propose a best practice within the current constraints, and then start the discussion for improvements.

I very much agree with this. We should first figure out what the best possible workflow for library maintainers is right now before we come up with big plans. So let's focus on that again, as we did at the start of this thread. I don't think we've reached a conclusion yet.

Back to topic: Quoting @uranusjr's post about why dependencies should be defined in a different file for libraries:

Another way to work around this is to do something clever in the packaging tools that “unpins” the dependency version. Bundler (of Ruby) sort of does this, by recommending people to not include the lock file into the gem, so Bundler can use the unpinned versions in Gemfile, instead of pinned versions in Gemfile.lock. But people tend to ignore advices and do whatever they want, so you still get pinned versions everywhere.

I was probably a bit too strong to say that it simply won’t work. But at least all previous tries have failed

I still don't see why the official recommendation for libraries for now cannot be to use pipenv for their CI builds, but keep the Pipfile.lock out of source control. Since, as a few people pointed out, pipenv doesn't currently have anything to do with the packaging process, we shouldn't run into the problem you outlined above.

And I also don't see why this is an argument against defining your abstract dependencies in the same file that applications use to define their abstract dependencies. It's okay if pipenv doesn't want to implement an elaborate solution for integrating the Pipfile with setup.py, but I don't see why that's a bad idea in general.

@vlcinsky

I think, that one Pipfile shall not be used for two purposes (lib and app), these things shall be done separately.

See my above post. Could you please elaborate on why you think that? I simply cannot see any downside in principle. Right now, it might be a bad idea to include a Pipfile, since you'll then have to define the dependencies in the same way in two different files, but I haven't yet seen any argument that explains why it would be a bad idea to use the Pipfile for dependency declarations in general.

Note that I've already agreed that Pipfile.lock should not be in source control for libraries unless you're in the same situation I'm in.

Edit: Also, if it turns out that pipenv itself actually needs to know about the difference, you might just introduce something like cargo's crate-type field before you start introducing app-dependencies and lib-dependencies - that sounds overly complicated.

@Moritz90 Several of Python’s mailing lists would be good venues to hold this discussion.

pypa-dev is the most definite for discussions centring Python packaging, and the ecosystem around it. I’d probably start here if I were to post a similar discussion.

python-ideas is a place to get ideas discussed, and has quite high visibility to the whole Python community. It would also be a good starting point if you want to push this to the PEP level (eventually you would, I think).

@tsiq-oliverc

By PR I mean: show an example proving your concept viable.

So pick up some existing library, fork it, apply your (1) - you say it shall be easy with pipenv and show me. I tried pretty hard and have difficulties.

If your (2) means "someone else has to do the work", your PR will not exist.

In (3) you talk about "small subset of cases" without giving any real number. Are all the top libraries I described in regards to number of virtualenvs considered "small subset"?

To conclude this discussion, I created short summary of what was found during discussion.

Focus: pipenv (anti)patterns for python libraries and applications

I changed the focus a bit: it talks not only about (general) python libraries, but also about applications as it was rather cheap to include it and it demonstrates the differences well.

I intentionally excluded anything proposing changes in existing tooling such as pipenv, tox etc.

What is pipenv and what it is not

  • it is deployment tool, allowing to define and apply concrete dependencies by means of Pipfile.lock.
  • it is virtualenv management tool.
  • it is NOT packaging tool in the sense of generating python package.

Libraries and applications

The (python software) product is either ready to be used in another product (thus library) or it is final application ready to be run.

Personally I think, even "enterprise libraries" fall into library category (the same rules apply, only number of execution contexts is smaller).

Types of software products

  • library: to be used in another product (library or an application)
  • application: to be deployed and run

Installation methods:

  • library: pipenv install <package> thus "get the package into play (resolving versions for other libraries around)"
  • application: pipenv sync thus "apply concrete dependencies"

Abstract and concrete dependencies

software product dependencies

  • abstract dependencies: must name used libraries, may restrict versions or usage, but must remain flexible enough (shall not pin versions)
  • concrete dependencies: must pin versions, ideally with hashes of used libraries

    pipenv artefacts:

  • Pipfile: abstract dependencies

  • Pipfile.lock: concrete (locked) dependencies"

Execution contexts

Typical number of different execution contexts

  • library:

    • python virtualenvs on one OS: 3 to 9 (tornado using 30)

    • number of OS: 1 to 3 (Linux, OSX, Windows)

    • total number: 3 to 18

  • application:

    • python virtualenvs on one OS: 1

    • number of OS: 1

    • total number: 1 (or very few)

CI goals, priorities and determinism

CI goal

  • library:

    • the code incl. it's abstract dependencies allow installation and expected function within all expected variants of execution contexts.

    • when (private/public) pypi get dependency library update, fail, if it affects tested library installation of function.

  • application:

    • when installed (using concrete/pinned dependencies), all expected functionality is provided within one, pre-defined execution context

Particular CI goals regarding functionality:

  • library:

    • abstract dependencies declared by the library are complete and include all necessary restrictions (only where needed): library installs itself properly

    • all expected use cases function properly

  • application:

    • concrete dependencies are complete and all are pinned, best incl. hashes: application installs properly

    • all expected use cases function properly

Different CI test modes

  • Mode: "Run, Forrest, Run"

    • Vast majority of python libraries nowdays are tested this way.

    • use tox or similar testing SW

    • No usage of pipenv and concrete dependencies (can be fine for libraries)

  • Mode: "Generate and seal"

    • No Pipfile in repository

    • Pipfile.lock created by pipenv install -e .

    • Pipfile.lock documents (seals) the environment and allows later reproduction of virtualenv for analysing issues.

  • Mode: "Ice Age"

    • two phase test

      • when abstract dependencies (defined within setup.py install_requires) change or dependent package on pypi is updated, regenerate Pipfile.lock by pipenv install -e .

    • function test run: run when library code changes. Is run within virtualenv created by pipenv sync Pipfile.lock

      How and by whom can be Pipfile.lock created

  • manually by developer (may work for applications)

  • automatically by CI build (and passing all tests, declare it verified artefact)

Priority of determinism versus flexibility

  • library: flexibility (run whenever possible)
  • application: determinism (run in exactly the same way within selected execution context)

    What can affect determinism of installed product:

  • public pypi (low determinism, packages are updated at any time)

  • private pypi (higher determinism, package updates may be controlled)
  • abstract requirements within libraries (shall not be used for determinism)
  • concrete requirements (Pipfile.lock): total determinism

Miscellaneous

Some use cases for Pipfile.lock:

  • (antipattern) define library abstract dependencies (because it must be abstract)
  • (antipattern) set up virtualenv for tested library (may hide broken library abstract dependencies)
  • document exact virtualenv ("seal"), where a test was run (thus allow developer to recreate it later for broken tests and experiment in it)
  • set up virtualenv for tested application
  • deploy application into production

    Other hints

  • pbr library allows definition of abstract library dependencies via requirements.txt. Update reading Pipfile is on the way.

  • poetry package tries something similar to pyenv

    Common issues

  • "drop lockfile into repo" and you get deterministic builds:

    • Shall work for applications.
    • Will not work for libraries, as there are really numerous executions contexts and each has good potential to result in different Pipfile.lock. Seriously: flask shows on it's 11 different virtual envs (on one OS) 10 different locked dependencies. Who is going to create and commit them?
    • Note, that with "Generate and seal" CI mode, you can still get Pipfile.lock (but generated by CI script) allowing to regenerate the virtualenv elsewhere.
  • Pipfile.lock in library repository

    • if used to create virtualenv, it can hide broken definition of library dependencies within setup.py.

  • Pipfile in library repository

    • if it repeats abstract dependencies (which are to be defined within setup.py), it may hide broken setup.py dependency declaration.

    • recommended: Generate Pipfile by pipenv install -e . or pipenv install -e .[tests] if you need also test dependencies and they are declared as "tests" extras in the setup.py

  • adding pipenv install <something> into CI scripts

    • it does not improve determinism much on it's own

    • see "Generate and seal" CI mode (then all installation into virtualenv must go via pipenv).

Conclusions

Python libraries (especially general ones) exhibit unexpectedly high number of execution contexts. The reason is, that with libraries the goal is proven flexibility under different conditions. The flexibility seems more important over deterministic builds. For people coming from "compile" world this might feel like very bad antipattern. The fact is, most (possibly all) python libraries do not provide deterministic builds (if you are aware of some, let me know) and Python is still doing very well. The reasons Python applications are stile alive might be: python as scripting language differs from compiled world. The other reason could be, that determinism can(shall) be resolved a step later as soon as and application (build from set of libraries) shall resolve the (natural and justified) requirement for determinism.

For applications the situation is just opposite and here determinism is really easy to reach with tool such as pipenv.

What to do next?

  • I will not have time to deal with this for next week or two.
  • I can imagine creating series of blog entries somewhere (not sure where). If you know the place (ideally allowing some discussion), it would be natural place to let the content be referenced, discussed and finally possibly (if it survives) put somewhere in stone :-).
  • I propose @uranusjr to take over control of this issue (close it, decide what to do next, redirect people elsewhere or whatever seems practical)

Thanks everyone for very inspiring discussion - I feels to me as a message "I am totally lost in this topic" refactored three times - what means naturally we got better.

@vlcinsky poetry has nothing to do with pyenv. It's a lot like pipenv (but with a much better implementation regarding the management of libraries and applications, IMO) but with the packaging and publishing part.

You have a pyproject.toml file that defines your project and its dependencies (abstract dependencies) and a pyproject.lock which describes the pinned dependencies and are pinned for any python version and platform the pyproject.toml file has specified in order to have only one deterministic lock file to avoid the problems that pipenv is facing. Only when installing, poetry will check which packages to install by checking them against the environment.

And when it packages your library, it will use the abstract dependencies (and not the pinned one) so you keep the flexibility when distributing your package (via PyPI for example).

Tha advantage of this is that it will use abstract dependencies for libraries and the lock file for applications. This is the best of both worlds.

@zface poetry not using pinned dependencies is literally defeating the entire purpose. Pipenv is _idempotent_ and this requires _reproduction_ of an environment. Please stop using this issue as a platform to try and sell everyone something that has listed as its first reason for why to use it over pipenv that the author doesn't like the cli. At the end of the day, our software is deployed across hundreds of thousands of machines and actually acknowledges and uses the best practices around packaging. If you don't want an idempotent environment and you do want to blur the lines between development and packaging please don't participate in this discussion because we are not moving in that direction and it will not be productive.

Essentially we spend a lot of time and effort on resiliency that small projects which make lofty claims don’t have to spend as much effort on because people aren’t hitting edge cases. If you truly believe that another tool offers you the best of all worlds then I encourage you to use it— pipenv itself is not going to handle packaging for you in the near term, if ever.

@techalchemy I am not selling anything, really, I am merely directing towards ideas that could be used in pipenv.

And poetry does pin dependencies in the pyproject.lock, just like pipenv does in Pipfile.lock. So you have reproduction just like pipenv provides. If you have a lock file it will be used and install pinned dependency and if I am not mistaken it's also what pipenv does.

The only time it uses abstract dependencies is when it packages the project for distribution (so basically for libraries) since in this case you do not want pinned dependencies.

@vlcinsky There are still a few points that need to be sorted out, corrected, or expanded on, but I am still very keen on this going into documentation form, Pipenv or otherwise. Would you be interested in sending in a pull request? I’d be more than happy to help flesh out the article.

Regarding poetry, I am not personally a fan as a whole, but it does do many correct things. It should probably not be mentioned in Pipenv docs because it violates a few best practices Pipenv devs want to push people towards, but it should be mentioned if the discussion is held in pypa-dev or similar, to provide a complete picture of how the packaging ecosystem currently is.

poetry can also use more attention and contribution. This would be the best for the community, including Pipenv. With viable choices, people can weight on their choices instead of going into Pipenv head first and complaining it not doing what they expect. Good competition between libraries can also spur forward technical improvements in the dependency resolution front, which Pipenv and poetry both do (and neither perfectly). We can learn a lot from each other.

@uranusjr Yes, I think few things were clarified and deserve sharing with wider audience. Your assistance is really welcome.

What about "pair documentation drafting"? I think that at this moment it would be most effective to work on it in small scale of two persons only.

Thinks to do are (possibly with one or two iterations):

  • where exactly we could publish that
  • identify documentation items (articles, sections)
  • clarify scope and goal of each item
  • agree on outline
  • identify open issues
  • work them out
  • write the doc
  • publish (and hope it would be accepted)

If you feel like writing it on your own (based on what was discussed) and have me as a reviewer, I would not complain.

I will contact you by e-mail to agree on next actions.

@vlcinsky Also I’m available as @uranusjr on PySlackers (A Slack workspace) if you prefer realtime interaction. Pipenv has a channel there (#pipenv).

@uranusjr That's what I meant by gathering effort. Python desperately need a good package manager like cargo. The Python ecosystem pales in comparison with the other languages due to the lack of a standard way to do things. And pipenv will not help with that, I think.

What bothers me is that pipenv advertise itself as the officially recommended Python packaging tool while it's not a packaging tool, far from it, which is misleading for users. It's merely a dependency manager coupled with a virtualenv manager.

Also, you say that it was inspired by cargo, npm, yarn which are packaging tools along with dependency managers while piping is not.

And here is the flaw of pipenv, it just muddies the water since people will still make the same mistakes as before with requirements.txt vs setup.py. Projects will still be badly packaged with badly define dependencies in their setup.py because of that. That's what projects like cargo did right: they handle all the aspects of developing projects/applications to ensure a consistency while a project like pipenv does not.

And when you say:

which Pipenv and poetry both do (and neither perfectly)

What do you mean? From what I have seen, their dependency manager is much more resilient than the one provided by pipenv. The only downside is that they use the PyPI JSON API which sometimes does not have dependency information due to badly published packages.

Anyway, I think, like you said, that both projects can learn from each other.

And, one more thing, what's the future of pipenv if, ultimately, pip handles the Pipfile? Will it just be a virtualenv manager?

If the poetry dependency manager relies on the json api it’s not only sometimes wrong due to ‘badly published packages’, it’s going to be very limited in what it can actually resolve correctly. The warehouse json api posts the _most recent_ dependencies even if you’re dealing with an old version, and that’s if it has that info at all. We used to incorporate the json api too, it was great because it was fast, but the infrastructure team told us not to trust it. It seems a bit disingenuous to call something resilient if it relies on an unreliable source to start off with.

Ultimately the challenges are around actually building a dependency graph that executed a setup file because currently, that’s how packaging works. There is just no way around it. A dependency graph that resolves on my machine may be different from one that resolves on your machine even for the same package.

It’s easy to hand wave and say ‘well doesn’t that just make pipenv a virtualenv manager if pip can read a pipfile?’ No. Pipenv is a dependency manager. It manages idempotent environments and generates a reproducible lockfile. I realize this must seem trivial to you because you are waving it away and reducing this tool to a virtualenv manager, but it isn’t. We resolve lockfiles and include markers for python versions that you don’t have, aren’t using, and keep that available so that you can precisely deploy and reproduce across platforms and python versions. We use several resolution methods including handling local wheels and files, vcs repositories (we resolve the graph there too) remote artifacts, pypi packages, private indexes, etc.

At the end of the day pip _will_ handle pipfiles, that’s the plan, it’s been the plan since the format was created. But that is the same as asking ‘but what about when pip can handle requirements files?’ The question is basically identical. Pip can install that format. It’s not really relevant to any of the functionality I described other than that we also install the files (using pip, by the way).

@techalchemy

The warehouse json api posts the most recent dependencies even if you’re dealing with an old version, and that’s if it has that info at all

This just plain wrong, you can get a specific version dependencies by calling https://pypi.org/pypi/{project}/{release}/json. If you just call https://pypi.org/pypi/{project}/json sure you will only get the last dependencies but you can actually get the right set of dependencies.

And the packaging/publishing part of python projects really need to be improved because in the end it will benefit everyone, since it will make it possible to use the JSON API reliably.

It manages idempotent environments and generates a reproducible lockfile.
We resolve lockfiles and include markers for python versions that you don’t have, aren’t using, and keep that available so that you can precisely deploy and reproduce across platforms and python versions.

And so does poetry. And you can make it not use the JSON API to provide the use the same resolution method as pipenv (using pip-tools). See https://github.com/sdispater/poetry/issues/37#issuecomment-379071989 and it will still be more resilient than pipenv (https://github.com/sdispater/poetry#dependency-resolution)

@zface I will say this one final time, please take this to somewhere higher in the hierarchy. Pipenv does not self-proclaim to be the officially recommended Python packaging tool; it says that because it is. If you feel that is inappropriate, tell it to the officials that recommend Pipenv. Please do not put these things on Pipenv dev. This is the wrong place to complain, and you cannot possibly get resolutions for your complaints here. You can also get better answers on technical questions you have there. This is an issue tracker for Pipenv, not a discussion board for Python packaging tools and how Python packaging is done.

Pipenv doesn't just rely on pip-tools for resolution, please stop reducing our software to one liners that demonstrate a lack of understanding. I know very well how the PyPI api works, I talked directly to the team that implemented it.

This just plain wrong,

This kind of attitude is not welcome here. Do not assume we don't understand what we are talking about. Please practice courtesy.

it will still be more resilient than pipenv (https://github.com/sdispater/poetry#dependency-resolution)

Pipenv does not currently flatten dependency graphs. Pointing to one specific issue where a tree has been flattened and claiming the entire tool is therefore both better and more resilient is foolish, you are proving over and over again that you are simply here to insult pipenv and promote poetry. Please be on your way, this behavior is not welcome.

I agree the discussion is way off-topic, that was trying to capitalize the "good practices" arround pipenv.

However,

[...] will still make the same mistakes as before with requirements.txt vs setup.py. Projects will still be badly packaged with badly define dependencies in their setup.py because of that.

I share this opinion, getting new developers to successfully package their own Python code is actually complex, too complex, requires to read way to much online documentation.
But it's not up to pipenv or any other package dependency to deal with that entirely. We could not rewrite the history. We, as a community, need to find a way to modernize the python tool chain, step by step.

And pipenv (and probably poetry) is a very good step forward.

Having to maintain on one side Pipfile for application and setup.py for libraries on the other side, is a no brainer. No matter how hard we explain with lot of words and long articles and good practices guides, it's too complex for what it is. I completely agree it is like this for the moment, but it should not prevent us of imagining a better and safer way.
At the end, as a developer, I want a single tool, maybe with two different modes, to help me and make my life as easier as possible.

Their should be a way of extracting just the part that does the requirements.txt/Pipfilefrom libs such as PBR to propose a kind of 'easy setup.py', a Pipfile-aware wrapper arround install_requires, without all the unwanted behavior pbr brings, and package that in a dedicated setuptools wrapper that only does that.

So we would be able to have the better of each world:

  • pipenv to maintain Pipfile (versionned in both libraries and applications)
  • pipenv to maintain Pipfile.lock (versionned only for applications)
  • one would use this magic wrapper package (pipfile_setuptools, install_requires_pipfile ?) that would be a first level dependency which job is only to inject Pipfileinto install_requires.

This another project that would not be related to pipenv, but still needs a generic Pipfile parser library. What do you think?

@gsemet From my understanding PyPA has been trying to fill that with pyproject.toml instead, led by flit. You’ll need to talk to them first (at pypa-dev or distutils-sig) about this before proceeding to use Pipfile as the source format. As for parsing Pipfile (and the lock file), that is handled in pypa/pipfile (which Pipenv vendors to provide the core parsing logic).


Edit: Please drop me a message if you decide to start a discussion about this in either mailing list. I do have some ideas how we can bring the two parts of Python packaging distribution together.

I must admit I am a bit sad seeing dependencies declared in pyproject.toml(which takes the roles as setup.cfgdone by PBR), while PyPa also supports Pipfile....

Thanks for the pointer to flit and pipfile. There is also Kennethreitz 's pipenvlib that seems lighter.

PBR's setup.cfg seems more complete compared to the official documentation (ex: data_files) and reuse a file already shared with several tools (flake8, pytest, ...) can use the same file, reducing the number of file at the root of a python project)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

AkiraSama picture AkiraSama  Â·  3Comments

jakul picture jakul  Â·  3Comments

randName picture randName  Â·  3Comments

hynek picture hynek  Â·  3Comments

FooBarQuaxx picture FooBarQuaxx  Â·  3Comments