Carrying forward discussion from #8210
To discuss about a constraints.txt file with the following content:
packaging @ git+https://github.com/pypa/[email protected]
To me, this means whenever packaging
is requested, use this link I provided. Further version specifiers should still apply (and fail resolution if they are not satisfied by the link target). The behaviour would be similar to
pip install <packages-i-want> "packaging @ git+https://github.com/pypa/[email protected]"
but without packaging
being marked as user_requested
(unless it is listed in <packages-i-want>
, of course).
I think of it as constraining the location from which distributions can be downloaded, just as a version expression like >=20.1 would constrain the version(s) that could be downloaded.
OK. I have done a more detailed investigation into the way that we'd implement NAME @ URL
style constraints. The idea in principle is straightforward, but it has a number of interactions with other features of pip that make the overall behaviour far more complex.
The current implementation will have behaviours for all of these points that arise "by accident" as a result of the implementation choice to treat constraints as "just requirements that we don't install". The new resolver doesn't have that option, because the underlying model is completely different.
It's also true that there are likely reasonable choices to be made as to "how should this interaction work". It's not necessarily hard to choose a "correct" behaviour for any of the following points. But the implementation is decidedly non-trivial in the new resolver's existing model, so the cost of these features is disproportionately high, and we don't really have any idea at the moment if there's any actual use cases for these sorts of interactions (that's actual use cases as opposed to "yes I can see how this might be useful" 🙂)
foo @ https://some/url
and a requirements file that said foo --hash=xxx
, what should happen? Should we hash-check the URL? Is it of any practical use? Should we select one of multiple URL constraints based on the hash?There are quite likely other questions. These are basically just some that came up in the process of trying to see how to implement this and saying "ouch, that would be a problem" a few times 🙂
As I said, it's almost certainly possible to come up with a well defined spec for this feature. And maybe that spec would even match the current behaviour. But we don't have that spec right now, and no-one has really done any of the work needed to come up with one.
So I propose that we consider URLs as constraints as "deprecated pending work being done to properly define the feature"1, and assume that they won't be supported in at least the initial version of the new resolver. People currently using URLs in constraint files will need to either stick to the old resolver, or find a workaround.
The only actual user of URLs as constraints that I am aware of is @sbidoul, who explained his use here. Maybe there are (more or less clumsy) workarounds that could be used instead? @dhellmann seemed OK here with not including this in the new resolver, at least initially.
To reiterate, this is not a blanket refusal to accept this feature. It's simply noting that we need to clarify the design in order to rewrite it for the new resolver, and that we're not blocking the release of the new resolver on this feature.
1 This has horrible echoes of the dependency links mess. I really want to not have that happen again, so I want to be very clear that the plan this time is to drop the feature without any replacement if necessary, and not to leave it hanging around forever "until we have the replacement defined".
@pfmoore don't the edge case you mention also arise when direct URLs are used as regular install-requires?
@sbidoul I have no idea to be honest. I've not spotted any test failures that do that, but if there are any, then yes, that's something we'd need to address (and having addressed it, we may well then have a better chance of implementing URLs as constraints).
OTOH I very much doubt there are robust tests or user experience on this. The legacy resolver handles non-user-specified direct URLs very poorly, so there are likely many edge cases currently precluded by other restrictions. I’m +1 on delaying thoughts on this until we are able to make the new resolver widely available for a while, to let the users come up with crazy cool use cases that expose considerations we missed.
Agreed, this whole area is really emergent behaviour from a set of features that were implemented independently and never really considered in combination.
This is probably something that needs to be covered under #6536 (new resolver rollout) which is something that needs to be prioritised at some point. There will be areas where the new resolver works differently, that either aren't covered by tests, or where we've made a decision to keep a difference. and we need to look at how we get feedback on that.
I had opened an issue here https://github.com/pypa/pip/issues/8757
Just wanted to add a mention here that it is very helpful to be able to provide the url for installation a package from as a constraint when waiting for a contributed patch to be applied by upstream and released.
I.e. something like https://github.com/some/repo/archive/bugfix-issue-foo.tar.gz#egg=somepackage
to allow installing the bugfix for package "somepackage" until upstream can cut a new release. Otherwise all packages have to be vendored and renamed and it is very complicated and duplicates a lot of work.
Right now if you drop https://github.com/some/repo/archive/bugfix-issue-foo.tar.gz#egg=somepackage
in a constraint file and then install a requirements.txt that lists somepackage
, you get https://github.com/some/repo/archive/bugfix-issue-foo.tar.gz
. Please do not remove that workflow if you are able to keep it in some form
@thenewguy Thanks for your input. I've seen a few people describe use cases for URLs as constraints that all seem to have the same underlying motivation, and I'm fairly clear that we should have this feature in some form.
However, the key here is defining the details. I posted above a list of questions about how URLs as constraints should interact with other features of pip, and so far, no-one has come up with any answers. I don't personally have any intuition on what the answers should be, so it's essentially impossible to implement the feature for the new resolver. To be clear, the existing implementation is tightly linked to the old resolver code - it's not a case of "keeping what's there", we absolutely have to do a complete re-implementation, and as such, we need to know what to implement.
Please do not remove that workflow if you are able to keep it in some form
As I say, we aren't able to keep it. What we can do is to re-implement it. But to do that we need to understand what to implement - and "what the old implementation does" isn't even meaningful in terms of the new resolver...
As I say, we aren't able to keep it. What we can do is to _re-implement_ it. But to do that we need to understand what to implement - and "what the old implementation does" isn't even meaningful in terms of the new resolver...
Gotcha - well in my experience - this is typically used for temporary bugfix releases. It sounds like there is a lot more to consider than my knowledge of pip's internals allow, but imo, if a named package url is provided, it should be forced and installed regardless of any version specifier. If that is too much, perhaps failing if the version doesn't match. I've never used it in combination with strict versioning (because as a bugfix there really isn't a version to it as I am not in control of the release process for upstream). So perhaps, conceptually, it is an override rather than a constraint. I've not come across another way to do this.
However, it is also handy when working with a pre-release package that isn't published somewhere like pypi yet. Although this case is easier to work around than the previous use.
I don't use the feature, but it sounds like maybe one interpretation of how others use it is to (partially?) disable the resolver for a specific dependency and use the given distribution as the only available location and version. Maybe that implies some other behaviors about how that affects the other dependencies, I'm not sure. For example, should the resolver assume that the distribution at the given URL meets the version requirements of anything that depends on it, even if perhaps the version number in the distribution metadata says otherwise?
That would be kind of a "just do what I tell you" mode, which given the conversation elsewhere about how to reduce the scope of pip itself and encourage an ecosystem of surrounding tools, might make a lot of these sorts of features buildable outside of pip. For example, if pip could report the results of resolving a requirement set by producing a list of URLs without installing them (maybe it can do this already, or it means building another tool on top of the resolver library?), then someone could build that full list from a partial requirements.txt, edit the results to replace the URL for the dependency they are patching, and then install from that list of URLs without resolving dependencies at all (with pip, or something else) to get exactly what they want. That would give the user the convenience of pip helping to build the full dependency list, and mean that PyPA developers wouldn't have to entertain every variation of dependency management that anyone can come up with.
The “do as I tell you” usage is more difficult to implement, and should be discussed separately IMO (#8076 covers it). Constraints are currently implemented as adding to existing requirements. This makes them fit nicer with the rest of dependency resolution logic, and relatively easier to make sense. This is enough for most use cases in practice as well, since existing dependencies are correct most of the time, and a URL constraint just needs to add additional information to tell the resolver to specifically use that one particular artifact, much alike how version constraints do.
if a named package url is provided, it should be forced and installed regardless of any version specifier
Conceptually, that behaviour doesn't feel like a "constraint" to me, but more like an "override". I think it's important that constraints do what the name suggests, for both discoverability and understandability reasons. Overrides are being discussed separately in #8076 as @uranusjr says.
For me, conceptually, a URL constraint says "you can only get this project from this specific URL. If what's there doesn't satisfy the dependencies pip has calculated, you're out of luck".
To be fair, that logic mostly answers my above questions:
For the last question, "should we reject the edge cases that we don't support" I would be pragmatic and say that if we end up in a code branch that the above logic doesn't give us an answer for, we fail with an error saying it's unsupported. On the other hand, we don't go out of our way to check for special cases, and we document the intention and explicitly state in the documentation that any other usage is not supported, and behaviour is undefined and may change or be removed without notice.
If that's an acceptable approach, we can go ahead on that basis. (However, note that the period of paid work on developing the new resolver has completed now, so implementing this will be done on volunteer time - personally, I'd like to look at it but I don't know when I'll next have time to do so).
The “do as I tell you” usage is more difficult to implement, and should be discussed separately IMO (#8076 covers it).
Sure, I'll read that, too.
Constraints are currently implemented as adding to existing requirements. This makes them fit nicer with the rest of dependency resolution logic, and relatively easier to make sense. This is enough for most use cases in practice as well, since existing dependencies are correct most of the time, and a URL constraint just needs to add additional information to tell the resolver to specifically use that one particular artifact, much alike how version constraints do.
Sure, that's another way to look at it. I was trying to point out that if the phases of resolving the dependencies and installing the packages were exposed explicitly, then many (most?) other use cases, such as this one, could be addressed by modifying the output of the resolver before passing the list to the installer. Those modifications could be left up to individual users or authors of other tools that integrate with pip, but pip's resolver wouldn't have to contain the additional complexity.
I was trying to point out that if the phases of resolving the dependencies and installing the packages were exposed explicitly, then many (most?) other use cases, such as this one, could be addressed by modifying the output of the resolver before passing the list to the installer.
That's an interesting idea. We could have something like pip install --resolve-only --out=some_file.txt
and pip install --from-resolve-data=some_file.txt
. I could see some pretty significant issues to consider here (what if the pip options used in the two phases were different, for a start?) and I can't imagine that we'd ever support editing that intermediate file (it's expected that people do this, but it would have to be on a "we won't help if you break stuff" basis, I'd have thought), but I can see it would be useful.
(Of course, taking that idea to its logical conclusion, we'd break pip up into a suite of tools doing the various "bits" of the process, much like the Unix idea of combining many small tools - and I doubt we're actually likely to go down that route in reality).
I was trying to point out that if the phases of resolving the dependencies and installing the packages were exposed explicitly, then many (most?) other use cases, such as this one, could be addressed by modifying the output of the resolver before passing the list to the installer.
That's an interesting idea. We could have something like
pip install --resolve-only --out=some_file.txt
andpip install --from-resolve-data=some_file.txt
. I could see some pretty significant issues to consider here (what if the pip options used in the two phases were different, for a start?)
Yes, that's close to what I was thinking. It might be easier to break down which options apply to each phase by thinking about new sub-commands with names like pip resolve
and pip deploy
, with pip install
encompassing both phases transparently. Most of the options to the existing install
command would apply to the resolve
command, but not to deploy
. Separate sub-commands also has the benefit of separate argument parsers, and if the options aren't available for pip deploy
, they can't be different than the values given to pip resolve
. :-)
and I can't imagine that we'd ever support editing that intermediate file (it's expected that people do this, but it would have to be on a "we won't help if you break stuff" basis, I'd have thought), but I can see it would be useful.
Yes, exactly. The deploy
step would just take the data as input and do what it needs to do so the listed packages were on the import path (downloading things, turning them into wheels, writing files to the filesystem, etc.). It would only work from the list, though, without applying any additional rules or processing. It has to assume that the list is "correct". It would be up to the user to make the list contain what they want, and if what they want is broken somehow then that's not pip's problem.
I originally said a list of URLs, but the output of the resolve
phase might be easier to consume if it includes more of the data that the resolver has. For example, the original requirement, the reason for a dependency being added if it wasn't in the original requirement list, the URL to the package, an optional second URL for a local cached copy, etc. A lot of that data could be optional because deploy
wouldn't need it, but a program that wanted to modify the data could use it to make decisions about the modifications.
(Of course, taking that idea to its logical conclusion, we'd break pip up into a suite of tools doing the various "bits" of the process, much like the Unix idea of combining many small tools - and I doubt we're actually likely to go down that route in reality).
Separate sub-commands may make it easier for users to understand conceptually, but I wouldn't go so far as to create separate executables or main programs. And I wouldn't necessarily say that pip install
should write the intermediate data to a file before deploying, but internally it should build the same data structure with the resolver and pass it to the deployment code.
IIUC, we're going in the direction of #53, lockfiles, #7819 here.
Looking at the implementation of the install command, I see there are probably some options that would be needed by both phases. Whether to install to the user dir, the target directory, etc. would affect what is seen as already present by the resolver and then would also be needed by the deployer code to know where to put things. So, maybe some of those sorts of values need to be part of the data file produced by the resolve phase.
For what it's worth, "constraints file containing lines of the form package @ file:///local/path
" was my interpretation of the following lines from the constraints file documentation:
Constraints files are used for exactly the same reason as requirements files when you don’t know exactly what things you want to install. For instance, say that the “helloworld” package doesn’t work in your environment, so you have a local patched version. Some things you install depend on “helloworld”, and some don’t.
One way to ensure that the patched version is used consistently is to manually audit the dependencies of everything you install, and if “helloworld” is present, write a requirements file to use when installing that thing.
Constraints files offer a better way: write a single constraints file for your organisation and use that everywhere. If the thing being installed requires “helloworld” to be installed, your fixed version specified in your constraints file will be used.
If that's not the intention, should there be a documentation bug or enhancement?
If that's not the intention, should there be a documentation bug or enhancement?
Yes. The existing documentation is inaccurate for the new resolver, in that it doesn't explain exactly what is valid in a constraints file (and the new resolver has changed the specific details there). If anyone wants to raise a documentation PR for this, that would be helpful.
Or a PR to implement URL constraints along the path sketched in https://github.com/pypa/pip/issues/8253#issuecomment-674372005 ;)
Or a PR to implement URL constraints along the path sketched in #8253 (comment) ;)
Indeed, or that 🙂 I'm trying to do too many things at once at the moment, and not doing any of them well...
I'd definitely like to see URL constraints along those lines. So, I'm right now working my way through the developer documentation, in hopes of eventually understanding things well enough to write a PR.
I'm currently minimally familiar with pip's codebase, so I probably won't make good time on my own. I'll keep on with this on my own if need be, but if anyone with more experience decides to work on this, I'd probably be more effective if I were helping with their efforts in some way.
Looking over the code, here's what I think needs to happen:
I'm not sure if I accidentally read code that's exclusive to the legacy resolver at some point, which would confuse things. In any case, it looks to me like most of the changes required follow naturally from the representation of URL constraints as Constraint objects, of which all I know is, there should be hashes, and some way of representing the URL.
Are there any big stumbling blocks I'm missing out on?
Most helpful comment
The “do as I tell you” usage is more difficult to implement, and should be discussed separately IMO (#8076 covers it). Constraints are currently implemented as adding to existing requirements. This makes them fit nicer with the rest of dependency resolution logic, and relatively easier to make sense. This is enough for most use cases in practice as well, since existing dependencies are correct most of the time, and a URL constraint just needs to add additional information to tell the resolver to specifically use that one particular artifact, much alike how version constraints do.