Pip: Remove the pip search command

Created on 13 Apr 2018  路  32Comments  路  Source: pypa/pip

Currently pip allows you to search a repository by running pip search, which will then print out a bunch of packages that match, see for example:

$ pip search requests
negotiator-3k (1.0.0)                                                  - Proper Content Negotiation for Python

    The Negotiator is a library for decision making over Content Negotiation requests.
    It takes the standard HTTP Accept headers (Accept, Accept-Language, Accept-Charset,
    Accept-Encoding) and rationalises them against the parameters acceptable by the
    server; it then makes a recommendation as to the appropriate response format.

    This version of the Negotiator also supports the SWORDv2 extensions to HTTP Accept
    in the form of Accept-Packaging.
odoo10-addon-sql-request-abstract (10.0.1.0.0.99.dev1)                 - Abstract Model to manage SQL Requests
odoo9-addon-sql-request-abstract (9.0.1.0.0.99.dev6)                   - Abstract Model to manage SQL Requests
odoo8-addon-sql-request-abstract (8.0.1.0.0.99.dev7)                   - Abstract Model to manage SQL Requests
zenodo-accessrequests (1.0.0a2)                                        - Zenodo module for providing access request feature.
requests-wsgi-adapter (0.4.0)                                          - WSGI Transport Adapter for Requests
requests-celery-adapters (2.0.9)                                       - Requests lib adapters to send Celery messages (tasks)
odoo9-addon-hr-holiday-notify-employee-manager (9.0.1.0.0.99.dev1)     - Notify employee's manager by mail on Leave Requests creation.
odoo9-addon-purchase-request-operating-unit (9.0.1.0.0)                - Operating Unit in Purchase Requests
odoo10-addon-hr-holidays-notify-employee-manager (10.0.1.0.0.99.dev4)  - Notify employee's manager by mail on Leave Requests creation.
odoo9-addon-sql-export (9.0.1.0.0.99.dev12)                            - Export data in csv file with SQL requests
odoo8-addon-sql-export (8.0.1.0.0.99.dev9)                             - Export data in csv file with SQL requests

...

The output here goes on for ~900 lines and the results are just complete trash. This is a better on Warehouse:

$ pip search --index https://pypi.org/pypi requests
requests (2.18.4)                         - Python HTTP for Humans.
  INSTALLED: 2.18.4 (latest)
aiohttp-requests (0.1.0)                  - A thin wrapper for aiohttp client with Requests simplicity
anonymous-requests (0.2)                  - 
apiclient-requests (0.1.2)                - A simple python base package for building good api clients on
careful-requests (0.1.4)                  - Requests for header-sensitive servers (like Accept-Encoding)
crawl-requests (2.2.8)                    - crawl_requests(like requests) can update ua and proxy automatically.
gcloud-requests (1.1.9)                   - Thread-safe client functionality for gcloud-python via requests.
jsonapi-requests (0.6.0)                  - Python client implementation for json api. http://jsonapi.org/
jsonrpc-requests (0.4.0)                  - A JSON-RPC client library, backed by requests
nav-requests (1.1.4)                      - Renamed to `nav`
parse-requests (1.0.7)                    - parse-rest-python - A fast and simple Python library to interact with Parse.com REST API
play-requests (0.0.3)                     - pytest-play plugin driving the famous Python requests library for making HTTP calls
PyGithub-requests (1.26.0)                - Use the full Github API v3
Randomized-Requests (1.0.2)               - Python package that makes post and get request with random proxy and user agent
requests-aeaweb (0.0.1)                   - Requests wrapper to log onto AEAweb.org.
requests-aliyun (0.3.1)                   - authentication for aliyun service
requests-auth (1.0.2)                     - Easy Authentication for Requests
requests-aws (0.1.8)                      - AWS authentication for Amazon S3 for the python requests module
requests-aws4auth (0.9)                   - AWS4 authentication for Requests
requests-bce (0.0.5)                      - authentication for bce service

...

Which gives us ~111 lines of output, and which actually returns some meaningful output.

I believe that this command is a fairly regular source of confusion for users, primarily because it uses a different source of truth than pip install does, which means they need to configure the location to search at differently than they need to configure the location to install from (and the search API is not standardized, and to the best of my knowledge, very few alternative implementations support it.

There has been a long standing idea of switching search to use the PackageFinder() class to try and resolve these issues, but I don't think that is going to work reasonably either. The problem is that while that would reconcile the differences between, PEP 503 doesn't provide any mechanism to pass information like the summary that we print alongside each result above. Speaking with my PyPI hat on, I would be very opposed to adding such information to the PEP 503 repository API, because it would bloat the responses and have them take up more bandwidth for a very minority edge case. The other problem is that the PackageFinder() API itself doesn't fall back to /simple/ anymore, but that's resolvable but the larger issue with that is that /simple/ is 7MB large as of right now, and that is likely to continue to grow, having pip search issue a 7MB http request is a pretty crummy experience.

So that leaves us in a bit of a sticky situation. The current implementation is confusing and practically speaking only searches PyPI and not anywhere else, but our best path forward for resolving that is a non-starter due to other concerns.

So I think we should just rip the bandaid off and deprecate and eventually remove the pip search command. The only other alternative I can really think of that would actually resolve this, is to switch to using /simple/, but that would then mean getting hit with a 7MB download just to try and search.

Thoughts @pypa/pip-committers?

search UX UX - functionality research epic deprecation maintenance

Most helpful comment

We could also bake some sort of time gating right into the message, like in the Python code check if the date is before some cut off date before showing the message.

All 32 comments

I've rarely used pip search, and never really got useful data from it. But then again, I've never got much use out of PyPI's search facilities anyway.

I'd like to see a good search facility on PyPI, and if there were one, I'd want to be able to use it on the command line as pip search. I'd be happy for pip search to only work on indexes that support a (yet to be defined in a PEP :smile:) standard search API, assuming that PyPI is one such index.

So I guess I'm -1 on removing pip search, but happy to acknowledge that it's currently useless - possibly even to the extent of having it simply report "Search is currently disabled because there is no search API defined for package indexes - please use the index search page directly". Long term I do think it's worth having though.

(Another option is that we could develop a plugin API and delegate producing a good pip search to 3rd party contributors. But that's a whole other debate ;-))

@pfmoore Is there a functional difference of deprecating/removing the search command now, and if a standardized search API gets designed and implemented, adding it back at a later point? Having the command there but always returning an error seems the worst of all possible outcomes.

Two, in my view. Both minor.

  1. It signals our intent to have a search command, not simply to dump it. (Assuming that is our intent...)
  2. If there are people using it, even in its current bad state, leaving it alone avoids harming them, and costs us little. Obviously going as far as a "Currently disabled" message removes this benefit (so this one only applies if we stop at the "acknowledge it's useless" level :smile:).

But honestly, I don't care enough to fight for its retention. I do think it deserves a deprecation cycle, but you included that in your proposal anyway, so that's fine.

I tried a couple examples (pip search websockets and pip search trio), and the results seem reasonable / useful.

For the "requests" example, it looks like the reason it's picking up ~900 instead of ~100 is that the first implementation is also searching the summary string instead of just the project name, and "requests" is a common English word. For example, it picks up this:

odoo8-addon-sql-export (8.0.1.0.0.99.dev9)         - Export data in csv file with SQL requests

Do we expect the XML-RPC API to be removed before we manage to make a search API?

If that's the case, we should go ahead and pull out the band aid. Otherwise, what Paul suggests (making it loudly known that we expect pip search to "lie" and be no-so-useful) is also fine.

Any interest in this? I don't mind deprecating it for removal.

If pip search will be deprecated, which tool need to use for packages search in PyPi (not in warehouse, in public)? I ask, because, we have several internal cases when we use "pip search". Thanks in advance for the answer!

@xnuinside I'm not sure I understand your question, specifically the "(not in warehouse, in public)" part, can you elaborate?

The pip search command uses PyPI's XML-RPC API that is available for anyone to use: https://warehouse.readthedocs.io/api-reference/xml-rpc/

@di, Dustin, PyPI's API it is what I need) thx

No one is opposed to actually deprecating the pip search command and, as far as I can tell, there's no reason to not do this soon. I've gone ahead and added this to pip 20.2 release, since we've missed the window for pip 20.1.

FWIW, a while back I asked what people used pip search for and got some responses: https://twitter.com/di_codes/status/1131243583078588418

@di Thanks, I was aware of that and if that's representative of our userbase (it's not, but let's roll with the assumption), Ernest's response looks like the main reason that endpoint is hit so often.

None the less, deprecation is probably the best tool we have to surface any concerns users might have with removal/replacement of functionality. That should tell us if pip search is the reason for PyPI's XML-RPC search endpoint numbers. :)

Just confirming, the intent is to add the deprecation warning by 20.2? The way we're talking it sounds like we might drop it in 20.2, but there's no deprecation warning being emitted to warn people!

The plan is definitely to deprecate in 20.2.

Quoting @nlhkabu's excellent suggestion from Zulip:

I don't think it is a good look to mark something as "for deprecation" and then backtrack on that decision. Maybe instead we can add a warning asking for user feedback. E.g.
"The pip team is considering refactoring or removing this command. Please let us know how you use it here: url"

I think I'm going to go ahead and file a new issue, and use that as a URL for this message. If anyone has inputs, please let me know. I'd really like to slip this into 20.2 if possible.

I don't think directing user's to a GitHub issue is going to be the best way to collect this information.

I think the UX team should put together a survey (or similar) to collect feedback on what we should do with pip search.
As Georgia mentioned on Zulip, I also think we should ask about it during user interviews.

@pradyunsg let's talk about how we might be able to get something set up for 20.2. in our next team meet, and report back here.

One (maybe minor) point. If we do have the message point to a survey, presumably we'd need to keep that survey available, or at least replace it with a "this survey is now closed" message directing the user somewhere else, until we alter the message again and have some assurance that a reasonably high proportion of users will have upgraded? My assumption being that we don't want to present the user with a broken link.

Not so much an objection, I'm just not sure how something like this should be handled.

We could also bake some sort of time gating right into the message, like in the Python code check if the date is before some cut off date before showing the message.

I've put together a draft survey for collecting feedback from users:
https://forms.gle/qY7PA3U4QHmo9Ao66

Please review (feel free to fill it out, I will delete all responses before sharing the link publicly) and let me know if you think I've missed any important questions, or could improve what is there.

Thanks :man_dancing:

I'm not sure I understand the difference between "trustworthy" and "useful" in the 2 questions about pip search results. This should either be made clearer, or we should just have the "useful" question.

Also, it would be nice if, when asking whether pip search should search on anything other than name, we also asked whether searching on other data should need a flag (or something). My personal response was "Yes - description. But only on request, default should be on name only." My point being that if we get responses like "name, decription and keywords", we still don't know how to implement that (one reason the current command is useless IMO is because it searches indiscriminately across multiple fields).

@pfmoore I suggested adding a question to understand trustworthiness of pip search results. This is because current user behaviour is currently download a piece of software (possibly) from the Internet - sight unseen - based on search results from a search in pip.

It would be useful to understand if users have had any issues - malicious packages, packages purporting to be a different package, etc.

Usefulness (as I understand it here) is more about the usefulness of the pip search results.

About "when asking whether pip search should search on anything other than name" I agree, but let's start with a minimum viable change - start with name. Depending on the results to that question we have an idea of what people would want. We don't need to worry (too much) about implementation until we know what people want. ;)

current user behaviour is currently download a piece of software (possibly) from the Internet - sight unseen - based on search results from a search in pip.

Really? I'm genuinely surprised - do we have evidence of that from other surveys? My assumption (and what drove the idea of just deprecating pip search) was that current user behaviour is to ignore pip search because it isn't sufficiently helpful, fire up a browser and use PyPI to search, then download software based on that PyPI search. That's definitely what I do...

It would be useful to understand if users have had any issues

Cool. No argument if you think it's useful data to collect. But when I read the question, I didn't understand what it meant by "trustworthy", so I just treated it as the same as "useful" and answered the same. So maybe the question needs to be clearer, to clarify the intention?

Also, I found it hard to answer the questions, because personally I find pip search in its current form so useless that I have no real opinions beyond that (I don't know if I consider data that's of no use to me whatsoever is "trustworthy", for example).

We don't need to worry (too much) about implementation until we know what people want.

Again, I'm fine with that. All I'm saying is that unless we get "just search on name and nothing else" as an answer, we will have follow-up questions, otherwise the information will be of limited use. But if we want to keep this survey simple and defer those questions to a follow-up survey, that works fine for me.

Thanks for your feedback @pfmoore @ei8fdb.

I've rephrased the "trustworthy" question as

I assume that packages found with pip search are...
Very insecure -> Very secure

I hope this is clearer.

Thanks for flagging those potential follow up questions @pfmoore . I'm thinking the best way to address these will be to design a couple of alternative implementations and conduct follow up research on which solution is best. I'm hoping that this initial survey will give us enough data to inform those initial designs.

Marking this as a blocker for 20.2, since we've got quite a few smaller questions to cover here:

  • what is the exact wording for the message?
  • should users be able to "silence" this message? (this will need exposing a flag, which is... more work. :P)
  • where in the output should it be located?

    • Run pip search water and pip search httpx for two cases of what the current output looks like.

  • when/how often should it be printed? (every run, every tuesday, something else?)

    • anything other than "every run" will involve significant implementation complexity, so I strongly prefer that. :)

@pypa/pip-committers @nlhkabu @ei8fdb I'd love to get your inputs on all of these.

Marking this as a blocker for 20.2

Dropping this from a blocker status, since on second thoughts, this doesn't need to block the main 20.2 release.

(1) there's a lot of open questions here
(2) If we really want to, I'm happy to introduce the nudge-users-to-a-survey message in 20.2.1 (a future bugfix).
(3) I no longer think this is super critical to do right now.

Status: finalising the survey on my side.

XMLRPC search is far and away the #1 route PyPI's backends, consuming over 90% of the resources we use to host PyPI. It's getting increasingly untenable especially as other clients and services (ab)use this fragile and resource intensive endpoint.

I'm a proponent of removing pip search and deprecating the API endpoint that services it. This will reduce the cost of running PyPI, and may be necessary even if pip search continues to exist.

XMLRPC search is far and away the #1 route PyPI's backends

Somewhat off-topic, but is this specifically the search, or the XMLRPC API in general? I ask because as far as I know the only XMLRPC APIs that don't have equivalent or better APIs elsewhere are search and the mirroring API (changelog et al)

More or less specifically search.

Here an approximate breakdown of the count of requests over the last 7 days (note "pypi" here is the POST /pypi that xmlrpc uses):

Screen Shot 2020-10-21 at 4 05 24 PM

So on average 57rps to our backends for all XMLRPC.

And here is the breakdown of all XMLRPC requests in the same time period:

Screen Shot 2020-10-21 at 4 07 33 PM

So on average ~54rps to our backends for XMLRPC search (which includes but is not exclusive to pip).

https://github.com/pypa/pip/issues/9030#issuecomment-716110759

With PyPI throttling XMLRPC calls, we're no longer going to have a good way to test our pip search calls (without hitting those rate limits). I'm strongly in favor of dropping pip search for technical reasons. I understand that we have users who use this command, but overall, I think this is more painful to keep functional and maintain than the value it's providing to our users.

An alternative would be to drop all of the tests for search, and deprecate search. That way we don't break users unnecessarily (after all, users running pip search won't hit the rate limite).

Agree on dropping the tests. The whole implementation is quite divorced to the rest of the code base, and we鈥檙e likely not going to change anything meaningful in it anyway. It鈥檚 great it works for some people, but probably not workwhile to jump through hoops for.

Was this page helpful?
0 / 5 - 0 ratings