Warehouse: Advanced search

Created on 17 Oct 2015  路  20Comments  路  Source: pypa/warehouse

Sorry if this has been asked before.

I would be really nice if I could make metadata searches like these on PyPI:

  • Give me all packages that depend on X
  • Give me all packages that don't depend on Y
  • Give me all packages that depend on X>1.2.3
  • Give me all packages that don't depend on Y<1.2.3
  • Give me all packages that have classifier A :: B :: C
  • Give me all packages that "work" on Python X.Y (but there are multiple ways to declare compatibility? maybe some compound index)
  • Give me all package that have a release after 2015-01-02

And so on ...

feature request search

All 20 comments

Not really. Use-case: give me all packages that do X but don't depend on Y, cause Y is broken/whatever.

I frequently look for packages that don't have heavy dependencies. Eg: I want a plotting library that don't depend on matplotlib.

To make the banana analogy: you asked for a banana, you got it, but there's a monkey and the whole forest attached to that banana.

Another criteria that would be useful:

  • Give me all packages that are available under license L
  • Give me all packages that are available under licenses other than L

Another idea:

  • Give me all packages that don't have license L and don't have any dependencies with license L (the "can't use GPL" problem many people have)

I think some of these would be easier if there was a boolean search that could be used to find packages that don't match a particular result. So rather than:

  • Give me all packages that are available under license L
  • Give me all packages that are available under licenses other than L

You could just have

  • Give me all packages that are available under license L

And use the boolean "not" operator to exclude results matching that.

A related issue about exclusion in search: #1971.

@waseem18 is writing up a bit of a proposal on how to do this.

@waseem18 it would be great to get to see your work in progress! Feel free to share it in a GitHub gist and link to it here, or put it right into a comment. It's fine if it's rough.

@brainwane I'll put up a comment about what and how and then start on after getting feedback.

Below is a rough UI screen on how Advanced Search might look like.

1

  • I tried placing the Advanced Search button/link below Filter projects section. As this is a rough UI this might not be the best place to have the link and we can discuss it's placement.
  • Section on right contains Advanced Search title and below that some search options and their respective input elements. UI for this section can also be improved.
  • #1677 Can also be put up in Advanced Search section.
  • We can start with implementing #1677 and later on proceeding with the next options.

@brainwane @nlhkabu Will be happy to receive your feedback / suggestions on this.

I have to point out that the information about dependencies of a package are not statically available for source distributions. Thus, this information is incompletely available right now. There's an open issue on this repository regarding the same.

IIRC, Warehouse stores the install_requires (i don't remember the name?) metadata for packages that upload a wheel first.

Thanks for the information @pradyunsg I was unaware of #474 and #2502 and I was looking into the JSON's of packages - Your comment put me on track now.

I've gone through #474 #2502 and found that as of now it's not trivial to implement Advance Searching.

And as mentioned on #474

it looks like out of ~120k packages in the PyPi index, only ~17k have a non null info->requires_dist field

Glad that PEP 566 has been accepted which paves way for having meta data for packages that upload a wheel first.

Thanks for the information @pradyunsg

Glad to be of help. :)

In today's Warehouse core developers' meeting we decided to pare down our near-future milestones on our development roadmap so they really only contain the essential bugfixes and features we need to launch, replace legacy PyPI, and shut down the old site.

So I'm moving this issue into a milestone further in the future; sorry for the wait. And I would love for @waseem18 to make further progress on it, if he would like to!

I would be happy to work on this @brainwane
I'll keep a close look on the issues that this issue depends on so that we can start on once they are resolved.

Similar is the case for #1677

hi @waseem18 thanks for your work on this so far.

A couple of UX ideas:

  1. I think it could be better to have the advanced search appear below the main search bar - something like this:

screenshot from 2018-03-10 11-07-41

  1. it would be awesome if we could develop some kind of advanced search syntax - similar to github:
    https://help.github.com/articles/searching-issues-and-pull-requests/

What do you think?

Thanks for the UX ideas @nlhkabu . The suggested UX looks really great.

I'll implement the UI in the same way as you suggested once work on Advance Search is started.

https://github.com/pypa/warehouse/issues/3452#issuecomment-377096605 has a suggestion from @drunkwcodes:

Maybe introducing https://github.com/nepsilon/search-query-parser and letting users to type search queries like "Framework:Django" in the search bar will help.

Because we are familiar with Google search and Github search.


@HonzaKral I'd appreciate your assessment on what we need to configure or what components/extensions we need to add to our ElasticSearch setup to get more advanced search in Warehouse, if you have time to give your opinion!

There are no additional components needed from the installation part, as long as all the fields you'd want to query exist on the documents. Then it's a matter of extracting those conditions in a structured way (either by parsing text input or by processing a more complex/broken down form), validating them (by providing a whitelist of options) and adding conditions to the search. Something like:

# get search object from current code
search = get_search()

# create Query objects from form data ...
for filter in parse_and_validate_filters(form_data):
    # and apply to search
    search = search.filter(filter)

To create a filter there would be somewhere (I'd assume a Form object of some sort) logic to convert the input to Query:

assert parse_input('version>=1') == Q('range', version={'gte': 1})
assert parse_input('version>=1,<3') == Q('range', version={'gte': 1, 'lt': 3})
assert parse_input('Framework:django') == Q('match', framework='django')

Alternatively you could also use FacetedSearch abstraction which is already part of elasticsearch-dsl (0) and that has the ability to use filters as well as calculate/display facets which is always a nice addition to search.

I would be happy to talk more and provide any help with the elasticsearch part of this

0 - http://elasticsearch-dsl.readthedocs.io/en/latest/faceted_search.html#example

@HonzaKral If you're open to actually making the improvement in Warehouse yourself, that would be great! If not, I totally understand, and will ask @robhudson whether he has time. :)

I would love to work on it, but not sure about the time. I will try to make time at PyCon sprints, will update the issue then. Thanks for the ping

Was this page helpful?
0 / 5 - 0 ratings

Related issues

nlhkabu picture nlhkabu  路  4Comments

toddrme2178 picture toddrme2178  路  3Comments

gcochard picture gcochard  路  3Comments

gautamkrishnar picture gautamkrishnar  路  4Comments

LarsFronius picture LarsFronius  路  4Comments