The search box does not appear to have a way to search for a specific exact phrase.
Based on how other search engines work, putting a phrase in quotes should require an exact phrase match.
However, currently searches with and without quotes produce the same result:
https://pypi.org/search/?q=%22Image+processing+routines+for+SciPy%22
and
https://pypi.org/search/?q=Image+processing+routines+for+SciPy
I would have expected the first one with quotes to only produce the result containing the exact phrase.
Good First Issue: This issue is good for first time contributors. If there is not a corresponding pull request for this issue, it is up for grabs. For directions for getting set up, see our Getting Started Guide. If you are working on this issue and have questions, please feel free to ask them here, #pypa-dev on Freenode, or the pypa-dev mailing list.
Thanks for your report, @pv, and sorry for the slow response!
The folks working on Warehouse have gotten funding to concentrate on improving and deploying Warehouse, and have kicked off work towards our development roadmap -- the most urgent task is to improve Warehouse to the point where we can redirect pypi.python.org to pypi.org so the site is more sustainable and reliable.
We discussed this issue in our meeting today to prioritize it. Since search in Warehouse is already much better than search on legacy PyPI, but users will probably expect search to work as you suggest, I've moved this issue to a future milestone that we'll work on in the next few months.
Thanks, @pv, and sorry again for the wait.
Note to people thinking about contributing to Warehouse: this would be a great first issue for a new contributor to tackle if the new contributor were already familiar with Elasticsearch.
I'm looking into this issue.
This doesn't look like a simple match to phrase_match change.
@waseem18 Yeah, I'm not _totally_ sure that this issue is actually a "good first issue" -- elasticsearch tuning has generally been pretty tricky in my experience. But since you're not a new contributor anymore, should be a good issue for you. 馃檪
Update:
After changing match to match_phrase at this line and searching Warehouse with string containing spaces (example: cli github), we get Transport Error.
elasticsearch.exceptions.TransportError: TransportError(500, 'search_phase_execution_exception', 'field "normalized_name" was indexed without position data; cannot run PhraseQuery (phrase=normalized_name:"cli github")')
As phrase queries require index_options: positions- I changed
normalized_name = Text(analyzer=NameAnalyzer, index_options="docs") to normalized_name = Text(analyzer=NameAnalyzer, index_options="positions") here and then reindexed.
At this point the search functionality works but I found the results are not efficient.
@waseem18 what do you mean "not efficient"? I would be happy to help fine-tune the queries
@waseem18 Maybe you have done some profiling and you have some specific performance numbers to share?
@HonzaKral By not efficient I mean the search results are way better when index_options=docs.
For example when we search image processing routines, index_options=docs gives good results while index_options=positions doesn't give any result.
This is same with queries containing spaces.
@waseem18 it's going to be helpful if you give concrete numbers or results, I think. Saying "good results" might mean that there are a lot of them, or they're relevant, or they're well ordered... feel free to cut and paste or take screenshots to show what you mean. :)
Sure @brainwane
In this particular case there should be no difference between which documents match with various index_options as long as the query remains the same. Only ordering (_score) of hits would be different where docs doesn't take into account frequency (how many times the word occurs in the document). positions is the default and should be used here
Thanks for the information @HonzaKral I'll get back with some examples and screenshots.
@HonzaKral Thanks for the help.
index_options to positions for Package index and respective match_phrase change. SEARCH_BOOST value of description from 5 to 10 which I believe improved phrase queries.1.


2.


3.


Note that users expect that it's also possible to do queries containing
both exact and inexact matches (probably should be ANDed), along the
lines of:
"image processing" geo
good use case @pv
Will have that in mind
That looks good @waseem18! What I would have expected! Note that for doing bool operators you can just use the Q object from elasticsearch-dsl: should.append(Q("match", ...) & Q("match_phrase", ...) & Q(...))
Thanks for the suggestion @HonzaKral I'll surely follow that.
In today's Warehouse developers' meeting we decided to pare down our near-future milestones on our development roadmap so they really only contain the essential bugfixes and features we need to launch, replace legacy PyPI, and shut down the old site. So I'm moving this issue into a milestone further in the future; sorry for the wait.