Elasticsearch: Support for Surround Query Parser

Created on 25 May 2015  ·  11Comments  ·  Source: elastic/elasticsearch

Currently Elasticsearch supports Lucene's classic and simple query parsers via its query_string and simple_query_string queries.

I'd like to make use of Lucene's surround query parser and I'm willing to implement it.

Before I start anything:

  1. Is that something other people would like to have? I haven't found any relevant issue in the tracker.
  2. Is there any up-to-date documentation on how to properly implement a new query parser? My plan was to try to mimic what has been done in the two previous ones.
:SearcSearch >feature help wanted

All 11 comments

This parser seems to have some advantages over the existing parserswhen dealing with proximity (edit distance is not intuitive to all users). Are there plans to implement this?

I also need that in my company's product.

I'd be OK with this being added as a plugin. Same goes for the SpanQueryParser in https://issues.apache.org/jira/browse/LUCENE-5205

I had written a test plugin for this, just implement the basic function, just to pass my test, I am a newcomer to ES and lucene. test parser

@clintongormley we're currently working to implement LUCENE-5205 in 2.x, 1.x as well as 5.x after that.

1) we're mimicking org/apache/lucene/queryparser/classic/MapperQueryParser but maybe the logic could be reimplemented to work with both classic and SpanQueryParser ?
2) should we aim for this be added as a core plugin ? or should we maintain it as a third-party ?

@nicbaz Why not submit it as a WIP PR as a core plugin, and we can make a decision once we see the code?

I would love to get support for the SpanQueryParser built into the ES query API. Since Lucene has several query parsers which are not yet supported by the ES query API, why not prove a generalized ES query string API that accepts as an option, which parser to use?

Other query parsers should also be supported by highlighting.

This would be a huge value-add in the legal space where users regularly deal in _surprisingly_ complex full text searches, especially involving recursive span queries (i.e. nested proximity queries) (e.g. [[john doe]~3 ["car or auto*" "accident or wreck"]~3]~5 -representing a search for mention of john doe and a car accident within a close proximity)

@gcampbell-epiq I agree this would be nice, we ended up implementing something that does exactly that (see example here). However, if this will largely be implemented for the legal space, it would make sense to use the query language that's largely been standardized across legal search engines rather than the Lucene style of brackets and tildas.

I am curious to know the status on support for SpanQueryParser in ES.
My company, Retriever, is doing media monitoring on behalf of our customers in the Nordic countries by means of a large number of predefined queries.
These can be very complex as they are set up by our professional staff.
We are currently using Fast ESP Alerting and plan to migrate to ES Percolator.
Many of the queries contains extensive use of NEAR and ONEAR (Ordered NEAR), such as
“term* ONEAR/20 ‘some phrase’”, “(term1 or term2) NEAR (term*3)”.
Will the plugin support Percolator queries?
Is there a possibility to test the SpanQueryParser plugin somehow for this purpose?

I am also very interested in this. My product is looking to migrate to Elasticsearch. We have implemented START, NEAR, ONEAR, and MULTIPLE with phrase support in Solr. We would also like to have this work with Percolator. For us, this is a must have and is currently a negative when dealing with the decision to migrate or not.

/cc @elastic/es-search-aggs

Was this page helpful?
0 / 5 - 0 ratings