Elasticsearch: Return `matched_queries` for named queries in percolator

Created on 19 Mar 2015 · 8Comments · Source: elastic/elasticsearch

I've got a bunch of multi-match percolator queries, which are named queries as well. It would be great to be able to get a response of matched queries including matched_queries when percolating. e.g.:

  {
     "took": 0,
     "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
     },
     "total": 2,
     "matches": [
        {
           "_index": "rethinkdb_ex",
           "_id": "user-1",
           "matched_queries": [
              "queryA"
           ]
        },
        {
           "_index": "rethinkdb_ex",
           "_id": "user-2",
           "matched_queries": [
              "queryA",
              "queryB"
           ]
        }
     ]
  }

Is this something that's planned?

:SearcPercolator Search help wanted

Source

gebrits

Most helpful comment

We're re-opening this, as the low-level lucene Matches API should make this considerably easier to implement. We don't have any plans to work on it currently, but we're happy to help if anybody from the community wants to pick it up.

romseygeek on 18 Mar 2020

👍2

All 8 comments

Hi @gebrits why no storing them as different queries in the percolator index? Can you elaborate on the usecase?

javanna on 19 Mar 2015

hi @javanna, Each multi-match query is bound to a single user for notification purposes (e.g.: sending out a mail). If I were to split out the multi-match in various queries, my notification client code would get a a bit more complicated. I.e.: having to track the results of all queries matching the same user, wrapping up the results, constructing a mail based on the aggregate results, deduping, etc.

It just feels cleaner to have it all handled by 1 multi-match query. What do you think?

gebrits on 19 Mar 2015

I am on the fence to be honest :) I always imagined this usecase with multiple alerts/queries per user and assumed that indeed the different matching queries would need to be aggregated on the client side. It feels cleaner this way to me, but I don't have a super strong opinion. I marked this for discussion, we'll see what other folks think about it. Thanks for describing your usecase!

javanna on 20 Mar 2015

No probs :) To fuel the discussion, I feel the need for clientside de-duping is the 'biggest' hurdle that would be solved with this. It's just messy. (e.g.: taking care of ordening, etc.)

gebrits on 24 Mar 2015

The trade-off between the cost of this feature (both in terms of dev cost and runtime) and the value it provides doesn't look good to me: closing.

jpountz on 28 Sep 2015

Was this ever re-addressed? My use case involves complex boolean queries that, if expanded into all individual queries, would increase the number of percolator queries very dramatically, exponentially actually. It would vastly complicate things in my app to do that. Whereas if elasticsearch could just tell me which parts of the query it matched on, like it does with named queries, that information would be incredibly valuable. If it was optional, I would gladly pay the performance cost.

sdjw on 5 Oct 2018

I'd like for it to be readressed, as well.
ES has improved tremendously over the years, and what could have been a huge performance wall then might not be now.
Our users are saving very complex queries, with lots of boolean queries. We can't possibly ask them to create one query per possible combination.
If it's optional, the performance of the many who wouldn't need this wouldn't be impacted.