Elasticsearch: collapse_sort to choose the top field-collapsing document independently from the main query score without inner_hits

Created on 16 Aug 2019  路  6Comments  路  Source: elastic/elasticsearch

This is a feature request.

At the moment, field collapsing will use the same scoring as the main query to choose the document that will appear in the main search results hits.

it would be good to have a score for the main query but another score to choose which one is going to be chosen within the field-collapsed set.

We can use inner_hit to sort the inner hits but this does not change the top document that has been chosen in the main hits. We can then manually choose the inner_hit..hits.hits[0] instead of the top hits.

I think the collapse feature would benefit from having a collapse_sort that will define the hit that will be chosen at the top level.

Thanks

:SearcRanking Search help wanted high hanging fruit

Most helpful comment

Hi,

We would find this addition very useful as well. Our use case is that we want to sort on most recent document of each collapsed bucket. Currently we reach a dead end as follows:

  1. Use collapse to bucket documents by their name
  2. Use inner_hits.sort with size: 1 to order the bucket by the most recent first and grabbing the first entry
  3. Sort the results with an outer level sort.

Currently, Step 3 will not sort the results by the first item in the inner_hits, but by the top document chosen by ES for each collapse bucket, which leads to incorrect results if the document chosen by ES is different than the first document in the ordered inner hits.

@jimczi & @mayya-sharipova - Do you have any updates on the timeline for addressing this issue? If not, are you familiar with what areas of the code need to be changed and the level of effort to tackle this feature?

Thanks!

All 6 comments

Pinging @elastic/es-search

@casertap Thank you for filling the issue. From the issue description it is not clear to me what you are requesting. May be if you provide an example this would help.

Some other clarifications:

  1. Are you interested in sorting collapsed docs by textual score from a query or sorting by some field? I see you mentioned both of these in your description.
  2. > We can use inner_hit to sort the inner hits but this does not change the top document that has been chosen in the main hits. ... I think the collapse feature would benefit from having a collapse_sort that will define the hit that will be chosen at the top level.

You can already do this. If you need to sort main collapsed hits just use sort parameter on the top level, like in this example

Hi @mayya-sharipova
Thanks for your answer and for your link. I do not think this is what I am talking about.

I created a gist to try to explain the feature I would like to see: https://gist.github.com/casertap/9a2e6b9b1eee02b2b2111e6b446fb1ed

Please do not hesitate to ask more questions if this is not clear.
Thank you

Basically, this would be bringing the top inner_hit to the search main hits during collapsing.

I agree that it would be nice to be able to sort the group differently than the main sort but it's not a low hanging fruit. Today collapsing works in a single pass over the data because the sort within the group is the same than the main sort. If we allow a different sort we'll need two passes over the entire documents that match the query, the first pass to retrieve the top doc per group and the second pass to rank the remaining top docs. I'll mark this issue as adoptme because I don't have time to work on this now but I'd be happy to help if someone wants to tackle this.

Hi,

We would find this addition very useful as well. Our use case is that we want to sort on most recent document of each collapsed bucket. Currently we reach a dead end as follows:

  1. Use collapse to bucket documents by their name
  2. Use inner_hits.sort with size: 1 to order the bucket by the most recent first and grabbing the first entry
  3. Sort the results with an outer level sort.

Currently, Step 3 will not sort the results by the first item in the inner_hits, but by the top document chosen by ES for each collapse bucket, which leads to incorrect results if the document chosen by ES is different than the first document in the ordered inner hits.

@jimczi & @mayya-sharipova - Do you have any updates on the timeline for addressing this issue? If not, are you familiar with what areas of the code need to be changed and the level of effort to tackle this feature?

Thanks!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

JagathJayasinghe picture JagathJayasinghe  路  105Comments

eryabitskiy picture eryabitskiy  路  94Comments

geekpete picture geekpete  路  59Comments

rjernst picture rjernst  路  43Comments

clintongormley picture clintongormley  路  55Comments