This is a feature request.
At the moment, field collapsing will use the same scoring as the main query to choose the document that will appear in the main search results hits
.
it would be good to have a score for the main query but another score to choose which one is going to be chosen within the field-collapsed set.
We can use inner_hit
to sort
the inner hits but this does not change the top document that has been chosen in the main hits. We can then manually choose the inner_hit..hits.hits[0]
instead of the top hits.
I think the collapse
feature would benefit from having a collapse_sort
that will define the hit that will be chosen at the top level.
Thanks
Pinging @elastic/es-search
@casertap Thank you for filling the issue. From the issue description it is not clear to me what you are requesting. May be if you provide an example this would help.
Some other clarifications:
You can already do this. If you need to sort main collapsed hits just use sort
parameter on the top level, like in this example
Hi @mayya-sharipova
Thanks for your answer and for your link. I do not think this is what I am talking about.
I created a gist to try to explain the feature I would like to see: https://gist.github.com/casertap/9a2e6b9b1eee02b2b2111e6b446fb1ed
Please do not hesitate to ask more questions if this is not clear.
Thank you
Basically, this would be bringing the top inner_hit to the search main hits during collapsing.
I agree that it would be nice to be able to sort the group differently than the main sort but it's not a low hanging fruit. Today collapsing works in a single pass over the data because the sort within the group is the same than the main sort. If we allow a different sort we'll need two passes over the entire documents that match the query, the first pass to retrieve the top doc per group and the second pass to rank the remaining top docs. I'll mark this issue as adoptme because I don't have time to work on this now but I'd be happy to help if someone wants to tackle this.
Hi,
We would find this addition very useful as well. Our use case is that we want to sort on most recent document of each collapsed bucket. Currently we reach a dead end as follows:
collapse
to bucket documents by their nameinner_hits.sort
with size: 1
to order the bucket by the most recent first and grabbing the first entryCurrently, Step 3 will not sort the results by the first item in the inner_hits
, but by the top document chosen by ES for each collapse bucket, which leads to incorrect results if the document chosen by ES is different than the first document in the ordered inner hits.
@jimczi & @mayya-sharipova - Do you have any updates on the timeline for addressing this issue? If not, are you familiar with what areas of the code need to be changed and the level of effort to tackle this feature?
Thanks!
Most helpful comment
Hi,
We would find this addition very useful as well. Our use case is that we want to sort on most recent document of each collapsed bucket. Currently we reach a dead end as follows:
collapse
to bucket documents by their nameinner_hits.sort
withsize: 1
to order the bucket by the most recent first and grabbing the first entryCurrently, Step 3 will not sort the results by the first item in the
inner_hits
, but by the top document chosen by ES for each collapse bucket, which leads to incorrect results if the document chosen by ES is different than the first document in the ordered inner hits.@jimczi & @mayya-sharipova - Do you have any updates on the timeline for addressing this issue? If not, are you familiar with what areas of the code need to be changed and the level of effort to tackle this feature?
Thanks!