I am using the collapse feature (huge fan of, by the way) and attempted to use that with search_after when I realized that it wasn't supported. I noticed that on #22337 it was mentioned that this was achievable. Is there any plan to implement using collapse in conjunction with search_after in the future?
Pinging @elastic/es-search (:Search/Search)
This seems quite challenging since we may return groups that were returned on the previous round so I am not sure this is realistic. Suppose you sort on timestamp ascending and groups based on hostname, we'll use the best value of each group for the sort fields even if some documents within the group are beyond that value. This means that the search_after request may return these documents as the head of a group in the subsequent requests.
The only way we could handle this efficiently is if you group and sort by the same field.
Considering this limitation I am leaning toward closing this as won't fix unless you had a specific idea in mind for the implementation ?
Yes, it would only work if you sort on the same field that you collapse on. That is exactly the case that I have which is why I was hoping it would work.
I haven't looked at how it could be implemented, but totally understand if that use case is not extensive enough to get worked on. If I have time, maybe I'll look into what it would take to get it implemented.
I am in the same situation. Are there any feasible workarounds for the specific case where you're collapsing on the same field that you're sorting by?
Are there any feasible workarounds for the specific case where you're collapsing on the same field that you're sorting by?
You can use the composite aggregation as a workaround since it allows to paginate over the results.
I am also open to support this feature in search_after if this solves a real use case. I don't have plans to work on it in the near future though but I'd be happy to help if somebody wants to tackle this in a pr. I'll mark this issue as adoptme in the meantime so don't hesitate to ask questions if this is something that you'd like to tackle.
You can use the
compositeaggregation as a workaround since it allows to paginate over the results.
Thank you! How does the performance of the composite aggregation compare to using collapse to deduplicate and from & size to paginate? I understand that from loads all the records up to that point into memory, and we're seeing the effects of that with large queries, which is why I was interested in search_after.
Hi @jimczi, I want to help. Could you give me some guidelines on what should be done here?
I want to help. Could you give me some guidelines on what should be done here?
Thanks for your interest @tumile !
I guess that the first step would be to change the check in SearchService to allow search_after in conjunction with collapse but only if the collapsing and sorting are done on the same field (see https://github.com/elastic/elasticsearch/issues/53115#issuecomment-594623087). Then you could write some simple tests to ensure that the feature works as expected (we throw an error if the sorting and collapsing field differs, we don't return the same group twice, ...).
@jimczi correct me if I'm wrong, but it looks like CollapsingTopDocsCollector doesn't support paging. I wonder how this can be added?
Most helpful comment
Thanks for your interest @tumile !
I guess that the first step would be to change the check in SearchService to allow
search_afterin conjunction withcollapsebut only if the collapsing and sorting are done on the same field (see https://github.com/elastic/elasticsearch/issues/53115#issuecomment-594623087). Then you could write some simple tests to ensure that the feature works as expected (we throw an error if the sorting and collapsing field differs, we don't return the same group twice, ...).