The pagination of search requests using search_after require to use a tiebreaker that is unique per document. This is done automatically on sorted _scroll queries by tie-breaking documents on the index/shardId/docID tuple. This tuple is not accessible to normal search requests so the other option is to copy the _id of the document into a doc value field and use it as a tiebreaker.
This solution is difficult to implement for solutions that are not in charge of indexation.
With the introduction of the search context for requests, we'll be able to paginate over a set of sorted results using search_after with the guarantee to see the same documents during the walk. Since the internal document id wouldn't change between requests, using the tuple that _scroll queries use become possible.
This issue proposes to expose a virtual sort field called _tiebreak (or any name that suits better). The field would be accessible as a sort criteria that can be used with a search context to ensure consistent ordering. The field would be composed of:
The order of the composition should be discussed but the main goal is to allow consistent ordering using search_after without relying on manual operations at index-time.
Pinging @elastic/es-search (:Search/Search)
This is a great idea!
Is the idea that a user needs to explicitly provide this sort field in their request: "sort": ["my_date", "_tiebreak"]?
Or that when doing a search sort with search_context, elasticsearch will automatically rewrite sort to add this field as tie break?
I wonder the same as Mayya, maybe we could have a good tie breaker by default that wouldn't require to expose a virtual field? Index UUID and shard ID are the same on all documents of a shard, so Lucene's default tie-breaker (docID) would do the right thing, so maybe we would only have to change how hits are merged on the coordinating node and we could provide consistent ordering with negligible overhead?
I agree that it would be nice to add the tiebreaker automatically but it needs to be materialized in the sort values of the response. This is useful only for search_after queries so we rely on users to provide this value when they paginate.
Most helpful comment
I agree that it would be nice to add the tiebreaker automatically but it needs to be materialized in the
sortvalues of the response. This is useful only forsearch_afterqueries so we rely on users to provide this value when they paginate.