Describe the feature:
Provide a tie_breaker_id by copying the _id field so we can have stable sort/export in batches.
Describe a specific use case for the feature:
For Saved objects and for SIEM rules we ran into issues with sorting and paging where we ended up with duplicates and in some cases missing data. We solved this by loading all records into memory at once and then exporting. For our UI tables we allow the user to view 300 records at a time to minimize the chances of duplicates or misses when doing paging but it is possible when they have above 300.
We have replicated this issue by paging through rules 10 at a time and looking at each rule and seeing we have found duplicates. We have replicated this issue also using CURL and eliminating any chances it was a UI issue.
There are docs such as this one which highlight the problem and a solution (In the important section)
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-body.html#request-body-search-search-after
If the alerting framework provide a copy of _id as a tiebreaker field called
as suggested there for us to sort on, export on, etc... that would be a small mapping change utilizing copy_to:
https://www.elastic.co/guide/en/elasticsearch/reference/current/copy-to.html
Pinging @elastic/kibana-alerting-services (Team:Alerting Services)
@elastic/kibana-platform Is there any plans to have an alternative to sorting by _id now that Elasticsearch doesn't support this? some sort of tie breaker field?
FWIW, I created a tie_breaker_id keyword field for large list support API endpoints I am working on @mikecote, and then during document inserts I use a uuid for it. It's not going to be the fastest and I have an extra field consuming disk space compared to just being able to copy over the _id but that's what I'm doing now in my code.
Is there any plans to have an alternative to sorting by _id now that Elasticsearch doesn't support this? some sort of tie breaker field?
Not that I'm aware of. @rudolf?
I wasn't aware of this duplicating behaviour, but I think we should treat it as a bug in Core. @FrankHassanabad suggestion for using copy_to sounds like a good suggestion.
@FrankHassanabad What is the impact on users, could showing a duplicate mislead a user to take the wrong actions or is this mostly an annoyance? What's the priority for getting this fixed? It doesn't sound like a lot of work, but our roadmap is quite full.
~Theoretically, generating a saved object ndjson export with duplicates could lead to failure on import if overwrite=false which then leads to an id collision. I will have to go through the source code, but I think we use a Set which would remove duplicates and mitigate this problem.~ We don't use search_after directly for saved object queries, so this should only impact custom ES queries against the saved object index that uses search_after.
I read to quickly and didn't see that data could be either _missing_ or duplicated, we can probably sometimes live with duplicates but the missing data wouldn't be acceptable.
copy_to isn't possible on the _id field, so we will have to either use an ingest pipeline or do it inside the saved objects repository. I'm not sure if we can rely on ingest processing always being enabled on ES so doing it client-side might be safer, especially for a minor release.
There's no immediate requirement to implement this in Core/Saved Objects, we could only implement this in task manager for now since Saved Objects only uses from + to for paging. I'm not aware of any saved object types that currently expect to be able to page over more than 10k results, but there are plans to move larger indexes to saved objects in which case we might have to switch to search_after for paging find results.
@rudolf, I generate a UUID client side at the moment for the a tie breaker field before pushing the record into the index within Kibana/NodeJS
. I'm not aware of any saved object types that currently expect to be able to page over more than 10k results
For large lists since they can be above 10k we had to switch from SO to another data index so I could use search after for our SIEM large list support. Also for a few other Elastic Search abilities such as delete by query, etc...
If SO is adding support for above 10k and a few other features and everyone feels comfortable we can add large volumes of data into it I would be 馃憤 for moving back to it. I started with it, but quickly ran into issues with the 10k support and missing things such as delete by query where I had to move to a data index.
"Reviewed by Frank Hassanabad on 7/29/2020, still valid as of this date"
Just wanted to double back on this ticket. We're adding auto refresh to our all rules table and the inconsistencies in sorting without a tiebreaker_id are now very evident. Would be much appreciated if we could bump this up :)
Is there any chance you could use an existing alerting SO field like createdAt as the tie breaker? Of course, that's not guaranteed to be unique, but is maybe unique enough? Or maybe that's already your primary search term. Is there another unique-ish field that is stable for your uses, like name, that you could use as an additional sort to get the uniqueness (ie, is the name something a user can change ad-hoc, or are you creating these programmatically and they don't change over time)? How stable do these sorts need to be over time? Just during a user session, paging through UI's? Seems like something like sorting by [createdAt, name] would probably be good enough in that case.
It sounds like the only other option is for us to add a new field to the alert SO, which would presumably be a UUID, as the tie breaker.
I'm worried about the cardinality explosion here, as the internal index on that new field will literally be as large as the # of alerts, so reusing an existing field(s) would help out there (I think!). We could potentially make this field optional, and probably not available for editing in the UI, so that alertsClient users could specify a value if they want, but the default would be that we don't use the field.
Also wondering if this will bite us in the end, for things like the alerts list UI. No matter how we end up sorting the alerts in that list (eg, status, name, etc), seems like we may also be in the same situation - with an unstable sort, so missing/dup entries while paging - but perhaps that's a bit more survivable for customers than Frank's use case.
@FrankHassanabad , any thoughts on my comment ^^^
Thanks, I did a sampling of data from the prepackaged rules install and the createdAt field could help keep us unique or mostly unique to lessen the problems and impacts:
2020-11-05T19:46:41.552Z
2020-11-05T19:46:41.406Z
2020-11-05T19:46:41.753Z
2020-11-05T19:46:41.716Z
2020-11-05T19:46:42.533Z
It's not guaranteed as you can see some of those are _very_ close to being non-unique and we might end up with some that are maybe not unique but I think at the very least for a workaround we can use them to help keep things more stable or as close to stable as we can.
@FrankHassanabad thanks for checking that!
I think we do still need a way to pass multiple fields to sort by though right? So for our rules example, when the user selects to sort by enabled, we pass that through as the sort_field but can't specify any secondary field.
Yeah, we need to allow comma separated values to sort by within the find_rules_route and then we can send multiples down and sort in the order given to us.
It sounds like the only other option is for us to add a new field to the alert SO, which would presumably be a UUID, as the tie breaker.
I'm worried about the cardinality explosion here, as the internal index on that new field will literally be as large as the # of alerts, so reusing an existing field(s) would help out there (I think!).
Adding another field is the only way to guarantee no duplicates (ES team might add a virtual tie breaker field at some point in the future https://github.com/elastic/elasticsearch/issues/56828).
Although every field adds overhead, I don't think the "cost" of this is prohibitively high, instead of N it will now be 2N.
We might have to add a tie breaker field to all saved objects for https://github.com/elastic/kibana/issues/77961
Ah, adding a tie breaker to SO's themselves seems like the way to do this, as this is a generic SO issue IIRC, not specific to alerts. I'd certainly hate to add one to alerts in vX only to have it added to SO's in v>X, since then we'd have two tie breakers (and more wasted index space).
Most helpful comment
Adding another field is the only way to guarantee no duplicates (ES team might add a virtual tie breaker field at some point in the future https://github.com/elastic/elasticsearch/issues/56828).
Although every field adds overhead, I don't think the "cost" of this is prohibitively high, instead of N it will now be 2N.
We might have to add a tie breaker field to all saved objects for https://github.com/elastic/kibana/issues/77961