Affected versions: 7.7 - 7.9.1 (fixed >= 7.9.2)
Continuous transform minimize the amount of updates by querying only changed buckets, the logic uses a combination of query features for that. For terms it relies on terms query.
If a script is used in group_by instead of a field, this change detection logic causes the transform to fail (after it switches to continuous mode, not directly after start but after checkpoint 1) as it expects a field:
task encountered irrecoverable failure: field name cannot be null
Because scripts offer freedom to build a bucket key, we can't use them to detect changes on them. We could construct a script query, however this would be very expensive.
Mitigation: don't use scripts in group_by together with continuous transform. Scripts in queries are very expensive, so independent of change detection it's highly recommended to _not_ use scripts in production, but only in the development/data exploration phase.
Update: We finally decided to go with option B, keeping the other ideas for documentation.
## A Disallow scripted group_by in continuous mode
This would be easiest, however if you have only 1 scripted group_by but n non scripted group_by's this would limit functionality unnecessary.
This would let continuous mode do a full rerun if all group_by's are using a script. On larger scale this leads to performance problems. If there are other non-scripted group_by's or the amount of data is small this might still be an acceptable solution.
B.2: For this solution we could also consider using _update instead of index.
## C Implement change detection based on scripted query
I am not 100% sure this is possible. This solution would use a script query instead of a terms query. This solution might not be better than solution B: disabling change detection.
Because data might be small A would limit functionality, I therefore tend towards B. It would be possible to disallow certain combinations like only disallow if no group_by implements change detection, but again this sounds like to much of a restriction. Instead we should warn the user about potential problems. Solution C might be good in the long run, but takes significant more time to verify, implement and test, so B seems to be the best short term solution.
The user that reported the issue made a workaround for https://github.com/elastic/elasticsearch/issues/48243, therefore supporting missing_bucket should be prioritized.
Update
If transform can not apply any change detection a warning (job message + log) is raised regarding performance.
Pinging @elastic/ml-core (:ml/Transform)
FWIW: I tested the possibility to backport this change to 7.9, but unfortunately this turned into a lot of merge problems as the PR depends on other changes.
Without leaking any information about release dates, the next patch release 7.9.2 and the next minor 7.10 are not far away and there is not much time in-between them.
I therefore dropped the idea of a backport.
After revisiting the backport again, I manually backported the important bits to 7.9.2.