We're attempting to perform a series of update_by_query requests in parallel, several of which will attempt to update the same document. As you would expect, we get a Conflict error when this happens.
These partial document updates we're attempting to perform do not conflict with or depend on each other in any way. So the issue is two-fold:
update_by_query is lacking in convenience features supported by update. Namely, there is no way to automatically retry updates on conflict (see #19632). Given the nature of update_by_query and its side-effects, this is quite a bit more painful to implement in user-code than retrying a single update would be.If there are any existing workarounds to emulate the behavior we're describing, I'd love to hear them. As far as I can tell, our only option is to use ?conflicts=proceed on the first pass and then manually re-running any failed updates, this time ignoring any documents we've already successfully updated.
_update_by_query doesn't have a retry feature because it'd be fiddly to implement. I don't think it'd be impossible just time consuming to get right.
Skipping the version check on update by query would amount to always overwriting. I don't think that is what you want. I think better would be to catch the version conflict and refetch the document and recheck the condition and rerun the update. But that sequence is fairly complicated to implement. I don't really have the time to implement it.
Given all that I've been suggesting folks write the query in such a way that it skips documents that have already been updated, use ?conflicts=proceed&refresh=wait_for, and rerun the process until it doesn't report any conflicts. That isn't always possible, I know.
Skipping the version check on update by query would amount to always overwriting. I don't think that is what you want.
@nik9000 that makes sense. I wasn't sure if internally you guys were reindexing the entire document or just the fields that were modified in the update. In #20135 I suggested implementing partial static updates in update_by_query. It would be awesome if we could harness that to be smarter about what constitutes a conflicting update.
Nik' suggestion solved user's request already.
Moreover, given comments from Nik, we prefer to close this issue as a clear indication that we are not going to work on this at this time. We are always open to reconsidering this in the future based on compelling feedback; despite this issue being closed please feel free to leave feedback on the proposal (including +1s).
I believe the retry_on_conflict should be supported by the_update_by_query, no one wants to have a partial update for given request.
@sanjeev-kanabargi can you tell more about your use case? a version conflict during update by query indicates that someone else is modifying the index. Running things in parallel always makes things much more complex and we want to be very careful introducing features here.
_update_by_querydoesn't have a retry feature because it'd be fiddly to implement. I don't think it'd be impossible just time consuming to get right.Skipping the version check on update by query would amount to always overwriting. I don't think that is what you want. I think better would be to catch the version conflict and refetch the document and recheck the condition and rerun the update. But that sequence is fairly complicated to implement. I don't really have the time to implement it.
Given all that I've been suggesting folks write the query in such a way that it skips documents that have already been updated, use
?conflicts=proceed&refresh=wait_for, and rerun the process until it doesn't report any conflicts. That isn't always possible, I know.
@nik9000
Could you please update your answer, because the 'wait_for' option is longer supported in update_by_query API, and specify the versions-range this functionality is supported?
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update-by-query.html
Most helpful comment
@nik9000
Could you please update your answer, because the 'wait_for' option is longer supported in update_by_query API, and specify the versions-range this functionality is supported?
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update-by-query.html