Elasticsearch: VersionConflictEngineException with script update in cluster

Created on 16 Sep 2015  路  2Comments  路  Source: elastic/elasticsearch

We麓re having problems with VersionConflictEngineExceptions all the time. The update should happen as a script and increment a number value (see sample document below)

We麓re running a cluster of two els instances and I can only imagine that the synchronization is causing the conflict version in one node. I would expect the update not to throw this kind of exception in a cluster, as each update is atomically. We are running four application servers that execute this code and the Exceptions are thrown randomly on all instances.

stacktrace:

Caused by: org.elasticsearch.index.engine.VersionConflictEngineException: [kpi][4] [opportunity][1442415600000]: version conflict, current [5933], provided [5932]
        at org.elasticsearch.index.engine.internal.InternalEngine.innerIndex(InternalEngine.java:582) [elasticsearch-1.4.4.jar:]
        at org.elasticsearch.index.engine.internal.InternalEngine.index(InternalEngine.java:522) [elasticsearch-1.4.4.jar:]
        at org.elasticsearch.index.shard.service.InternalIndexShard.index(InternalIndexShard.java:425) [elasticsearch-1.4.4.jar:]
        at org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:193) [elasticsearch-1.4.4.jar:]
        at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:512) [elasticsearch-1.4.4.jar:]
        at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.doStart(TransportShardReplicationOperationAction.java:426) [elasticsearch-1.4.4.jar:]
        at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.start(TransportShardReplicationOperationAction.java:342) [elasticsearch-1.4.4.jar:]
        at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction.doExecute(TransportShardReplicationOperationAction.java:97) [elasticsearch-1.4.4.jar:]
        at org.elasticsearch.action.index.TransportIndexAction.innerExecute(TransportIndexAction.java:134) [elasticsearch-1.4.4.jar:]
        at org.elasticsearch.action.index.TransportIndexAction.doExecute(TransportIndexAction.java:112) [elasticsearch-1.4.4.jar:]
        at org.elasticsearch.action.index.TransportIndexAction.doExecute(TransportIndexAction.java:60) [elasticsearch-1.4.4.jar:]
        at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:75) [elasticsearch-1.4.4.jar:]
        at org.elasticsearch.action.update.TransportUpdateAction.shardOperation(TransportUpdateAction.java:217) [elasticsearch-1.4.4.jar:]
        at org.elasticsearch.action.update.TransportUpdateAction.shardOperation(TransportUpdateAction.java:170) [elasticsearch-1.4.4.jar:]
        at org.elasticsearch.action.support.single.instance.TransportInstanceSingleOperationAction$AsyncSingleAction$1.run(TransportInstanceSingleOperationAction.java:187) [elasticsearch-1.4.4.jar:]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_20]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_20]
        at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_20]

sample document:

{
  "_index": "kpi",
  "_type": "opportunity",
  "_id": "1442412000000",
  "_version": 14742,
  "found": true,
  "_source": {
    "timestamp": "2015-09-16T14:00:00.249+0000",
    "own": 224,
    "shared": 2,
    "network": 3941,
    "unknown": 10575
  }
}

each update script looks like this (one of the lines, only one increment per script):

ctx._source.own+=1;
ctx._source.shared+=1;
ctx._source.network+=1;
ctx._source.unknown+=1;
feedback_needed

Most helpful comment

Can you confirm that you are not setting the retry_on_conflict parameter? This parameter is zero by default and is designed exactly for your use case of updates where the ordering of updates (say incrementing a counter) isn't important.

If you do confirm this, this behavior is expected when you have multiple writers attempting to update the same document. You can address this issue by using the retry_on_conflict parameter to retry when a version conflict occurs. You can read more about this issue in the documentation on partial updates including the specific section on conflicts.

All 2 comments

Can you confirm that you are not setting the retry_on_conflict parameter? This parameter is zero by default and is designed exactly for your use case of updates where the ordering of updates (say incrementing a counter) isn't important.

If you do confirm this, this behavior is expected when you have multiple writers attempting to update the same document. You can address this issue by using the retry_on_conflict parameter to retry when a version conflict occurs. You can read more about this issue in the documentation on partial updates including the specific section on conflicts.

I can confirm I麓m not setting the retry_on_conflict parameter. But it sounds exactly like the parameter I want to use. I deployed with the parameter and the exceptions seem to be gone.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ttaranov picture ttaranov  路  3Comments

rbayliss picture rbayliss  路  3Comments

brwe picture brwe  路  3Comments

matthughes picture matthughes  路  3Comments

clintongormley picture clintongormley  路  3Comments