Elasticsearch: Failed to update expiration time for async-search

Created on 3 Oct 2020  路  10Comments  路  Source: elastic/elasticsearch

Elasticsearch version (bin/elasticsearch --version):
7.9.1 (Elastic Cloud)

Plugins installed: []
N/A

JVM version (java -version):

        "version" : "14.0.1",
        "vm_name" : "OpenJDK 64-Bit Server VM",
        "vm_version" : "14.0.1+7",
        "vm_vendor" : "AdoptOpenJDK",
        "bundled_jdk" : true,

OS version (uname -a if on a Unix-like system):

        "name" : "Linux",
        "pretty_name" : "CentOS Linux 7 (Core)",
        "arch" : "amd64",
        "version" : "4.15.0-1027-aws",

Description of the problem including expected versus actual behavior:

In the ES Cloud console logs, I see the following ERROR log one or two times a day:

[instance-0000000043] failed to update expiration time for async-search [FnFMaHdUUmZQVG1tMWVuZy05bUo4aEEfUGY1ZThBaTRTeHlqQWNBcHBVdFlJUTozNDE4NjcxNw==] org.elasticsearch.transport.RemoteTransportException: [instance-0000000054][172.17.0.15:19354][indices:data/write/update[s]] Caused by: org.elasticsearch.index.engine.VersionConflictEngineException: [qLhwTRfPTmm1eng-9mJ8hA]: version conflict, required seqNo [144258], primary term [18]. but no document was found at org.elasticsearch.index.engine.InternalEngine.planIndexingAsPrimary(InternalEngine.java:1052) ~[elasticsearch-7.9.1.jar:7.9.1] 
at org.elasticsearch.index.engine.InternalEngine.indexingStrategyForOperation(InternalEngine.java:1022) ~[elasticsearch-7.9.1.jar:7.9.1] 
at org.elasticsearch.index.engine.InternalEngine.index(InternalEngine.java:911) ~[elasticsearch-7.9.1.jar:7.9.1] 
at org.elasticsearch.index.shard.IndexShard.index(IndexShard.java:819) ~[elasticsearch-7.9.1.jar:7.9.1] 
at org.elasticsearch.index.shard.IndexShard.applyIndexOperation(IndexShard.java:791) ~[elasticsearch-7.9.1.jar:7.9.1] at org.elasticsearch.index.shard.IndexShard.applyIndexOperationOnPrimary(IndexShard.java:748) ~[elasticsearch-7.9.1.jar:7.9.1] 
at org.elasticsearch.action.bulk.TransportShardBulkAction.executeBulkItemRequest(TransportShardBulkAction.java:274) ~[elasticsearch-7.9.1.jar:7.9.1] 
at org.elasticsearch.action.bulk.TransportShardBulkAction$2.doRun(TransportShardBulkAction.java:164) ~[elasticsearch-7.9.1.jar:7.9.1] 
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.9.1.jar:7.9.1] 
at org.elasticsearch.action.bulk.TransportShardBulkAction.performOnPrimary(TransportShardBulkAction.java:209) ~[elasticsearch-7.9.1.jar:7.9.1] 
at org.elasticsearch.action.bulk.TransportShardBulkAction.dispatchedShardOperationOnPrimary(TransportShardBulkAction.java:116) ~[elasticsearch-7.9.1.jar:7.9.1] 
at org.elasticsearch.action.bulk.TransportShardBulkAction.dispatchedShardOperationOnPrimary(TransportShardBulkAction.java:82) ~[elasticsearch-7.9.1.jar:7.9.1] 
at org.elasticsearch.action.support.replication.TransportWriteAction$1.doRun(TransportWriteAction.java:161) ~[elasticsearch-7.9.1.jar:7.9.1] 
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:710) ~[elasticsearch-7.9.1.jar:7.9.1] 
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.9.1.jar:7.9.1] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) ~[?:?] 
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) ~[?:?] 
at java.lang.Thread.run(Thread.java:832) [?:?]

It's hard to know where that would come from (maybe from interactions in Kibana), but I wouldn't expect to see such errors. It seems to be due to a version conflict when updating a document in .async_search

:SearcSearch >bug Search

Most helpful comment

I opened https://github.com/elastic/elasticsearch/pull/63652 after reading the additional stack traces that you provided. The one in the description is about a deleted search so it's not really a bug but the subsequent ones are due to a race condition. The update of the expiration time competes with the update of the response when the search finishes so a version conflict can happen. Retrying the updates in #63652 in should be enough to fix this bug.

All 10 comments

Pinging @elastic/es-search (:Search/Search)

This error means that the search was cancelled or deleted so the document doesn't exist anymore. Are you seeing any error in Kibana when this is logged in ES ? In any case, this logging is misleading since we shouldn't log anything (at least not at the error level) if the document is "just" missing. I'll leave the bug label to fix the logging level.

Thanks for looking into this @jimczi !
I don't really see any error in Kibana.

I have the same problem since 7.9.1 its really annoying to see error every time i click search in kibana. I see error message in kibana frontend and in elasticsearch logs.
Now running 7.9.2 and the error is still happening.

Elastic logs:
failed to update expiration time for async-search [FmN6QVNOSENCUmVPNmV4bnRQYTRUUkEfdWVDUXJwbktSU09TQV80ZWZxUEhVQTo4NjUyNTEwMA==]

I have the same problem since 7.9.1 its really annoying to see error every time i click search in kibana

Can you share the error that you see in Kibana and the stack trace that is logged in ES ? Does it happen on every search ? It could be a bug in Kibana too so we'll need to understand how this error is triggered.

JVM version:

openjdk version "1.8.0_262" OpenJDK Runtime Environment (build 1.8.0_262-b10) OpenJDK 64-Bit Server VM (build 25.262-b10, mixed mode)

The error is happening on any index - filebeat, winlogbeat etc.
Here is the response from clicking search in Kibana:

{"statusCode":404,"error":"Not Found","message":"[resource_not_found_exception] Fi15TVZfQnBZVElDZUtmQjlubmhyUncfdWVDUXJwbktSU09TQV80ZWZxUEhVQTo5MTU2MzI3Nw==","attributes":{"error":{"root_cause":[{"type":"resource_not_found_exception","reason":"Fi15TVZfQnBZVElDZUtmQjlubmhyUncfdWVDUXJwbktSU09TQV80ZWZxUEhVQTo5MTU2MzI3Nw=="}],"type":"resource_not_found_exception","reason":"Fi15TVZfQnBZVElDZUtmQjlubmhyUncfdWVDUXJwbktSU09TQV80ZWZxUEhVQTo5MTU2MzI3Nw=="}}}

Elasticsearch logs:
12:45:08.231 elasticsearch.server [elasticsearch.server][ERROR] failed to update expiration time for async-search [Fnc5QmJBY21nUzVpZk9yUGJNSzJyLVEfWGQzRndnSlFUQ2VJZk14aGM4U0tCQTo4NjE4NjA0NA==] 12:45:16.339 elasticsearch.server [elasticsearch.server][ERROR] failed to update expiration time for async-search [Fk54WFByNzJqUzlTMXZMNFlTMGRZR3cfWGQzRndnSlFUQ2VJZk14aGM4U0tCQTo4NjE4NjkyMQ==] org.elasticsearch.transport.RemoteTransportException: [node1][xxx.xxx.xxx.xxx:9300][indices:data/write/update[s]] Caused by: org.elasticsearch.index.engine.VersionConflictEngineException: [NxXPr72jS9S1vL4YS0dYGw]: version conflict, required seqNo [16838], primary term [19]. current document has seqNo [16839] and primary term [19] at org.elasticsearch.index.engine.InternalEngine.planIndexingAsPrimary(InternalEngine.java:1059) ~[elasticsearch-7.9.2.jar:7.9.2] at org.elasticsearch.index.engine.InternalEngine.indexingStrategyForOperation(InternalEngine.java:1022) ~[elasticsearch-7.9.2.jar:7.9.2] at org.elasticsearch.index.engine.InternalEngine.index(InternalEngine.java:911) ~[elasticsearch-7.9.2.jar:7.9.2] at org.elasticsearch.index.shard.IndexShard.index(IndexShard.java:819) ~[elasticsearch-7.9.2.jar:7.9.2] at org.elasticsearch.index.shard.IndexShard.applyIndexOperation(IndexShard.java:791) ~[elasticsearch-7.9.2.jar:7.9.2] at org.elasticsearch.index.shard.IndexShard.applyIndexOperationOnPrimary(IndexShard.java:748) ~[elasticsearch-7.9.2.jar:7.9.2] at org.elasticsearch.action.bulk.TransportShardBulkAction.executeBulkItemRequest(TransportShardBulkAction.java:274) ~[elasticsearch-7.9.2.jar:7.9.2] at org.elasticsearch.action.bulk.TransportShardBulkAction$2.doRun(TransportShardBulkAction.java:164) ~[elasticsearch-7.9.2.jar:7.9.2] at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.9.2.jar:7.9.2] at org.elasticsearch.action.bulk.TransportShardBulkAction.performOnPrimary(TransportShardBulkAction.java:209) ~[elasticsearch-7.9.2.jar:7.9.2] at org.elasticsearch.action.bulk.TransportShardBulkAction.dispatchedShardOperationOnPrimary(TransportShardBulkAction.java:116) ~[elasticsearch-7.9.2.jar:7.9.2] at org.elasticsearch.action.bulk.TransportShardBulkAction.dispatchedShardOperationOnPrimary(TransportShardBulkAction.java:82) ~[elasticsearch-7.9.2.jar:7.9.2] at org.elasticsearch.action.support.replication.TransportWriteAction$1.doRun(TransportWriteAction.java:161) ~[elasticsearch-7.9.2.jar:7.9.2] at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:737) ~[elasticsearch-7.9.2.jar:7.9.2] at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.9.2.jar:7.9.2] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) ~[?:?] at java.lang.Thread.run(Thread.java:832) [?:?] 12:45:16.339 elasticsearch.server [elasticsearch.server][ERROR] failed to update expiration time for async-search [Fk54WFByNzJqUzlTMXZMNFlTMGRZR3cfWGQzRndnSlFUQ2VJZk14aGM4U0tCQTo4NjE4NjkyMQ==] 12:45:32.922 elasticsearch.server [elasticsearch.server][ERROR] failed to update expiration time for async-search [FnJDSXlUeS1ZUzh1UnF6RVloZVJwd2cfWGQzRndnSlFUQ2VJZk14aGM4U0tCQTo4NjE4NzgxMg==]

Ya we hit this one a lot across instances. Seems to make queries hang while it resolves. (7.9.2 / ECK).

I spent more time with this. It would be nice to control in some way what nodes this index goes on. My issue seems to be directly related to a primary being on a heavily utilized hot node, deleting it and forcing the system to recreate it on another set of nodes (I had to do this a few times), showed immediate and lasting benefit.

Guessing somewhat related to https://github.com/elastic/elasticsearch/issues/37867

JVM version:

OpenJDK Runtime Environment (build 1.8.0_262-b10)
OpenJDK 64-Bit Server VM (build 25.262-b10, mixed mode)

The error is happening on any index - filebeat, winlogbeat etc.
Here is the response from clicking search in Kibana:

{"statusCode":404,"error":"Not Found","message":"[resource_not_found_exception] Fi15TVZfQnBZVElDZUtmQjlubmhyUncfdWVDUXJwbktSU09TQV80ZWZxUEhVQTo5MTU2MzI3Nw==","attributes":{"error":{"root_cause":[{"type":"resource_not_found_exception","reason":"Fi15TVZfQnBZVElDZUtmQjlubmhyUncfdWVDUXJwbktSU09TQV80ZWZxUEhVQTo5MTU2MzI3Nw=="}],"type":"resource_not_found_exception","reason":"Fi15TVZfQnBZVElDZUtmQjlubmhyUncfdWVDUXJwbktSU09TQV80ZWZxUEhVQTo5MTU2MzI3Nw=="}}}

Elasticsearch logs:

elasticsearch.server
[elasticsearch.server][ERROR] failed to update expiration time for async-search [Fnc5QmJBY21nUzVpZk9yUGJNSzJyLVEfWGQzRndnSlFUQ2VJZk14aGM4U0tCQTo4NjE4NjA0NA==]
12:45:16.339
elasticsearch.server
[elasticsearch.server][ERROR] failed to update expiration time for async-search [Fk54WFByNzJqUzlTMXZMNFlTMGRZR3cfWGQzRndnSlFUQ2VJZk14aGM4U0tCQTo4NjE4NjkyMQ==]
org.elasticsearch.transport.RemoteTransportException: [node1][xxx.xxx.xxx.xxx:9300][indices:data/write/update[s]]
Caused by: org.elasticsearch.index.engine.VersionConflictEngineException: [NxXPr72jS9S1vL4YS0dYGw]: version conflict, required seqNo [16838], primary term [19]. current document has seqNo [16839] and primary term [19]
  at org.elasticsearch.index.engine.InternalEngine.planIndexingAsPrimary(InternalEngine.java:1059) ~[elasticsearch-7.9.2.jar:7.9.2]
  at org.elasticsearch.index.engine.InternalEngine.indexingStrategyForOperation(InternalEngine.java:1022) ~[elasticsearch-7.9.2.jar:7.9.2]
  at org.elasticsearch.index.engine.InternalEngine.index(InternalEngine.java:911) ~[elasticsearch-7.9.2.jar:7.9.2]
  at org.elasticsearch.index.shard.IndexShard.index(IndexShard.java:819) ~[elasticsearch-7.9.2.jar:7.9.2]
  at org.elasticsearch.index.shard.IndexShard.applyIndexOperation(IndexShard.java:791) ~[elasticsearch-7.9.2.jar:7.9.2]
  at org.elasticsearch.index.shard.IndexShard.applyIndexOperationOnPrimary(IndexShard.java:748) ~[elasticsearch-7.9.2.jar:7.9.2]
  at org.elasticsearch.action.bulk.TransportShardBulkAction.executeBulkItemRequest(TransportShardBulkAction.java:274) ~[elasticsearch-7.9.2.jar:7.9.2]
  at org.elasticsearch.action.bulk.TransportShardBulkAction$2.doRun(TransportShardBulkAction.java:164) ~[elasticsearch-7.9.2.jar:7.9.2]
  at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.9.2.jar:7.9.2]
  at org.elasticsearch.action.bulk.TransportShardBulkAction.performOnPrimary(TransportShardBulkAction.java:209) ~[elasticsearch-7.9.2.jar:7.9.2]
  at org.elasticsearch.action.bulk.TransportShardBulkAction.dispatchedShardOperationOnPrimary(TransportShardBulkAction.java:116) ~[elasticsearch-7.9.2.jar:7.9.2]
  at org.elasticsearch.action.bulk.TransportShardBulkAction.dispatchedShardOperationOnPrimary(TransportShardBulkAction.java:82) ~[elasticsearch-7.9.2.jar:7.9.2]
  at org.elasticsearch.action.support.replication.TransportWriteAction$1.doRun(TransportWriteAction.java:161) ~[elasticsearch-7.9.2.jar:7.9.2]
  at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:737) ~[elasticsearch-7.9.2.jar:7.9.2]
  at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.9.2.jar:7.9.2]
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) ~[?:?]
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) ~[?:?]
  at java.lang.Thread.run(Thread.java:832) [?:?]
12:45:16.339
elasticsearch.server
[elasticsearch.server][ERROR] failed to update expiration time for async-search [Fk54WFByNzJqUzlTMXZMNFlTMGRZR3cfWGQzRndnSlFUQ2VJZk14aGM4U0tCQTo4NjE4NjkyMQ==]
12:45:32.922
elasticsearch.server
[elasticsearch.server][ERROR] failed to update expiration time for async-search [FnJDSXlUeS1ZUzh1UnF6RVloZVJwd2cfWGQzRndnSlFUQ2VJZk14aGM4U0tCQTo4NjE4NzgxMg==]```

Here is Kibana stack trace:
Error: Not Found at Fetch._callee3$ (https://kibana/33984/bundles/core/core.entry.js:34:109213) at l (https://kibana/33984/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:368:155323) at Generator._invoke (https://kibana/33984/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:368:155076) at Generator.forEach.e.<computed> [as next] (https://kibana/33984/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:368:155680) at fetch_asyncGeneratorStep (https://kibana/33984/bundles/core/core.entry.js:34:102354) at _next (https://kibana/33984/bundles/core/core.entry.js:34:102670)

I opened https://github.com/elastic/elasticsearch/pull/63652 after reading the additional stack traces that you provided. The one in the description is about a deleted search so it's not really a bug but the subsequent ones are due to a race condition. The update of the expiration time competes with the update of the response when the search finishes so a version conflict can happen. Retrying the updates in #63652 in should be enough to fix this bug.

Thank you @jimczi 馃憤
Also thanks to @tomsozolins for sharing stack traces 馃憤

Was this page helpful?
0 / 5 - 0 ratings