Rolling upgrades of an Elasticsearch 5.6.10 cluster to version 6.3.0 fail with a java.lang.IllegalStateException: commit doesn't contain history uuid
when a synced flush (_flush/synced
) is performed, as described in the rolling upgrade documentation.
Steps to reproduce:
I cannot reproduce the problem without performing the synced flush. I think this problem could have been introduced in #28245.
Reproduction script, takes about a minute to reproduce the issue
#!/bin/bash
set -ex
# Setup
docker rm -f es1 || true
docker rm -f es2 || true
docker network inspect es || docker network create es
rm -rf /tmp/esdata
mkdir -p /tmp/esdata/data1 /tmp/esdata/data2 /tmp/esdata/snapshot
sudo chown -R 1000:1000 /tmp/esdata
sudo sysctl -w vm.max_map_count=262144
# Start two-node Elasticsearch 5.6.10 cluster
docker run -d --name es1 --net es -v /tmp/esdata/data1:/usr/share/elasticsearch/data -v /tmp/esdata/snapshot:/snapshot -e path.repo=/snapshot -e xpack.security.enabled=false -e discovery.zen.ping.unicast.hosts=es2 -p 127.0.0.1:9200:9200 docker.elastic.co/elasticsearch/elasticsearch:5.6.10
docker run -d --name es2 --net es -v /tmp/esdata/data2:/usr/share/elasticsearch/data -v /tmp/esdata/snapshot:/snapshot -e path.repo=/snapshot -e xpack.security.enabled=false -e discovery.zen.ping.unicast.hosts=es1 -p 127.0.0.1:9201:9200 docker.elastic.co/elasticsearch/elasticsearch:5.6.10
while ! http 127.0.0.1:9200/_cluster/health?wait_for_status=green; do sleep 1; done
# Index some sample data
curl https://download.elastic.co/demos/kibana/gettingstarted/shakespeare_6.0.json | curl -H 'Content-Type: application/x-ndjson' -XPOST '127.0.0.1:9200/shakespeare/doc/_bulk?pretty' --data-binary @-
# Perform rolling upgrade tp 6.3.0 according to docs at
# https://www.elastic.co/guide/en/elasticsearch/reference/current/rolling-upgrades.html
# Step 1: disable shard allocation
http PUT 127.0.0.1:9200/_cluster/settings persistent:='{"cluster.routing.allocation.enable": "none"}'
# Step 2: stop non-essential indexing and perform a synced flush
# Without this step, the upgrade goes well!
http POST 127.0.0.1:9200/_flush/synced
# Step 4: shut down a single node
docker stop es2
docker rm es2
# Step 5, 7: upgrade and start that node
docker run -d --name es2 --net es -v /tmp/esdata/data2:/usr/share/elasticsearch/data -v /tmp/esdata/snapshot:/snapshot -e path.repo=/snapshot -e discovery.zen.ping.unicast.hosts=es1 -p 127.0.0.1:9201:9200 docker.elastic.co/elasticsearch/elasticsearch:6.3.0
while ! http 127.0.0.1:9201; do sleep 1; done
# Step 8: reenable shard allocation
http --check-status PUT 127.0.0.1:9200/_cluster/settings persistent:='{"cluster.routing.allocation.enable": null}'
# Watch mayhem ensue
docker logs -f es2
Log including stack traces from the upgraded node
OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.
[2018-06-20T21:38:02,917][INFO ][o.e.n.Node ] [] initializing ...
[2018-06-20T21:38:02,958][INFO ][o.e.e.NodeEnvironment ] [uLAJsY1] using [1] data paths, mounts [[/usr/share/elasticsearch/data (tmpfs)]], net usable_space [15.6gb], net total_space [15.7gb], types [tmpfs]
[2018-06-20T21:38:02,959][INFO ][o.e.e.NodeEnvironment ] [uLAJsY1] heap size [989.8mb], compressed ordinary object pointers [true]
[2018-06-20T21:38:02,972][INFO ][o.e.n.Node ] [uLAJsY1] node name derived from node ID [uLAJsY1xT5yhCUzAvNa8ag]; set [node.name] to override
[2018-06-20T21:38:02,972][INFO ][o.e.n.Node ] [uLAJsY1] version[6.3.0], pid[1], build[default/tar/424e937/2018-06-11T23:38:03.357887Z], OS[Linux/4.17.2-1-ARCH/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/10.0.1/10.0.1+10]
[2018-06-20T21:38:02,972][INFO ][o.e.n.Node ] [uLAJsY1] JVM arguments [-Xms1g, -Xmx1g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Djava.io.tmpdir=/tmp/elasticsearch.jX5EEUqv, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=data, -XX:ErrorFile=logs/hs_err_pid%p.log, -Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m, -Djava.locale.providers=COMPAT, -Des.cgroups.hierarchy.override=/, -Des.path.home=/usr/share/elasticsearch, -Des.path.conf=/usr/share/elasticsearch/config, -Des.distribution.flavor=default, -Des.distribution.type=tar]
[2018-06-20T21:38:04,206][INFO ][o.e.p.PluginsService ] [uLAJsY1] loaded module [aggs-matrix-stats]
[2018-06-20T21:38:04,206][INFO ][o.e.p.PluginsService ] [uLAJsY1] loaded module [analysis-common]
[2018-06-20T21:38:04,207][INFO ][o.e.p.PluginsService ] [uLAJsY1] loaded module [ingest-common]
[2018-06-20T21:38:04,207][INFO ][o.e.p.PluginsService ] [uLAJsY1] loaded module [lang-expression]
[2018-06-20T21:38:04,207][INFO ][o.e.p.PluginsService ] [uLAJsY1] loaded module [lang-mustache]
[2018-06-20T21:38:04,207][INFO ][o.e.p.PluginsService ] [uLAJsY1] loaded module [lang-painless]
[2018-06-20T21:38:04,207][INFO ][o.e.p.PluginsService ] [uLAJsY1] loaded module [mapper-extras]
[2018-06-20T21:38:04,207][INFO ][o.e.p.PluginsService ] [uLAJsY1] loaded module [parent-join]
[2018-06-20T21:38:04,207][INFO ][o.e.p.PluginsService ] [uLAJsY1] loaded module [percolator]
[2018-06-20T21:38:04,207][INFO ][o.e.p.PluginsService ] [uLAJsY1] loaded module [rank-eval]
[2018-06-20T21:38:04,207][INFO ][o.e.p.PluginsService ] [uLAJsY1] loaded module [reindex]
[2018-06-20T21:38:04,207][INFO ][o.e.p.PluginsService ] [uLAJsY1] loaded module [repository-url]
[2018-06-20T21:38:04,207][INFO ][o.e.p.PluginsService ] [uLAJsY1] loaded module [transport-netty4]
[2018-06-20T21:38:04,207][INFO ][o.e.p.PluginsService ] [uLAJsY1] loaded module [tribe]
[2018-06-20T21:38:04,207][INFO ][o.e.p.PluginsService ] [uLAJsY1] loaded module [x-pack-core]
[2018-06-20T21:38:04,207][INFO ][o.e.p.PluginsService ] [uLAJsY1] loaded module [x-pack-deprecation]
[2018-06-20T21:38:04,207][INFO ][o.e.p.PluginsService ] [uLAJsY1] loaded module [x-pack-graph]
[2018-06-20T21:38:04,207][INFO ][o.e.p.PluginsService ] [uLAJsY1] loaded module [x-pack-logstash]
[2018-06-20T21:38:04,208][INFO ][o.e.p.PluginsService ] [uLAJsY1] loaded module [x-pack-ml]
[2018-06-20T21:38:04,208][INFO ][o.e.p.PluginsService ] [uLAJsY1] loaded module [x-pack-monitoring]
[2018-06-20T21:38:04,208][INFO ][o.e.p.PluginsService ] [uLAJsY1] loaded module [x-pack-rollup]
[2018-06-20T21:38:04,208][INFO ][o.e.p.PluginsService ] [uLAJsY1] loaded module [x-pack-security]
[2018-06-20T21:38:04,208][INFO ][o.e.p.PluginsService ] [uLAJsY1] loaded module [x-pack-sql]
[2018-06-20T21:38:04,208][INFO ][o.e.p.PluginsService ] [uLAJsY1] loaded module [x-pack-upgrade]
[2018-06-20T21:38:04,208][INFO ][o.e.p.PluginsService ] [uLAJsY1] loaded module [x-pack-watcher]
[2018-06-20T21:38:04,208][INFO ][o.e.p.PluginsService ] [uLAJsY1] loaded plugin [ingest-geoip]
[2018-06-20T21:38:04,208][INFO ][o.e.p.PluginsService ] [uLAJsY1] loaded plugin [ingest-user-agent]
[2018-06-20T21:38:06,118][INFO ][o.e.x.s.a.s.FileRolesStore] [uLAJsY1] parsed [0] roles from file [/usr/share/elasticsearch/config/roles.yml]
[2018-06-20T21:38:06,428][INFO ][o.e.x.m.j.p.l.CppLogMessageHandler] [controller/172] [Main.cc@109] controller (64 bit): Version 6.3.0 (Build 0f0a34c67965d7) Copyright (c) 2018 Elasticsearch BV
[2018-06-20T21:38:06,632][WARN ][o.e.d.c.m.IndexTemplateMetaData] Deprecated field [template] used, replaced by [index_patterns]
[2018-06-20T21:38:06,634][WARN ][o.e.d.c.m.IndexTemplateMetaData] Deprecated field [template] used, replaced by [index_patterns]
[2018-06-20T21:38:06,640][WARN ][o.e.d.c.m.IndexTemplateMetaData] Deprecated field [template] used, replaced by [index_patterns]
[2018-06-20T21:38:06,641][WARN ][o.e.d.c.m.IndexTemplateMetaData] Deprecated field [template] used, replaced by [index_patterns]
[2018-06-20T21:38:06,643][WARN ][o.e.d.c.m.IndexTemplateMetaData] Deprecated field [template] used, replaced by [index_patterns]
[2018-06-20T21:38:06,644][WARN ][o.e.d.c.m.IndexTemplateMetaData] Deprecated field [template] used, replaced by [index_patterns]
[2018-06-20T21:38:06,644][WARN ][o.e.d.c.m.IndexTemplateMetaData] Deprecated field [template] used, replaced by [index_patterns]
[2018-06-20T21:38:06,644][WARN ][o.e.d.c.m.IndexTemplateMetaData] Deprecated field [template] used, replaced by [index_patterns]
[2018-06-20T21:38:06,645][WARN ][o.e.d.c.m.IndexTemplateMetaData] Deprecated field [template] used, replaced by [index_patterns]
[2018-06-20T21:38:06,646][WARN ][o.e.d.c.m.IndexTemplateMetaData] Deprecated field [template] used, replaced by [index_patterns]
[2018-06-20T21:38:06,647][WARN ][o.e.d.c.m.IndexTemplateMetaData] Deprecated field [template] used, replaced by [index_patterns]
[2018-06-20T21:38:06,648][WARN ][o.e.d.c.m.IndexTemplateMetaData] Deprecated field [template] used, replaced by [index_patterns]
[2018-06-20T21:38:06,650][WARN ][o.e.d.c.m.IndexTemplateMetaData] Deprecated field [template] used, replaced by [index_patterns]
[2018-06-20T21:38:06,865][INFO ][o.e.d.DiscoveryModule ] [uLAJsY1] using discovery type [zen]
[2018-06-20T21:38:07,373][INFO ][o.e.n.Node ] [uLAJsY1] initialized
[2018-06-20T21:38:07,373][INFO ][o.e.n.Node ] [uLAJsY1] starting ...
[2018-06-20T21:38:07,481][INFO ][o.e.t.TransportService ] [uLAJsY1] publish_address {172.19.0.3:9300}, bound_addresses {0.0.0.0:9300}
[2018-06-20T21:38:07,497][INFO ][o.e.b.BootstrapChecks ] [uLAJsY1] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2018-06-20T21:38:10,646][INFO ][o.e.c.s.ClusterApplierService] [uLAJsY1] detected_master {4E_A_7z}{4E_A_7zATUu6ebxzJFhMrg}{JxDu4xcyTWKdshEZqUgKQw}{172.19.0.2}{172.19.0.2:9300}{ml.max_open_jobs=10, ml.enabled=true}, added {{4E_A_7z}{4E_A_7zATUu6ebxzJFhMrg}{JxDu4xcyTWKdshEZqUgKQw}{172.19.0.2}{172.19.0.2:9300}{ml.max_open_jobs=10, ml.enabled=true},}, reason: apply cluster state (from master [master {4E_A_7z}{4E_A_7zATUu6ebxzJFhMrg}{JxDu4xcyTWKdshEZqUgKQw}{172.19.0.2}{172.19.0.2:9300}{ml.max_open_jobs=10, ml.enabled=true} committed version [36]])
[2018-06-20T21:38:10,651][INFO ][o.e.c.s.ClusterSettings ] [uLAJsY1] updating [cluster.routing.allocation.enable] from [all] to [none]
[2018-06-20T21:38:10,827][WARN ][o.e.x.s.a.s.m.NativeRoleMappingStore] [uLAJsY1] Failed to clear cache for realms [[]]
[2018-06-20T21:38:10,837][INFO ][o.e.l.LicenseService ] [uLAJsY1] license [3d2953c0-7b27-4738-861b-091c92a4fd31] mode [trial] - valid
[2018-06-20T21:38:10,865][INFO ][o.e.x.s.t.n.SecurityNetty4HttpServerTransport] [uLAJsY1] publish_address {172.19.0.3:9200}, bound_addresses {0.0.0.0:9200}
[2018-06-20T21:38:10,865][INFO ][o.e.n.Node ] [uLAJsY1] started
[2018-06-20T21:38:10,894][INFO ][o.e.x.m.e.l.LocalExporter] waiting for elected master node [{4E_A_7z}{4E_A_7zATUu6ebxzJFhMrg}{JxDu4xcyTWKdshEZqUgKQw}{172.19.0.2}{172.19.0.2:9300}{ml.max_open_jobs=10, ml.enabled=true}] to setup local exporter [default_local] (does it have x-pack installed?)
[2018-06-20T21:38:10,925][INFO ][o.e.x.m.e.l.LocalExporter] waiting for elected master node [{4E_A_7z}{4E_A_7zATUu6ebxzJFhMrg}{JxDu4xcyTWKdshEZqUgKQw}{172.19.0.2}{172.19.0.2:9300}{ml.max_open_jobs=10, ml.enabled=true}] to setup local exporter [default_local] (does it have x-pack installed?)
[2018-06-20T21:38:10,954][INFO ][o.e.x.m.e.l.LocalExporter] waiting for elected master node [{4E_A_7z}{4E_A_7zATUu6ebxzJFhMrg}{JxDu4xcyTWKdshEZqUgKQw}{172.19.0.2}{172.19.0.2:9300}{ml.max_open_jobs=10, ml.enabled=true}] to setup local exporter [default_local] (does it have x-pack installed?)
[2018-06-20T21:38:11,381][INFO ][o.e.c.s.ClusterSettings ] [uLAJsY1] updating [cluster.routing.allocation.enable] from [none] to [all]
[2018-06-20T21:38:11,392][INFO ][o.e.x.m.e.l.LocalExporter] waiting for elected master node [{4E_A_7z}{4E_A_7zATUu6ebxzJFhMrg}{JxDu4xcyTWKdshEZqUgKQw}{172.19.0.2}{172.19.0.2:9300}{ml.max_open_jobs=10, ml.enabled=true}] to setup local exporter [default_local] (does it have x-pack installed?)
[2018-06-20T21:38:11,529][INFO ][o.e.x.m.e.l.LocalExporter] waiting for elected master node [{4E_A_7z}{4E_A_7zATUu6ebxzJFhMrg}{JxDu4xcyTWKdshEZqUgKQw}{172.19.0.2}{172.19.0.2:9300}{ml.max_open_jobs=10, ml.enabled=true}] to setup local exporter [default_local] (does it have x-pack installed?)
[2018-06-20T21:38:11,592][WARN ][o.e.i.c.IndicesClusterStateService] [uLAJsY1] [[shakespeare][0]] marking and sending shard failed due to [failed recovery]
org.elasticsearch.indices.recovery.RecoveryFailedException: [shakespeare][0]: Recovery failed from {4E_A_7z}{4E_A_7zATUu6ebxzJFhMrg}{JxDu4xcyTWKdshEZqUgKQw}{172.19.0.2}{172.19.0.2:9300}{ml.max_open_jobs=10, ml.enabled=true} into {uLAJsY1}{uLAJsY1xT5yhCUzAvNa8ag}{J4vNZ9OETdeO8pxepzmRHw}{172.19.0.3}{172.19.0.3:9300}{ml.machine_memory=33728278528, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}
at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.doRecovery(PeerRecoveryTargetService.java:282) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.access$900(PeerRecoveryTargetService.java:80) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$RecoveryRunner.doRun(PeerRecoveryTargetService.java:623) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:724) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.3.0.jar:6.3.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.lang.Thread.run(Thread.java:844) [?:?]
Caused by: org.elasticsearch.transport.RemoteTransportException: [4E_A_7z][172.19.0.2:9300][internal:index/shard/recovery/start_recovery]
Caused by: org.elasticsearch.index.engine.RecoveryEngineException: Phase[1] phase1 failed
at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:140) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:132) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$100(PeerRecoverySourceService.java:54) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:141) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:138) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:33) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1556) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:674) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.3.0.jar:6.3.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:?]
at java.lang.Thread.run(Thread.java:748) ~[?:?]
Caused by: org.elasticsearch.indices.recovery.RecoverFilesRecoveryException: Failed to transfer [0] files with total size of [0b]
at org.elasticsearch.indices.recovery.RecoverySourceHandler.phase1(RecoverySourceHandler.java:337) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:138) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:132) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$100(PeerRecoverySourceService.java:54) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:141) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:138) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:33) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1556) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:674) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.3.0.jar:6.3.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:?]
at java.lang.Thread.run(Thread.java:748) ~[?:?]
Caused by: org.elasticsearch.transport.RemoteTransportException: [uLAJsY1][172.19.0.3:9300][internal:index/shard/recovery/prepare_translog]
Caused by: java.lang.IllegalStateException: commit doesn't contain history uuid
at org.elasticsearch.index.engine.InternalEngine.loadHistoryUUID(InternalEngine.java:493) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:193) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:157) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.engine.InternalEngineFactory.newReadWriteEngine(InternalEngineFactory.java:25) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.shard.IndexShard.newEngine(IndexShard.java:2152) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.shard.IndexShard.createNewEngine(IndexShard.java:2134) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.shard.IndexShard.innerOpenEngineAndTranslog(IndexShard.java:1341) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.shard.IndexShard.openEngineAndSkipTranslogRecovery(IndexShard.java:1305) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.indices.recovery.RecoveryTarget.prepareForTranslogOperations(RecoveryTarget.java:366) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$PrepareForTranslogOperationsRequestHandler.messageReceived(PeerRecoveryTargetService.java:403) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$PrepareForTranslogOperationsRequestHandler.messageReceived(PeerRecoveryTargetService.java:397) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:246) ~[?:?]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.messageReceived(SecurityServerTransportInterceptor.java:304) ~[?:?]
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1592) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:724) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.3.0.jar:6.3.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
at java.lang.Thread.run(Thread.java:844) ~[?:?]
[2018-06-20T21:38:11,602][WARN ][o.e.i.c.IndicesClusterStateService] [uLAJsY1] [[.monitoring-es-6-2018.06.20][0]] marking and sending shard failed due to [failed recovery]
org.elasticsearch.indices.recovery.RecoveryFailedException: [.monitoring-es-6-2018.06.20][0]: Recovery failed from {4E_A_7z}{4E_A_7zATUu6ebxzJFhMrg}{JxDu4xcyTWKdshEZqUgKQw}{172.19.0.2}{172.19.0.2:9300}{ml.max_open_jobs=10, ml.enabled=true} into {uLAJsY1}{uLAJsY1xT5yhCUzAvNa8ag}{J4vNZ9OETdeO8pxepzmRHw}{172.19.0.3}{172.19.0.3:9300}{ml.machine_memory=33728278528, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}
at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.doRecovery(PeerRecoveryTargetService.java:282) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.access$900(PeerRecoveryTargetService.java:80) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$RecoveryRunner.doRun(PeerRecoveryTargetService.java:623) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:724) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.3.0.jar:6.3.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.lang.Thread.run(Thread.java:844) [?:?]
Caused by: org.elasticsearch.transport.RemoteTransportException: [4E_A_7z][172.19.0.2:9300][internal:index/shard/recovery/start_recovery]
Caused by: org.elasticsearch.index.engine.RecoveryEngineException: Phase[1] phase1 failed
at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:140) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:132) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$100(PeerRecoverySourceService.java:54) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:141) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:138) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:33) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1556) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:674) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.3.0.jar:6.3.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:?]
at java.lang.Thread.run(Thread.java:748) ~[?:?]
Caused by: org.elasticsearch.indices.recovery.RecoverFilesRecoveryException: Failed to transfer [0] files with total size of [0b]
at org.elasticsearch.indices.recovery.RecoverySourceHandler.phase1(RecoverySourceHandler.java:337) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:138) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:132) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$100(PeerRecoverySourceService.java:54) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:141) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:138) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:33) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1556) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:674) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.3.0.jar:6.3.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:?]
at java.lang.Thread.run(Thread.java:748) ~[?:?]
Caused by: org.elasticsearch.transport.RemoteTransportException: [uLAJsY1][172.19.0.3:9300][internal:index/shard/recovery/prepare_translog]
Caused by: java.lang.IllegalStateException: commit doesn't contain history uuid
at org.elasticsearch.index.engine.InternalEngine.loadHistoryUUID(InternalEngine.java:493) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:193) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:157) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.engine.InternalEngineFactory.newReadWriteEngine(InternalEngineFactory.java:25) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.shard.IndexShard.newEngine(IndexShard.java:2152) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.shard.IndexShard.createNewEngine(IndexShard.java:2134) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.shard.IndexShard.innerOpenEngineAndTranslog(IndexShard.java:1341) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.shard.IndexShard.openEngineAndSkipTranslogRecovery(IndexShard.java:1305) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.indices.recovery.RecoveryTarget.prepareForTranslogOperations(RecoveryTarget.java:366) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$PrepareForTranslogOperationsRequestHandler.messageReceived(PeerRecoveryTargetService.java:403) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$PrepareForTranslogOperationsRequestHandler.messageReceived(PeerRecoveryTargetService.java:397) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:246) ~[?:?]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.messageReceived(SecurityServerTransportInterceptor.java:304) ~[?:?]
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1592) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:724) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.3.0.jar:6.3.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
at java.lang.Thread.run(Thread.java:844) ~[?:?]
[2018-06-20T21:38:11,634][INFO ][o.e.x.m.e.l.LocalExporter] waiting for elected master node [{4E_A_7z}{4E_A_7zATUu6ebxzJFhMrg}{JxDu4xcyTWKdshEZqUgKQw}{172.19.0.2}{172.19.0.2:9300}{ml.max_open_jobs=10, ml.enabled=true}] to setup local exporter [default_local] (does it have x-pack installed?)
[2018-06-20T21:38:11,657][WARN ][o.e.i.c.IndicesClusterStateService] [uLAJsY1] [[shakespeare][3]] marking and sending shard failed due to [failed recovery]
org.elasticsearch.indices.recovery.RecoveryFailedException: [shakespeare][3]: Recovery failed from {4E_A_7z}{4E_A_7zATUu6ebxzJFhMrg}{JxDu4xcyTWKdshEZqUgKQw}{172.19.0.2}{172.19.0.2:9300}{ml.max_open_jobs=10, ml.enabled=true} into {uLAJsY1}{uLAJsY1xT5yhCUzAvNa8ag}{J4vNZ9OETdeO8pxepzmRHw}{172.19.0.3}{172.19.0.3:9300}{ml.machine_memory=33728278528, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}
at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.doRecovery(PeerRecoveryTargetService.java:282) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.access$900(PeerRecoveryTargetService.java:80) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$RecoveryRunner.doRun(PeerRecoveryTargetService.java:623) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:724) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.3.0.jar:6.3.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.lang.Thread.run(Thread.java:844) [?:?]
Caused by: org.elasticsearch.transport.RemoteTransportException: [4E_A_7z][172.19.0.2:9300][internal:index/shard/recovery/start_recovery]
Caused by: org.elasticsearch.index.engine.RecoveryEngineException: Phase[1] phase1 failed
at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:140) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:132) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$100(PeerRecoverySourceService.java:54) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:141) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:138) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:33) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1556) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:674) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.3.0.jar:6.3.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:?]
at java.lang.Thread.run(Thread.java:748) ~[?:?]
Caused by: org.elasticsearch.indices.recovery.RecoverFilesRecoveryException: Failed to transfer [0] files with total size of [0b]
at org.elasticsearch.indices.recovery.RecoverySourceHandler.phase1(RecoverySourceHandler.java:337) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:138) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:132) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$100(PeerRecoverySourceService.java:54) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:141) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:138) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:33) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1556) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:674) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.3.0.jar:6.3.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:?]
at java.lang.Thread.run(Thread.java:748) ~[?:?]
Caused by: org.elasticsearch.transport.RemoteTransportException: [uLAJsY1][172.19.0.3:9300][internal:index/shard/recovery/prepare_translog]
Caused by: java.lang.IllegalStateException: commit doesn't contain history uuid
at org.elasticsearch.index.engine.InternalEngine.loadHistoryUUID(InternalEngine.java:493) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:193) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:157) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.engine.InternalEngineFactory.newReadWriteEngine(InternalEngineFactory.java:25) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.shard.IndexShard.newEngine(IndexShard.java:2152) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.shard.IndexShard.createNewEngine(IndexShard.java:2134) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.shard.IndexShard.innerOpenEngineAndTranslog(IndexShard.java:1341) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.shard.IndexShard.openEngineAndSkipTranslogRecovery(IndexShard.java:1305) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.indices.recovery.RecoveryTarget.prepareForTranslogOperations(RecoveryTarget.java:366) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$PrepareForTranslogOperationsRequestHandler.messageReceived(PeerRecoveryTargetService.java:403) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$PrepareForTranslogOperationsRequestHandler.messageReceived(PeerRecoveryTargetService.java:397) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:246) ~[?:?]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.messageReceived(SecurityServerTransportInterceptor.java:304) ~[?:?]
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1592) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:724) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.3.0.jar:6.3.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
at java.lang.Thread.run(Thread.java:844) ~[?:?]
[2018-06-20T21:38:11,669][WARN ][o.e.i.c.IndicesClusterStateService] [uLAJsY1] [[.watches][0]] marking and sending shard failed due to [failed recovery]
org.elasticsearch.indices.recovery.RecoveryFailedException: [.watches][0]: Recovery failed from {4E_A_7z}{4E_A_7zATUu6ebxzJFhMrg}{JxDu4xcyTWKdshEZqUgKQw}{172.19.0.2}{172.19.0.2:9300}{ml.max_open_jobs=10, ml.enabled=true} into {uLAJsY1}{uLAJsY1xT5yhCUzAvNa8ag}{J4vNZ9OETdeO8pxepzmRHw}{172.19.0.3}{172.19.0.3:9300}{ml.machine_memory=33728278528, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}
at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.doRecovery(PeerRecoveryTargetService.java:282) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.access$900(PeerRecoveryTargetService.java:80) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$RecoveryRunner.doRun(PeerRecoveryTargetService.java:623) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:724) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.3.0.jar:6.3.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.lang.Thread.run(Thread.java:844) [?:?]
Caused by: org.elasticsearch.transport.RemoteTransportException: [4E_A_7z][172.19.0.2:9300][internal:index/shard/recovery/start_recovery]
Caused by: org.elasticsearch.index.engine.RecoveryEngineException: Phase[1] phase1 failed
at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:140) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:132) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$100(PeerRecoverySourceService.java:54) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:141) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:138) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:33) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1556) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:674) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.3.0.jar:6.3.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:?]
at java.lang.Thread.run(Thread.java:748) ~[?:?]
Caused by: org.elasticsearch.indices.recovery.RecoverFilesRecoveryException: Failed to transfer [0] files with total size of [0b]
at org.elasticsearch.indices.recovery.RecoverySourceHandler.phase1(RecoverySourceHandler.java:337) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:138) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:132) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$100(PeerRecoverySourceService.java:54) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:141) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:138) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:33) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1556) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:674) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.3.0.jar:6.3.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:?]
at java.lang.Thread.run(Thread.java:748) ~[?:?]
Caused by: org.elasticsearch.transport.RemoteTransportException: [uLAJsY1][172.19.0.3:9300][internal:index/shard/recovery/prepare_translog]
Caused by: java.lang.IllegalStateException: commit doesn't contain history uuid
at org.elasticsearch.index.engine.InternalEngine.loadHistoryUUID(InternalEngine.java:493) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:193) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:157) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.engine.InternalEngineFactory.newReadWriteEngine(InternalEngineFactory.java:25) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.shard.IndexShard.newEngine(IndexShard.java:2152) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.shard.IndexShard.createNewEngine(IndexShard.java:2134) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.shard.IndexShard.innerOpenEngineAndTranslog(IndexShard.java:1341) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.shard.IndexShard.openEngineAndSkipTranslogRecovery(IndexShard.java:1305) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.indices.recovery.RecoveryTarget.prepareForTranslogOperations(RecoveryTarget.java:366) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$PrepareForTranslogOperationsRequestHandler.messageReceived(PeerRecoveryTargetService.java:403) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$PrepareForTranslogOperationsRequestHandler.messageReceived(PeerRecoveryTargetService.java:397) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:246) ~[?:?]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.messageReceived(SecurityServerTransportInterceptor.java:304) ~[?:?]
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1592) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:724) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.3.0.jar:6.3.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
at java.lang.Thread.run(Thread.java:844) ~[?:?]
[2018-06-20T21:38:11,681][INFO ][o.e.x.m.e.l.LocalExporter] waiting for elected master node [{4E_A_7z}{4E_A_7zATUu6ebxzJFhMrg}{JxDu4xcyTWKdshEZqUgKQw}{172.19.0.2}{172.19.0.2:9300}{ml.max_open_jobs=10, ml.enabled=true}] to setup local exporter [default_local] (does it have x-pack installed?)
--- cut, Elasticsearch never seems to recover from this ---
Pinging @elastic/es-distributed
This bug can happen in the following scenario.
@praseodym Thanks for reporting this bug. We are working on the fix.
This is fixed by #31506. This fix be will included in 6.3.1.
Thank you! Considering that this is a blocker for upgrades, when will 6.3.1 be released?
Thank you! Considering that this is a blocker for upgrades, when will 6.3.1 be released?
That's still unknown at this point. Obviously this is a serious issue. Working on it.
@bleskes @dnhatn is there a workaround to recover from this state? what's the recommended approach once the upgrade has been affected by this bug.
Is removing the replica shard an option after upgrading? Or would that not work and upgrade to 6.3.1 is the only option?
@gmoskovicz a direct rolling upgrade to 6.3 from 5.x just won't work. You can do a rolling upgrade to a 6.x version before 6.3 and then to 6.3. You can also push to 6.3 using a full cluster restart and then reduce the number of replicas and bring it back up (forcing the data to be cleaned). PLEASE try this first - I think it should work but it should be clear by now this is tricky - many moving parts.
A cleaner workaround would be force flush the offending index, then retry the cluster allocation.
POST /offending-index/_flush?force=true
POST /_cluster/reroute?retry_failed
Most helpful comment
A cleaner workaround would be force flush the offending index, then retry the cluster allocation.
POST /offending-index/_flush?force=true
POST /_cluster/reroute?retry_failed