Build scan:
https://gradle-enterprise.elastic.co/s/wchepwmxp4rn6/tests/:x-pack:plugin:ccr:internalClusterTest/org.elasticsearch.xpack.ccr.AutoFollowIT/testCleanFollowedLeaderIndexUUIDs?expanded-stacktrace=WyIwIl0#1
Repro line:
./gradlew ':x-pack:plugin:ccr:internalClusterTest' --tests "org.elasticsearch.xpack.ccr.AutoFollowIT.testCleanFollowedLeaderIndexUUIDs" -Dtests.seed=C1F2474A919348E8 -Dtests.security.manager=true -Dtests.locale=et-EE -Dtests.timezone=America/Indiana/Marengo
Reproduces locally?:
No
Applicable branches:
7.x, master
Failure history:
It failed twice according to build scans
https://gradle-enterprise.elastic.co/scans/tests?search.relativeStartTime=P7D&search.timeZoneId=Europe/Madrid&tests.container=org.elasticsearch.xpack.ccr.AutoFollowIT&tests.sortField=FAILED&tests.test=testCleanFollowedLeaderIndexUUIDs&tests.unstableOnly=true
But only once according to build-stats
https://build-stats.elastic.co/app/kibana#/discover?_g=(refreshInterval:(pause:!t,value:0),time:(from:now-30d,mode:quick,to:now))&_a=(columns:!(_source),index:b646ed00-7efc-11e8-bf69-63c8ef516157,interval:auto,query:(language:lucene,query:testCleanFollowedLeaderIndexUUIDs),sort:!(process.time-start,desc))
Failure excerpt:
org.elasticsearch.discovery.MasterNotDiscoveredException: (No message provided)
Pinging @elastic/es-distributed (Team:Distributed)
Here is the stack trace with an interesting bit "Expected current thread ...to not be a transport thread. Reason: [Blocking operation]"
org.elasticsearch.xpack.ccr.AutoFollowIT > testCleanFollowedLeaderIndexUUIDs FAILED
MasterNotDiscoveredException[null]
at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$2.onTimeout(TransportMasterNodeAction.java:230)
at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:335)
at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:252)
at org.elasticsearch.cluster.service.ClusterApplierService$NotifyTimeout.run(ClusterApplierService.java:601)
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:684)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=878, name=elasticsearch[follower2][transport_worker][T#1], state=RUNNABLE, group=TGRP-AutoFollowIT]
Caused by:
java.lang.AssertionError: Expected current thread [Thread[elasticsearch[follower2][transport_worker][T#1],5,TGRP-AutoFollowIT]] to not be a transport thread. Reason: [Blocking operation]
at __randomizedtesting.SeedInfo.seed([C1F2474A919348E8]:0)
at org.elasticsearch.transport.Transports.assertNotTransportThread(Transports.java:60)
at org.elasticsearch.common.util.concurrent.BaseFuture.blockingAllowed(BaseFuture.java:92)
at org.elasticsearch.common.util.concurrent.BaseFuture.get(BaseFuture.java:64)
at org.elasticsearch.common.util.concurrent.FutureUtils.get(FutureUtils.java:76)
at org.elasticsearch.action.support.AdapterActionFuture.actionGet(AdapterActionFuture.java:61)
at org.elasticsearch.action.support.AdapterActionFuture.actionGet(AdapterActionFuture.java:55)
at org.elasticsearch.xpack.ccr.repository.CcrRepository$RestoreSession.close(CcrRepository.java:628)
at org.elasticsearch.core.internal.io.IOUtils.close(IOUtils.java:74)
at org.elasticsearch.core.internal.io.IOUtils.close(IOUtils.java:116)
at org.elasticsearch.core.internal.io.IOUtils.close(IOUtils.java:99)
at org.elasticsearch.xpack.ccr.repository.CcrRepository.lambda$restoreShard$2(CcrRepository.java:330)
at org.elasticsearch.action.ActionListener$5.onFailure(ActionListener.java:303)
at org.elasticsearch.action.ActionListener$1.onFailure(ActionListener.java:136)
at org.elasticsearch.action.ActionListener$1.onFailure(ActionListener.java:136)
at org.elasticsearch.indices.recovery.MultiChunkTransfer.onCompleted(MultiChunkTransfer.java:145)
at org.elasticsearch.indices.recovery.MultiChunkTransfer.handleItems(MultiChunkTransfer.java:133)
at org.elasticsearch.indices.recovery.MultiChunkTransfer.access$000(MultiChunkTransfer.java:59)
at org.elasticsearch.indices.recovery.MultiChunkTransfer$1.write(MultiChunkTransfer.java:78)
at org.elasticsearch.common.util.concurrent.AsyncIOProcessor.processList(AsyncIOProcessor.java:108)
at org.elasticsearch.common.util.concurrent.AsyncIOProcessor.drainAndProcessAndRelease(AsyncIOProcessor.java:96)
at org.elasticsearch.common.util.concurrent.AsyncIOProcessor.put(AsyncIOProcessor.java:84)
at org.elasticsearch.indices.recovery.MultiChunkTransfer.addItem(MultiChunkTransfer.java:89)
at org.elasticsearch.indices.recovery.MultiChunkTransfer.lambda$handleItems$4(MultiChunkTransfer.java:125)
at org.elasticsearch.action.ActionListener$1.onFailure(ActionListener.java:136)
at org.elasticsearch.action.ActionListener$3.onFailure(ActionListener.java:183)
at org.elasticsearch.action.support.ListenerTimeouts$TimeoutableListener.onFailure(ListenerTimeouts.java:96)
at org.elasticsearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:59)
at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1299)
at org.elasticsearch.transport.InboundHandler.lambda$handleException$3(InboundHandler.java:328)
at org.elasticsearch.common.util.concurrent.EsExecutors$DirectExecutorService.execute(EsExecutors.java:224)
at org.elasticsearch.transport.InboundHandler.handleException(InboundHandler.java:326)
at org.elasticsearch.transport.InboundHandler.handlerResponseError(InboundHandler.java:318)
at org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:137)
at org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:95)
at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:700)
at org.elasticsearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:142)
at org.elasticsearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:117)
at org.elasticsearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:82)
at org.elasticsearch.transport.nio.MockNioTransport$MockTcpReadWriteHandler.consumeReads(MockNioTransport.java:296)
at org.elasticsearch.nio.SocketChannelContext.handleReadBytes(SocketChannelContext.java:228)
at org.elasticsearch.nio.BytesChannelContext.read(BytesChannelContext.java:40)
at org.elasticsearch.nio.EventHandler.handleRead(EventHandler.java:139)
at org.elasticsearch.transport.nio.TestEventHandler.handleRead(TestEventHandler.java:151)
at org.elasticsearch.nio.NioSelector.handleRead(NioSelector.java:420)
at org.elasticsearch.nio.NioSelector.processKey(NioSelector.java:246)
at org.elasticsearch.nio.NioSelector.singleLoop(NioSelector.java:174)
at org.elasticsearch.nio.NioSelector.runLoop(NioSelector.java:131)
The actual failure here is:
WARNING: Uncaught exception in thread: Thread[elasticsearch[follower2][transport_worker][T#1],5,TGRP-AutoFollowIT] | 聽
-- | --
聽 | java.lang.AssertionError: Expected current thread [Thread[elasticsearch[follower2][transport_worker][T#1],5,TGRP-AutoFollowIT]] to not be a transport thread. Reason: [Blocking operation] | 聽
聽 | at __randomizedtesting.SeedInfo.seed([C1F2474A919348E8]:0) | 聽
聽 | at org.elasticsearch.transport.Transports.assertNotTransportThread(Transports.java:60) | 聽
聽 | at org.elasticsearch.common.util.concurrent.BaseFuture.blockingAllowed(BaseFuture.java:92) | 聽
聽 | at org.elasticsearch.common.util.concurrent.BaseFuture.get(BaseFuture.java:64) | 聽
聽 | at org.elasticsearch.common.util.concurrent.FutureUtils.get(FutureUtils.java:76) | 聽
聽 | at org.elasticsearch.action.support.AdapterActionFuture.actionGet(AdapterActionFuture.java:61) | 聽
聽 | at org.elasticsearch.action.support.AdapterActionFuture.actionGet(AdapterActionFuture.java:55) | 聽
聽 | at org.elasticsearch.xpack.ccr.repository.CcrRepository$RestoreSession.close(CcrRepository.java:628) | 聽
聽 | at org.elasticsearch.core.internal.io.IOUtils.close(IOUtils.java:74) | 聽
聽 | at org.elasticsearch.core.internal.io.IOUtils.close(IOUtils.java:116) | 聽
聽 | at org.elasticsearch.core.internal.io.IOUtils.close(IOUtils.java:99) | 聽
聽 | at org.elasticsearch.xpack.ccr.repository.CcrRepository.lambda$restoreShard$2(CcrRepository.java:330) | 聽
聽 | at org.elasticsearch.action.ActionListener$5.onFailure(ActionListener.java:303) | 聽
聽 | at org.elasticsearch.action.ActionListener$1.onFailure(ActionListener.java:136) | 聽
聽 | at org.elasticsearch.action.ActionListener$1.onFailure(ActionListener.java:136) | 聽
聽 | at org.elasticsearch.indices.recovery.MultiChunkTransfer.onCompleted(MultiChunkTransfer.java:145) | 聽
聽 | at org.elasticsearch.indices.recovery.MultiChunkTransfer.handleItems(MultiChunkTransfer.java:133) | 聽
聽 | at org.elasticsearch.indices.recovery.MultiChunkTransfer.access$000(MultiChunkTransfer.java:59) | 聽
聽 | at org.elasticsearch.indices.recovery.MultiChunkTransfer$1.write(MultiChunkTransfer.java:78) | 聽
聽 | at org.elasticsearch.common.util.concurrent.AsyncIOProcessor.processList(AsyncIOProcessor.java:108) | 聽
聽 | at org.elasticsearch.common.util.concurrent.AsyncIOProcessor.drainAndProcessAndRelease(AsyncIOProcessor.java:96) | 聽
聽 | at org.elasticsearch.common.util.concurrent.AsyncIOProcessor.put(AsyncIOProcessor.java:84) | 聽
聽 | at org.elasticsearch.indices.recovery.MultiChunkTransfer.addItem(MultiChunkTransfer.java:89) | 聽
聽 | at org.elasticsearch.indices.recovery.MultiChunkTransfer.lambda$handleItems$4(MultiChunkTransfer.java:125) | 聽
聽 | at org.elasticsearch.action.ActionListener$1.onFailure(ActionListener.java:136) | 聽
聽 | at org.elasticsearch.action.ActionListener$3.onFailure(ActionListener.java:183) | 聽
聽 | at org.elasticsearch.action.support.ListenerTimeouts$TimeoutableListener.onFailure(ListenerTimeouts.java:96) | 聽
聽 | at org.elasticsearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:59) | 聽
聽 | at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1299) | 聽
聽 | at org.elasticsearch.transport.InboundHandler.lambda$handleException$3(InboundHandler.java:328) | 聽
聽 | at org.elasticsearch.common.util.concurrent.EsExecutors$DirectExecutorService.execute(EsExecutors.java:224) | 聽
聽 | at org.elasticsearch.transport.InboundHandler.handleException(InboundHandler.java:326) | 聽
聽 | at org.elasticsearch.transport.InboundHandler.handlerResponseError(InboundHandler.java:318) | 聽
聽 | at org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:137) | 聽
聽 | at org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:95) | 聽
聽 | at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:700) | 聽
聽 | at org.elasticsearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:142) | 聽
聽 | at org.elasticsearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:117) | 聽
聽 | at org.elasticsearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:82) | 聽
聽 | at org.elasticsearch.transport.nio.MockNioTransport$MockTcpReadWriteHandler.consumeReads(MockNioTransport.java:296) | 聽
聽 | at org.elasticsearch.nio.SocketChannelContext.handleReadBytes(SocketChannelContext.java:228) | 聽
聽 | at org.elasticsearch.nio.BytesChannelContext.read(BytesChannelContext.java:40) | 聽
聽 | at org.elasticsearch.nio.EventHandler.handleRead(EventHandler.java:139) | 聽
聽 | at org.elasticsearch.transport.nio.TestEventHandler.handleRead(TestEventHandler.java:151) | 聽
聽 | at org.elasticsearch.nio.NioSelector.handleRead(NioSelector.java:420) | 聽
聽 | at org.elasticsearch.nio.NioSelector.processKey(NioSelector.java:246) | 聽
聽 | at org.elasticsearch.nio.NioSelector.singleLoop(NioSelector.java:174) | 聽
聽 | at org.elasticsearch.nio.NioSelector.runLoop(NioSelector.java:131) | 聽
聽 | at java.lang.Thread.run(Thread.java:748)
Perhaps related to @original-brownbear's latest changes?
haha, everyone so fast here
This is a result of https://github.com/elastic/elasticsearch/pull/65921 which causes a failure callback to execute on the transport instead of the generic thread pool where it ran previously. Will open a fix PR shortly.
Most helpful comment
haha, everyone so fast here