Elasticsearch: SearchPhaseExecutionException does not keep the original call stacktrace information

Created on 27 Aug 2015 · 10Comments · Source: elastic/elasticsearch

When I send a query which failed at server side, I will got a SearchPhaseExecutionException with description "Failed to execute phase [query], all shards failed; shardFailures {[JuFndtmqRW2...".

Unfortunately the thrown SearchPhaseExecutionException is built from an async action without the call stack trace information of where the wrong/failed query was required. All of what we can get is just as following exception stacktrace:

Caused by: org.elasticsearch.action.search.SearchPhaseExecutionException: Failed to execute phase [query], all shards failed; shardFailures {[JuFndtmqRW2C4kCzvLZ0SQ][xxx][0]: ....
    at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.onFirstPhaseResult(TransportSearchTypeAction.java:237) ~[elasticsearch-1.7.1.jar:na]
    at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$1.onFailure(TransportSearchTypeAction.java:183) ~[elasticsearch-1.7.1.jar:na]
    at org.elasticsearch.search.action.SearchServiceTransportAction$6.handleException(SearchServiceTransportAction.java:249) ~[elasticsearch-1.7.1.jar:na]
    at org.elasticsearch.transport.netty.MessageChannelHandler.handleException(MessageChannelHandler.java:190) ~[elasticsearch-1.7.1.jar:na]
    at org.elasticsearch.transport.netty.MessageChannelHandler.handlerResponseError(MessageChannelHandler.java:180) ~[elasticsearch-1.7.1.jar:na]
    at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:130) ~[elasticsearch-1.7.1.jar:na]
    at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) ~[elasticsearch-1.7.1.jar:na]
    at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) ~[elasticsearch-1.7.1.jar:na]
    at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) ~[elasticsearch-1.7.1.jar:na]
    at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296) ~[elasticsearch-1.7.1.jar:na]
    at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462) ~[elasticsearch-1.7.1.jar:na]
    at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443) ~[elasticsearch-1.7.1.jar:na]
    at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303) ~[elasticsearch-1.7.1.jar:na]
    at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) ~[elasticsearch-1.7.1.jar:na]
    at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) ~[elasticsearch-1.7.1.jar:na]
    at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) ~[elasticsearch-1.7.1.jar:na]
    at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268) ~[elasticsearch-1.7.1.jar:na]
    at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255) ~[elasticsearch-1.7.1.jar:na]
    at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) ~[elasticsearch-1.7.1.jar:na]
    at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) ~[elasticsearch-1.7.1.jar:na]
    at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) ~[elasticsearch-1.7.1.jar:na]
    at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) ~[elasticsearch-1.7.1.jar:na]
    at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) ~[elasticsearch-1.7.1.jar:na]
    at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) ~[elasticsearch-1.7.1.jar:na]
    at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) ~[elasticsearch-1.7.1.jar:na]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_51]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_51]

The above information is completely useless for bug analyzing, I believe the call stack with where the query request is created and invoked is the most important information for developers.

Additionally, our cluster is at version 1.6.2, and I noticed this issue with the same version of client library, then I tried the newest version 1.7.1 at client side, got the same result.

:CorInfrCore discuss

Source

xzer

👍2

Most helpful comment

I am having this problem all the time - and it really causes a lot of headaches to us!

Here is an example of a stack trace (5.3.1) that does not show the origin of the failure.

[REMOVED] IndexNotFoundException[no such index]
    at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver$WildcardExpressionResolver.infe(IndexNameExpressionResolver.java:660)
    at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver$WildcardExpressionResolver.innerResolve(IndexNameExpressionResolver.java:617)
    at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver$WildcardExpressionResolver.resolve(IndexNameExpressionResolver.java:567)
    at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndices(IndexNameExpressionResolver.java:164)
    at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndexNames(IndexNameExpressionResolver.java:139)
    at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndexNames(IndexNameExpressionResolver.java:72)
    at org.elasticsearch.action.admin.indices.mapping.get.TransportGetFieldMappingsAction.doExecute(TransportGetFieldMappingsAction.java:59)
    at org.elasticsearch.action.admin.indices.mapping.get.TransportGetFieldMappingsAction.doExecute(TransportGetFieldMappingsAction.java:42)
    at org.elasticsearch.action.support.TransportAction.doExecute(TransportAction.java:146)
    at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:170)
    at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:142)
    at org.elasticsearch.action.support.HandledTransportAction$TransportHandler.messageReceived(HandledTransportAction.java:64)
    at org.elasticsearch.action.support.HandledTransportAction$TransportHandler.messageReceived(HandledTransportAction.java:54)
    at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69)
    at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1519)
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
    at org.elasticsearch.common.util.concurrent.EsExecutors$1.execute(EsExecutors.java:110)
    at org.elasticsearch.transport.TcpTransport.handleRequest(TcpTransport.java:1476)
    at org.elasticsearch.transport.TcpTransport.messageReceived(TcpTransport.java:1360)
    at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:74)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
    at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:293)
    at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:280)
    at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:396)
    at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:248)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
    at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
    at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
    at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926)
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:134)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:624)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:524)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:478)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:438)
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
    at java.lang.Thread.run(Thread.java:748)

prismec on 5 Feb 2018

👍4

All 10 comments

Hi @xzer

From the above it looks like it is missing the real reason for the failure, which presumably was either reported in response to the request or somewhere else in the logs. 2.0 has made a number of changes to exception reporting and stack trace logging. Could you give 2.0.0-beta1 a go and let us know if it has addressed your needs?

clintongormley on 27 Aug 2015

The exception message contains the response body which described what is wrong at shard side, I replaced the error messages by dots in the paste. The problem is not what is wrong with the query, the point is we do not know where the wrong query is created at the client side source.

I will try the 2.0 later.

xzer on 31 Aug 2015

I am sorry to reply this issue too late, I am now migrating my program to version 2.0, and when I got some server side query errors, I can actually get the real stack information of where the wrong query is created.

So I think this issue can be closed since the new 2.0 is no longer bad at that.

xzer on 19 Nov 2015

thanks for letting us know @xzer

clintongormley on 19 Nov 2015

I am here to report that this issue is not fixed completely, there are still places that lack of calling site stacktrace information.

The following is what I got when there are errors in my query:

Caused by: org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed
at org.elasticsearch.action.search.AbstractSearchAsyncAction.onFirstPhaseResult(AbstractSearchAsyncAction.java:206) ~[elasticsearch-2.3.3.jar:2.3.3]
at org.elasticsearch.action.search.AbstractSearchAsyncAction$1.onFailure(AbstractSearchAsyncAction.java:152) ~[elasticsearch-2.3.3.jar:2.3.3]
at org.elasticsearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:46) ~[elasticsearch-2.3.3.jar:2.3.3]
at org.elasticsearch.transport.TransportService$DirectResponseChannel.processException(TransportService.java:855) ~[elasticsearch-2.3.3.jar:2.3.3]
at org.elasticsearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:833) ~[elasticsearch-2.3.3.jar:2.3.3]
at org.elasticsearch.transport.TransportService$4.onFailure(TransportService.java:387) ~[elasticsearch-2.3.3.jar:2.3.3]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:39) ~[elasticsearch-2.3.3.jar:2.3.3]
... 3 common frames omitted
Caused by: org.elasticsearch.search.query.QueryPhaseExecutionException: Result window is too large, from + size must be less than or equal to: [10000] but was [10118]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level parameter.
at org.elasticsearch.search.internal.DefaultSearchContext.preProcess(DefaultSearchContext.java:212) ~[elasticsearch-2.3.3.jar:2.3.3]
at org.elasticsearch.search.query.QueryPhase.preProcess(QueryPhase.java:103) ~[elasticsearch-2.3.3.jar:2.3.3]
at org.elasticsearch.search.SearchService.createContext(SearchService.java:676) ~[elasticsearch-2.3.3.jar:2.3.3]
at org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:620) ~[elasticsearch-2.3.3.jar:2.3.3]
at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:371) ~[elasticsearch-2.3.3.jar:2.3.3]
at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:368) ~[elasticsearch-2.3.3.jar:2.3.3]
at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:365) ~[elasticsearch-2.3.3.jar:2.3.3]
at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:33) ~[elasticsearch-2.3.3.jar:2.3.3]
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:75) ~[elasticsearch-2.3.3.jar:2.3.3]
at org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:376) ~[elasticsearch-2.3.3.jar:2.3.3]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-2.3.3.jar:2.3.3]
... 3 common frames omitted

Completely lost the calling site information which causes it is difficult to find out where the query is performed.

I digged the source and found the following logic may not be so ideal for this issue:

elasticsearch/core/src/main/java/org/elasticsearch/action/support/AdapterActionFuture.java:79

    static RuntimeException rethrowExecutionException(ExecutionException e) {
        if (e.getCause() instanceof ElasticsearchException) {
            ElasticsearchException esEx = (ElasticsearchException) e.getCause();
            Throwable root = esEx.unwrapCause();
            if (root instanceof ElasticsearchException) {
                return (ElasticsearchException) root;
            } else if (root instanceof RuntimeException) {
                return (RuntimeException) root;
            }
            return new UncategorizedExecutionException("Failed execution", root);
        } else if (e.getCause() instanceof RuntimeException) {
            return (RuntimeException) e.getCause();
        } else {
            return new UncategorizedExecutionException("Failed execution", e);
        }
    }

In the above method, the original Exception was unrapped and then to be thrown, which cuts off the current calling stacktrace information from the exception causing chain, I believe the better way to throw the exception is to simply wrap the exception to a new RuntimeException as following:

    static RuntimeException rethrowExecutionException(ExecutionException e) {
        if (e.getCause() instanceof ElasticsearchException) {
            ElasticsearchException esEx = (ElasticsearchException) e.getCause();
            Throwable root = esEx.unwrapCause();
            if (root instanceof ElasticsearchException) {
                return new RunTimeException(root);
            } else if (root instanceof RuntimeException) {
                return new RunTimeException(root);
            }
            return new UncategorizedExecutionException("Failed execution", root);
        } else if (e.getCause() instanceof RuntimeException) {
            return new RunTimeException(e.getCause());
        } else {
            return new UncategorizedExecutionException("Failed execution", e);
        }
    }

I am not sure whether your guys can got notification from a closed ticket, if there is no reply after days, I will recreate a ticket the report this issue again.

xzer on 1 Jul 2016

👍1

What version is this on @xzer ?

clintongormley on 1 Jul 2016

the newest 2.3.3

xzer on 1 Jul 2016

I am having this problem all the time - and it really causes a lot of headaches to us!

Here is an example of a stack trace (5.3.1) that does not show the origin of the failure.

[REMOVED] IndexNotFoundException[no such index]
    at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver$WildcardExpressionResolver.infe(IndexNameExpressionResolver.java:660)
    at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver$WildcardExpressionResolver.innerResolve(IndexNameExpressionResolver.java:617)
    at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver$WildcardExpressionResolver.resolve(IndexNameExpressionResolver.java:567)
    at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndices(IndexNameExpressionResolver.java:164)
    at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndexNames(IndexNameExpressionResolver.java:139)
    at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndexNames(IndexNameExpressionResolver.java:72)
    at org.elasticsearch.action.admin.indices.mapping.get.TransportGetFieldMappingsAction.doExecute(TransportGetFieldMappingsAction.java:59)
    at org.elasticsearch.action.admin.indices.mapping.get.TransportGetFieldMappingsAction.doExecute(TransportGetFieldMappingsAction.java:42)
    at org.elasticsearch.action.support.TransportAction.doExecute(TransportAction.java:146)
    at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:170)
    at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:142)
    at org.elasticsearch.action.support.HandledTransportAction$TransportHandler.messageReceived(HandledTransportAction.java:64)
    at org.elasticsearch.action.support.HandledTransportAction$TransportHandler.messageReceived(HandledTransportAction.java:54)
    at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69)
    at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1519)
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
    at org.elasticsearch.common.util.concurrent.EsExecutors$1.execute(EsExecutors.java:110)
    at org.elasticsearch.transport.TcpTransport.handleRequest(TcpTransport.java:1476)
    at org.elasticsearch.transport.TcpTransport.messageReceived(TcpTransport.java:1360)
    at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:74)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
    at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:293)
    at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:280)
    at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:396)
    at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:248)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
    at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
    at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
    at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926)
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:134)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:624)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:524)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:478)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:438)
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
    at java.lang.Thread.run(Thread.java:748)

prismec on 5 Feb 2018

👍4

yes, I also noticed that in the recent 5.x lib, this issue still exists...

xzer on 9 Feb 2018

I think what we need here to be actionable is specific requests with specific error messages that are not useful. I am closing this issue in favor of new issues that are targeted to recent versions of Elasticsearch along these lines.