Elasticsearch: High Level Rest Client throws SocketTimeoutException after being idle for sometime

Created on 9 Jul 2020  ·  11Comments  ·  Source: elastic/elasticsearch

Elasticsearch version (bin/elasticsearch --version):
7.6

JVM version (java -version):
openjdk 11.0.6 2020-01-14 LTS

OS version (uname -a if on a Unix-like system):
Centos 7 Linux localhost.localdomain 3.10.0-957.21.3.el7.x86_64

Description of the problem including expected versus actual behavior:
Using latest SpringBoot 2.3 and Java HLRC,
after client being idle for some time, I get an exception with the first request. The exception returns after the (pre-configured) connection timeout.

See more details in https://jira.spring.io/browse/DATAES-815
There is a code sample I wrote there with a workaround, catching the exception and performing retries.
Per this issue, the problem relies on the lower level RestClient, not in Spring-data-elasticsearch code.

Steps to reproduce:

  1. Create a HLRC in SpringBoot. Set connection and socket timeout to 1000 ms (for example).
  2. Execute one query (doesn't matter which).
  3. Wait for relatively long time, maybe couple of hours I think. I don't know how much time exactly.
  4. Execute another query.

A SocketTimeoutException exception occur and returned after the configured timeout.

Provide logs (if relevant):
Here are my service logs after catching the exception and performing retries:
(I used 1 ms timeout in this case, but it's the same if I use 1000 or 5000)

2020-07-09 01:48:47.290 TRACE 20048 --- [nio-8090-exec-6] o.s.data.elasticsearch.client.WIRE : [6ae6c9ba] Sending request POST /books-11111111-c929-462e-ade4-074c81643191/_search?pre_filter_shard_size=128&typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=dfs_query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true with parameters:
Request body: {"from":0,"size":100,"query":{"match_all":{"boost":1.0}},"version":true,"sort":[{"title.keyword":{"order":"desc"}}]}
2020-07-09 01:48:48.294 WARN 20048 --- [nio-8090-exec-6] .i.a.r.q.c.ElasticSearchRestClientConfig : Elasticsearch client - performing retry 1 after caught DataAccessResourceFailureException: 1 milliseconds timeout on connection http-outgoing-0 [ACTIVE]; nested exception is java.lang.RuntimeException: 1 milliseconds timeout on connection http-outgoing-0 [ACTIVE]
2020-07-09 01:48:48.295 TRACE 20048 --- [nio-8090-exec-6] o.s.data.elasticsearch.client.WIRE : [bddb861] Sending request POST /books-11111111-c929-462e-ade4-074c81643191
/_search?pre_filter_shard_size=128&typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=dfs_query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true with parameters:
Request body: {"from":0,"size":100,"query":{"match_all":{"boost":1.0}},"version":true,"sort":[{"title.keyword":{"order":"desc"}}]}
2020-07-09 01:48:49.757 WARN 20048 --- [nio-8090-exec-6] .i.a.r.q.c.ElasticSearchRestClientConfig : Elasticsearch client - performing retry 2 after caught DataAccessResourceFailureException: 1 milliseconds timeout on connection http-outgoing-1 [ACTIVE]; nested exception is java.lang.RuntimeException: 1 milliseconds timeout on connection http-outgoing-1 [ACTIVE]
2020-07-09 01:48:49.761 TRACE 20048 --- [nio-8090-exec-6] o.s.data.elasticsearch.client.WIRE : [d7952d9] Sending request POST /books-11111111-c929-462e-ade4-074c81643191*/_search?pre_filter_shard_size=128&typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=dfs_query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true with parameters:
Request body: {"from":0,"size":100,"query":{"match_all":{"boost":1.0}},"version":true,"sort":[{"title.keyword":{"order":"desc"}}]}
2020-07-09 01:48:50.146 TRACE 20048 --- [/O dispatcher 3] o.s.data.elasticsearch.client.WIRE : [d7952d9] Received raw response: 200 OK

:CorFeatureJava High Level REST Client >bug CorFeatures

All 11 comments

Pinging @elastic/es-core-features (:Core/Features/Java High Level REST Client)

I have meet this issues too. I found the solution on apache http client tutorial website. The tutorial suggest to create a connection eviction policy. Here are the reasons for this problem(form the website).

When a connection is released back to the manager, it can be kept alive however it is unable to monitor the status of the socket and react to any I/O events. If the connection gets closed on the server side, the client side connection is unable to detect the change in the connection state (and react appropriately by closing the socket on its end).

HttpClient tries to mitigate the problem by testing whether the connection is 'stale', that is no longer valid because it was closed on the server side, prior to using the connection for executing an HTTP request. The stale connection check is not 100% reliable. The only feasible solution that does not involve a one thread per socket model for idle connections is a dedicated monitor thread used to evict connections that are considered expired due to a long period of inactivity. The monitor thread can periodically call NHttpClientConnectionManager#closeExpiredConnections() method to close all expired connections and evict closed connections from the pool. It can also optionally call NHttpClientConnectionManager#closeIdleConnections() method to close all connections that have been idle over a given period of time.

I think we need to create a default NHttpClientConnectionManager and creat the monitor thread to fix this problem when consumer don't creat customizeHttpClient.

I have create the monitor thread in my project to fix this problem.

good job

@sunny-zip , thanks for the detailed explanation.
Are you using springboot?
If so, can you share your code and how you added the monitor when configuring the RestHighLevelClient bean ?

Regardless, I believe such a monitor is to be added in the underlying infra.

Yes ,I am using springboot, there is a org.apache.http.impl.client.IdleConnectionEvictor class already defined
in current httpclient:4.5, But the connectionManager param is not NHttpClientConnectionManager, so I have to make a IdleConnectionEvictor by myself just change HttpClientConnectionManager to NHttpClientConnectionManager.

public final class IdleConnectionEvictor {

    private final PoolingNHttpClientConnectionManager connectionManager;
    private final ThreadFactory threadFactory;
    private final Thread thread;
    private final long sleepTimeMs;
    private final long maxIdleTimeMs;

    private volatile Exception exception;

    public IdleConnectionEvictor(
            final PoolingNHttpClientConnectionManager connectionManager,
            final ThreadFactory threadFactory,
            final long sleepTime, final TimeUnit sleepTimeUnit,
            final long maxIdleTime, final TimeUnit maxIdleTimeUnit) {
        this.connectionManager = Args.notNull(connectionManager, "Connection manager");
        this.threadFactory = threadFactory != null ? threadFactory : new DefaultThreadFactory();
        this.sleepTimeMs = sleepTimeUnit != null ? sleepTimeUnit.toMillis(sleepTime) : sleepTime;
        this.maxIdleTimeMs = maxIdleTimeUnit != null ? maxIdleTimeUnit.toMillis(maxIdleTime) : maxIdleTime;
        this.thread = this.threadFactory.newThread(() -> {
            try {
                while (!Thread.currentThread().isInterrupted()) {
                    Thread.sleep(sleepTimeMs);
                    connectionManager.closeExpiredConnections();
                    if (maxIdleTimeMs > 0) {
                        connectionManager.closeIdleConnections(maxIdleTimeMs, TimeUnit.MILLISECONDS);
                    }
                }
            } catch (final Exception ex) {
                exception = ex;
            }

        });
    }

    public IdleConnectionEvictor(
            final PoolingNHttpClientConnectionManager connectionManager,
            final long sleepTime, final TimeUnit sleepTimeUnit,
            final long maxIdleTime, final TimeUnit maxIdleTimeUnit) {
        this(connectionManager, null, sleepTime, sleepTimeUnit, maxIdleTime, maxIdleTimeUnit);
    }

    public IdleConnectionEvictor(
            final PoolingNHttpClientConnectionManager connectionManager,
            final long maxIdleTime, final TimeUnit maxIdleTimeUnit) {
        this(connectionManager, null,
                maxIdleTime > 0 ? maxIdleTime : 5, maxIdleTimeUnit != null ? maxIdleTimeUnit : TimeUnit.SECONDS,
                maxIdleTime, maxIdleTimeUnit);
    }

    public void start() {
        thread.start();
    }

    public void shutdown() {
        thread.interrupt();
    }

    public boolean isRunning() {
        return thread.isAlive();
    }

    public void awaitTermination(final long time, final TimeUnit tunit) throws InterruptedException {
        thread.join((tunit != null ? tunit : TimeUnit.MILLISECONDS).toMillis(time));
    }

    static class DefaultThreadFactory implements ThreadFactory {

        @Override
        public Thread newThread(final Runnable r) {
            final Thread t = new Thread(r, "Connection evictor");
            t.setDaemon(true);
            return t;
        }

    };
}

Next is added the monitor when configuring the RestHighLevelClient bean:

    RestClientBuilder builder = RestClient.builder(httpHosts);
    builder.setHttpClientConfigCallback((httpClientBuilder) -> {
                    ConnectingIOReactor ioReactor = null;
                    try {
                        ioReactor = new DefaultConnectingIOReactor();
                    } catch (IOReactorException e) {
                        e.printStackTrace();
                    }
                    connectionManager = new PoolingNHttpClientConnectionManager(ioReactor);
                    connectionManager.setMaxTotal(30);
                    connectionManager.setDefaultMaxPerRoute(5);
                    httpClientBuilder.setConnectionManager(connectionManager);

                    return httpClientBuilder;
                }
        );
        return new RestHighLevelClient(builder);

The last is start the evictor:

            new IdleConnectionEvictor(connectionManager,
                    1800, TimeUnit.SECONDS,
                    1800, TimeUnit.SECONDS).start();

@oridool I agree with you, I can make a PR after manager agree.

Hi @sunny-zip ,
I'm initializing my RHLC in a much simpler way, using the springboot-data-elasticsearch methodology, and I have no access to the lower level objects.
Which means that in order to implement something like suggested above, I will have to rewrite almost all my client initialization and bypass springboot way. I'd like to avoid that, and have a fix in the lower level infrastructure.

Here is how I currently initialize my client:

    @Bean(destroyMethod = "close")
    @Override
    public RestHighLevelClient elasticsearchClient() {
        ClientConfiguration clientConfiguration;
        ClientConfiguration.TerminalClientConfigurationBuilder terminalClientConfigurationBuilder;
        if (esSchema.equalsIgnoreCase("https")) {
            terminalClientConfigurationBuilder =
                    ClientConfiguration.builder()
                    .connectedTo(esHost + ":" + esPort)
                    .usingSsl();
        } else if (esSchema.equalsIgnoreCase("http")){
            terminalClientConfigurationBuilder = ClientConfiguration.builder()
                    .connectedTo(esHost + ":" + esPort);
        } else {
            throw new IllegalArgumentException("Illegal property value for spring.elasticsearch.schema, expected 'http' or 'https'");
        }

        if (esUser != null && esPassword !=null && !esUser.isEmpty() && !esPassword.isEmpty()) {
            terminalClientConfigurationBuilder.withBasicAuth(esUser, esPassword);
        }
        clientConfiguration = terminalClientConfigurationBuilder
                .withConnectTimeout(connectTimeout).withSocketTimeout(connectTimeout)
                .build();

        RestHighLevelClient client = RestClients.create(clientConfiguration).rest();
        log.warn("Elasticsearch client created for {}://{}", esSchema, clientConfiguration.getEndpoints().get(0).toString());
        return client;
    }

After some debugging, it seems that HLRC is using the HttpAsyncClientBuilder to build the underlying Apache http client, while the built-in IdleConnectionEvictor only exists in HttpClientBuilder and is missing in HttpAsyncClientBuilder :

https://github.com/apache/httpcomponents-client/blob/f268d192d484c81e933d228ce9e9b630de008d90/httpclient5/src/main/java/org/apache/hc/client5/http/impl/classic/HttpClientBuilder.java#L980

So I wonder if that's a bug or limitation in the underlying HttpAsyncClientBuilder and related CloseableHttpAsyncClient, or is there something missing in the RestHighLevelClient initialization.

This seems to be a known issue in the underlying HttpAsyncClient https://stackoverflow.com/questions/13032070/do-i-need-to-close-expired-connections-in-apache-httpasyncclient

The reason i believe these dropped connections don't show up in netstat is because the underlying NIO channel and socket will still get closed and released.

You can also try to limit the duration of keep alive to 1 hour, it works for me:

RestClientBuilder builder = RestClient.builder(httpHosts);
builder.setHttpClientConfigCallback(httpAsyncClientBuilder -> httpAsyncClientBuilder.setKeepAliveStrategy((response, context) -> 3600000/* 1hour */));

Was this page helpful?
0 / 5 - 0 ratings