Elasticsearch version: 5.4.3
Plugins installed: [x-pack]
JVM version (java -version
):
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)
OS version (uname -a
if on a Unix-like system):
Linux ip-10-124-1-211 4.9.32-15.41.amzn1.x86_64 #1 SMP Thu Jun 22 06:20:54 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Description of the problem including expected versus actual behavior:
After making roughly 200 calls to the remote reindex api
POST _reindex
{
"dest": {
"index": "some-index",
"version_type": "external"
},
"source": {
"index": "some-index",
"remote": {
"host": "http://some-server.com:9200"
}
}
}
I started seeing the following response:
{"error":{"root_cause":[{"type":"connect_exception","reason":null}],"type":"connect_exception","reason":null},"status":500}
looking in the log, I see:
[2017-07-12T15:22:16,214][WARN ][r.suppressed ] path: /_reindex, params: {}
java.net.ConnectException: null
at org.apache.http.nio.pool.RouteSpecificPool.timeout(RouteSpecificPool.java:168) ~[?:?]
at org.apache.http.nio.pool.AbstractNIOConnPool.requestTimeout(AbstractNIOConnPool.java:561) ~[?:?]
at org.apache.http.nio.pool.AbstractNIOConnPool$InternalSessionRequestCallback.timeout(AbstractNIOConnPool.java:822) ~[?:?]
at org.apache.http.impl.nio.reactor.SessionRequestImpl.timeout(SessionRequestImpl.java:183) ~[?:?]
at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processTimeouts(DefaultConnectingIOReactor.java:210) ~[?:?]
at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvents(DefaultConnectingIOReactor.java:155) ~[?:?]
at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:348) ~[?:?]
at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:192) ~[?:?]
at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64) ~[?:?]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
If I replace the hostname in the remote server URL with its IP, its started working again. It appears that ES is caching the connection/session to the remote host (via apache.http) and it has gotten into an invalid state. It's also worth noting that that the server is behind a load balancer (ELB). IP of ELB could change over time...
Lovely. I'm initially classifying this as a bug in the low level rest client because that is the thing we use to make that connection. That is the bit that'd have to modify how it sets up the http client.
I see this same message if the elasticsearch instance is stopped and restarted. The rest client will reconnect and even write to elastic but these errors continue to be thrown. I am calling the bulk API.
I ran into this same error as well. The issue appears to have arisen when a node left the cluster (and was subsequently replaced) during the reindexing process. Now any subsequent calls to the reindex API return the same error.
I experience the same problem on 6.1.3 while reindexing from AWS managed cluster to self managed.
After each reindex I have to restart service in order to issue another reindex.
I'm seeing the same errors when trying to migrate from 2.x to 5.x. We're not using a hostname but an ip address.
I'm seeing this error on an AWS hosted ES 6.2 instance. But it's not specific to re-indexing. I'm encountering it while running integration tests, and in the process am deleting and re-creating my test indexes.
Before switching to the ES Rest client api, I was just constructing the headers/body by hand and calling ES via HttpURLConnection, but that was just while we were experimenting. It was working fine! But of course, it's not scalable to do things that way, so I switch to the rest api.
It seems this bug is taking a very long time to get addressed? The first report on this was 10 months ago.
I think Its hindering the use of amazon elb's as remote address while reindexing operation. [since ips behind elb keeps changing.]
We hit this issue trying to reindex a logging cluster (ES 5.6.2) sitting behind an ELB to a new ES 6.3.2 cluster and we have had to restart ES on the target client node to get things sorted. Let's hope this gets fixed sometimes soon as tracking down this issue was a doozy.
I fixed this problem by setting networkaddress.cache.ttl=60
(was 0
- never expire) in $JAVA_HOME/jre/lib/security/java.security
. My ES cluster was behind AWS ELB which changes IP often.
Most helpful comment
I think Its hindering the use of amazon elb's as remote address while reindexing operation. [since ips behind elb keeps changing.]