Elasticsearch version: 2.2.0
JVM version:1.8.0_45
OS version:Linux gentoo x86_64
Description of the problem including expected versus actual behavior:
setup a single elasticsearch node (not cluster) in the Docker, get the warning logs : exception caught on transport layer [[id: 0x55b18076, /172.17.42.1:57474 => /172.17.0.127:9300]], closing connection
java.lang.IllegalStateException: Message not fully read (request) for requestId [1191646], action [cluster/nodes/info], readerIndex [39] vs expected [57]; resetting
at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:120)
172.17.0.127 is the ip address of the current docker
b9ab5f0193cc elasticsearch # ifconfig
eth0: flags=4163
inet 172.17.0.127 netmask 255.255.0.0 broadcast 0.0.0.0
ether 02:42:ac:11:00:7f txqueuelen 0 (Ethernet)
RX packets 1604 bytes 115647 (112.9 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 1582 bytes 108924 (106.3 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
172.17.42.1 is the ip address of the docker 0
docker0: flags=4163
inet 172.17.42.1 netmask 255.255.0.0 broadcast 0.0.0.0
ether 12:2e:2a:40:37:c6 txqueuelen 0 (Ethernet)
RX packets 72955032 bytes 5384429681 (5.0 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 73722264 bytes 13089516894 (12.1 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
Steps to reproduce:
Provide logs (if relevant):
b9ab5f0193cc elasticsearch # ./bin/elasticsearch -Des.insecure.allow.root=true
[2016-03-14 15:04:30,491][WARN ][bootstrap ] running as ROOT user. this is a bad idea!
[2016-03-14 15:04:30,660][INFO ][node ] [White Rabbit] version[2.2.0], pid[478], build[8ff36d1/2016-01-27T13:32:39Z]
[2016-03-14 15:04:30,660][INFO ][node ] [White Rabbit] initializing ...
[2016-03-14 15:04:31,016][INFO ][plugins ] [White Rabbit] modules [lang-expression, lang-groovy], plugins [head, analysis-smartcn], sites [head]
[2016-03-14 15:04:31,027][INFO ][env ] [White Rabbit] using [1] data paths, mounts [[/opt/elasticsearch-2.2.0/data (/dev/sda2)]], net usable_space [24gb], net total_space [39.2gb], spins? [possibly], types [ext4]
[2016-03-14 15:04:31,027][INFO ][env ] [White Rabbit] heap size [989.8mb], compressed ordinary object pointers [true]
[2016-03-14 15:04:32,027][INFO ][node ] [White Rabbit] initialized
[2016-03-14 15:04:32,027][INFO ][node ] [White Rabbit] starting ...
[2016-03-14 15:04:32,070][INFO ][transport ] [White Rabbit] publish_address {172.17.0.127:9300}, bound_addresses {0.0.0.0:9300}
[2016-03-14 15:04:32,077][INFO ][discovery ] [White Rabbit] elasticsearch/sJFE90EHQPGp9xsFH39CyQ
[2016-03-14 15:04:34,985][WARN ][transport.netty ] [White Rabbit] exception caught on transport layer [[id: 0x55b18076, /172.17.42.1:57474 => /172.17.0.127:9300]], closing connection
java.lang.IllegalStateException: Message not fully read (request) for requestId [1191646], action [cluster/nodes/info], readerIndex [39] vs expected [57]; resetting
at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:120)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:75)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
[2016-03-14 15:04:35,119][INFO ][cluster.service ] [White Rabbit] new_master {White Rabbit}{sJFE90EHQPGp9xsFH39CyQ}{172.17.0.127}{172.17.0.127:9300}, reason: zen-disco-join(elected_as_master, [0] joins received)
[2016-03-14 15:04:35,130][INFO ][http ] [White Rabbit] publish_address {172.17.0.127:9200}, bound_addresses {0.0.0.0:9200}
[2016-03-14 15:04:35,130][INFO ][node ] [White Rabbit] started
[2016-03-14 15:04:35,208][INFO ][gateway ] [White Rabbit] recovered [0] indices into cluster_state
[2016-03-14 15:04:36,118][WARN ][transport.netty ] [White Rabbit] exception caught on transport layer [[id: 0x5b154af7, /172.17.42.1:57486 => /172.17.0.127:9300]], closing connection
java.lang.IllegalStateException: Message not fully read (request) for requestId [1191272], action [cluster/nodes/info], readerIndex [39] vs expected [57]; resetting
at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:120)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:75)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Hi @ukouryou
It looks like you have mixed versions of Elasticsearch running. Perhaps a 1.x client trying to talk to your 2.x cluser?
@clintongormley thanks for your replay, I get the log when the elasticsearch server start up,I didn't talk to server with client
@ukouryou it looks like you DO have a client or node hanging around somewhere. That message indicates that some other node/client tried to connect to this node and sent a malformed request. Likely it is malformed because it has a different major version.
@clintongormley I have scaned the whole intranet, no other node exist.
I found that if I removed network.host: 0.0.0.0 in the elasticsearch.yml,it works ok
1be54772214b elasticsearch # ./bin/elasticsearch -Des.insecure.allow.root=true
[2016-03-16 10:34:51,492][WARN ][bootstrap ] running as ROOT user. this is a bad idea!
[2016-03-16 10:34:51,650][INFO ][node ] [Amphibian] version[2.2.0], pid[407], build[8ff36d1/2016-01-27T13:32:39Z]
[2016-03-16 10:34:51,650][INFO ][node ] [Amphibian] initializing ...
[2016-03-16 10:34:52,001][INFO ][plugins ] [Amphibian] modules [lang-expression, lang-groovy], plugins [head, analysis-smartcn], sites [head]
[2016-03-16 10:34:52,012][INFO ][env ] [Amphibian] using [1] data paths, mounts [[/opt/elasticsearch-2.2.0/data (/dev/sda2)]], net usable_space [23.7gb], net total_space [39.2gb], spins? [possibly], types [ext4]
[2016-03-16 10:34:52,012][INFO ][env ] [Amphibian] heap size [989.8mb], compressed ordinary object pointers [true]
[2016-03-16 10:34:52,980][INFO ][node ] [Amphibian] initialized
[2016-03-16 10:34:52,980][INFO ][node ] [Amphibian] starting ...
[2016-03-16 10:34:53,027][INFO ][transport ] [Amphibian] publish_address {127.0.0.1:9300}, bound_addresses {127.0.0.1:9300}
[2016-03-16 10:34:53,032][INFO ][discovery ] [Amphibian] elasticsearch/ndsNzXynSQCfhbty1ePJag
[2016-03-16 10:34:56,047][INFO ][cluster.service ] [Amphibian] new_master {Amphibian}{ndsNzXynSQCfhbty1ePJag}{127.0.0.1}{127.0.0.1:9300}, reason: zen-disco-join(elected_as_master, [0] joins received)
[2016-03-16 10:34:56,057][INFO ][http ] [Amphibian] publish_address {127.0.0.1:9200}, bound_addresses {127.0.0.1:9200}
[2016-03-16 10:34:56,057][INFO ][node ] [Amphibian] started
[2016-03-16 10:34:56,134][INFO ][gateway ] [Amphibian] recovered [0] indices into cluster_state
but I want to access it outside of the docker, any other suggestions?
@bleskes any idea what might be going on here?
I too strongly suspect there is an old 1.x client sending info requests to the cluster (we used those in 1.x to check that a node is alive).
Can you double check the source ip, 172.17.42.1 ?
Note that we you don't bound to 0.0.0.0 the ES node is not accepting any request from the outside and thus can't be reached by that client.
@bleskes I have a physical server ,run the ifconfig command , the output as follows
ifconfig
docker0: flags=4163mtu 1500
inet 172.17.42.1 netmask 255.255.0.0 broadcast 0.0.0.0
ether 0a:ea:37:dc:d9:29 txqueuelen 0 (Ethernet)
RX packets 75912870 bytes 5869903197 (5.4 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 76830962 bytes 13987928119 (13.0 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0eno1: flags=4163
mtu 1500
inet 192.168.5.201 netmask 255.255.255.0 broadcast 192.168.5.255
ether 34:17:eb:c2:3d:cc txqueuelen 1000 (Ethernet)
RX packets 16581382 bytes 9437015715 (8.7 GiB)
RX errors 0 dropped 3874435 overruns 0 frame 0
TX packets 9692374 bytes 5086146362 (4.7 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
device interrupt 20 memory 0xf7100000-f7120000lo: flags=73
mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
loop txqueuelen 0 (Local Loopback)
RX packets 982 bytes 314975 (307.5 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 982 bytes 314975 (307.5 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
I have installed docker on this physical server, and also a docker container name elasticsearch on this server, I got the issue in the container elasticserach , 172.17.42.1 is ther ip address of the docker0, did I make it clear?
I also found that It works ok with these configurations outside of the docker container
andy@andy:~/work/elasticsearch-2.2.0$ ./bin/elasticsearch -Dnetwork.host=0.0.0.0
[2016-03-17 10:52:12,591][INFO ][node ] [Maestro] version[2.2.0], pid[6771], build[8ff36d1/2016-01-27T13:32:39Z]
[2016-03-17 10:52:12,592][INFO ][node ] [Maestro] initializing ...
[2016-03-17 10:52:12,903][INFO ][plugins ] [Maestro] modules [lang-expression, lang-groovy], plugins [head, analysis-smartcn], sites [head]
[2016-03-17 10:52:12,914][INFO ][env ] [Maestro] using [1] data paths, mounts [[/ (/dev/sda2)]], net usable_space [833.6gb], net total_space [908.5gb], spins? [possibly], types [ext4]
[2016-03-17 10:52:12,915][INFO ][env ] [Maestro] heap size [989.8mb], compressed ordinary object pointers [true]
[2016-03-17 10:52:13,931][INFO ][node ] [Maestro] initialized
[2016-03-17 10:52:13,931][INFO ][node ] [Maestro] starting ...
[2016-03-17 10:52:13,974][INFO ][transport ] [Maestro] publish_address {172.17.0.1:9300}, bound_addresses {[::]:9300}
[2016-03-17 10:52:13,980][INFO ][discovery ] [Maestro] elasticsearch/-3ILE2hWR3WkR3icKpYqVA
[2016-03-17 10:52:17,026][INFO ][cluster.service ] [Maestro] new_master {Maestro}{-3ILE2hWR3WkR3icKpYqVA}{172.17.0.1}{172.17.0.1:9300}, reason: zen-disco-join(elected_as_master, [0] joins received)
[2016-03-17 10:52:17,036][INFO ][http ] [Maestro] publish_address {172.17.0.1:9200}, bound_addresses {[::]:9200}
[2016-03-17 10:52:17,036][INFO ][node ] [Maestro] started
[2016-03-17 10:52:17,112][INFO ][gateway ] [Maestro] recovered [0] indices into cluster_state
Is this a clue?
I'm not a docker user so I might be wrong here, but doesn't the -p 9300:9300 parameter mean that that port will accept connection from anywhere on our network (not only docker containers in the same network)? The symptoms you describe match and old client trying to connect. I can't tell where it comes from, nor do I see anything that will point in another direction. Sorry, but my I advice is to continue digging in that direction. Let us know if you have any new information.
@bleskes @clintongormley you are right, I got the old client in an unused docker container,it works fine after stopped that container,thanks for your help