elasticsearch 2.2 in docker,exception caught on transport layer , Message not fully read (request) for requestId

Created on 14 Mar 2016  路  9Comments  路  Source: elastic/elasticsearch

Elasticsearch version: 2.2.0

JVM version:1.8.0_45

OS version:Linux gentoo x86_64

Description of the problem including expected versus actual behavior:
setup a single elasticsearch node (not cluster) in the Docker, get the warning logs : exception caught on transport layer [[id: 0x55b18076, /172.17.42.1:57474 => /172.17.0.127:9300]], closing connection
java.lang.IllegalStateException: Message not fully read (request) for requestId [1191646], action [cluster/nodes/info], readerIndex [39] vs expected [57]; resetting
at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:120)

172.17.0.127 is the ip address of the current docker
b9ab5f0193cc elasticsearch # ifconfig
eth0: flags=4163 mtu 1500
inet 172.17.0.127 netmask 255.255.0.0 broadcast 0.0.0.0
ether 02:42:ac:11:00:7f txqueuelen 0 (Ethernet)
RX packets 1604 bytes 115647 (112.9 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 1582 bytes 108924 (106.3 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
172.17.42.1 is the ip address of the docker 0
docker0: flags=4163 mtu 1500
inet 172.17.42.1 netmask 255.255.0.0 broadcast 0.0.0.0
ether 12:2e:2a:40:37:c6 txqueuelen 0 (Ethernet)
RX packets 72955032 bytes 5384429681 (5.0 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 73722264 bytes 13089516894 (12.1 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
Steps to reproduce:

  1. setup docker with command:
    docker run -it -d -e LANG=en_US.utf8 -p 9300:9300 -p 9200:9200 --name elastic xx:xx /bin/bash
    the image is the clean gentoo image only with jdk
  2. config the elasticsearch.yml :
    network.host: 0.0.0.0
    script.inline: on
    script.indexed: on
  3. start up elasticsearch with command:
    ./bin/elasticsearch -Des.insecure.allow.root=true

Provide logs (if relevant):
b9ab5f0193cc elasticsearch # ./bin/elasticsearch -Des.insecure.allow.root=true
[2016-03-14 15:04:30,491][WARN ][bootstrap ] running as ROOT user. this is a bad idea!
[2016-03-14 15:04:30,660][INFO ][node ] [White Rabbit] version[2.2.0], pid[478], build[8ff36d1/2016-01-27T13:32:39Z]
[2016-03-14 15:04:30,660][INFO ][node ] [White Rabbit] initializing ...
[2016-03-14 15:04:31,016][INFO ][plugins ] [White Rabbit] modules [lang-expression, lang-groovy], plugins [head, analysis-smartcn], sites [head]
[2016-03-14 15:04:31,027][INFO ][env ] [White Rabbit] using [1] data paths, mounts [[/opt/elasticsearch-2.2.0/data (/dev/sda2)]], net usable_space [24gb], net total_space [39.2gb], spins? [possibly], types [ext4]
[2016-03-14 15:04:31,027][INFO ][env ] [White Rabbit] heap size [989.8mb], compressed ordinary object pointers [true]
[2016-03-14 15:04:32,027][INFO ][node ] [White Rabbit] initialized
[2016-03-14 15:04:32,027][INFO ][node ] [White Rabbit] starting ...
[2016-03-14 15:04:32,070][INFO ][transport ] [White Rabbit] publish_address {172.17.0.127:9300}, bound_addresses {0.0.0.0:9300}
[2016-03-14 15:04:32,077][INFO ][discovery ] [White Rabbit] elasticsearch/sJFE90EHQPGp9xsFH39CyQ
[2016-03-14 15:04:34,985][WARN ][transport.netty ] [White Rabbit] exception caught on transport layer [[id: 0x55b18076, /172.17.42.1:57474 => /172.17.0.127:9300]], closing connection
java.lang.IllegalStateException: Message not fully read (request) for requestId [1191646], action [cluster/nodes/info], readerIndex [39] vs expected [57]; resetting
at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:120)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:75)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
[2016-03-14 15:04:35,119][INFO ][cluster.service ] [White Rabbit] new_master {White Rabbit}{sJFE90EHQPGp9xsFH39CyQ}{172.17.0.127}{172.17.0.127:9300}, reason: zen-disco-join(elected_as_master, [0] joins received)
[2016-03-14 15:04:35,130][INFO ][http ] [White Rabbit] publish_address {172.17.0.127:9200}, bound_addresses {0.0.0.0:9200}
[2016-03-14 15:04:35,130][INFO ][node ] [White Rabbit] started
[2016-03-14 15:04:35,208][INFO ][gateway ] [White Rabbit] recovered [0] indices into cluster_state
[2016-03-14 15:04:36,118][WARN ][transport.netty ] [White Rabbit] exception caught on transport layer [[id: 0x5b154af7, /172.17.42.1:57486 => /172.17.0.127:9300]], closing connection
java.lang.IllegalStateException: Message not fully read (request) for requestId [1191272], action [cluster/nodes/info], readerIndex [39] vs expected [57]; resetting
at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:120)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:75)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

:CorInfrTransport API discuss

All 9 comments

Hi @ukouryou

It looks like you have mixed versions of Elasticsearch running. Perhaps a 1.x client trying to talk to your 2.x cluser?

@clintongormley thanks for your replay, I get the log when the elasticsearch server start up,I didn't talk to server with client

@ukouryou it looks like you DO have a client or node hanging around somewhere. That message indicates that some other node/client tried to connect to this node and sent a malformed request. Likely it is malformed because it has a different major version.

@clintongormley I have scaned the whole intranet, no other node exist.
I found that if I removed network.host: 0.0.0.0 in the elasticsearch.yml,it works ok

1be54772214b elasticsearch # ./bin/elasticsearch -Des.insecure.allow.root=true
[2016-03-16 10:34:51,492][WARN ][bootstrap ] running as ROOT user. this is a bad idea!
[2016-03-16 10:34:51,650][INFO ][node ] [Amphibian] version[2.2.0], pid[407], build[8ff36d1/2016-01-27T13:32:39Z]
[2016-03-16 10:34:51,650][INFO ][node ] [Amphibian] initializing ...
[2016-03-16 10:34:52,001][INFO ][plugins ] [Amphibian] modules [lang-expression, lang-groovy], plugins [head, analysis-smartcn], sites [head]
[2016-03-16 10:34:52,012][INFO ][env ] [Amphibian] using [1] data paths, mounts [[/opt/elasticsearch-2.2.0/data (/dev/sda2)]], net usable_space [23.7gb], net total_space [39.2gb], spins? [possibly], types [ext4]
[2016-03-16 10:34:52,012][INFO ][env ] [Amphibian] heap size [989.8mb], compressed ordinary object pointers [true]
[2016-03-16 10:34:52,980][INFO ][node ] [Amphibian] initialized
[2016-03-16 10:34:52,980][INFO ][node ] [Amphibian] starting ...
[2016-03-16 10:34:53,027][INFO ][transport ] [Amphibian] publish_address {127.0.0.1:9300}, bound_addresses {127.0.0.1:9300}
[2016-03-16 10:34:53,032][INFO ][discovery ] [Amphibian] elasticsearch/ndsNzXynSQCfhbty1ePJag
[2016-03-16 10:34:56,047][INFO ][cluster.service ] [Amphibian] new_master {Amphibian}{ndsNzXynSQCfhbty1ePJag}{127.0.0.1}{127.0.0.1:9300}, reason: zen-disco-join(elected_as_master, [0] joins received)
[2016-03-16 10:34:56,057][INFO ][http ] [Amphibian] publish_address {127.0.0.1:9200}, bound_addresses {127.0.0.1:9200}
[2016-03-16 10:34:56,057][INFO ][node ] [Amphibian] started
[2016-03-16 10:34:56,134][INFO ][gateway ] [Amphibian] recovered [0] indices into cluster_state

but I want to access it outside of the docker, any other suggestions?

@bleskes any idea what might be going on here?

I too strongly suspect there is an old 1.x client sending info requests to the cluster (we used those in 1.x to check that a node is alive).

Can you double check the source ip, 172.17.42.1 ?

Note that we you don't bound to 0.0.0.0 the ES node is not accepting any request from the outside and thus can't be reached by that client.

@bleskes I have a physical server ,run the ifconfig command , the output as follows

ifconfig
docker0: flags=4163 mtu 1500
inet 172.17.42.1 netmask 255.255.0.0 broadcast 0.0.0.0
ether 0a:ea:37:dc:d9:29 txqueuelen 0 (Ethernet)
RX packets 75912870 bytes 5869903197 (5.4 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 76830962 bytes 13987928119 (13.0 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

eno1: flags=4163 mtu 1500
inet 192.168.5.201 netmask 255.255.255.0 broadcast 192.168.5.255
ether 34:17:eb:c2:3d:cc txqueuelen 1000 (Ethernet)
RX packets 16581382 bytes 9437015715 (8.7 GiB)
RX errors 0 dropped 3874435 overruns 0 frame 0
TX packets 9692374 bytes 5086146362 (4.7 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
device interrupt 20 memory 0xf7100000-f7120000

lo: flags=73 mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
loop txqueuelen 0 (Local Loopback)
RX packets 982 bytes 314975 (307.5 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 982 bytes 314975 (307.5 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

I have installed docker on this physical server, and also a docker container name elasticsearch on this server, I got the issue in the container elasticserach , 172.17.42.1 is ther ip address of the docker0, did I make it clear?

I also found that It works ok with these configurations outside of the docker container

andy@andy:~/work/elasticsearch-2.2.0$ ./bin/elasticsearch -Dnetwork.host=0.0.0.0
[2016-03-17 10:52:12,591][INFO ][node ] [Maestro] version[2.2.0], pid[6771], build[8ff36d1/2016-01-27T13:32:39Z]
[2016-03-17 10:52:12,592][INFO ][node ] [Maestro] initializing ...
[2016-03-17 10:52:12,903][INFO ][plugins ] [Maestro] modules [lang-expression, lang-groovy], plugins [head, analysis-smartcn], sites [head]
[2016-03-17 10:52:12,914][INFO ][env ] [Maestro] using [1] data paths, mounts [[/ (/dev/sda2)]], net usable_space [833.6gb], net total_space [908.5gb], spins? [possibly], types [ext4]
[2016-03-17 10:52:12,915][INFO ][env ] [Maestro] heap size [989.8mb], compressed ordinary object pointers [true]
[2016-03-17 10:52:13,931][INFO ][node ] [Maestro] initialized
[2016-03-17 10:52:13,931][INFO ][node ] [Maestro] starting ...
[2016-03-17 10:52:13,974][INFO ][transport ] [Maestro] publish_address {172.17.0.1:9300}, bound_addresses {[::]:9300}
[2016-03-17 10:52:13,980][INFO ][discovery ] [Maestro] elasticsearch/-3ILE2hWR3WkR3icKpYqVA
[2016-03-17 10:52:17,026][INFO ][cluster.service ] [Maestro] new_master {Maestro}{-3ILE2hWR3WkR3icKpYqVA}{172.17.0.1}{172.17.0.1:9300}, reason: zen-disco-join(elected_as_master, [0] joins received)
[2016-03-17 10:52:17,036][INFO ][http ] [Maestro] publish_address {172.17.0.1:9200}, bound_addresses {[::]:9200}
[2016-03-17 10:52:17,036][INFO ][node ] [Maestro] started
[2016-03-17 10:52:17,112][INFO ][gateway ] [Maestro] recovered [0] indices into cluster_state

Is this a clue?

I'm not a docker user so I might be wrong here, but doesn't the -p 9300:9300 parameter mean that that port will accept connection from anywhere on our network (not only docker containers in the same network)? The symptoms you describe match and old client trying to connect. I can't tell where it comes from, nor do I see anything that will point in another direction. Sorry, but my I advice is to continue digging in that direction. Let us know if you have any new information.

@bleskes @clintongormley you are right, I got the old client in an unused docker container,it works fine after stopped that container,thanks for your help

Was this page helpful?
0 / 5 - 0 ratings