I'm using testcontainers-java while building on jenkins with a docker agent. i.e. the docker "wormhole" pattern.
I'm getting the dreaded "Can not connect to Ryuk" error. (gist with full log)
2019-08-13 01:53:08.861 WARN --- [containers-ryuk] o.testcontainers.utility.ResourceReaper : Can not connect to Ryuk at 192.168.0.1:32769
java.net.ConnectException: Connection refused (Connection refused)
Here's what I have found while debugging:
First, testcontainers determines the dockerHostIp by looking for the default route in a new temporary container. This returns 192.168.0.1 in this environment.
Next, testcontainers starts the ryuk container, retrieves the mapped port, and tries to connect to it through the gateway ip
In this environment, the connection is refused as seen in the error message above.
In this enviroment, docker does _not_ allow connecting from one container to another container through ports on the gateway ip.
Docker only allows using these ports from _outside_ of a docker container (e.g. connections from the docker host vm)
On the other hand, docker _does_ allow connecting from one container to another using the ip address of the desired container, with it's exposed port. (e.g. 192.168.0.2:8080)
I have found that the following iptables rules (managed by docker) explicitly prevent using ports on the gateway ip from within a docker container to communicate to another container:
Chain DOCKER (2 references)
pkts bytes target prot opt in out source destination
// the next rule prevents anything coming into docker0 (e.g. from a container) from reaching the port mappings for other containers
3 180 RETURN all -- docker0 any anywhere anywhere
// the next rule also only allows non-docker0 traffic (e.g. only traffic from outside a container) to reach the port mappings
0 0 DNAT tcp -- !docker0 any anywhere anywhere tcp dpt:32751 to:192.168.0.2:8080
0 0 DNAT tcp -- !docker0 any anywhere anywhere tcp dpt:32750 to:192.168.0.3:8080
Note again that these are managed by docker. I have done nothing to them.
I confirmed this docker behavior outside of testcontainers by starting up a simple netcat listener container...
# docker run --rm -it -p 32751:8080 alpine:3.6 nc -lkp 8080
and trying to connect to it from another container, through the gateway ip (which fails)
# docker run --rm -it alpine:3.6 telnet 192.168.0.1 32751
telnet: can't connect to remote host (192.168.0.1): Connection refused
and then trying to connect to it from another container via the container's specific ip/port
# docker run --rm -it alpine:3.6 telnet 192.168.0.2 8080
hello
^C
Console escape. Commands are:
l go to line mode
c go to character mode
z suspend telnet
e exit telnet
So, it seems to me that when using the "docker wormhole" pattern, testcontainers should connect to ryuk using the ryuk container's ip address and exposed port, rather than the gateway ip and mapped port.
Another observation (separate issue) is that all of this assumes that both the source docker container and the ryuk docker container are both running on the default bridge network. Using a user-defined bridge network is not supported.
Also note that connecting from one container to another through the mapped ports on the gateway ip works on Docker for Windows. I only encounter the connection refused error in the CI environment, which is running:
# cat /etc/redhat-release
CentOS Linux release 7.6.1810 (Core)
# docker version
Client:
Version: 18.09.3
API version: 1.39
Go version: go1.10.8
Git commit: 774a1f4
Built: Thu Feb 28 06:33:21 2019
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 18.09.3
API version: 1.39 (minimum version 1.12)
Go version: go1.10.8
Git commit: 774a1f4
Built: Thu Feb 28 06:02:24 2019
OS/Arch: linux/amd64
Experimental: false
Hi @philsttr!
We indeed need to think about accessing the containers by their IPs in environments like yours.
Meanwhile, one way of solving this should be to use "--network=host" (unless your tests assign fixed ports already)
Some users have also reported a success after adjusting the firewall rules:
https://github.com/testcontainers/testcontainers-java/issues/1277#issuecomment-468306052
Thanks for the quick response @bsideup ! (I swear I'm not stalking you)
Unfortunately, I don't think using --network=host is an option for me in this environment. I'm bending the rules by using the default bridge network as is. It would be ideal if testcontainers supported using a user-defined network (maybe another feature request), as that is what our jenkins jobs use.
I also looked into adjusting firewall rules. I had found #1277 before filing this issue, but I was unable to get that to work. Docker is manually adding individual nat rules when it spins up containers (see the second comment in my iptables output) that are preventing the connections from happening. So, it's not as simple as a one-time change.
I feel like this environment is a standard docker "wormhole" pattern, and was surprised when testcontainers didn't work in it, since the docs explicitly state that this is a supported configuration. It's just docker on linux. We're not really doing anything fancy. I would imagine that docker would be adding these same iptables rules in every environment..
Do you have any other docker-on-linux installs using the wormhole pattern and the bridge network that work?
We even run our CI with this setup (see Travis' config)
I think you have two "unusual" parameters of your setup:
1) custom networks
2) hardened firewall in CentOS/RHEL as was already discovered by our RedHat friends
I am on vacation, will come back on Monday and ping other RHEL/CentOS users. You can try searching for redhat-specific issues meanwhile, it was reported a couple of weeks ago
As a small addition, I've run and supported multiple different Gitlab-CI environments using the docker-wormhole pattern and this issue never occurred to me.
So in general docker wormhole pattern is well supported by Testcontainers, but of course certain configurations are still troublesome.
Thanks for the updates guys. I've done some additional research here.
I was able to try --network host in this environment as a test, and it worked. So, to recap, in this environment, I had the following outcomes for container-to-container communication using the gateway ip and mapped port:
--network host - ✔️works--network bridge - ❌ does not work--network userdefined - ❌ does not workI also tried some experiments on some newly created VMs locally. I tried both a centos 7.6 VM and a ubuntu 19.04 VM.
On each VM, I just installed docker using yum/apt-get, and ran the netcat/telnet test I mentioned above.
On ubuntu 19.04, -network bridge - ✔️works.
On centos 7.6, --network bridge - ❌ does not work (here's a gist of the terminal commands used)
So, this definitely seems specific to centos. And it affects the most generic centos VM that I could create (e.g. install OS->install docker->test->fail).
I was able to get -network bridge to work on centos by inserting the iptables rule as mentioned in https://github.com/testcontainers/testcontainers-java/issues/572#issuecomment-517831833
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you believe this is a mistake, please reply to this comment to keep it open. If there isn't one already, a PR to fix or at least reproduce the problem in a test case will always help us get back on track to tackle this.
This issue has been automatically closed due to inactivity. We apologise if this is still an active problem for you, and would ask you to re-open the issue if this is the case.
Most helpful comment
Thanks for the updates guys. I've done some additional research here.
I was able to try
--network hostin this environment as a test, and it worked. So, to recap, in this environment, I had the following outcomes for container-to-container communication using the gateway ip and mapped port:--network host- ✔️works--network bridge- ❌ does not work--network userdefined- ❌ does not workI also tried some experiments on some newly created VMs locally. I tried both a centos 7.6 VM and a ubuntu 19.04 VM.
On each VM, I just installed docker using yum/apt-get, and ran the netcat/telnet test I mentioned above.
On ubuntu 19.04,
-network bridge- ✔️works.On centos 7.6,
--network bridge- ❌ does not work (here's a gist of the terminal commands used)So, this definitely seems specific to centos. And it affects the most generic centos VM that I could create (e.g. install OS->install docker->test->fail).