This is Scylla's bug tracker, to be used for reporting bugs only.
If you have a question about Scylla, and not a bug, please ask it in
our mailing-list at [email protected] or in our slack channel.
Installation details
Scylla version (or git commit hash): latest docker
Cluster size: 2
OS (RHEL/CentOS/Ubuntu/AWS AMI): Ubuntu host, scylla docker image
Platform (physical/VM/cloud instance type/docker): docker
Hardware: sockets= cores=1* hyperthreading= memory=1GB*
Disks: (SSD/HDD, count)
*I've used small VMs for my test example below, but see the same issue on hosts with 16GB RAM and 8 cores.
TL;DR
Unable to connect to a Scylla cluster via port 9042 and others when they are running as Docker containers on a Docker swarm in an overlay network and I am trying to connect to them from another computer that is not part of the Swarm - i,e, by using the host IP addresses/names. Ports are being forwarded to the hosts successfully for other containers added to the overlay network.
Hi,
I seem to be having an issue in being able to connect to Scylla in a Docker overlay network on a Docker swarm host. Here is the situation:
A Windows 10 machine running VirtualBox
Two Ubuntu Server 18.04.2 VMs, called ubuntu1 and ubuntu2 and with static IPs 192.168.0.70 and 192.168.0.71 respectively.
Both VMs have had docker installed using the instructions at https://docs.docker.com/install/linux/docker-ce/ubuntu/, ie:
sudo apt update
sudo apt-get install \
apt-transport-https \
ca-certificates \
curl \
gnupg-agent \
software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) \
stable"
sudo apt update
sudo apt-get install docker-ce docker-ce-cli containerd.io
Set up so that Docker doesn't have to be run using sudo and starts on restart:
sudo usermod -aG docker $USER
sudo systemctl enable docker
Log out and back in again to pick up the new group, and repeat all the steps on the other VM.
Docker swarm manager installed on ubuntu1 by running:
docker swarm init
Join the second VM to the swarm by copying the "docker swarm join" line that is then provided after running "docker swarm init" but on ubuntu2 (not written here as it varies for each VM, but will start with "docker swarm join").
Create an overlay network called "testnet" that allows containers on different hosts to communicate:
docker network create -d overlay --attachable --subnet 10.0.0.0/16 testnet
If I then run a container using the "tomcat" docker image as a test:
docker run --name tomcat --hostname tomcat --network testnet -p 8080:8080 -d tomcat
I find that I can reach the Tomcat site root at http://192.168.0.70:8080 - i.e. via the VM address rather than the container's address on the overlay network. I can also successfully telnet to 192.168.0.70, port 8080 from the Windows machine, as expected.
I then tried running a scylla node on each VM, with the second node pointing at the first as a seed. The containers use the same network as the Tomcat container did, which allows connections across the two hosts:
(On ubuntu1)
docker run --name scylla-node1 --hostname scylla-node1 --network testnet --ip 10.0.0.250 -p 9042:9042 -p 9160:9160 -p 10000 -p 7000:7000 -p 7001:7001 -p 7199:7199 -p 9180:9180 -d scylladb/scylla
(On ubuntu2)
docker run --name scylla-node2 --hostname scylla-node2 --network testnet --ip 10.0.0.251 -p 9042:9042 -p 9160:9160 -p 10000 -p 7000:7000 -p 7001:7001 -p 7199:7199 -p 9180:9180 -d scylladb/scylla --seeds="scylla-node1"
If however, I run a Scylla node on each host, I find the following:
cqlsh scylla-node1 from the second node, it can connect to the first without any issues, so the connections within the overlay network across the two hosts are working.No errors appear in the logs on either cluster.
I can connect via Telnet on the Windows host to port 9180 on each VM - i.e. I can run telnet 192.168.0.70 9180 and it connects successfully.
I CAN'T connect to port 9042 via telnet from the Windows machine using the VM hostnames -i.e. "telnet 192.168.0.70 9042" does NOT work, and I'm finding that a Spark application which is set to connect to that hostname and IP address cannot connect to Scylla at that address and port either.
I also cannot connect via telnet on the Windows host to the following ports on the ubuntu VMs: 7000, 7001, 7199, 9100, 9160. The only one that connects successfully is 9180.
As a next step I then realised that I might need to override the scylla.yaml file, in particular the listen_address, rpc_address, broadcast_address, and broadcast_rpc_address settings, copying the files to "/home/matt" on each host, and then mounting it when running the containers via the following commands:
docker run --name scylla-node1 --hostname scylla-node1 --network testnet --ip 10.0.0.250 -p 9042:9042 -p 9160:9160 -p 10000 -p 7000:7000 -p 7001:7001 -p 7199:7199 -p 9180:9180 -v /home/matt/scylla.yaml:/etc/scylla/scylla.yaml -d scylladb/scylla
docker run --name scylla-node2 --hostname scylla-node2 --network testnet --ip 10.0.0.251 -p 9042:9042 -p 9160:9160 -p 10000 -p 7000:7000 -p 7001:7001 -p 7199:7199 -p 9180:9180 -v /home/matt/scylla.yaml:/etc/scylla/scylla.yamls -d scylladb/scylla --seeds="scylla-node1"
Obviously, each host (and therefore each container) has a different copy of the file, with the appropriate values put into the settings I've mentioned, depending on which container it is for.
I've tried different combinations of the host IPs, hostnames, container IPs and container names for each of the settings, but still can't connect to scylla on port 9042 (or the other ports I mentioned) via the host once the containers are running. I don't think the issue is down to the scylla.yaml settings, because in most cases the two containers CAN talk to each other, it's just when I try to connect via the host's IP that I can't connect. As mentioned at the beginning, I've tested a tomcat container on the same overlay network, and can connect to port 8080 just fine using the same techniques, so the network itself IS allowing forwarding from the host to the containers in the network. I can therefore only put it down to an issue with the scylla image itself. I should also note that if I start a single scylla node on one of the hosts using the default bridge network I can connect to 9042 without any issues using the host IP.
Happy to provide any additional information as required. If it is a configuration problem, I'd be very happy to know what I'm doing wrong!
Thanks!
Is User Request really an appropriate label for this, since I should be able to connect to Scylla in this configuration? This looks to be a bug rather than a request.
@penberg can you please review this
@penberg if you can check this out and help (even for the next user that has the issue)
This is affecting us on https://github.com/bionicles/coronavirus.
If I understand the scenario correctly, connections are working fine from the Windows host to the Docker containers except to port 9042, which is the CQL protocol -- even when listen-address is configured in the YAML, right?
One possible explanation might be that if the --listen-address command line option is not configured, we default it to hostname -i, even overriding whatever is in YAML:
https://github.com/scylladb/scylla/blob/master/dist/docker/redhat/scyllasetup.py#L93
What's the output of the following?
$ docker logs some-scylla 2>&1 | grep "command used"
As a workaround, try configuring --listen-address, --rpc-address, --broadcast-address, and --broadcast-rpc-address from the Docker image command line.
@penberg thanks for the response. I've had to recreate the environment, as this was obviously logged a long time ago, but can recreate it again.
The output of the command that you suggested is the following for scylla-node1:
"/usr/bin/scylla --log-to-syslog 0 --log-to-stdout 1 --default-log-level info --network-stack posix --developer-mode=1 --overprovisioned --listen-address 10.0.0.250 --rpc-address 10.0.0.250 --seed-provider-parameters seeds=10.0.0.250 --blocked-reactor-notify-ms 999999999"
I've tried running the containers with the parameters that you suggested, but unfortunately I still get the same problem.