Hey there. I'm using docker-machine to build swarm cluster over digitalocean droplets. I want to configure it to use digitalocean private interfaces (eth1) instead of public(eth0).
Here is a quick script to illustrate the problem:
#!/bin/bash
export DIGITALOCEAN_ACCESS_TOKEN=.......
export DIGITALOCEAN_PRIVATE_NETWORKING=true
docker-machine ls -q | grep test-| xargs docker-machine rm
docker-machine create -d digitalocean test-keystore
ip=$(docker-machine ssh test-keystore 'ifconfig eth1 | grep "inet addr:" | cut -d: -f2 | cut -d" " -f1')
token="consul://${ip}:8500"
docker $(docker-machine config test-keystore) run -d \
-p "${ip}:8500:8500" \
-h "consul" \
progrium/consul -server -bootstrap
docker-machine create \
-d digitalocean \
--swarm \
--swarm-master \
--swarm-discovery $token \
--engine-opt="cluster-store=$token" \
--engine-opt="cluster-advertise=eth1:2376" \
test-swarm-master
docker-machine create \
-d digitalocean \
--swarm \
--swarm-discovery $token \
--engine-opt="cluster-store=$token" \
--engine-opt="cluster-advertise=eth1:2376" \
test-swarm-1
As you can see this is a simple cluster made of 2 nodes. My thought was that this should be enough to satisfy my needs, but unfortunately it doesn't work as expected...
Let's check eth1 interface on master node:
bash-3.2$ docker-machine ssh test-swarm-master ifconfig eth1
eth1 Link encap:Ethernet HWaddr 04:01:8e:ce:2a:02
inet addr:10.132.164.169 Bcast:10.132.255.255 Mask:255.255.0.0
inet6 addr: fe80::601:8eff:fece:2a02/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:1630 errors:0 dropped:0 overruns:0 frame:0
TX packets:1597 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:369220 (369.2 KB) TX bytes:245840 (245.8 KB)
Let's try docker info
:
bash-3.2$ docker info
Containers: 4
Images: 3
Role: primary
Strategy: spread
Filters: health, port, dependency, affinity, constraint
Nodes: 2
test-swarm-1: 104.236.235.73:2376
â”” Status: Healthy
â”” Containers: 2
â”” Reserved CPUs: 0 / 1
â”” Reserved Memory: 0 B / 514.5 MiB
â”” Labels: executiondriver=native-0.2, kernelversion=3.13.0-71-generic, operatingsystem=Ubuntu 14.04.3 LTS, provider=digitalocean, storagedriver=aufs
test-swarm-master: 104.131.77.93:2376
â”” Status: Healthy
â”” Containers: 2
â”” Reserved CPUs: 0 / 1
â”” Reserved Memory: 0 B / 514.5 MiB
â”” Labels: executiondriver=native-0.2, kernelversion=3.13.0-71-generic, operatingsystem=Ubuntu 14.04.3 LTS, provider=digitalocean, storagedriver=aufs, type=proxy
CPUs: 2
Total Memory: 1.005 GiB
Name: 0f99e736baf7
As you probably see, nodes are advertised using external IP (example: 104.236.235.73:2376). I want it to be 10.132.164.169:2376. Looks like --engine-opt="cluster-advertise=eth1:2376"
option doesn't work. Hope I'm not missing anything.
Additional info:
bash-3.2$ docker-machine ssh test-swarm-master ps aux | grep swarm
root 3985 0.1 2.5 26028 12736 ? Ssl 13:43 0:01 /swarm manage --tlsverify --tlscacert=/etc/docker/ca.pem --tlscert=/etc/docker/server.pem --tlskey=/etc/docker/server-key.pem -H tcp://0.0.0.0:3376 --strategy spread consul://10.132.85.138:8500
root 4069 0.0 1.0 19688 5500 ? Ssl 13:43 0:00 /swarm join --advertise 104.131.77.93:2376 consul://10.132.85.138:8500
Hm, I don't know if it's solvable immediately, given that this line:
https://github.com/docker/machine/blob/master/libmachine/provision/configure_swarm.go#L42-L48
Always uses the public IP address. This is necessary if, for instance, you're connecting across clouds.
I can certainly see the use case and we'll keep it in mind in the future. I've thought about adding some way to get and use private IPs at the driver level, but doing so would be a pretty large undertaking.
Thanks, @nathanleclaire! Do you have any workarounds in your mind? I tried to remove swarm-agent and put it back with --advertise ${internalIP}
but node doesn't return to cluster unfortunately :(
@wildsurfer Try that same move (re-run Swarm container with different IP) using a discovery service other than the Hub token (e.g. Consul). I know that Hub tokens have had a lot of issues with Swarm nodes leaving and re-joining for me in the past.
@nathanleclaire in example above I already use consul... so in theory node should get back, right?
This issue is related to #2687
@nathanleclaire I understand that it is required for nodes connecting across clouds, but that doesn't really justify taking the option away from users?
I want to elaborate a bit on what issues I encountered in this context:
Ideally, I would want the networking to work this way:
Unfortunately, there is no option to use the private networking for swarm communication (which, by the way, makes the --digitalocean-private-networking command line parameter kind of pointless).
In order to address this, I created non-swarm hosts and manually configured them, then adjusted the config.json with machine so I could use them as swarm hosts. I then reconfigured each node by relaunching the manager/agent containers. Unfortunately, they stopped connecting to each other because the TLS certificates are generated only for the public IP addresses, and thus are invalid on the private network.
I could now go and manually generate new certificates for all hosts, but that kind of defeats the purpose. If I go that far, I can just set up the docker swarm manually from scratch without using machine, it will be pretty much the same amount of work. Unfortunately, this is not very scalable or easy to maintain.
Keep in mind that digital ocean bills you for public bandwidth use, but usage on the private network is free. There is a real economic incentive to get this to work.
Unfortunately, they stopped connecting to each other because the TLS certificates are generated only for the public IP addresses
Is it possible to create 1 certificate for 2 ip addresses? Seems to me that this may be a first step in solving this issue.
@wildsurfer It is indeed possible to create a CA certificate for more than one name, including IP addresses. I'm not sure that's the best approach, though.
It seems to me like the digital ocean driver would need a bit of refactoring to implement this cleanly. Can we agree on a common approach before I potentially start working on anything? I think the cleanest way to do this would be:
I don't like to have command line options too long, does someone have good suggestions for the naming?
one flag to enable private networking on the created node
Don't we have one already(DIGITALOCEAN_PRIVATE_NETWORKING)?
one flag to advertise private address on masters
one flag to advertise private address on nodes
What if we will use one flag for both situations?
PRIVATE_ADVERTISE=true (or --private-advertise)
one flag to bind master to public/private/all interfaces
BIND_TO=all
Is there an update on this issue?
Most helpful comment
Don't we have one already(DIGITALOCEAN_PRIVATE_NETWORKING)?
What if we will use one flag for both situations?
PRIVATE_ADVERTISE=true (or --private-advertise)
BIND_TO=all