Machine: Configuring swarm to communicate between nodes using eth1

Created on 11 Dec 2015 · 10Comments · Source: docker/machine

Hey there. I'm using docker-machine to build swarm cluster over digitalocean droplets. I want to configure it to use digitalocean private interfaces (eth1) instead of public(eth0).

Here is a quick script to illustrate the problem:

#!/bin/bash

export DIGITALOCEAN_ACCESS_TOKEN=.......
export DIGITALOCEAN_PRIVATE_NETWORKING=true

docker-machine ls -q | grep test-| xargs docker-machine rm

docker-machine create -d digitalocean test-keystore
ip=$(docker-machine ssh test-keystore 'ifconfig eth1 | grep "inet addr:" | cut -d: -f2 | cut -d" " -f1')

token="consul://${ip}:8500"

docker $(docker-machine config test-keystore) run -d \
    -p "${ip}:8500:8500" \
    -h "consul" \
    progrium/consul -server -bootstrap

docker-machine create \
    -d digitalocean \
    --swarm \
    --swarm-master \
    --swarm-discovery $token \
    --engine-opt="cluster-store=$token" \
    --engine-opt="cluster-advertise=eth1:2376" \
    test-swarm-master

docker-machine create \
    -d digitalocean \
    --swarm \
    --swarm-discovery $token \
    --engine-opt="cluster-store=$token" \
    --engine-opt="cluster-advertise=eth1:2376" \
    test-swarm-1

As you can see this is a simple cluster made of 2 nodes. My thought was that this should be enough to satisfy my needs, but unfortunately it doesn't work as expected...

Let's check eth1 interface on master node:

bash-3.2$ docker-machine ssh test-swarm-master ifconfig eth1
eth1      Link encap:Ethernet  HWaddr 04:01:8e:ce:2a:02  
          inet addr:10.132.164.169  Bcast:10.132.255.255  Mask:255.255.0.0
          inet6 addr: fe80::601:8eff:fece:2a02/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1630 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1597 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:369220 (369.2 KB)  TX bytes:245840 (245.8 KB)

Let's try docker info:

bash-3.2$ docker info
Containers: 4
Images: 3
Role: primary
Strategy: spread
Filters: health, port, dependency, affinity, constraint
Nodes: 2
 test-swarm-1: 104.236.235.73:2376
  └ Status: Healthy
  └ Containers: 2
  └ Reserved CPUs: 0 / 1
  └ Reserved Memory: 0 B / 514.5 MiB
  └ Labels: executiondriver=native-0.2, kernelversion=3.13.0-71-generic, operatingsystem=Ubuntu 14.04.3 LTS, provider=digitalocean, storagedriver=aufs
 test-swarm-master: 104.131.77.93:2376
  └ Status: Healthy
  └ Containers: 2
  └ Reserved CPUs: 0 / 1
  └ Reserved Memory: 0 B / 514.5 MiB
  └ Labels: executiondriver=native-0.2, kernelversion=3.13.0-71-generic, operatingsystem=Ubuntu 14.04.3 LTS, provider=digitalocean, storagedriver=aufs, type=proxy
CPUs: 2
Total Memory: 1.005 GiB
Name: 0f99e736baf7

As you probably see, nodes are advertised using external IP (example: 104.236.235.73:2376). I want it to be 10.132.164.169:2376. Looks like --engine-opt="cluster-advertise=eth1:2376" option doesn't work. Hope I'm not missing anything.

Additional info:

bash-3.2$ docker-machine ssh test-swarm-master ps aux | grep swarm
root      3985  0.1  2.5  26028 12736 ?        Ssl  13:43   0:01 /swarm manage --tlsverify --tlscacert=/etc/docker/ca.pem --tlscert=/etc/docker/server.pem --tlskey=/etc/docker/server-key.pem -H tcp://0.0.0.0:3376 --strategy spread consul://10.132.85.138:8500
root      4069  0.0  1.0  19688  5500 ?        Ssl  13:43   0:00 /swarm join --advertise 104.131.77.93:2376 consul://10.132.85.138:8500

areswarm drivedigitalocean

Source

wildsurfer

👍1

Most helpful comment

one flag to enable private networking on the created node

Don't we have one already(DIGITALOCEAN_PRIVATE_NETWORKING)?

one flag to advertise private address on masters
one flag to advertise private address on nodes

What if we will use one flag for both situations?
PRIVATE_ADVERTISE=true (or --private-advertise)

one flag to bind master to public/private/all interfaces

BIND_TO=all

wildsurfer on 21 Jan 2016

👍4

All 10 comments

Hm, I don't know if it's solvable immediately, given that this line:

https://github.com/docker/machine/blob/master/libmachine/provision/configure_swarm.go#L42-L48

Always uses the public IP address. This is necessary if, for instance, you're connecting across clouds.

I can certainly see the use case and we'll keep it in mind in the future. I've thought about adding some way to get and use private IPs at the driver level, but doing so would be a pretty large undertaking.

nathanleclaire on 11 Dec 2015

Thanks, @nathanleclaire! Do you have any workarounds in your mind? I tried to remove swarm-agent and put it back with --advertise ${internalIP} but node doesn't return to cluster unfortunately :(

wildsurfer on 12 Dec 2015

@wildsurfer Try that same move (re-run Swarm container with different IP) using a discovery service other than the Hub token (e.g. Consul). I know that Hub tokens have had a lot of issues with Swarm nodes leaving and re-joining for me in the past.

nathanleclaire on 12 Dec 2015

@nathanleclaire in example above I already use consul... so in theory node should get back, right?

wildsurfer on 12 Dec 2015

This issue is related to #2687

wildsurfer on 5 Jan 2016

@nathanleclaire I understand that it is required for nodes connecting across clouds, but that doesn't really justify taking the option away from users?

I want to elaborate a bit on what issues I encountered in this context:

cluster of 3 consul nodes for redundant service discovery
cluster of 3 manager nodes for redundant swarm management
cluster of N agent nodes for actually hosting containers

Ideally, I would want the networking to work this way:

consul nodes use the private network on digital ocean (works)
manager nodes use the private network for advertising but are still reachable over public network by binding to 0.0.0.0
agent nodes use the private network (with a separately configured loadbalancer for public services)

Unfortunately, there is no option to use the private networking for swarm communication (which, by the way, makes the --digitalocean-private-networking command line parameter kind of pointless).

In order to address this, I created non-swarm hosts and manually configured them, then adjusted the config.json with machine so I could use them as swarm hosts. I then reconfigured each node by relaunching the manager/agent containers. Unfortunately, they stopped connecting to each other because the TLS certificates are generated only for the public IP addresses, and thus are invalid on the private network.

I could now go and manually generate new certificates for all hosts, but that kind of defeats the purpose. If I go that far, I can just set up the docker swarm manually from scratch without using machine, it will be pretty much the same amount of work. Unfortunately, this is not very scalable or easy to maintain.

Keep in mind that digital ocean bills you for public bandwidth use, but usage on the private network is free. There is a real economic incentive to get this to work.

awfm9 on 20 Jan 2016

Unfortunately, they stopped connecting to each other because the TLS certificates are generated only for the public IP addresses

Is it possible to create 1 certificate for 2 ip addresses? Seems to me that this may be a first step in solving this issue.

wildsurfer on 20 Jan 2016

@wildsurfer It is indeed possible to create a CA certificate for more than one name, including IP addresses. I'm not sure that's the best approach, though.

It seems to me like the digital ocean driver would need a bit of refactoring to implement this cleanly. Can we agree on a common approach before I potentially start working on anything? I think the cleanest way to do this would be: