Machine: After instance change ip, the swarm agent must also change the join addr.

Created on 18 Mar 2015 · 4Comments · Source: docker/machine

After more testing PR #770 I found this:

I detect another changed IP problem, after I restart my swarm ec2 cluster today.

The master use the old ip's from the swarm machines

time="2015-03-18T18:23:54Z" level=error msg="Get https://54.69.29.90:2376/v1.15/info: dial tcp 54.69.29.90:2376: i/o timeout" 
time="2015-03-18T18:23:54Z" level=error msg="Get https://54.69.230.35:2376/v1.15/info: dial tcp 54.69.230.35:2376: i/o timeout" 
time="2015-03-18T18:23:54Z" level=error msg="Get https://54.69.255.39:2376/v1.15/info: dial tcp 54.69.255.39:2376: i/o timeout" 
time="2015-03-18T18:23:54Z" level=error msg="Get https://52.10.167.59:2376/v1.15/info: dial tcp 52.10.167.59:2376: i/o timeout"

I analyze the problem:

The swarm agent are join with the old ip 52.10.167.59

$ docker-machine ls
NAME               ACTIVE   DRIVER       STATE     URL                        SWARM
amazonec2-03                amazonec2    Stopped                              
dev                         virtualbox   Stopped                              
ec2-swarm-01                amazonec2    Running   tcp://54.149.27.239:2376   ec2-swarm-master
ec2-swarm-02                amazonec2    Running   tcp://52.10.108.31:2376    ec2-swarm-master
ec2-swarm-03       *        amazonec2    Running   tcp://54.148.5.178:2376    ec2-swarm-master
ec2-swarm-master            amazonec2    Running   tcp://52.11.98.189:2376    ec2-swarm-master (master)
$ $(docker-machine env ec2-swarm-master)
$ docker ps --no-trunc
CONTAINER ID                                                       IMAGE               COMMAND                                                                                                                                                                                          CREATED             STATUS              PORTS                              NAMES
13d27667155b3b1962b99b8d817c7a9865b47fe5b0d5d9c0af08735b26163efa   swarm:latest        "/swarm join --addr 52.10.167.59:2376 token://5a57a53a13470b1e680c6904ce5b34d1"                                                                                                                  35 hours ago        Up 11 minutes       2375/tcp                           swarm-agent          
810f7ce04b6439c191470a2116197088ee2a3d2e5ed1cc7f4742aacef46317f9   swarm:latest        "/swarm manage --tlsverify --tlscacert=/etc/docker/ca.pem --tlscert=/etc/docker/server.pem --tlskey=/etc/docker/server-key.pem -H tcp://0.0.0.0:3376 token://5a57a53a13470b1e680c6904ce5b34d1"   35 hours ago        Up 11 minutes       2375/tcp, 0.0.0.0:3376->3376/tcp   swarm-agent-master   
$ docker-machine ip ec2-swarm-master
52.11.98.189

After the IP from swarm machine changed, the implementation must reconfigure the swarm agent, remove the old container and start a new one.

areswarm driveec2

Source

rossbachp

Most helpful comment

Here is my workaround after changing IP address of docker swarm node:

% docker-machine env docker-node
% docker-machine regenerate-certs docker-node
(I sometimes need to run multiple times when error occurs.)
% eval $(docker-machine env docker-node)
% export TOKEN=$(docker inspect -f "{{ index .Config.Cmd 3}}" swarm-agent)
% docker rm -f swarm-agent
% docker run -d --name=swarm-agent --restart=always swarm:latest join --advertise "${DOCKER_HOST##tcp://}" "${TOKEN}"

yoshiokatsuneo on 21 Jan 2016

👍4

All 4 comments

The only quick fix is currently recreate the agent with this tiny script:

create-swam-agent.sh

#!/bin/bash
TOKEN=$(docker inspect -f "{{ index .Config.Cmd 3 }}" swarm-agent)
IP=$(curl http://169.254.169.254/latest/meta-data/public-ipv4)
docker stop swarm-agent
docker rm swarm-agent
docker run -d --name swarm-agent --restart=always swarm \
  join --addr ${IP}:2376 \
  ${TOKEN}

rossbachp on 18 Mar 2015

I think longer-termish we will have to support some kind of "sync" to the config store, I don't know if the Docker Hub token discovery service would support modifying the cluster IPs, but I'm sure the KV backends would.

cc @aluzzardi @vieux @abronan How would you envision workflow for this case (changing IPs in the swarm)?

nathanleclaire on 6 Jul 2015

@nathanleclaire Entries in the K/V are deleted after TTL expiration (nodes are removed from the discovery). So if the IPs are changing, the store will reflect the state of the cluster correctly after a stop/restart (on EC2 for example). Still you might expect old entries to be listed for a bit of time until their TTL expires (If you have 3 machines, expect to have 6 of those listed even though old entries will be marked as unhealthy and couldn't be used in the Swarm)

As a workaround, if Machine is aware that an instance is restarting, it could directly delete the entry in the K/V to not list machines with wrong IPs after a restart.

abronan on 14 Jul 2015

Here is my workaround after changing IP address of docker swarm node:

% docker-machine env docker-node
% docker-machine regenerate-certs docker-node
(I sometimes need to run multiple times when error occurs.)
% eval $(docker-machine env docker-node)
% export TOKEN=$(docker inspect -f "{{ index .Config.Cmd 3}}" swarm-agent)
% docker rm -f swarm-agent
% docker run -d --name=swarm-agent --restart=always swarm:latest join --advertise "${DOCKER_HOST##tcp://}" "${TOKEN}"