Machine: regenerate-certs doesn't work on swarm-master

Created on 21 Dec 2015 · 5Comments · Source: docker/machine

Consider this scenario:

docker-machine create -d amazonec2 --swarm --swarm-master (etc)
everything works (single node swarm-master + swarm node)
change IP of amazon instance (in my case, set Elastic IP)
docker-machine detects IP change, via _magic_ I guess
docker-machine env for new IP will complain about tls cert IP mismatch
docker-machine regenerate-certs gets docker working again with docker-machine env
docker-machine env --swarm however, will act like it's fine, but any docker or docker-compose commands will do nothing. No errors in cli, just nothing. docker images when not using --swarm IP will generate proper image list, but with --swarm IP it'll just list headers and no images.

Is regenerate-certs supposed to work with an existing swarm?

areswarm kinbug

Source

BretFisher

👍1

All 5 comments

When you run swarm it listens on the public IP when it was first initialized. docker inspect on the swarm manage process looks something like this.

{
  "Path": "/swarm",
  "Args": [
      "manage",
      "--tlsverify",
      "--tlscacert=/etc/docker/ca.pem",
      "--tlscert=/etc/docker/server.pem",
      "--tlskey=/etc/docker/server-key.pem",
      "-H",
      "tcp://0.0.0.0:3376",
      "--strategy",
      "spread",
      "--advertise",
      "PUBLICIP:2376",
      "--replication",
      "etcd://ectd.host:2379/swarm"
    ]
}

Quick (and kinda lazy) workaround I have found for this is simply rerunning docker-machine command but using the generic driver instead to setup swarm.

docker-machine --debug create NEWNAME -d generic \
--generic-ip-address SERVERIP \
--generic-ssh-key KEYPATH \
--generic-ssh-user core \
--engine-label public=false \
--swarm \
--swarm-master \
--swarm-opt replication \
--swarm-discovery=etcd:/URL:PORT/swarm \
--engine-opt "cluster-store=etcd://URL:PORT/store" \
--engine-opt "cluster-advertise=eth0:2376"

dustinblackman on 5 Jan 2016

👍1

Thanks for this tip @dustinblackman. This workaround is helping me a lot!
Is there any possibility to remove one of these machines after regenerating the swarm master?
It looks a bit confisung when the same server is listed twice with different names.

rm-jamotion on 2 Feb 2016

@rm-jamotion Without using docker-machine rm? You can delete the machines folder in ~/.docker/machine/machines.

dustinblackman on 2 Feb 2016

@dustinblackman Yes I know, but I have to remove the first machine using the aws driver. But it would be better if it is possible to remove the machine created with generic driver and move the keys to the aws machine. So the start/stop features of aws will stay available...

rm-jamotion on 7 Feb 2016

docker-machine version 0.7.0, build 783b3a8,

It's not only matter of IP address.Even without IP address change in Virtualbox driver I've noticed that regenerate-certs generates wrong key usage:

sudo openssl x509 -in /var/lib/boot2docker/server.pem -noout -text | grep -A8 "X509v3 extensions"
        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature, Key Encipherment, Key Agreement
            X509v3 Extended Key Usage: 
                TLS Web Server Authentication
            X509v3 Basic Constraints: critical
                CA:FALSE
            X509v3 Subject Alternative Name: 
                DNS:localhost, IP Address:10.10.0.148

In logs of docker daemon you can find:

2016-07-29 13:13:58.745094 I | http: TLS handshake error from 10.10.0.60:33214: tls: failed to verify client's certificate: x509: certificate specifies an incompatible key usage

In docker info when connected to swarm all nodes are Pending, and in swarm master logs:

time="2016-07-29T13:22:58Z" level=debug msg="Failed to validate pending node: The server probably has client authentication (--tlsverify) enabled. Please check your TLS client certification settings: Get https://10.10.0.60:2376/info: remote error: bad certificate" Addr="10.10.0.60:2376"