Consider this scenario:
docker-machine create -d amazonec2 --swarm --swarm-master
(etc)docker-machine env
for new IP will complain about tls cert IP mismatchdocker-machine regenerate-certs
gets docker working again with docker-machine envdocker-machine env --swarm
however, will act like it's fine, but any docker or docker-compose commands will do nothing. No errors in cli, just nothing. docker images when not using --swarm IP will generate proper image list, but with --swarm IP it'll just list headers and no images.Is regenerate-certs
supposed to work with an existing swarm?
When you run swarm it listens on the public IP when it was first initialized. docker inspect
on the swarm manage
process looks something like this.
{
"Path": "/swarm",
"Args": [
"manage",
"--tlsverify",
"--tlscacert=/etc/docker/ca.pem",
"--tlscert=/etc/docker/server.pem",
"--tlskey=/etc/docker/server-key.pem",
"-H",
"tcp://0.0.0.0:3376",
"--strategy",
"spread",
"--advertise",
"PUBLICIP:2376",
"--replication",
"etcd://ectd.host:2379/swarm"
]
}
Quick (and kinda lazy) workaround I have found for this is simply rerunning docker-machine command but using the generic driver instead to setup swarm.
docker-machine --debug create NEWNAME -d generic \
--generic-ip-address SERVERIP \
--generic-ssh-key KEYPATH \
--generic-ssh-user core \
--engine-label public=false \
--swarm \
--swarm-master \
--swarm-opt replication \
--swarm-discovery=etcd:/URL:PORT/swarm \
--engine-opt "cluster-store=etcd://URL:PORT/store" \
--engine-opt "cluster-advertise=eth0:2376"
Thanks for this tip @dustinblackman. This workaround is helping me a lot!
Is there any possibility to remove one of these machines after regenerating the swarm master?
It looks a bit confisung when the same server is listed twice with different names.
@rm-jamotion Without using docker-machine rm
? You can delete the machines folder in ~/.docker/machine/machines
.
@dustinblackman Yes I know, but I have to remove the first machine using the aws driver. But it would be better if it is possible to remove the machine created with generic driver and move the keys to the aws machine. So the start/stop features of aws will stay available...
docker-machine version 0.7.0, build 783b3a8,
It's not only matter of IP address.Even without IP address change in Virtualbox driver I've noticed that regenerate-certs generates wrong key usage:
sudo openssl x509 -in /var/lib/boot2docker/server.pem -noout -text | grep -A8 "X509v3 extensions"
X509v3 extensions:
X509v3 Key Usage: critical
Digital Signature, Key Encipherment, Key Agreement
X509v3 Extended Key Usage:
TLS Web Server Authentication
X509v3 Basic Constraints: critical
CA:FALSE
X509v3 Subject Alternative Name:
DNS:localhost, IP Address:10.10.0.148
In logs of docker daemon you can find:
2016-07-29 13:13:58.745094 I | http: TLS handshake error from 10.10.0.60:33214: tls: failed to verify client's certificate: x509: certificate specifies an incompatible key usage
In docker info when connected to swarm all nodes are Pending, and in swarm master logs:
time="2016-07-29T13:22:58Z" level=debug msg="Failed to validate pending node: The server probably has client authentication (--tlsverify) enabled. Please check your TLS client certification settings: Get https://10.10.0.60:2376/info: remote error: bad certificate" Addr="10.10.0.60:2376"