Hello, guys.
I've faced a strange error after upgrading from Docker 17.03.0 to Docker 17.12.0.
Any request to a docker machine returns next:
Unable to query docker version: Get https://10.0.47.10:2376/v1.15/version: x509: cannot validate certificate for 10.0.47.10 because it doesn't contain any IP SANs
Example: https://screencast.com/t/Kl1XNeQkRCpk
Here is what docker machine with Debug flag returns (this is not ls
, but env
command):
Docker Machine Version: 0.13.0, build 9ba6da9
Found binary path at /usr/local/bin/docker-machine
Launching plugin server for driver generic
Plugin server listening at address 127.0.0.1:54261
() Calling .GetVersion
Using API Version 1
() Calling .SetConfigRaw
() Calling .GetMachineName
(<machine name>) Calling .GetURL
Reading CA certificate from /<my home dir>/.docker/machine/certs/ca.pem
Reading client certificate from /<my home dir>/.docker/machine/certs/cert.pem
Reading client key from /<my home dir>/.docker/machine/certs/key.pem
Error checking TLS connection: Error checking and/or regenerating the certs: There was an error validating certificates for host "10.0.47.10:2376": x509: cannot validate certificate for 10.0.47.10 because it doesn't contain any IP SANs
You can attempt to regenerate them using 'docker-machine regenerate-certs [name]'.
Be advised that this will trigger a Docker daemon restart which might stop running containers.
Here is how it looks like: https://screencast.com/t/4ncZTclnd
docker-machine regenerate-certs [name]
- does not help at all :)
I've already:
~/.docker/
folder, reset Docker data to factory defaults and reinstalled Docker from scratch. So, Docker regenerated that folder.As I understood, x509 can't verify exactly that separate certificate which is stored along from machine certificates. I've checked it and it really does not contain any SAN. But it is auto-generated and is used for all machines as I see => should not contain and SAN.
On the other hand, every docker machine certificate contains SANs and looks correcly. See next listing...
There are two folders in ~/.docker/machine/
:
machines
folder with machines certificates in it. Each machine certificate looks like:openssl x509 -in ~/.docker/machine/machines/<machine name>/server.pem -text
Certificate:
Data:
Version: 3 (0x2)
Serial Number:
8e:d6:84:90:cf:46:a8:3c:17:56:d7:80:da:1f:9d:e2
Signature Algorithm: sha256WithRSAEncryption
Issuer: O=andrey
Validity
Not Before: Jan 24 10:16:00 2018 GMT
Not After : Jan 8 10:16:00 2021 GMT
Subject: O=andrey.<docker machine name>
Subject Public Key Info:
Public Key Algorithm: rsaEncryption
Public-Key: (2048 bit)
Modulus:
...
Exponent: 65537 (0x10001)
X509v3 extensions:
X509v3 Key Usage: critical
Digital Signature, Key Encipherment, Key Agreement
X509v3 Extended Key Usage:
TLS Web Server Authentication
X509v3 Basic Constraints: critical
CA:FALSE
X509v3 Subject Alternative Name:
DNS:localhost, IP Address:10.0.47.10
Signature Algorithm: sha256WithRSAEncryption
....
So, there are SANs: DNS:localhost, IP Address:10.0.47.10
and it should be valid.
certs
folder with some common certificate there. I'm not sure when Docker engine uses it. It looks like this:openssl x509 -in ~/.docker/machine/certs/cert.pem -text
Certificate:
Data:
Version: 3 (0x2)
Serial Number:
3f:09:e3:71:8d:4b:26:14:f1:e4:bb:20:97:ee:cd:98
Signature Algorithm: sha256WithRSAEncryption
Issuer: O=andrey
Validity
Not Before: Jan 24 10:52:00 2018 GMT
Not After : Jan 8 10:52:00 2021 GMT
Subject: O=andrey.<bootstrap>
Subject Public Key Info:
Public Key Algorithm: rsaEncryption
Public-Key: (2048 bit)
Modulus:
...
Exponent: 65537 (0x10001)
X509v3 extensions:
X509v3 Key Usage: critical
Digital Signature
X509v3 Extended Key Usage:
TLS Web Client Authentication
X509v3 Basic Constraints: critical
CA:FALSE
Environment:
_Local machine:_
Client:
Version: 17.12.0-ce
API version: 1.35
Go version: go1.9.2
Git commit: c97c6d6
Built: Wed Dec 27 20:03:51 2017
OS/Arch: darwin/amd64
Server:
Engine:
Version: 17.12.0-ce
API version: 1.35 (minimum version 1.12)
Go version: go1.9.2
Git commit: c97c6d6
Built: Wed Dec 27 20:12:29 2017
OS/Arch: linux/amd64
Experimental: true
docker-machine version 0.13.0, build 9ba6da9
OpenSSL 1.0.2n 7 Dec 2017
_Remote Docker machines:_
Client:
Version: 17.12.0-ce
API version: 1.35
Go version: go1.9.2
Git commit: c97c6d6
Built: Wed Dec 27 20:11:19 2017
OS/Arch: linux/amd64
Server:
Engine:
Version: 17.12.0-ce
API version: 1.35 (minimum version 1.12)
Go version: go1.9.2
Git commit: c97c6d6
Built: Wed Dec 27 20:09:53 2017
OS/Arch: linux/amd64
Experimental: false
generic
Ubuntu 16.04.3 xenial
OpenSSL 1.0.2g 1 Mar 2016
So, the question is: Is that some kind of bug in generic
driver and it should not verify that comon certificate? Is that Docker upgrade procedure does not clean up needed data? Did I do something wrong?
Thank you in advance.
Seems almost no one faced with this.
Could "resolve" on my end by switching back to 17.03.0 for a while.
While doing that discovered that Docker leaves a lot of generated files after upgrade/uninstall procedures. And the thing, it does not regenerate them after.
That's why to remove Docker fully I had to follow these instructions:
Another big problem I faced with after rolling back were certificates. If I import certificates to existing machines from the backup (this tool was used: https://www.npmjs.com/package/machine-share) I get this:
Unable to query docker version: Get https://10.0.41.76:2376/v1.15/version: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "serial:..........")
It's understandable, but the next scenario looks strange. I have few swarms, each contains few nodes. The thing is to resolve that certificate issue I had to regenerate certificates on all docker machines. When I regenerate certificate on a node, swarm kicks it (sometimes with strange connectivity issues after). After the node is kicked from the swarm all stacks/services disappear on the server in a moment.
So, I assume correct scenario here should be:
Causes a lot of manual work (especially in case swarm contains many nodes), but is stable.
Unfortunately, original issue is still persistent if I reinstall 17.12.0 even with cleanup. Not sure what causes it on the local machine. Upgrading all remote nodes using cleanup every time is obviously not an option, otherwise, a lot of generated files left in /etc/... and other locations, what makes automatic upgrade not possible sometimes.
Posted on wrong issue. Sorry.
@AndreyVoloshko I am getting the same error but this is with docker version 18.03.0-ce. No idea how to fix it and I cannot access my docker machines. Did you find any workaround besides going back to 17.03 ?
@manast
No, I did not find a workaround for that. Luckily we've started all OPS infrastructure refactoring and had to create all machines from scratch. That's why I've already forgotten about this thread. But seems the problem is quite rare.
yeah, rare but it happens... in my case it is a production environment, what happens if you have hundreds of machines and then suddenly you cannot access them?. out of business I guess :).
I've been battling the same thing for a day or so. I found that my docker client was talking to the daemon through my proxy server. I added an exclusion and now everything works
@jlanng Could you explain the exclusion that you added! I am also having this problem with docker version 18.06.0-ce
@dadoherty I think @jlanng means setting up NO_PROXY variable, with an entry for the address of the docker machine being accessed.
BTW thanks @jlanng for the solution - in my case it also was a proxy issue.
Most helpful comment
I've been battling the same thing for a day or so. I found that my docker client was talking to the daemon through my proxy server. I added an exclusion and now everything works