Machine: Docker machine can't validate certificate because it doesn't contain any IP SANs

Created on 24 Jan 2018  路  9Comments  路  Source: docker/machine

Hello, guys.

I've faced a strange error after upgrading from Docker 17.03.0 to Docker 17.12.0.

Any request to a docker machine returns next:

Unable to query docker version: Get https://10.0.47.10:2376/v1.15/version: x509: cannot validate certificate for 10.0.47.10 because it doesn't contain any IP SANs

Example: https://screencast.com/t/Kl1XNeQkRCpk

Here is what docker machine with Debug flag returns (this is not ls, but env command):

Docker Machine Version:  0.13.0, build 9ba6da9
Found binary path at /usr/local/bin/docker-machine
Launching plugin server for driver generic
Plugin server listening at address 127.0.0.1:54261
() Calling .GetVersion
Using API Version  1
() Calling .SetConfigRaw
() Calling .GetMachineName
(<machine name>) Calling .GetURL
Reading CA certificate from /<my home dir>/.docker/machine/certs/ca.pem
Reading client certificate from /<my home dir>/.docker/machine/certs/cert.pem
Reading client key from /<my home dir>/.docker/machine/certs/key.pem
Error checking TLS connection: Error checking and/or regenerating the certs: There was an error validating certificates for host "10.0.47.10:2376": x509: cannot validate certificate for 10.0.47.10 because it doesn't contain any IP SANs
You can attempt to regenerate them using 'docker-machine regenerate-certs [name]'.
Be advised that this will trigger a Docker daemon restart which might stop running containers.

Here is how it looks like: https://screencast.com/t/4ncZTclnd

docker-machine regenerate-certs [name] - does not help at all :)

I've already:

  • Restarted Docker daemon
  • Reset it to factory defaults and reinstalled it from scratch
  • Removed ~/.docker/ folder, reset Docker data to factory defaults and reinstalled Docker from scratch. So, Docker regenerated that folder.

As I understood, x509 can't verify exactly that separate certificate which is stored along from machine certificates. I've checked it and it really does not contain any SAN. But it is auto-generated and is used for all machines as I see => should not contain and SAN.
On the other hand, every docker machine certificate contains SANs and looks correcly. See next listing...

There are two folders in ~/.docker/machine/:

  • machines folder with machines certificates in it. Each machine certificate looks like:
openssl x509 -in ~/.docker/machine/machines/<machine name>/server.pem -text

Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            8e:d6:84:90:cf:46:a8:3c:17:56:d7:80:da:1f:9d:e2
    Signature Algorithm: sha256WithRSAEncryption
        Issuer: O=andrey
        Validity
            Not Before: Jan 24 10:16:00 2018 GMT
            Not After : Jan  8 10:16:00 2021 GMT
        Subject: O=andrey.<docker machine name>
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                Public-Key: (2048 bit)
                Modulus:
                   ...
                Exponent: 65537 (0x10001)
        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature, Key Encipherment, Key Agreement
            X509v3 Extended Key Usage: 
                TLS Web Server Authentication
            X509v3 Basic Constraints: critical
                CA:FALSE
            X509v3 Subject Alternative Name: 
                DNS:localhost, IP Address:10.0.47.10
    Signature Algorithm: sha256WithRSAEncryption
    ....

So, there are SANs: DNS:localhost, IP Address:10.0.47.10 and it should be valid.

  • certs folder with some common certificate there. I'm not sure when Docker engine uses it. It looks like this:
openssl x509 -in ~/.docker/machine/certs/cert.pem -text

Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            3f:09:e3:71:8d:4b:26:14:f1:e4:bb:20:97:ee:cd:98
    Signature Algorithm: sha256WithRSAEncryption
        Issuer: O=andrey
        Validity
            Not Before: Jan 24 10:52:00 2018 GMT
            Not After : Jan  8 10:52:00 2021 GMT
        Subject: O=andrey.<bootstrap>
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                Public-Key: (2048 bit)
                Modulus:
                    ...
                Exponent: 65537 (0x10001)
        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature
            X509v3 Extended Key Usage: 
                TLS Web Client Authentication
            X509v3 Basic Constraints: critical
                CA:FALSE

Environment:
_Local machine:_

  • OS: MacOS 10.13.2
  • Docker version:
Client:
 Version:   17.12.0-ce
 API version:   1.35
 Go version:    go1.9.2
 Git commit:    c97c6d6
 Built: Wed Dec 27 20:03:51 2017
 OS/Arch:   darwin/amd64

Server:
 Engine:
  Version:  17.12.0-ce
  API version:  1.35 (minimum version 1.12)
  Go version:   go1.9.2
  Git commit:   c97c6d6
  Built:    Wed Dec 27 20:12:29 2017
  OS/Arch:  linux/amd64
  Experimental: true
  • Docker machine version: docker-machine version 0.13.0, build 9ba6da9
  • OpenSSL version: OpenSSL 1.0.2n 7 Dec 2017

_Remote Docker machines:_

  • Docker version:
Client:
 Version:   17.12.0-ce
 API version:   1.35
 Go version:    go1.9.2
 Git commit:    c97c6d6
 Built: Wed Dec 27 20:11:19 2017
 OS/Arch:   linux/amd64

Server:
 Engine:
  Version:  17.12.0-ce
  API version:  1.35 (minimum version 1.12)
  Go version:   go1.9.2
  Git commit:   c97c6d6
  Built:    Wed Dec 27 20:09:53 2017
  OS/Arch:  linux/amd64
  Experimental: false
  • Docker machine driver: generic
  • Remote machine OS: Ubuntu 16.04.3 xenial
  • OpenSSL version: OpenSSL 1.0.2g 1 Mar 2016

So, the question is: Is that some kind of bug in generic driver and it should not verify that comon certificate? Is that Docker upgrade procedure does not clean up needed data? Did I do something wrong?

Thank you in advance.

Most helpful comment

I've been battling the same thing for a day or so. I found that my docker client was talking to the daemon through my proxy server. I added an exclusion and now everything works

All 9 comments

Seems almost no one faced with this.

Could "resolve" on my end by switching back to 17.03.0 for a while.

While doing that discovered that Docker leaves a lot of generated files after upgrade/uninstall procedures. And the thing, it does not regenerate them after.
That's why to remove Docker fully I had to follow these instructions:

Another big problem I faced with after rolling back were certificates. If I import certificates to existing machines from the backup (this tool was used: https://www.npmjs.com/package/machine-share) I get this:

Unable to query docker version: Get https://10.0.41.76:2376/v1.15/version: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "serial:..........")

It's understandable, but the next scenario looks strange. I have few swarms, each contains few nodes. The thing is to resolve that certificate issue I had to regenerate certificates on all docker machines. When I regenerate certificate on a node, swarm kicks it (sometimes with strange connectivity issues after). After the node is kicked from the swarm all stacks/services disappear on the server in a moment.

So, I assume correct scenario here should be:

  1. Take node out of the swarm.
  2. Regenerate certificate.
  3. Connect is back to swarm.

Causes a lot of manual work (especially in case swarm contains many nodes), but is stable.

Unfortunately, original issue is still persistent if I reinstall 17.12.0 even with cleanup. Not sure what causes it on the local machine. Upgrading all remote nodes using cleanup every time is obviously not an option, otherwise, a lot of generated files left in /etc/... and other locations, what makes automatic upgrade not possible sometimes.

Posted on wrong issue. Sorry.

@AndreyVoloshko I am getting the same error but this is with docker version 18.03.0-ce. No idea how to fix it and I cannot access my docker machines. Did you find any workaround besides going back to 17.03 ?

@manast
No, I did not find a workaround for that. Luckily we've started all OPS infrastructure refactoring and had to create all machines from scratch. That's why I've already forgotten about this thread. But seems the problem is quite rare.

yeah, rare but it happens... in my case it is a production environment, what happens if you have hundreds of machines and then suddenly you cannot access them?. out of business I guess :).

I've been battling the same thing for a day or so. I found that my docker client was talking to the daemon through my proxy server. I added an exclusion and now everything works

@jlanng Could you explain the exclusion that you added! I am also having this problem with docker version 18.06.0-ce

@dadoherty I think @jlanng means setting up NO_PROXY variable, with an entry for the address of the docker machine being accessed.

BTW thanks @jlanng for the solution - in my case it also was a proxy issue.

Was this page helpful?
0 / 5 - 0 ratings