I am using Docker Toolbox 1.8.2c with a local build of docker-machine using PR #1951. That PR fixes the ssh problems but now the generation/validation of certificates is broken. I do not know if the problem is due to the PR or is present on master.
After creating a machine, any attempt to use the certificates, e.g. running env
causes docker-machine to detect that the certs are invalid and regenerate them. The certs are never regenerated and copied successfully so all attempts to connect to the machine and use docker fail. I attempted debugging a bit and the certificate validation is failing in cert.go, line 205 _, err = tls.DialWithDialer(dialer, "tcp", addr, tlsConfig)
.
See https://gist.github.com/carolynvs/d98baf90172d386561e1 for the full output from calling docker-machine create default --driver virtualbox
on Windows 10.
The machine can't ever get its certificates installed properly:
$ docker-machine env default
Invalid certs detected; regenerating for 192.168.99.100:2376
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...
export DOCKER_TLS_VERIFY="1"
export DOCKER_HOST="tcp://192.168.99.100:2376"
export DOCKER_CERT_PATH="C:\Users\caro8994\.docker\machine\certs"
export DOCKER_MACHINE_NAME="default"
# Run this command to configure your shell:
# eval "$(C:\Program Files\Docker Toolbox\docker-machine.exe env default)"
caro8994@CAROLYNVANS87E4 MINGW64 ~
$ docker-machine env default
Invalid certs detected; regenerating for 192.168.99.100:2376
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...
export DOCKER_TLS_VERIFY="1"
export DOCKER_HOST="tcp://192.168.99.100:2376"
export DOCKER_CERT_PATH="C:\Users\caro8994\.docker\machine\certs"
export DOCKER_MACHINE_NAME="default"
# Run this command to configure your shell:
# eval "$(C:\Program Files\Docker Toolbox\docker-machine.exe env default)"
Here is the output from running docker-machine -D env default
https://gist.github.com/carolynvs/778e4533a26fd612732d.
Here is the output from running docker-machine -D regenerate-certs default
https://gist.github.com/carolynvs/ad82eb5fb9d7c42a3ed0
Thanks for the detailed summary. I've seen issues like this before as well and I'll look into it.
Are you on the latest VirutalBox? i.e. 5.0.6?
I was using the 5.0.4 which ships with the latest version of Docker Toolbox (1.8.2c). I just removed that version, installed 5.0.6 and I am experiencing the same behavior.
OK thanks.
@carolynvs If you remove the host only network that you have (can do this in VirtualBox GUI) and try again, does it work?
I deleted the machine, removed the adapter and tried again with the same result.
OK thanks. Very peculiar behavior. I might make a test build which dumps more information about the certs and suggest that you try that if you're agreeable.
Of course! I'm happy to help out however I can.
If you want to just make a branch and point me to it, I can build it myself (:heart: containerized builds!). That way you don't need to throw multiple builds over the wall if this takes more than one attempt.
Another thing to possibly consider while fixing this, some folks like myself actually write out the contents of docker-machine env
to a file which I'll source for each new terminal session (as it's a little faster than running docker-machine env
). If the output of this command contains anything that cannot be eval
d, it's obviously going to cause problems.
So lines like the following will cause issues:
Invalid certs detected; regenerating for 192.168.99.100:2376
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...
I experienced this issue on 0.5.0-dev
, but haven't experienced it since downgrading to 0.4.1
.
I experienced exactly the same behavior today on the release candidate.
Hi @carolynvs @blaggacao, thanks a lot for your feedback.
I'm trying to reproduce/fix this bug. Could you try this PR (https://github.com/docker/machine/pull/2006) that I created to help investigate the bug
Looks like I'm seeing this too. I'm using the latest master
build on OS X using the digitalocean
driver, so this definitely isn't anything to do with the environment. I think the area/windows
and area/driver-virtualbox
tags are irrelevant here :)
Hi @hairyhenderson, can you try build PR #2006 and tell me the output for docker-machine -D env default
?
@dgageot - will do when I get a chance.
I'm also thinking a bit more about this and realizing that I've been doing a _local_ build (i.e. make build
on OS X, without using a container). One of the areas where go build
has behaved differently in the past is around certificates (esp. root CA certs), so this _might_ be related to that... I dunno.
But I'll rebuild with #2006 and try it out. Thanks!
@hairyhenderson That's a good point. I'll run my tests with a cross-compiled docker-machine
@dgageot Here is the failed output https://gist.github.com/carolynvs/e2473d21c3376f1ebec2 from docker-machine -D env default
for a brand new machine.
I built #2006 and copied docker-machine.exe and docker-machine-driver-virtualbox.exe to the Docker Toolbox installation directory. I am using Docker Toolbox 1.8.2c on Windows 10.
I'm not sufficiently proficient as to know how to build, maybe I will have a look on it tis evening, if I can figure it out.
@carolynvs Thanks a lot. I still don't understand what's going on but your logs will help me.
@carolynvs Can you provide the output of:
VBoxManage list hostonlyifs
VBoxManage list dhcpservers
C:\Program Files\Oracle\VirtualBox>VBoxManage list hostonlyifs
Name: VirtualBox Host-Only Ethernet Adapter
GUID: 3729f60a-d9c3-4daa-96ca-7ce7bae4ddcc
DHCP: Disabled
IPAddress: 192.168.56.1
NetworkMask: 255.255.255.0
IPV6Address: fe80:0000:0000:0000:9d6d:4449:fce1:e1cb
IPV6NetworkMaskPrefixLength: 64
HardwareAddress: 0a:00:27:00:00:00
MediumType: Ethernet
Status: Up
VBoxNetworkName: HostInterfaceNetworking-VirtualBox Host-Only Ethernet Adapter
Name: VirtualBox Host-Only Ethernet Adapter #2
GUID: 99076a32-c9e5-4930-895a-a35ee45c2542
DHCP: Disabled
IPAddress: 192.168.99.1
NetworkMask: 255.255.255.0
IPV6Address: fe80:0000:0000:0000:118b:39e1:36b9:a336
IPV6NetworkMaskPrefixLength: 64
HardwareAddress: 0a:00:27:00:00:00
MediumType: Ethernet
Status: Up
VBoxNetworkName: HostInterfaceNetworking-VirtualBox Host-Only Ethernet Adapter #2
C:\Program Files\Oracle\VirtualBox>VBoxManage list dhcpservers
NetworkName: HostInterfaceNetworking-VirtualBox Host-Only Ethernet Adapter
IP: 192.168.56.100
NetworkMask: 255.255.255.0
lowerIPAddress: 192.168.56.101
upperIPAddress: 192.168.56.254
Enabled: Yes
NetworkName: HostInterfaceNetworking-VirtualBox Host-Only Ethernet Adapter #2
IP: 192.168.99.6
NetworkMask: 255.255.255.0
lowerIPAddress: 192.168.99.100
upperIPAddress: 192.168.99.254
Enabled: Yes
I have found that I still occasionally get double host only adapters. I just deleted them both and created a new machine. The certs are still regenerating when I run docker-machine env default
.
Here is the output of the VBoxManage commands the second time around (with only 1 host adapter).
C:\Program Files\Oracle\VirtualBox>VBoxManage list hostonlyifs
Name: VirtualBox Host-Only Ethernet Adapter
GUID: 2883b47a-862d-454e-9db7-42c3789585eb
DHCP: Disabled
IPAddress: 192.168.99.1
NetworkMask: 255.255.255.0
IPV6Address: fe80:0000:0000:0000:90ff:fd25:e5f0:8c92
IPV6NetworkMaskPrefixLength: 64
HardwareAddress: 0a:00:27:00:00:00
MediumType: Ethernet
Status: Up
VBoxNetworkName: HostInterfaceNetworking-VirtualBox Host-Only Ethernet Adapter
C:\Program Files\Oracle\VirtualBox>VBoxManage list dhcpservers
NetworkName: HostInterfaceNetworking-VirtualBox Host-Only Ethernet Adapter
IP: 192.168.99.6
NetworkMask: 255.255.255.0
lowerIPAddress: 192.168.99.100
upperIPAddress: 192.168.99.254
Enabled: Yes
@carolynvs I have no idea so far.
I pushed a couple more commits on the PR to print more information and try things.
If you have time to update the output you get, that'd be just great.
ping @nathanleclaire @dmp42 any idea?
Here's the new output: https://gist.github.com/carolynvs/84cd140bcbf9b696e20f.
Let me know if there's another way to go about debugging the connection problem. I'm not quite sure what docker-machine is detecting that is causing it to regenerate the certs but am happy to poke around in /var/lib/boot2docker on the host or compare certs between windows and the host, etc if I knew what to look for.
@carolynvs That would be awesome. As you pointed out, the problem arises in cert.go
:
Certs are not valid: read tcp 192.168.99.1:49755->192.168.99.100:2376: wsarecv: An established connection was aborted by the software in your host machine.
Either the certificate are not properly copied onto the vm.
Or the vm is not reachable on port 192.168.99.100:2376
(host network config? firewall, vpn? vm network config?)
Or there's a problem in the way we check.
If you export the env variables given by docker-machine env
and ignore the errors, are you able to connect to the docker daemon?
I can ping the docker host and ssh into it. When I ignore the messages about regnerating certs from docker-machine env
and set the variables manually, I am still unable to connect with the docker client.
An error occurred trying to connect: Get https://192.168.99.101:2376/v1.20/containers/json: WSARecv tcp 192.168.99.1:50072: An established connection was aborted by the software in your host machine.
The certs on the host in /var/lib/boot2docker/tls/
do not match those locally in ~/.docker/machine/machines/default/
. The certs in /var/lib/boot2docker/
match what is on my local machine. Also the certs in ~/.docker/machine/certs/
matches what is in ~/.docker/machine/machines/default/
.
I'm guessing that the issue lies with the certs not matching, which prevents docker-machine from securely connecting to the docker daemon, thus triggering a cert regen?
I've verified that the docker daemon is running:
docker@default2:/var/log$ ps aux | grep docker
root 2439 0.1 1.9 122904 19872 ? Sl 13:23 0:00 /usr/local/bin/docker daemon -D -g /var/lib/docker -H unix:// -H tcp://0.0.0.0:2376 --label provider=virtualbox --tlsverify --tlscacert=/var/lib/boot2docker/ca.pem --tlscert=/var/lib/boot2docker/server.pem --tlskey=/var/lib/boot2docker/server-key.pem -s aufs
Also here are the logs from boot2docker and docker: https://gist.github.com/carolynvs/f7965455ebbceb85d4e6
:+1: Thanks! I feel we're getting closer :smile:
IIRC, the certs in /var/lib/boot2docker/tls
are generated server side by a startup script in the boot2docker OS and not used for anything in the current Machine model (they are a relic of how boot2docker-cli historically expected the certificates to be set up).
@carolynvs @nathanleclaire I have no idea then. The only difference I have in my logs is that I'm using VBox version 5.0.6 and a more recent boot2docker.
@carolynvs Can you try connecting to the docker daemon using curl? We might get a better feedback on what's going wrong. I think you are on windows, so I don't really know how to achieve that but here's how I did it on OSX:
$ openssl pkcs12 -export -in ~/.docker/machine/certs/cert.pem -inkey ~/.docker/machine/certs/key.pem -out ~/.docker/machine/certs/cert.pfx -password pass:supersecret
$ curl -v --cacert ~/.docker/machine/machines/default/ca.pem --cert ~/.docker/machine/certs/cert.pfx --pass supersecret https://192.168.99.100:2376/version
* Trying 192.168.99.100...
* Connected to 192.168.99.100 (192.168.99.100) port 2376 (#0)
* WARNING: SSL: Certificate type not set, assuming PKCS#12 format.
* Client certificate: dgageot
* WARNING: using IP address, SNI is being disabled by the OS.
* TLS 1.2 connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
* Server certificate: default
* Server certificate: dgageot
> GET /version HTTP/1.1
> Host: 192.168.99.100:2376
> User-Agent: curl/7.43.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Type: application/json
< Server: Docker/1.8.3 (linux)
< Date: Tue, 20 Oct 2015 14:47:14 GMT
< Content-Length: 192
<
{"Version":"1.8.3","ApiVersion":"1.20","GitCommit":"f4bf5c7","GoVersion":"go1.4.2","Os":"linux","Arch":"amd64","KernelVersion":"4.1.10-boot2docker","BuildTime":"Mon Oct 12 18:01:15 UTC 2015"}
* Connection #0 to host 192.168.99.100 left intact
FTR, here's the tutorial I used to make it work: http://opensolitude.com/2015/07/12/curl-docker-remote-api-os-x.html
@dgageot Ooh, yes this seems to be a problem on my machine (using curl/openssl from Git for Windows so all the commands are the same).
$ openssl pkcs12 -export -in ~/.docker/machine/certs/cert.pem -inkey ~/.docker/machine/certs/key.pem -out ~/.docker/machine/certs/cert.pfx -password pass:supersecret
Loading 'screen' into random state - done
caro8994@CAROLYNVANS87E4 MINGW64 ~
$ docker-machine ip default
192.168.99.100
caro8994@CAROLYNVANS87E4 MINGW64 ~
$ curl -v --cacert ~/.docker/machine/machines/default/ca.pem --cert ~/.docker/machine/certs/cert.pfx --pass supersecret https://192.168.99.100:2376/version
* timeout on name lookup is not supported
* Trying 192.168.99.100...
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Connected to 192.168.99.100 (192.168.99.100) port 2376 (#0)
* ALPN, offering http/1.1
* could not load PEM client certificate, OpenSSL error error:0906D06C:PEM routines:PEM_read_bio:no start line, (no key found, wrong pass phrase, or wrong file format?)
* Closing connection 0
curl: (58) could not load PEM client certificate, OpenSSL error error:0906D06C:PEM routines:PEM_read_bio:no start line, (no key found, wrong pass phrase, or wrong file format?)
I checked all the certs in ~/.docker/machine/certs using vi -b path/to/cert
and verified that it has unix line endings. I also used the following command to try to check if openssl was able to read them.
$ openssl x509 -in .docker/machine/certs/cert.pem -inform PEM -text -noout
I'll keep poking around with certs, as this seems like the issue. Maybe try it out on another machine and see if it's just a Windows 10 thing.
@carolynvs Great job! I'll check that tomorrow morning (Paris time)
Hi @carolynvs, have you tried this command on ca.pem
too?
openssl x509 -in ~/.docker/machine/machines/default/ca.pem -inform PEM -text -noout
Can you check that it properly starts with -----BEGIN CERTIFICATE-----
and ends with -----END CERTIFICATE-----
. Nothing before and after.
@carolynvs I must admit I don't know what's going on. Have you tried this PR which seems vaguely related.
If you don't mind confirming this intermediate summary, so I can silently spend brain on this:
I;m sure, you already checked: http://stackoverflow.com/questions/20837161/openssl-pem-routinespem-read-biono-start-linepem-lib-c703expecting-truste
I put it for reference for others.
I just tried a different curl command using --cert and --key instead of the generated pfx file and it is able to connect.
$ curl --cacert ~/.docker/machine/machines/bugtest/ca.pem --cert ~/.docker/machine/machines/bugtest/cert.pem --key ~/.docker/machine/machines/bugtest/key.pem https://$(docker-machine ip bugtest):2376/version
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 192 100 192 0 0 1761 0 --:--:-- --:--:-- --:--:-- 1761{"Version":"1.8.3","ApiVersion":"1.20","GitCommit":"f4bf5c7","GoVersion":"go1.4.2","Os":"linux","Arch":"amd64","KernelVersion":"4.1.10-boot2docker","BuildTime":"Mon Oct 12 18:01:15 UTC 2015"}
Looking more closely at the output of docker-machine env
I see that it is exporting what I think is a bad cert path. On my mac this points to .docker/machines/machine/
$ docker-machine env bugtest
Certs are not valid: remote error: bad certificate
Invalid certs detected; regenerating for 192.168.99.102:2376
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...
export DOCKER_TLS_VERIFY="1"
export DOCKER_HOST="tcp://192.168.99.102:2376"
export DOCKER_CERT_PATH="C:\Users\caro8994\.docker\machine\certs"
export DOCKER_MACHINE_NAME="bugtest"
# Run this command to configure your shell:
# eval "$(C:\Program Files\Docker Toolbox\docker-machine.exe env bugtest)"
After manually setting the environment variables, changing the cert path to what I think it should have been, I can connect with the docker client.
Perhaps when docker-machine is testing if it can connect, it is using the wrong certs?
I added some debug info when validating certs and then tried manually connecting using first what docker-machine is using then what I think should be used.
caro8994@CAROLYNVANS87E4 MINGW64 ~
$ docker-machine env bugtest
HOST URL=192.168.99.102:2376
CA CERT PATH=C:\Users\caro8994\.docker\machine\certs\ca.pem
SERVER CERT PATH=C:\Users\caro8994\.docker\machine\machines\bugtest\server.pem
SERVER KEY PATH=C:\Users\caro8994\.docker\machine\machines\bugtest\server-key.pem
Certs are not valid: read tcp 192.168.99.1:50658->192.168.99.102:2376: wsarecv: An established connection was aborted by the software in your host machine.
Invalid certs detected; regenerating for 192.168.99.102:2376
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...
export DOCKER_TLS_VERIFY="1"
export DOCKER_HOST="tcp://192.168.99.102:2376"
export DOCKER_CERT_PATH="C:\Users\caro8994\.docker\machine\certs"
export DOCKER_MACHINE_NAME="bugtest"
# Run this command to configure your shell:
# eval "$(C:\Program Files\Docker Toolbox\docker-machine.exe env bugtest)"
caro8994@CAROLYNVANS87E4 MINGW64 ~
$ curl --cacert ~/.docker/machine/certs/ca.pem --cert ~/.docker/machine/machines/bugtest/server.pem --key ~/.docker/machine/machines/bugtest/server-key.pem https://$(docker-machine ip bugtest):2376/version
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (35) error:14094412:SSL routines:ssl3_read_bytes:sslv3 alert bad certificate
caro8994@CAROLYNVANS87E4 MINGW64 ~
$ curl --cacert ~/.docker/machine/certs/ca.pem --cert ~/.docker/machine/machines/bugtest/cert.pem --key ~/.docker/machine/machines/bugtest/key.pem https://$(docker-machine ip bugtest):2376/version
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 192 100 192 0 0 472 0 --:--:-- --:--:-- --:--:-- 472{"Version":"1.8.3","ApiVersion":"1.20","GitCommit":"f4bf5c7","GoVersion":"go1.4.2","Os":"linux", "Arch":"amd64","KernelVersion":"4.1.10-boot2docker","BuildTime":"Mon Oct 12 18:01:15 UTC 2015"}
So I see 2 suspicious things:
Thanks a lot @carolynvs that should really help. Before I digest everything you reported, can you try the latest version of https://github.com/docker/machine/pull/2006 It should print the certificates being used for validation. That should help
Here are the certs it is using
HOST URL=192.168.99.102:2376
CA CERT PATH=C:\Users\caro8994.docker\machine\certsca.pem
SERVER CERT PATH=C:\Users\caro8994.docker\machine\machines\bugtest\server.pem
SERVER KEY PATH=C:\Users\caro8994.docker\machine\machines\bugtest\server-key.pem
That's from my own debug info, not the PR which takes a long time to build now that it's building all the plugins. :smile:
OK, now I'm confused, so I'll just try a recap.
Can you confirm that:
~/.docker/machine/certs/ca.pem
is the same as ~/.docker/machine/machines/bugtest/ca.pem
~/.docker/machine/certs/cert.pem
is the same as ~/.docker/machine/machines/bugtest/cert.pem
~/.docker/machine/certs/key.pem
is the same as ~/.docker/machine/machines/bugtest/key.pem
docker
cli reach the server. Which value for DOCKER_CERT_PATH
did you use then?docker-machine env bugtest
prints export DOCKER_CERT_PATH="~/.docker/machine"
and not DOCKER_CERT_PATH="~/.docker/machine/certs"
Thanks again for the support!
@carolynvs FTR, cross-building only docker-machine, only for windows should be much faster: TARGET_ARCH=amd64 TARGET_OS=windows make build-x-machine
Sorry for the brain dump!
~/.docker/machine/certs
and ~/.docker/machine/machines/bugtest
DOCKER_CERT_PATH
to ~.docker/machine/machines/bugtest
docker-machine env
sets DOCKER_CERT_PATH="~/.docker/machine/machines/bugtest"
. On Windows 10 (which doesn't), the same command is resulting in DOCKER_CERT_PATH="~/.docker/machine/certs"
This was in my brain dump but may have gotten lost. When docker-machine is validating the certificates, it is attempting to connect to the docker daemon using server.pem and server-key.pem. This seems super fishy.
OK. Let's call @nathanleclaire and @ehazlett to the rescue. I think you nailed it but now, I'm too new to the project to understand why we have so many duplicated certificates and why we don't use the right ones.
Thanks for the build tip!
Below is the relevant output from the latest build of PR #2006 and here's the full output: https://gist.github.com/carolynvs/8b7034c26fe3a764c537
Reading CA certificate from C:\Users\caro8994\.docker\machine\certs\ca.pem
Reading server certificate from C:\Users\caro8994\.docker\machine\machines\bugtest\server.pem
Reading server key from C:\Users\caro8994\.docker\machine\machines\bugtest\server-key.pem
Sorry for the closed/reopened noise. I fumbled
Oi vey. @carolynvs @dgageot you all are champs for continuing to chase this one down. I think Carolyn's suspicion is correct: if the DOCKER_CERT_PATH
is not getting set correctly, then communication with the daemon will not work properly. It sounds like it may be an issue with paths that I introduced inadvertently in the libmachine
changes. I'll continue investigating this and poking around at your findings so far.
May the gateway to the culprit be this line, then?
https://github.com/docker/machine/blob/8aa1572e0dcd75762a7627e1056ef104317f44b9/libmachine/persist/filestore.go#L155
@blaggacao Definitely strongly in the realm of possibility - that code tends to be a little brittle and has been problematic in the past.
I don't get how this can be different on windows and mac os, though as @carolynvs confirmed.
To me it clearly constructs the .docker\machine\certs
path.
diff .docker/machine/certs/ca.pem .docker/machine/machines/oca/ca.pem
diff .docker/machine/certs/cert.pem .docker/machine/machines/oca/cert.pem
diff .docker/machine/certs/key.pem .docker/machine/machines/oca/key.pem
remains silent.
@blaggacao clearly, I don't have the same behaviour as @carolynvs on mac. So there is something fishy.
Yeah, the certs get copied over to that machine's directory during the provisioning bit.
@dgageot Apologies for the confusion. My mac is running docker-machine 0.4.1. I'm only running the PR build on my Windows machine as I've been testing out fixes there as they are merged into master.
I can do a build and run again on my mac right now.
I resume:
/machine/certs
and /machine/machines/certs
When manually setting DOCKER_CERT_PATH on windows (in bash), you should use UNIX style paths. For example, export DOCKER_CERT_PATH="~./docker/machine/machines/oca"
.
I can confirm that on my (wonky) machine the certs do match between /machine/certs and /machine/machines/certs.
I can confirm, by manual copying, as scp doesn't work:
diff ca.pem.local ca.pem.vm FALSE
diff server.pem.local server.pem.vm TRUE
diff key.pam.local key.pem.vm TRUE
the second and third ones differ between /machines/oca
and oca:~/.docker
Which paths in the VM are you using for the certs @blaggacao ?
I just realized, it was the wrong one...
I checked against ~/.docker
I'll check again against /var/lib/boot2docker
I can definitley confirm that
/machines/oca
and oca:/var/lib/boot2docker/
are the samedos2unix
on all 3 fiels ca.pem
, server.pem
, sever-key.pem
on oca
)I additionally get this timeout error: https://github.com/docker/machine/blob/6a5219b879db52698ccb2b7e0aafd516b34df839/libmachine/provision/boot2docker.go#L193
every time I run env
either with the --native-ssh
flag or not
Yeah, @blaggacao it also looks like the host only IP assigned to the VM is not reachable on your computer. Can you ping $(docker-machine ip vmname)
?
nope, doesn't work either... "Request timed out"
docker-machine ssh vmname
works though
Yeah, ssh
goes through localhost
. But it seems that you cannot contact the assigned host only VM IP, so I wouldn't expect env
to function correctly. Are you using any VPN or proxy?
not, that I would be aware of, just doublechecked the task manager... UPDATE detected one, closing
Closing does't change anything, but this is another issue, I think...
leads me to
As I don't get any of: https://github.com/docker/machine/blob/56f457c2ef6e306fb1815b6b125f98c85a6e92ec/libmachine/cert/cert.go#L22
the only remaining candidates are:
https://github.com/docker/machine/blob/56f457c2ef6e306fb1815b6b125f98c85a6e92ec/libmachine/cert/cert.go#L198-L205
This smells like a connection between both problems. Can you interprete my line of thought?
I didn't trust my Windows environment anymore so I started over and rebuilt Windows then put on #2006.
In the docker.log file I see this error
2015/10/21 17:06:23 http: TLS handshake error from 192.168.99.1:50386: tls: failed to verify client's certificate: x509: certificate has expired or is not yet valid
so I checked the cert's dates
$ openssl x509 -in server.pem -noout -dates
notBefore=Oct 21 22:00:00 2015 GMT
notAfter=Oct 5 22:00:00 2018 GMT
Could the problem be that the certificate is future dated? That would explain why originally my curl commands didn't work but a few hours later they did.
same here:
$ openssl x509 -in .docker/machine/machines/oca/server.pem -noout -dates
notBefore=Oct 21 22:00:00 2015 GMT
notAfter=Oct 5 22:00:00 2018 GMT
That's in roughly 5 hours in my timezone (Bogota/Americas) Well, but it says GMT (UTC). Bogota is UTC-5
docker@oca:~$ time
BusyBox v1.23.1 (2015-02-22 15:53:49 UTC) multi-call binary.
Update: FIX
As stated here: https://github.com/docker/docker/issues/11534#issuecomment-89405874
docker-machine ssh vmname
sudo ntpclient -s -h pool.ntp.org
yielded me a different error (one step at a time :)
I think this is it, the rest is my virtualbox.
I am going to eat dinner and check back in 5 hours when I suspect my cert will be valid and everything will just work. :smile:
Bad news, I've to do this on every vm restart.
:smile: I guess you hit the root cause! Thanks!
:clap: :clap: :clap: :clap: :clap: :clap: :clap:
@carolynvs did the fix i posted work for you?
I just wanted to confirm that after waiting 5 hours until the cert was valid, docker-machine env works. No clue why I am getting certs that are future dated but maybe should update issue to reflect the real root cause now that we know.
In my case not the certs was the issue, but the time setting on boot2docker... As I can see on your github profile, you're from Chicago, that's a similar timezone to Bogota, maybe the boot2docker gets setup wrongly in our timezones...
After syncing the time using your workaround, I still receive the same error (certificate has expired or is not yet valid) when using those certs to connect to my docker host.
On my mac, this is what I see after making a new box and checking its time.
docker@bugtest:~$ time
BusyBox v1.23.1 (2015-02-22 15:53:49 UTC) multi-call binary.
docker@bugtest:~$ hwclock
Thu Oct 22 15:54:29 2015 0.000000 seconds
docker@bugtest:~$ date
Thu Oct 22 15:54:06 UTC 2015
docker@bugtest:~$ openssl x509 -in /var/lib/boot2docker/server.pem -noout -dates
notBefore=Oct 22 15:48:00 2015 GMT
notAfter=Oct 6 15:48:00 2018 GMT
Here is the same commands on a new host on windows:
docker@bugtest:~$ time
BusyBox v1.23.1 (2015-02-22 15:53:49 UTC) multi-call binary.
docker@bugtest:~$ hwclock
Thu Oct 22 15:58:56 2015 0.000000 seconds
docker@bugtest:~$ date
Thu Oct 22 10:58:58 UTC 2015
docker@bugtest:~$ openssl x509 -in /var/lib/boot2docker/server.pem -noout -dates
notBefore=Oct 22 15:45:00 2015 GMT
notAfter=Oct 6 15:45:00 2018 GMT
The date is showing my local time but thinks that it is UTC and I do not know how to update it to match the hwclock. I've tried manually changing date but there is something about either busybox or virtualbox that is immediately undoing any change.
This is working state as of yesterday after applying the workaround:
docker@oca:~$ time
BusyBox v1.23.1 (2015-02-22 15:53:49 UTC) multi-call binary.
docker@oca:~$ hwclock
Thu Oct 22 10:10:46 2015 0.000000 seconds
docker@oca:~$ date
Thu Oct 22 16:28:19 UTC 2015
docker@oca:~$
docker@oca:~$ openssl x509 -in /var/lib/boot2docker/server.pem -noout -dates
notBefore=Oct 21 22:32:00 2015 GMT
notAfter=Oct 5 22:32:00 2018 GMT
here, date
corresponds to my local time expressed in UTC
some hints for my symtopms: https://forums.virtualbox.org/viewtopic.php?f=3&t=60558#p281836
time
is frozen, after 10 min: docker@oca:~$ time
BusyBox v1.23.1 (2015-02-22 15:53:49 UTC) multi-call binary.
as date
is showing correct date in my case, I assume the workaround fixed date in my case and therefore the issue.
cc @tianon @SvenDowideit PTAL at the above RE: boot2docker time/date issues ^^
Some code I've found might be contributing to the timestamp issue:
https://github.com/docker/machine/blob/master/libmachine/cert/cert.go#L53-L56
But it's always been working fine before.
@carolynvs @blaggacao Have you run into these issues still?
For me it is working after the referenced work-around. This, in turn, indicates, that some boot2docker time parameter where not set correctly. Typically it would then occur only during a limited time frame right after the machine creation. (Probably only in some time zones).
This again, would mean, that the certs timestamps would be correct.
I did stumble over this again just now after restarting the pc on my rc, but after update to 5.0 everything seems to work. We could probably close this for now. Anyhow, As soon as I notice strange behavior I would reopen it.
https://gist.github.com/damontic/bd60b6a18cacf635dc9c
I have this problem too. Don't close it.
@damontic That looks like a different issue than the one being discussed here.
I'm trying to setup a swarm on DigitalOcean and I have the same error.
init-do.sh file that create a KV store, a swarm master and a node:
# KV Store
docker-machine create \
--driver digitalocean \
--digitalocean-access-token ${TOKEN} \
--digitalocean-region "lon1" \
--digitalocean-size "1gb" \
consul
eval "$(docker-machine env consul)"
docker run -d -p "8500:8500" -h "consul" progrium/consul -server -bootstrap
sleep 5
# Swarm master
docker-machine create \
--driver digitalocean \
--digitalocean-access-token ${TOKEN} \
--digitalocean-region "lon1" \
--digitalocean-size "1gb" \
--swarm --swarm-image="swarm" --swarm-master \
--swarm-discovery="consul://$(docker-machine ip consul):8500" \
--engine-opt="cluster-store=consul://$(docker-machine ip consul):8500" \
--engine-opt="cluster-advertise=eth1:2376" \
demo0
sleep 5
# Swarm node
docker-machine create \
--driver digitalocean \
--digitalocean-access-token ${TOKEN} \
--digitalocean-region "lon1" \
--digitalocean-size "1gb" \
--swarm --swarm-image="swarm:1.0.0-rc2" \
--swarm-discovery="consul://$(docker-machine ip consul):8500" \
--engine-opt="cluster-store=consul://$(docker-machine ip consul):8500" \
--engine-opt="cluster-advertise=eth1:2376" \
demo1
The log / error I get
$> ./init-do.sh
Running pre-create checks...
Creating machine...
(consul) OUT | Creating SSH key...
(consul) OUT | Creating Digital Ocean droplet...
(consul) OUT | Waiting for IP address to be assigned to the Droplet...
Waiting for machine to be running, this may take a few minutes...
Machine is running, waiting for SSH to be available...
Detecting operating system of created instance...
Detecting the provisioner...
Provisioning created instance...
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...
To see how to connect Docker to this machine, run: docker-machine env consul
Unable to find image 'progrium/consul:latest' locally
latest: Pulling from progrium/consul
3b4d28ce80e4: Pull complete
...
d9125e9e799b: Pull complete
Digest: sha256:8cc8023462905929df9a79ff67ee435a36848ce7a10f18d6d0faba9306b97274
Status: Downloaded newer image for progrium/consul:latest
ab964fd70394d34f8d1de5c76246490b5857adaffbc1c02235bdc53663c33b37
Running pre-create checks...
Creating machine...
(demo0) OUT | Creating SSH key...
(demo0) OUT | Creating Digital Ocean droplet...
(demo0) OUT | Waiting for IP address to be assigned to the Droplet...
Waiting for machine to be running, this may take a few minutes...
Machine is running, waiting for SSH to be available...
Detecting operating system of created instance...
Detecting the provisioner...
Provisioning created instance...
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...
Error creating machine: Error running provisioning: Unable to verify the Docker daemon is listening: Maximum number of retries (5) exceeded
Running pre-create checks...
Creating machine...
(demo1) OUT | Creating SSH key...
(demo1) OUT | Creating Digital Ocean droplet...
(demo1) OUT | Waiting for IP address to be assigned to the Droplet...
Waiting for machine to be running, this may take a few minutes...
Machine is running, waiting for SSH to be available...
Detecting operating system of created instance...
Detecting the provisioner...
Provisioning created instance...
Error creating machine: Error running provisioning: Something went wrong running an SSH command!
command : sudo apt-get update
err : exit status 100
output : Ign http://mirrors.digitalocean.com trusty InRelease
Get:1 http://mirrors.digitalocean.com trusty-updates InRelease [64.4 kB]
Hit http://mirrors.digitalocean.com trusty Release.gpg
Hit http://mirrors.digitalocean.com trusty Release
Get:2 http://mirrors.digitalocean.com trusty-updates/main Sources [244 kB]
Get:3 http://mirrors.digitalocean.com trusty-updates/universe Sources [144 kB]
Get:4 http://mirrors.digitalocean.com trusty-updates/main amd64 Packages [652 kB]
Get:5 http://mirrors.digitalocean.com trusty-updates/universe amd64 Packages [331 kB]
Get:6 http://mirrors.digitalocean.com trusty-updates/main i386 Packages [631 kB]
Get:7 http://mirrors.digitalocean.com trusty-updates/universe i386 Packages [332 kB]
Get:8 http://mirrors.digitalocean.com trusty-updates/main Translation-en [319 kB]
Get:9 http://security.ubuntu.com trusty-security InRelease [64.4 kB]
Get:10 http://mirrors.digitalocean.com trusty-updates/universe Translation-en [173 kB]
Hit http://mirrors.digitalocean.com trusty/main Sources
Hit http://mirrors.digitalocean.com trusty/universe Sources
Hit http://mirrors.digitalocean.com trusty/main amd64 Packages
Hit http://mirrors.digitalocean.com trusty/universe amd64 Packages
Hit http://mirrors.digitalocean.com trusty/main i386 Packages
Hit http://mirrors.digitalocean.com trusty/universe i386 Packages
Hit http://mirrors.digitalocean.com trusty/main Translation-en
Hit http://mirrors.digitalocean.com trusty/universe Translation-en
Ign http://mirrors.digitalocean.com trusty/main Translation-en_US
Ign http://mirrors.digitalocean.com trusty/universe Translation-en_US
Get:11 http://security.ubuntu.com trusty-security/main Sources [99.2 kB]
Get:12 http://security.ubuntu.com trusty-security/universe Sources [32.5 kB]
Get:13 http://security.ubuntu.com trusty-security/main amd64 Packages [370 kB]
Get:14 http://security.ubuntu.com trusty-security/universe amd64 Packages [122 kB]
Get:15 http://security.ubuntu.com trusty-security/main i386 Packages [350 kB]
Get:16 http://security.ubuntu.com trusty-security/universe i386 Packages [123 kB]
Get:17 http://security.ubuntu.com trusty-security/main Translation-en [200 kB]
Get:18 http://security.ubuntu.com trusty-security/universe Translation-en [69.6 kB]
Fetched 4,323 kB in 4s (925 kB/s)
W: Failed to fetch http://security.ubuntu.com/ubuntu/dists/trusty-security/universe/i18n/Translation-en Hash Sum mismatch
E: Some index files failed to download. They have been ignored, or old ones used instead.
Before running this, I updated to Machine 0.5.1
$ docker-machine -v
docker-machine version 0.5.1 (7e8e38e)
I can move to the context of machine "consul" but not to the "demo0" or "demo1"
$ docker-machine env consul
export DOCKER_TLS_VERIFY="1"
export DOCKER_HOST="tcp://178.62.93.196:2376"
export DOCKER_CERT_PATH="/Users/luc/.docker/machine/machines/consul"
export DOCKER_MACHINE_NAME="consul"
# Run this command to configure your shell:
# eval "$(/usr/local/bin/docker-machine env consul)"
$ docker-machine env demo0
Error running connection boilerplate: Error checking and/or regenerating the certs: There was an error validating certificates for host "46.101.74.179:2376": dial tcp 46.101.74.179:2376: getsockopt: connection refused
You can attempt to regenerate them using 'docker-machine regenerate-certs name'.
Be advised that this will trigger a Docker daemon restart which will stop running containers.
$ docker-machine env demo1
Error running connection boilerplate: Error checking and/or regenerating the certs: There was an error validating certificates for host "46.101.17.195:2376": open /Users/luc/.docker/machine/machines/demo1/server.pem: no such file or directory
You can attempt to regenerate them using 'docker-machine regenerate-certs name'.
Be advised that this will trigger a Docker daemon restart which will stop running containers.
@lucj If provisioning fails, the created instances will be "invalid". Try removing them and starting again from scratch.
@nathanleclaire I've just deleted the machines (is 'docker-machine rm consul demo0 demo1' enough or should I manually delete some other stuff ?) and rerun with setup file and I got the same certs problem (when creating on DigitalOcean). Strange thing is that there is no problem with the 'consul' machine, but only with the swarm ones (demo0, demo1).
When creating the swarm on VirtualBox (5.0.10) it's working fine though.
i'm seeing this when using aws driver
I've done several tests (a lot actually), after having deleted the VM and recreated them (with a swarm) I still have the same problem.
I now have this issue after upgrading from version 1.8 to 1.9.1 using the docker toolbox on MacOSX 10.10.5
Error running connection boilerplate: Error checking and/or regenerating the certs: There was an error validating certificates for host "192.168.99.100:2376": dial tcp 192.168.99.100:2376: getsockopt: connection refused
You can attempt to regenerate them using 'docker-machine regenerate-certs name'.
Be advised that this will trigger a Docker daemon restart which will stop running containers.
command failed; 1
This is happening to me periodically too. Docker v1.9.1
Same problem here with azure driver. Every time that we I create a new azure machine it fails with the error:
Error creating machine: Error checking the host: Error checking and/or regenerating the certs: There was an error validating certificates for host "testcargo2-prefapp-in.cloudapp.net:2376": tls: DialWithDialer timed out
You can attempt to regenerate them using 'docker-machine regenerate-certs [name]'
After running docker-machine regenerate-certs
the certs validations works ok.
In docker-machine v0.5.5 there is no problem, and the creation of a docker host works ok:
Running pre-create checks...
Creating machine...
(testcargo3-prefapp-in) Creating Azure machine...
Waiting for machine to be running, this may take a few minutes...
Machine is running, waiting for SSH to be available...
Detecting operating system of created instance...
Detecting the provisioner...
Provisioning with ubuntu(upstart)...
Installing Docker...
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...
Checking connection to Docker...
Docker is up and running!
To see how to connect Docker to this machine, run: docker-machine env
@alambike You're hitting this issue with 0.6.0?
Yep, from 0.5.5 onwards. I have test this with 0.5.6 and 0.6.0.
same for me on 0.6.0 with aws driver (constantly) on mac 10.10.5. Not happening with virtual box driver.
fixed after changing --engine-opt="cluster-advertise=eth1:2376"
to --engine-opt="cluster-advertise=eth0:2376"
using docker-machine 0.6.0 (docker-machine 0.5.4 still fails)
I think im battling the same issue on my machine. I'm using ubuntu 14.04
docker-machine version 0.5.5, build 02c4254
Running host on RHEL 7.1
Server Version: 1.10.2-cs1-rc3
Tried everything suggested with time on the machines, here is output i get from curl
curl -v --cacert ~/.docker/machine/certs/ca.pem --cert ~/.docker/machine/machines/$NODE_NAME/cert.pem --key ~/.docker/machine/machines/$NODE_NAME/key.pem https://$(docker-machine ip $NODE_NAME):2376/version
@nathanleclaire I have found the cultprit! prltoolsd from boot2docker is constantly setting my date/timezone incorrectly.
$ date
<the current local time with the timezone set to UTC>
$ date -s '<the correct time in UTC>'
<prints the correct time>
$ date
<the date/time is now broken again>
$ /usr/local/etc/init.d/prltoolsd stop
$ date -s '<the correct time in UTC>'
<prints the correct time>
$ date
<prints the correct time and stays put>
After stopping prltoolsd
and resetting the date, all my docker-machine commands work as expected and my certificates do not regenerate.
I still don't know why the timezone is set to UTC and the time to localtime after making a new machine, so this is just a workaround, not a fix.
Nice @carolynvs ! We'll work on seeing if we can fix this in boot2docker.
@tianon @legal90 FYI ^^
@carolynvs Wow :fearful: . It looks really weird, because prltoolsd
process shouldn't start on any other virtualization system except Parallels Desktop. The daemon will start only if /usr/bin/prlvmcheck
returns 0 exit-code, which means that we are in Parallels VM.
Have you reproduced this issue on Virtualbox VM? What Boot2Docker version are you using?
P.s. Also, if we assume that prltoolsd
is the only reason, then Docker Machine version should not make sense. However, other comments above (link) tells that the issue appears only in Machine 0.5.5+
@legal90 That makes more sense. My environment is a bit wonky, but it did used to work just fine:
This explains why prltoolsd
is attempting to manage my docker host clock. It must be picking up on being nested inside Parallels. Does that also explain why the system clock is set to local time but thinks it is UTC?
That is the root problem that caused me to open this bug. I create a new docker machine at 10 AM CST (-6). The system clock (date
) on the new machine thinks that it is 10 AM UTC, so the timestamps on the certificates are "in the future". hwclock
reports the correct time.
Looking at the boot2docker Dockerfile, I noticed that it is setting /etc/timezone
to UTC and _should_ have set /etc/localtime
to UTC as well.
see https://github.com/boot2docker/boot2docker/blob/master/Dockerfile#L311
RUN echo 'UTC' > $ROOTFS/etc/timezone \
&& cp -L /usr/share/zoneinfo/UTC $ROOTFS/etc/localtime
But on my docker machine host, the tzdata package is not installed, so /usr/share/zoneinfo
doesn't exist and neither does /etc/localtime
. I have built my own boot2docker from the latest Dockerfile just to verify that I'm not using an old iso. I wonder if missing the /etc/localtime
file is contributing to the incorrect time problem?
@carolynvs Ah, now I got it.
This explains why prltoolsd is attempting to manage my docker host clock. It must be picking up on being nested inside Parallels.
Yeah, that's the root of issue. prltoolsd
runs in Virtualbox VM nested into Parallels VM. I've reproduced this and reported to responsible people at Parallels. I'll let you know as soon as it's fixed.
Does that also explain why the system clock is set to local time but thinks it is UTC?
Well, it's hard to commit but it is a known issue of Parallels Desktop (and its guest tools). It was originally reported here: https://github.com/Parallels/vagrant-parallels/issues/186.
It was worked around in PD 11 by additional option for prlctl
utility, but it doesn't help in your rare case, because you are actually running Virtualbox VM on Windows.
I'm sorry, but the only solution I can suggest you at the moment is to prevent prltoolsd
from running in your VM on the boot. If you use a custom Boot2Docker ISO build, you can remove parallels-related lines from Dockerfile and rebuild the ISO. Or comment out this line: https://github.com/boot2docker/boot2docker/blob/master/rootfs/rootfs/bootscript.sh#L101
Thanks for the extra info about how prltoolsd works! I'll do as you suggest and make a custom iso for my setup. :beer:
I would close this issue, as this fixes my problem, but I'll leave that up to you since other people seem to be hitting it (though probably for different reasons!).
I think we can treat it as effectively resolved; we can re-open if any new issues are discovered.
Thanks everyone for your contributions in reporting and triaging this epically long issue!
I am using DockerToolbox 1.10.3 on Windows. It was working fine until I restarted, and I am now having this same issue. I am also not that familiar with Docker, so can someone tell me what the fix is?
@mtrtm Does docker-machine regenerate-certs -f
not work?
Yes, docker-machine regenerate-certs -f does. It also seems to do it every time I start up Docker Quickstart Terminal
+1
I'm using docker mainly on a Redhat server and everything works just fine. I'm not an expert but I know what I'm doing. On Windows with virtualbox, however, every time the docker VM restarts I need to regenerate-certs. I'm on toolbox 1.11.1
+1
Macbook late 2009
2,26 GHz Intel Core 2 Duo
Mac OS Sierra 10.12
Docker Tollbox 1.2.1
VirtualBox 5.0.26
$ docker-machine ls
NAME ACTIVE DRIVER STATE URL SWARM DOCKER ERRORS
vbox-test - virtualbox Running tcp://192.168.99.100:2376 Unknown Unable to query docker version: Get https://192.168.99.100:2376/v1.15/version: x509: certificate has expired or is not yet valid
$ docker-machine env vbox-test
Error checking TLS connection: Error checking and/or regenerating the certs: There was an error validating certificates for host "192.168.99.100:2376": x509: certificate has expired or is not yet valid
You can attempt to regenerate them using 'docker-machine regenerate-certs [name]'.
Be advised that this will trigger a Docker daemon restart which will stop running containers.
$ docker-machine regenerate-certs vbox-test
Regenerate TLS machine certs? Warning: this is irreversible. (y/n): y
Regenerating TLS certificates
Waiting for SSH to be available...
Detecting the provisioner...
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...
$ docker-machine env vbox-test
Error checking TLS connection: Error checking and/or regenerating the certs: There was an error validating certificates for host "192.168.99.100:2376": x509: certificate has expired or is not yet valid
You can attempt to regenerate them using 'docker-machine regenerate-certs [name]'.
Be advised that this will trigger a Docker daemon restart which will stop running containers.
I had this on the default install of the Docker Tookit (installed on Windows 10 Home) downloaded 2016-10-30. The error went away after running:
docker-machine regenerate-certs
Having this issue on macOS. docker-machine env
complains:
$ docker-machine env docker1
Error checking TLS connection: Error checking and/or regenerating the certs: There was an error validating certificates for host "192.168.99.100:2376": x509: certificate has expired or is not yet valid
You can attempt to regenerate them using 'docker-machine regenerate-certs [name]'.
Be advised that this will trigger a Docker daemon restart which might stop running containers.
Regenerating the certificates (even with -f
) does not help. docker-machine ssh docker1 date
shows the correct date and time.
Any ideas?
@paddor Regenerating the certificates incl. client certificates (docker-machine regenerate-certs -f --client-certs
) fixed it for me.
Most helpful comment
@paddor Regenerating the certificates incl. client certificates (
docker-machine regenerate-certs -f --client-certs
) fixed it for me.