Minikube: vmwarefusion: failed to start after stop: Error configuring auth on host: Too many retries waiting for SSH to be available

Created on 20 Apr 2017  路  18Comments  路  Source: kubernetes/minikube

Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG REPORT

Minikube version (use minikube version): 0.18.0

Environment:

  • OS (e.g. from /etc/os-release): MacOS 10.12.4
  • VM Driver (e.g. cat ~/.minikube/machines/minikube/config.json | grep DriverName): vmwarefusion
  • ISO version (e.g. cat ~/.minikube/machines/minikube/config.json | grep -i ISO or minikube ssh cat /etc/VERSION): boot2docker.iso
  • Install tools:
  • Others:

What happened:
Using Vmware Fusion in Mac OS, the first time minikube is started, it works flawlessly. However, after minikube stop, if I run again minikube start --vm-driver=vmwarefusion, it will fail and never run the minikube.

Starting local Kubernetes cluster...
Starting VM...
Waiting for SSH to be available...
E0419 23:27:50.099029    1781 start.go:116] Error starting host: Temporary Error: Error configuring auth on host: Too many retries waiting for SSH to be available.  Last error: Maximum number of retries (60) exceeded.

What you expected to happen:
Be able to start the cluster after stopping it.

How to reproduce it (as minimally and precisely as possible):

minikube start --vm-driver=vmwarefusion
minikube stop
minikube start --vm-driver=vmwarefusion

Anything else do we need to know:
The only solution I've found so far is to minikube delete and start over.

kinbug

Most helpful comment

using latest v0.23.0 and still getting the same issue, is the fix included in that version?

is there any nightly build to test it?

the easiest way of fixing it is just ssh-copy-id -i ~/.minikube/machines/minikube/id_rsa.pub docker@$(minikube ip) while minikube is starting, the password is here cat ~/.minikube/machines/minikube/config.json|grep -i pass

All 18 comments

+1. Also seeing this behavior on three machines. Exact same environment.

Same for me too. Does not start once stopped.

Looks similar to https://github.com/kubernetes/minikube/issues/1107

Getting to WaitForSSH function...
(minikube) Calling .GetSSHHostname
(minikube) DBG | executing: /Applications/VMware Fusion.app/Contents/Library/vmrun list
(minikube) DBG | MAC address in VMX: 00:0c:29:e0:b3:62
(minikube) DBG | Trying to find IP address in configuration file: /Library/Preferences/VMware Fusion/vmnet1/dhcpd.conf
(minikube) DBG | Following IPs found map[00:50:56:c0:00:01:172.16.86.1]
(minikube) DBG | Trying to find IP address in configuration file: /Library/Preferences/VMware Fusion/vmnet8/dhcpd.conf
(minikube) DBG | Following IPs found map[00:50:56:c0:00:08:172.16.0.1 00:0c:29:59:7d:eb:172.16.0.106]
(minikube) DBG | Trying to find IP address in leases file: /var/db/vmware/vmnet-dhcpd-vmnet1.leases
(minikube) DBG | Trying to find IP address in leases file: /var/db/vmware/vmnet-dhcpd-vmnet8.leases
(minikube) DBG | IP found in DHCP lease table: 172.16.0.184
(minikube) Calling .GetSSHPort
(minikube) Calling .GetSSHKeyPath
(minikube) Calling .GetSSHKeyPath
(minikube) Calling .GetSSHUsername
Using SSH client type: external
Using SSH private key: /Users/arvtiwar/.minikube/machines/minikube/id_rsa (-rw-------)
&{[-F /dev/null -o PasswordAuthentication=no -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o LogLevel=quiet -o ConnectionAttempts=3 -o ConnectTimeout=10 -o ControlMaster=no -o ControlPath=none [email protected] -o IdentitiesOnly=yes -i /Users/arvtiwar/.minikube/machines/minikube/id_rsa -p 22] /usr/bin/ssh }
About to run SSH command:
exit 0
SSH cmd err, output: exit status 255:
Error getting ssh command 'exit 0' : Something went wrong running an SSH command!
command : exit 0
err : exit status 255
output :

Same issue here, I am facing the below errors and the minikube is keep retrying.
-
Starting VM...
E0512 14:29:35.839657 62651 start.go:119] Error starting host: Temporary Error: Error configuring auth on host: Too many retries waiting for SSH to be available. Last error: Maximum number of retries (60) exceeded.
-

Experiencing same issue here.

Did some digging with vmrun and found that guest /home/docker/.ssh dir is missing.

As a workaround I found I could get the cluster running again by:

minikube start -v 10 (get it to start the vm for you, [ctrl]+[c} once you start to see the 255 errors)

Then running this script on host to restore missing ssh keys in guest:

#!/bin/bash

MINIKUBE="${HOME}/.minikube/machines/minikube"
VMX="$MINIKUBE/minikube.vmx"
DOCKER_PUB_KEY="$MINIKUBE/id_rsa.pub"

function vmrun {
    GUESTCMD=$1; shift
    "/Applications/VMware Fusion.app/Contents/Library/vmrun" -gu docker -gp tcuser $GUESTCMD "$VMX" "$@"
}

vmrun runScriptInGuest /bin/bash "mkdir -p /home/docker/.ssh"
vmrun CopyFileFromHostToGuest "$DOCKER_PUB_KEY" /home/docker/.ssh/authorized_keys 
vmrun runScriptInGuest /bin/bash "chown -R docker /home/docker/.ssh" 
vmrun runScriptInGuest /bin/bash "chmod -R 700 /home/docker/.ssh" 

Then running start again now that ssh access is restored to bring it up: minikube start -v 10

Did a some quick digging for a cause, found this in minikube-automount logs, minikube-automount restores userdata.tar to populate the /home/docker/.ssh dir and so without that we get the 255 error from the client ssh

May 14 11:50:05 minikube minikube-automount[4977]: + tar xf /var/lib/boot2docker/userdata.tar -C /home/docker/
May 14 11:50:05 minikube minikube-automount[4977]: tar: can't open '/var/lib/boot2docker/userdata.tar': No such file or directory
May 14 11:50:05 minikube minikube-automount[4977]: + chown -R docker:docker /home/docker/.ssh
May 14 11:50:05 minikube minikube-automount[4977]: chown: /home/docker/.ssh: No such file or directory

/var/lib/boot2docker points onto persistent storage, so that is good:

$ ls -la /var/lib                             
total 0
drwxr-xr-x    7 root     root             0 May 14 11:50 .
drwxr-xr-x    4 root     root             0 May 14 11:50 ..
drwxr-xr-x    2 root     root             0 Feb  8 19:46 arpd
lrwxrwxrwx    1 root     root            29 May 14 11:50 boot2docker -> /mnt/sda1/var/lib/boot2docker
lrwxrwxrwx    1 root     root            21 May 14 11:50 cni -> /mnt/sda1/var/lib/cni
drwxr-xr-x    2 root     root             0 Feb  8 19:43 dbus
lrwxrwxrwx    1 root     root            24 May 14 11:50 docker -> /mnt/sda1/var/lib/docker
lrwxrwxrwx    1 root     root            25 May 14 11:50 kubelet -> /mnt/sda1/var/lib/kubelet
lrwxrwxrwx    1 root     root            27 May 14 11:50 localkube -> /mnt/sda1/var/lib/localkube
drwx------    2 root     root             0 May 14 11:50 machines
lrwxrwxrwx    1 root     root             9 Feb  8 19:23 misc -> ../../tmp
lrwxrwxrwx    1 root     root            21 May 14 11:50 rkt -> /mnt/sda1/var/lib/rkt
drwx--x--x    3 root     root             0 Feb  8 19:52 sudo
drwxr-xr-x    4 root     root             0 May 14 11:50 systemd

But there is no userdata.tar contained within.

$ find /mnt/sda1/var/lib/boot2docker -ls
  1835011      4 drwxr-xr-x   3  root     root         4096 May 12 21:46 /mnt/sda1/var/lib/boot2docker
  1835040      4 drwxr-xr-x   2  root     root         4096 May 12 21:46 /mnt/sda1/var/lib/boot2docker/etc

Yet to find out why userdata.tar is missing... But looks to be handled here: https://github.com/kubernetes/minikube/blob/k8s-v1.7/deploy/iso/minikube-iso/package/automount/minikube-automount

So I'm thinking the logs from the guest on first boot (journalctl -t minikube-automount) might show us the problem... will try to grab when I can.

Created a cluster from scratch:

The userdata.tar get uploaded to the guest early in minikube create via vmrun:

(minikube) DBG | executing: /Applications/VMware Fusion.app/Contents/Library/vmrun -gu docker -gp tcuser CopyFileFromHostToGuest /Users/b/.minikube/machines/minikube/minikube.vmx /Users/b/.minikube/machines/minikube/userdata.tar /home/docker/userdata.tar
(minikube) DBG | executing: /Applications/VMware Fusion.app/Contents/Library/vmrun -gu docker -gp tcuser runScriptInGuest /Users/b/.minikube/machines/minikube/minikube.vmx /bin/sh sudo /bin/mv /home/docker/userdata.tar /var/lib/boot2docker/userdata.tar && sudo tar xf /var/lib/boot2docker/userdata.tar -C /home/docker/ > /var/log/userdata.log 2>&1 && sudo chown -R docker:staff /home/docker

So now it is here on the guest: /var/lib/boot2docker/userdata.tar

Later on when minikube-automount is enabled and started it gets wiped by rm -rf /var/lib/docker /var/lib/boot2docker before it symlinks up the data partition:

May 15 15:11:31 minikube minikube-automount[4936]: + '[' -n /dev/sda1 ']'
May 15 15:11:31 minikube minikube-automount[4936]: ++ echo /dev/sda1
May 15 15:11:31 minikube minikube-automount[4936]: ++ sed 's/.*\///'
May 15 15:11:31 minikube minikube-automount[4936]: + PARTNAME=sda1
May 15 15:11:31 minikube minikube-automount[4936]: + echo 'mount p:sda1 ...'
May 15 15:11:31 minikube minikube-automount[4936]: mount p:sda1 ...
May 15 15:11:31 minikube minikube-automount[4936]: + mkdir -p /mnt/sda1
May 15 15:11:31 minikube minikube-automount[4936]: + mount /dev/sda1 /mnt/sda1
May 15 15:11:31 minikube minikube-automount[4936]: + umount -f /var/lib/docker
May 15 15:11:31 minikube minikube-automount[4936]: umount: /var/lib/docker: mountpoint not found
May 15 15:11:31 minikube minikube-automount[4936]: + true
May 15 15:11:31 minikube minikube-automount[4936]: + rm -rf /var/lib/docker /var/lib/boot2docker
May 15 15:11:31 minikube minikube-automount[4936]: + mkdir -p /var/lib
May 15 15:11:31 minikube minikube-automount[4936]: + mkdir -p /mnt/sda1/var/lib/boot2docker
May 15 15:11:31 minikube minikube-automount[4936]: + ln -s /mnt/sda1/var/lib/boot2docker /var/lib/boot2docker
May 15 15:11:31 minikube minikube-automount[4936]: + mkdir -p /mnt/sda1/var/lib/docker
May 15 15:11:31 minikube minikube-automount[4936]: + ln -s /mnt/sda1/var/lib/docker /var/lib/docker
May 15 15:11:31 minikube minikube-automount[4936]: + mkdir -p /mnt/sda1/var/log
May 15 15:11:31 minikube minikube-automount[4936]: + ln -s /mnt/sda1/var/log /var/log
May 15 15:11:31 minikube minikube-automount[4936]: + mkdir -p /mnt/sda1/var/lib/kubelet
May 15 15:11:31 minikube minikube-automount[4936]: + ln -s /mnt/sda1/var/lib/kubelet /var/lib/kubelet
May 15 15:11:31 minikube minikube-automount[4936]: + mkdir -p /mnt/sda1/var/lib/cni
May 15 15:11:31 minikube minikube-automount[4936]: + ln -s /mnt/sda1/var/lib/cni /var/lib/cni
May 15 15:11:31 minikube minikube-automount[4936]: + mkdir -p /mnt/sda1/data
May 15 15:11:31 minikube minikube-automount[4936]: + ln -s /mnt/sda1/data /data
May 15 15:11:31 minikube minikube-automount[4936]: + mkdir -p /mnt/sda1/hostpath_pv
May 15 15:11:31 minikube minikube-automount[4936]: + ln -s /mnt/sda1/hostpath_pv /tmp/hostpath_pv
May 15 15:11:31 minikube minikube-automount[4936]: + mkdir -p /mnt/sda1/hostpath-provisioner
May 15 15:11:31 minikube minikube-automount[4936]: + ln -s /mnt/sda1/hostpath-provisioner /tmp/hostpath-provisioner
May 15 15:11:31 minikube minikube-automount[4936]: + rm -rf /var/lib/rkt

Without knowledge of the other drivers, a possible fix might be to change minikube-automount to cp /var/lib/boot2docker/userdata.tar /mnt/sda1/var/lib/boot2docker/userdata.tar before the rm wipes it?

Thanks. After making a fresh cluster I put the tar file in by hand.
1) minikube ssh
2) sudo cp /Users/[mylogin]/.minikube/machines/minikube/userdata.tar /var/lib/boot2docker/
and it now starts after a stop.

I used the last piece of advice from @b333z and did the

  1. minikube ssh
  2. sudo cp /Users/[mylogin]/.minikube/machines/minikube/userdata.tar /mnt/sda1/var/lib/boot2docker/

as I wasn't able to get the /var/lib/boot2docker copy to work. I'm using 0.21. But now it works - so thanks ever so much for that investigation!

This commit seems to be a fix for the issue (minikube itself has no code dictating when userdata is copied.) Can we pull it in to minikube?

ping? I can try just blindly replacing the commit SHA1 in Godeps.json and seeing if tests pass...

@joshk0 should be fixed by #2060

using latest v0.23.0 and still getting the same issue, is the fix included in that version?

is there any nightly build to test it?

the easiest way of fixing it is just ssh-copy-id -i ~/.minikube/machines/minikube/id_rsa.pub docker@$(minikube ip) while minikube is starting, the password is here cat ~/.minikube/machines/minikube/config.json|grep -i pass

I wouldn't get 0.23.0 to work on MacOS at all, so thanks for the fix @urbaniak !

Thanks @urbaniak. I used it to fix #2126.

I used the script from urbaniak to get minikube to come up in VMWare Fusion 10.0.1 as well. I had the same error as #2126

oddly, i have to use it every single time i start minikube.

This issue seems resolved with minikube 0.25.0

I no longer have vmware running on my MBP, so I cannot verify it. If more people confirm it's working, I'll close it.

It is fixed for me on v0.25.0.

Was this page helpful?
0 / 5 - 0 ratings