Vagrant: Script: get_network_config.ps1 error "Failed to determine IP address" occurs often at random points of provisioning (suspected regression)

Created on 10 Jan 2018  ยท  22Comments  ยท  Source: hashicorp/vagrant

Please note that the Vagrant issue tracker is reserved for bug reports and
enhancements. For general usage questions, please use the Vagrant mailing list:
https://groups.google.com/forum/#!forum/vagrant-up. Thank you!

Vagrant version

Vagrant 2.0.1

Host operating system

Windows 10 v1709 and Windows 2016

Guest operating system

Windows 2016 on Hyper-V

Vagrantfile

https://gist.github.com/sandersaares/c2946dbd47732c7e09bf1ef4d117a30d

Debug output

vagrant_stdout.log
vagrant_stderr.log

Expected behavior

VM is provisioned without problems.

Actual behavior

At random steps in provisioning, I get

An error occurred while executing a PowerShell script. This error
is shown below. Please read the error message and see if this is
a configuration error with your system. If it is not, then please
report a bug.

Script: get_network_config.ps1
Error:

Failed to determine IP address

Sometimes this comes immediately after "Waiting for machine to boot", sometimes at any other point after that. It does not appear to come before that.

I have tried to also monitor the data returned by Get-VmNetworkAdapter in parallel and all my queries have properly returned an IP address (though perhaps it only disappears for an instant, breaking it for Vagrant but not for my separate queries?).

Back when I was on Vagrant 1.x.x I was using the same Vagrantfile and did not experience this issue, so it seems a regression.

Steps to reproduce

  1. vagrant up
  2. If no error, vagrant destroy -f and go to step 1
providehyperv

Most helpful comment

Issue remains on Vagrant 2.1.1.

All 22 comments

I have updated the gist to reduce the Vagrantfile to a smaller scope that still reproduces the issue. The attached log is from the first revision that had a bit more steps included.

Same problem here.

I have been seeing this issue too. This is on the snag list of issues for my project so if anyone has any thoughts I'm all ears! I'm running on Windows 10 v1709 and have been trying to narrow down any potential environmental issues.

My corporate network has IPv6 enabled (but not used- go figure) which I believed was triggering some oddness between IPv4 and IPv6 address detection in get_network_config.ps1. Disabling IPv6 on my local network interface made this issue less prevalent but it does still happen from time to time. For what its worth I have noticed its more likely to happen when running a "vagrant up" with a multi-machine configuration - particularly when the passing between the end of one up stage and the beginning of the next.

I also noticed that having the Hyper-V management console open will occasionally fail to show an IP address on the UI. I ran out of time to investigate this further but I was beginning to suspect that the actual API used by the powershell cmdlets can fail to return under some circumstances. I believe IP addresses can be retrieved via WMI so I'll have a look at whether the get_network_config.ps1 script could talk directly to that.

What I see is that in the "boot" phase, this script is run in a loop until it reports an IP address. Before the address is available, it results in the same error but this error is ignored and the operation is retried.

I suspect some other part of the process appears to also query for IP address but appears to lack this "retry when IP address not available" logic. I was unable to locate the relevant logic during some quick code reading, unfortunately.

Considering that virtual machines are complex entities that can do all sorts of things at any given moment, any such logic that lacks robust retry mechanisms is very suspect in my view.

I have also ran into this issue. I have done some digging and I can see why the issue occurs, however I am not deep enough into the Vagrant code to understand what the fix should be, hence just the explanation here and not a patch.

I have Vagrant 1.9.8 installed on a Windows 10 host using HyperV. The guest is running Windows 2012 R2 which is contained within a Vagrant box file created using packer, to which the final provisioning step being to sysprep the machine and shut it down (i.e. not a reboot).

The sysprep part maybe important, as the guess will perform a number of steps on initial boot during a vagrant up, which does include a reboot.

The error I experience here is exactly the same, Vagrant seems to correctly determine the guests IP address, and then goes on to wait for the device to boot, at this point Vagrant errors with exactly the same error (note I am using kitchen test here, however using vagrant up results in exactly the same message):

       ...Previous messages supressed...
       ==> default: Waiting for the machine to report its IP address...
           default: Timeout: 120 seconds
           default: IP: x.x.x.x
       ==> default: Waiting for machine to boot. This may take a few minutes...
           default: WinRM address: x.x.x.x:xxxx
           default: WinRM username: vagrant
           default: WinRM execution_time_limit: PT2H
           default: WinRM transport: negotiate
       An error occurred while executing a PowerShell script. This error
       is shown below. Please read the error message and see if this is
       a configuration error with your system. If it is not, then please
       report a bug.

       Script: get_network_config.ps1
       Error:

       Failed to determine IP address

Like others, I tracked the error being reported down to the get_network_config.ps1 file located under the c:\HashiCorp\Vagrant\embedded\gems\gems\vagrant-1.9.8\plugins\providers\hyperv\scripts directory. Note that this script makes a single attempt to obtain the guests IP address and exits with the result.

Before bringing up the guest I ran the following script in a seperate window to monitor the availability of the guests IP address (note that I have only 1 guest running and 1 network adapter attached to it):

while ($true) {
    $vm = (Get-VM)[0]

    if ($vm) {
        $adapter = (Get-VMNetworkAdapter -VM $vm)[0]

        Write-Host "Addresses:", $adapter.IPAddresses
    } else {
        Write-Host "VM not found"
    }

    sleep 1
}

Then when bringing up the guest the output is reported like so (I have labelled the lines so that they can be referred to below):

1:  Addresses:
2:  Addresses:
3:  Addresses:
4:  Addresses: x.x.x.x xxxx::xxxx:xxx:xx:xx:xxxx
5:  Addresses: x.x.x.x xxxx::xxxx:xxx:xx:xx:xxxx
6:  Addresses: x.x.x.x xxxx::xxxx:xxx:xx:xx:xxxx
7:  Addresses: x.x.x.x xxxx::xxxx:xxx:xx:xx:xxxx
8:  Addresses: x.x.x.x xxxx::xxxx:xxx:xx:xx:xxxx
9:  Addresses:
10: Addresses:
11: ...Approx. 40 duplicate messages suppressed...
12: Addresses:
13: Addresses:
14: Addresses: x.x.x.x xxxx::xxxx:xxx:xx:xx:xxxx
15: Addresses: x.x.x.x xxxx::xxxx:xxx:xx:xx:xxxx
16: Addresses: x.x.x.x xxxx::xxxx:xxx:xx:xx:xxxx

Vagrant waits for the guests IP address to become available by repeatedly running the get_network_config.ps1 script. At line 4 this becomes available and Vagrant continues. Vagrant then moves on to waiting for the device to finish booting, at some point during this wait it again runs the get_network_config.ps1 script from line 9 on wards.

At line 9 onwards the guest is rebooting following it making changes to the system (probably because it booted out of a shutdown following a sysprep). At this point the address disappears, and then a few seconds later Vagrant errors indicating it can't find the host address.

If I modify the get_network_config.ps1 file at line 10, and add a while loop (note the extra if statement at the end of the while loop to prevent it looping forever):

while ($true) {
  $vm = Get-VM -Id $VmId -ErrorAction "Stop"
  $networks = Get-VMNetworkAdapter -VM $vm
  foreach ($network in $networks) {
    if ($network.IpAddresses.Length -gt 0) {
      foreach ($ip_address in $network.IpAddresses) {
        if ($ip_address.Contains(".")) {
          $ip4_address = $ip_address
        } elseif ($ip_address.Contains(":")) {
          $ip6_address = $ip_address
        }
        if (-Not ([string]::IsNullOrEmpty($ip4_address)) -Or -Not ([string]::IsNullOrEmpty($ip6_address))) {
          # We found our IP address!
          break
        }
      }
    }
  }
  if (-Not ([string]::IsNullOrEmpty($ip4_address)) -Or -Not ([string]::IsNullOrEmpty($ip6_address))) {
    break
  }
}

The guest can then successfully be brought up using both vagrant up and kitchen test.

As I said previously, I am not deep enough into the Vagrant code to understand the impact of this, i.e. to the other operations performed by the Vagrant HyperV driver, so I have not suggested a patch in this case.

I cannot see any work around to this, therefore, we have updated our version of this script, and so far we have not seen any adverse affects elsewhere.

What's been interesting is that I have seen the behavior reported, yet it is not consistent. Some times I will get a failure, while other times it will work without issue. I'm unsure why the address goes missing at times. To mitigate this I created this PR: #9737

If the regular lookup fails, it will check to see if the host has seen the VM's MAC and attempt to extract the IP from there as a fallback. With this in place, I have not experienced any failures.

Issue remains on Vagrant 2.1.1.

I believe this is a race condition in the rename host procedure.
The rename operation schedules a reboot with a 5 second delay.

[10:28:48.628] DEBUG winrmshell: [WinRM] Command created for $computer = Get-WmiObject -Class Win32_ComputerSystem
$retval = $computer.rename("VM-Kiosk1").returnvalue
if ($retval -eq 0) {
shutdown /r /t 5 /f /d p:4:1 /c "Vagrant Rename Computer"
}
exit $retval
....
....
.....
[10:28:56.094] DEBUG subprocess: stdout: ===Begin-Error===
{
"error": "Failed to determine IP address"
}

detail

In instances where the provisioning is successful, the next get-ipaddress occurs within that 5 seconds. In instances where the provisioning fails, that time is longer (or I suspect Windows is taking longer to shutdown). The IP address check seems to be a part of asking windows for a shutdown event. So it's just late.

I've tried increasing the delay to 15 seconds (in Vagrant\embedded\gems\2.1.2\gems\vagrant-2.1.2\plugins\guests\windows\cap\change_host_name.rb:20 - see below). But I think that a better solution here would be to assume that a failed IP address meant that the system had already started the shutdown, and continue normally.

          script = <<-EOH
            $computer = Get-WmiObject -Class Win32_ComputerSystem
            $retval = $computer.rename("#{name}").returnvalue
            if ($retval -eq 0) {
              shutdown /r /t 15 /f /d p:4:1 /c "Vagrant Rename Computer"
            }
            exit $retval
          EOH

I also made this observation (race condition). I temporarily fixed this with a static sleep and retry command in the get_network_config.ps1 file.

Wouldn't that make it wait for the machine to finish rebooting before asking for the reboot event? I don't think a retry is what you want here. I think that just ensures that the script loses the race.

I get this issue with Vagrant 2.1.2 also

I'm getting this issue on 2.1.5 and 2.1.2. As my colleague was using 2.0.4 and was working fine so i downgraded to 2.0.4 and issue goes away.

Also getting this issue on 2.1.5.

I'm on Windows 10 and HyperV, had a similar issue, but it fails every time. I started digging into the .ps1 file.

Like the above issue, HyperV wasn't returning the IPAddresses of the virtual network adapter.

Running this in (Administrator) powershell:
$vm = get-vm -Name 'my_vagrant_machine_name'
Hyper-V\Get-VMNetworkAdapter -VM $vm | % { $_.IPAddresses }

Results in null.

In my vagrant image, I needed to install some hyperv-friendly stuff per this article:

https://docs.microsoft.com/en-us/windows-server/virtualization/hyper-v/supported-ubuntu-virtual-machines-on-hyper-v

In my vagrant image's distribution, these apt-get packages immediately fixed the empty IPAddress issue:

apt-get install linux-tools-virtual-lts-xenial linux-cloud-tools-virtual-lts-xenial

And now this command...

Hyper-V\Get-VMNetworkAdapter -VM $vm | % { $_.IPAddresses }

Started returning this...

172.19.45.234
fe80::225:5bef:fe17:1c2b

Vagrant was then happy on reboots...

While this isn't the same issue seen here, it's the same error message, so I hope this helps someone else.

@sinfloodmusic Our issue seems to relate to the Windows 2012 R2 guest rebooting during a vagrant up. It does this as it comes out of a sysprep and wants setup the system, see my earlier https://github.com/hashicorp/vagrant/issues/9356#issuecomment-366216652

We use WinRM, and before that is available the guest reboots one or two times. The Vagrant code doesn't seem to like the guest rebooting as while it does that the IP address of the guest is lost (from a Hyper-V PowerShell perspective).

Ahh got it. Hopefully my comment lets someone fix a different issue presenting the same error message. Happy to remove it if you'd rather keep this targeted. (I've since updated to mention it's related but not the same)

Oh no, please don't remove it, I am not an authority here!

Just wanted to point that out, that's all :)

So I started getting the same thing as you guys after a reboot or two! I found a bug in the script's fallback "mac to ip" mechanism. (I'm on vagrant 2.2.0)

In the file get_network_config.ps1

Line: 40: if ($ip_address) {

should read:

Line: 40: if ($addr) {

Now the mac fallback (which is pretty clever) is working for me.

@sinfloodmusic Thank you for pointing out the typo! I'll get it included in the next release.

Cheers!

Awesome! Thanks @chrisroberts

I believe the underlying causes of this issue are solved by #10410 and #10347. Since this issue is some what a mix of two issues, I'm going to close this as being resolved by these two PRs. After the next release (2.2.1) if anyone is still experiencing this problem please open a new issue (and fill out the template) and I'll get it triaged into the next milestone.

Cheers!

I'm going to lock this issue because it has been closed for _30 days_ โณ. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

lebogan picture lebogan  ยท  3Comments

mpontillo picture mpontillo  ยท  3Comments

jazzfog picture jazzfog  ยท  3Comments

hesco picture hesco  ยท  3Comments

luispabon picture luispabon  ยท  3Comments