I'm running into issues trying to run the Chef provisioner after deploying a VM in vSphere. Essentially, it seems like vsphere_virtual_machine is determining that the VM has finished being (re)configured and starts Chef before the VM has actually finished (re)configuration.
In this case, I have a Windows 2012 R2 template in vSphere that was setup on a domain with a static IP of 10.208.140.29. I am trying to deploy a VM from the template that resides on the same domain, but is given a static IP address of 10.208.140.20.
When I watch the deployed VM get booted by Terraform in vSphere Client, I see the IP address of the VM is first 10.208.140.29 (the template's IP), which gets passed to Chef and Chef tries to connect. Less than a minute later, the VM IP address becomes blank for a short time, before changing over to something like 10.208.153.123 (an IP address in our DHCP bank). Again, about a minute later, the VM IP address becomes blank for a short time, and then comes back as 10.208.140.20, the IP I configured in my Terraform file. So the VM eventually gets to the desired state, but Chef has already been started and trying to connect to the wrong IP address.
Sometimes Chef is quick enough to connect to the VM before it starts being reloaded due to configuration changes, but then seems to hang as its connection is lost upon the VM reloading/configuring. Othertimes, Chef will fail to ever connect and eventually times out.
Looking in vSphere at the deployed VM, I can see two reconfiguration tasks that complete quickly, but in the events, the reconfiguration events seem to take place much later. I've attached the reconfiguration event vSphere log file.
0.7.0 dev 52fb286766df7d25cfe8c18bf4b82e030b7faf06+CHANGES
# Configure the VMWare vSphere provider
provider "vsphere" {
user = "${var.vsphere_user}"
password = "${var.vsphere_password}"
vsphere_server = "${var.vsphere_server}"
}
# Create a folder
resource "vsphere_folder" "web" {
path = "SCP Testing/terraform"
datacenter = "DEVESX"
}
# Create a virtual machine within the folder
resource "vsphere_virtual_machine" "web" {
name = "terraform-web"
folder = "${vsphere_folder.web.path}"
datacenter = "DEVESX"
vcpu = 4
memory = 8192
cluster = "General"
dns_servers = ["10.208.140.30", "10.208.130.20"]
network_interface {
label = "dvPortGroup_PrivateVlan"
ipv4_gateway = "10.208.140.1"
ipv4_address = "10.208.140.20"
ipv4_prefix_length = "23"
}
windows_opt_config {
product_key = ""
domain = "perqalab.local"
domain_user = "Administrator"
domain_user_password = "${var.admin_password}"
}
disk {
template = "SCP Testing/templates/CP - packer-virtualbox-iso-1462307446"
datastore = "/DEVESX/datastore/VM Datastores/Nobackup"
}
provisioner "chef" {
attributes_json = <<EOF
{
"tomcat-all": {
"version": "8.0.23"
}
}
EOF
run_list = ["tomcat-all"]
node_name = "terraform-vsphere"
os_type = "windows"
secret_key = "${file("H:/Chef/sprokopiak.pem")}"
server_url = "https://chef01.com/"
validation_client_name = "chef-validator"
validation_key = "${file("H:/Chef/chef-validator.pem")}"
version = "11.18.12"
connection {
type = "winrm"
user = "Administrator"
password = "${var.admin_password}"
}
}
}
N/A
The Chef provisioner should run after the VM has been fully reconfigured by vsphere_virtual_machine
The Chef provisioner is running the first time the VM comes online, but before the (re)configurations are able to take place and finish.
terraform apply@thetuxkeeper, when you tried out deploying a Windows VM, did you happen to test provisioning with Chef as well?
@sprokopiak : I never deployed a Windows VM, but you could give 790115f (or later) a try. Perhaps that could fix your problem since it usually waits longer, perhaps it will wait until the customization is finished, but I'm not sure.
@stack72 @jen20 can you get us a vsphere label?
@thetuxkeeper, thanks for the heads up on that merged PR. I grabbed the latest from master and tried it out, but it doesn't seem to have changed any behavior. Chef still tries to run before the configurations are all complete.
The vSphere API seems to have a number of event objects. Maybe it's possible to figure out when all of the events have completed?
I will see if I can get a Linux VM on vSphere to try out.
Perhaps we need some kind of WaitForCustomization function/logic. I'm not sure how much you can ask the VMware Tools. I will be on vacation for 10 days starting tomorrow and I won't have time to look into it today.
In this case you need to wait until the VMware Tools executed the customization and the OS has applied those changes. I think because the template already has an IP the WaitForNetIP function returns before the customization was applied. It thinks all interfaces already have IPs, so it's done. But it's just a guess, I haven't tried to reproduce it yet.
@dougm any recommendations?
I was able to try this out on a Linux VM (RHEL 7.1 x64) in vSphere. It looks like the Linux VM is able to complete the customizations much more quickly (not necessarily surprising). For comparison, once the two reconfiguration tasks are complete, it takes the RHEL VM less than 1 minute for the two customization events to complete while it takes the Win 2012 R2 VM ~ 7 minutes for the two customization events to complete.
While I don't think the underlying problem is directly related to Windows, it looks like this problem may only show up for Windows machines, if they take so much longer to finish configuration and customization.
yah more Windows testing is needed ...
Now the question is do we want TF waiting that long ... Probably
Real question is can VMware API support it ;)
So we are waiting for the customization:
taskb, err := newVM.Customize(context.TODO(), customSpec)
if err != nil {
return err
}
_, err = taskb.WaitForResult(context.TODO(), nil)
if err != nil {
return err
}
And the log statement says that the task we are waiting for is done: "VM customization finished"
@dougm / @markpeek any recommendations?
I haven't used vm customizations myself, but looks like you could do a property.Wait on "config.tools.pendingCustomization"
It looks like the terraform code calls vm.Customize when the VM is powered off. So I assume config.tools.pendingCustomization would be set to something other than an empty string after the Customize task has finished.
You could then start the property.Wait() before powering on the VM and when config.tools.pendingCustomization changes to "", it should be done.
Thanks @dougm we need to try that!
I am having what seems like the same issue with Ubuntu Trusty hosts on vSphere. Downloading master now to see if that addresses it. Thanks @chrislovecnm for directing me to this issue.
@dweomer this is not fixed in master yet ;( Comtributions are appreciated!
I am still a Go newb but perusing the code on master it looks like vm.WaitForIP() call in resourceVSphereVirtualMachineRead() is doing the trick as my modules no longer finish before the VMs have IPs. I am poking around with an eye towards assumptions that provisioners rely on as I think the first part of the problem has been solved and now it is up to provisioners to wait for a VM to be accessible and hence manipulated.
WaitForIP partially works. It does not work when Windows has to reboot three times. We need to test if waiting in the pending customization flag works.
I believe I'm experiencing a similar issue:
Terraform 0.7.4
vSphere 6.0
VMWare Tools 10.0.0
The behavior I'm seeing after TF completes and I remote into the machine:
No customization has fired, even though I've specified it in my TF file. My next guess of what to do is to just try to set up the network and join the VM to the domain via a PowerShell (file) provisioner.
If there's any additional info I can provide to help resolve this (or testing I can do on a dev build), please let me know.
@sprokopiak Did you find that any of the OS customization was firing?
main.tf:
Turns out that my issue was due to us not having DHCP turned on and my ignorance of this fact, but it's worth noting that the feedback from Terraform/vSphere can be misleading if your network_interface is wrong. Not sure if it's possible or realistic to provide a better user experience for this issue.
It may be worth adding this little tidbit to a troubleshooting section if one is ever added to the docs:
Double-check your network config in Terraform vs. the actual network config on the VM if things are weird. OS customization and provisioners will not work as expected if, e.g. the IP address of the machine is not what you declared in Terraform.
Turns out only half of my issue was due to bad DHCP assumptions. My provisioners are still firing before OS customization has completed (and thus hitting the wrong IP address for the machine).
This issue is a blocker for me getting Terraform up and running. I'm probably out of my depth to do the actual PR, but I'm happy to test to see if it resolves my issue.
Can anyone make this change real quick?
@chrislovecnm @jen20 Would it be possible to try the fix suggested above? Even if you literally just make the edit in a branch and forward on to me, I'll be happy to test. I really, really want to bring Terraform to my organization and this is a blocker for me.
Did anyone ever figure this out? This is a huge blocker, I end up seeing the Chef provisioner trying to connect to a 169... address because it isnt waiting for the reboot/customization...
note - "skip_customization = true" this will allow the right IP to return, but then you don't run the customizations
The problem is that we are not waiting for the nic and ip to come up actually. Have not looked for this up.
Just want to add, that for syspreped windows images with /generalize option, customization also don't work. It is because after creating VM from such images, the VM reboots several times before become ready, thus customization does not apply, even network in disconnected state.
For us it is blocker, since we don't have DHCP, and there is no other option to configure IP-address for a new VM.
So there definitely needs to be different logic with Windows machines and Terraform provisioners; However, I have a suitable workaround I haven't seen anyone post here.
Assuming you know the hostname or IP address of the machine once Guest Customization completes, you can specify it in the connection provisioner like so:
connection {
type = "winrm"
user = "Administrator"
password = "${var.win_admin_pass}"
host = "${var.ip_or_host}" << this
timeout = "10m"
insecure = "true"
https = "false"
port = "5985"
}
When you don't specify this, it will go to the first IP address that vmware-tools reports to vsphere, which is often something that cannot be connected to (link-local) or some other unintended address.
You can also give it a longer timeout. In this case Terraform will continually try to connect until it reaches the timeout. My successful machines haven't exceeded this timeout.
Lastly, you can put the connection provisioner inside or alongside the chef provisioner.
@chrislovecnm Waiting on the "Pending Customization" flag doesn't work. vSphere immediately clears the flag when the VM boots, not when the customizations have been applied.
God this is depressing. Is there anything else we can even try? Is it an issue with VMWare? We're a licensed customer, so maybe we can try to lean on them to fix the issue on their side.
Any suggestions would be appreciated.
I'm having this same issue, I am a paying VMWare customer as well, maybe if several of us reach out with this same issue with the API and waiting for customizations not working as expected they'll fix it?
Ran smack dab into this problem on my end, too (terraform + vsphere provider + chef provisioner for a Windows VM).
Attempted to inject a wait via local-exec but it looks like terraform passed along the 169 IP anyway, so that was a bust.
A little birdie told me at a Hashicorp event that HC is hiring someone to specifically handle the VMWare provider. So, @phinze, if you could have that new hire take a look at this bug once they start we'd be most grateful!
This works:
Wait.ps1
param(
$myserver = $env:vsphere_vcenter,
$vm = "victim1"
)
import-module VMware.VimAutomation.Cis.Core
connect-viserver -server $myserver -user $env:TF_VAR_vsphere_username -password $env:TF_VAR_vsphere_password
Write-Output "Waiting for customization to finish ..."
While ($true) {
$vmEvents = Get-VIEvent -Entity $VM
$succeedEvent = $vmEvents | where { $_.GetType().Name -eq "CustomizationSucceeded" }
$failEvent = $vmEvents | Where { $_.GetType().Name -eq "CustomizationFailed" }
if ($failEvent) {
Write-Output "Customization Failed"
break
#return $false
}
if ($succeedEvent) {
Write-Host "Customization completed successfully!"
break
#Return $true
}
Start-sleep -Seconds 2
}
in my tf file
provisioner "local-exec" {
command = "powershell.exe -noprofile -ExecutionPolicy unrestricted -file scripts/Wait.ps1 -vmlist victim1"
}
I'm going to lock this issue because it has been closed for _30 days_ โณ. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.
Most helpful comment
A little birdie told me at a Hashicorp event that HC is hiring someone to specifically handle the VMWare provider. So, @phinze, if you could have that new hire take a look at this bug once they start we'd be most grateful!