We start with a base image for windows 2016 and windows 2016 core directly from iso. This packer template clones that vm and applies windows patches, reboots, applies windows patches again (because windows tends to require this multiple times), then reboots. The reboot following each of the windows 2016 patches cause the vm to go into the "Getting Windows ready. Don't turn off your computer" state. The windows-restart provisioner completes prior to the machine even shutting down. We have confirmed this behavior by adding -debug and stepping slowly through and watching the vm.
Install-WindowsUpdate -MicrosoftUpdate -AcceptAll -IgnoreReboot -Verbose
In the following screenshot you can see the machine is still on even though the windows-restart provisioner claims the machine successfully restarted. If we wait there for 5 or so minutes, the vm starts a progress counter, once it gets to 100% it reboots.
@nwesoccer does this happen also on the first reboot? You might want to try setting the WinRM service to delayed start.
@gusztavvargadr It happens after windows patches have been applied no matter which reboot it is. It's not about WinRM coming up to fast, the problem is that the server takes a while to go down because it's in the "Getting Windows ready. Don't turn off your computer". For some reason this state allows packer to pass the TryCheckReboot
at https://github.com/hashicorp/packer/blob/master/provisioner/windows-restart/provisioner.go#L126 even though the machine has not even gone down yet. So the subsequent provisioners start executing thinking that the reboot is done, then the reboot actually happens mid-sebsequent provisioning steps. Best case scenario the build fails, worst case is that it succeeds and never rebooted because then the patching will not be fully applied.
As soon as I think I've finally nailed this provisioner to the wall, someone walks in with another bug 😭
I'll see if I can add whatever error code is generated by "Getting Windows ready. Don't turn off your computer" and make sure it doesn't pass the check.
I'm seeing this issue too. As a temporary/dirty hack, I've put a shell-local
provisioner immediately after my windows-restart
that merely sleeps for a couple of minutes.
{
"type": "shell-local",
"command": "echo \"Sleeping for 2 minutes...\"; sleep 120"
},
edit: The built-in pause_before provisioner configuration is a much better implementation of this hack. Inserting a pause_before
on the next provisioner following the windows-restart
has worked around this bug (at least in my case).
{
"type": "windows-restart"
},
{
"type": "powershell",
"pause_before": "120s",
"inline": ["dir c:\\"]
}
Not sure if it helps, but this blog has documented the four places you can check for to see if there is a pending reboot in progress: http://ilovepowershell.com/2015/09/10/how-to-check-if-a-server-needs-a-reboot/
@SwampDragons - I was able to pair down my template that has this same issue to a fairly simple repro: https://gist.github.cerner.com/ks8252/b75f146ab5a62e0337b9c4418c6aa015
In that gist I've included both the Packer template and my Packer log (obtained via PACKER_LOG=1) which shows the issue.
The relevant bit in the log is:
...
2018/09/20 12:40:14 [INFO] (telemetry) Starting provisioner windows-restart
==> googlecompute: Restarting Machine
2018/09/20 12:40:14 packer: 2018/09/20 12:40:14 [INFO] starting remote command: shutdown /r /f /t 0 /c "packer restart"
2018/09/20 12:40:15 packer: 2018/09/20 12:40:15 [INFO] command 'shutdown /r /f /t 0 /c "packer restart"' exited with code: 0
2018/09/20 12:40:15 packer: 2018/09/20 12:40:15 [INFO] RPC endpoint: Communicator ended with: 0
2018/09/20 12:40:15 [INFO] 0 bytes written for 'stdout'
2018/09/20 12:40:15 [INFO] 0 bytes written for 'stderr'
2018/09/20 12:40:15 [INFO] RPC client: Communicator ended with: 0
2018/09/20 12:40:15 [INFO] RPC endpoint: Communicator ended with: 0
2018/09/20 12:40:15 packer: 2018/09/20 12:40:15 [INFO] 0 bytes written for 'stdout'
2018/09/20 12:40:15 packer: 2018/09/20 12:40:15 [INFO] 0 bytes written for 'stderr'
2018/09/20 12:40:15 packer: 2018/09/20 12:40:15 [INFO] RPC client: Communicator ended with: 0
==> googlecompute: Waiting for machine to restart...
2018/09/20 12:40:16 packer: 2018/09/20 12:40:16 Check if machine is rebooting...
2018/09/20 12:40:16 packer: 2018/09/20 12:40:16 [INFO] starting remote command: shutdown /r /f /t 60 /c "packer restart test"
2018/09/20 12:40:16 packer: 2018/09/20 12:40:16 [INFO] command 'shutdown /r /f /t 60 /c "packer restart test"' exited with code: 1115
2018/09/20 12:40:16 packer: 2018/09/20 12:40:16 [INFO] RPC endpoint: Communicator ended with: 1115
2018/09/20 12:40:16 [INFO] 0 bytes written for 'stdout'
2018/09/20 12:40:16 [INFO] 40 bytes written for 'stderr'
2018/09/20 12:40:16 [INFO] RPC client: Communicator ended with: 1115
2018/09/20 12:40:16 [INFO] RPC endpoint: Communicator ended with: 1115
googlecompute: A system shutdown is in progress.(1115)
2018/09/20 12:40:16 packer: 2018/09/20 12:40:16 [INFO] 0 bytes written for 'stdout'
2018/09/20 12:40:16 packer: 2018/09/20 12:40:16 [INFO] 40 bytes written for 'stderr'
2018/09/20 12:40:16 packer: 2018/09/20 12:40:16 [INFO] RPC client: Communicator ended with: 1115
2018/09/20 12:40:16 packer: 2018/09/20 12:40:16 Reboot already in progress, waiting...
2018/09/20 12:40:26 packer: 2018/09/20 12:40:26 Waiting for machine to reboot with timeout: 5m0s
2018/09/20 12:40:26 packer: 2018/09/20 12:40:26 Waiting for machine to become available...
2018/09/20 12:40:26 packer: 2018/09/20 12:40:26 Checking that communicator is connected with: 'powershell.exe -EncodedCommand ZQBjAGgAbwAgACIAJAB7AGUAbgB2ADoAQwBPAE0AUABVAFQARQBSAE4AQQBNAEUAfQAgAHIAZQBzAHQAYQByAHQAZQBkAC4AIgA='
2018/09/20 12:40:31 packer: 2018/09/20 12:40:31 [INFO] starting remote command: powershell.exe -EncodedCommand ZQBjAGgAbwAgACIAJAB7AGUAbgB2ADoAQwBPAE0AUABVAFQARQBSAE4AQQBNAEUAfQAgAHIAZQBzAHQAYQByAHQAZQBkAC4AIgA=
2018/09/20 12:40:32 packer: 2018/09/20 12:40:32 [INFO] command 'powershell.exe -EncodedCommand ZQBjAGgAbwAgACIAJAB7AGUAbgB2ADoAQwBPAE0AUABVAFQARQBSAE4AQQBNAEUAfQAgAHIAZQBzAHQAYQByAHQAZQBkAC4AIgA=' exited with code: 0
2018/09/20 12:40:32 packer: 2018/09/20 12:40:32 [INFO] RPC endpoint: Communicator ended with: 0
2018/09/20 12:40:32 [INFO] 28 bytes written for 'stdout'
2018/09/20 12:40:32 [INFO] 392 bytes written for 'stderr'
2018/09/20 12:40:32 [INFO] RPC client: Communicator ended with: 0
2018/09/20 12:40:32 [INFO] RPC endpoint: Communicator ended with: 0
googlecompute: PACKER-5BA3DA90 restarted.
2018/09/20 12:40:32 packer: 2018/09/20 12:40:32 [INFO] 28 bytes written for 'stdout'
2018/09/20 12:40:32 packer: 2018/09/20 12:40:32 [INFO] 392 bytes written for 'stderr'
2018/09/20 12:40:32 packer: 2018/09/20 12:40:32 [INFO] RPC client: Communicator ended with: 0
googlecompute: #< CLIXML
googlecompute: <Objs Version="1.1.0.1" xmlns="http://schemas.microsoft.com/powershell/2004/04"><Obj S="progress" RefId="0"><TN RefId="0"><T>System.Management.Automation.PSCustomObject</T><T>System.Object</T></TN><MS><I64 N="SourceId">1</I64><PR N="Record"><AV>Preparing modules for first use.</AV><AI>0</AI><Nil /><PI>-1</PI><PC>-1</PC><T>Completed</T><SR>-1</SR><SD> </SD></PR></MS></Obj></Objs>
==> googlecompute: Machine successfully restarted, moving on
2018/09/20 12:40:32 [INFO] (telemetry) ending windows-restart
...<snip uploading PS1 on the next provisioner\>...
</SD></PR></MS></Obj></Objs>2018/09/20 12:40:40 [INFO] starting remote command: powershell -executionpolicy bypass -file "C:/Windows/Temp/packer-elevated-shell-5ba3db92-32c8-a706-d39b-a1220bbaacf7.ps1"
2018/09/20 12:40:51 packer: 2018/09/20 12:40:51 [INFO] command 'powershell -executionpolicy bypass -file "C:/Windows/Temp/packer-elevated-shell-5ba3db92-32c8-a706-d39b-a1220bbaacf7.ps1"' exited with code: 2147943515
2018/09/20 12:40:51 packer: 2018/09/20 12:40:51 [INFO] RPC endpoint: Communicator ended with: 2147943515
2018/09/20 12:40:51 [INFO] 0 bytes written for 'stdout'
2018/09/20 12:40:51 [INFO] 0 bytes written for 'stderr'
2018/09/20 12:40:51 [INFO] RPC client: Communicator ended with: 2147943515
2018/09/20 12:40:51 [INFO] RPC endpoint: Communicator ended with: 2147943515
2018/09/20 12:40:51 packer: 2018/09/20 12:40:51 [INFO] 0 bytes written for 'stdout'
2018/09/20 12:40:51 packer: 2018/09/20 12:40:51 [INFO] 0 bytes written for 'stderr'
2018/09/20 12:40:51 packer: 2018/09/20 12:40:51 [INFO] RPC client: Communicator ended with: 2147943515
2018/09/20 12:40:51 [INFO] (telemetry) ending powershell
...
If you put a "pause_before": "2m",
on the second powershell provisioner, the image builds fine.
Thanks, Kevin!
I would love to use the pause_before, however sometimes it takes 5-10 minutes to finally shut down after the patches are applied so it's a pretty big waste of time. It's also prone to issues, if we pause for 10 minutes and it takes 11 mins we've found that the packer run could be "successful" even though the full reboot to finish applying patches did not yet occur. For now stepping through with -debug
is time consuming but ensure we get everything applied.
I'm also experiencing this problem when trying to use PSWindowsUpdate 2.0.0.4 to perform windows updates on Windows Server 2016 (or even just showing the available updates). This problem ONLY happens on Hyper-V; the exact same stuff works just fine under Virtual Box. This occurs with both packer 1.3.1 and 1.2.5.
The symptoms are that soon after my script issues an Install-WindowsUpdate
or Show-WindowsUpdate
, the next provisioner step executes (which of course, is a windows-restart!).
Windows Updates in our environment take anywhere from 3 minutes to 30 minutes, so adding a sleep statement is terribly unreliable.
Whatever PSWindowsUpdate is doing under Hyper-V, seems to trick Packer into thinking that the step is completed (successfully, too!), when in fact it still has a long way to go.
@nwesoccer @brettjacobson what is your detailed setup if I might ask? I'm building using a quite similar setup and it works fine for me.
I have Windows 10 Pro latest for the Hyper-V and VirtualBox host, with Packer 1.2.5. Updates and windows-restart work fine with Windows 10 Enterprise, Windows 2016 Standard and Standard Core guests. For the guests I'm using the publicly available evaluation ISOs.
The only Windows Update customization I make is I turn off automatic Windows Updates as the first thing after the first boot to make sure it kicks off only when I request it.
@gusztavvargadr For my case, I'm stuck using Windows 10, 1703 (thanks, Corp IT) as my host. so I fear this may be a situation where packer is no longer supporting Hyper-V as it was in 1703. It seems vagrant requires Hyper-V from 1709, now, so would not surprise me. Tomorrow I'm spinning up a Server 1709 box to try it out on (which will probably answer my questions).
slightly off-topic; how are you running both Hyper-V and VirtualBox on your host? I thought they were mutually exclusive?
It works just fine on packer 1.2.4:
==> virtualbox-iso: Restarting Machine
==> virtualbox-iso: Waiting for machine to restart...
virtualbox-iso: A system shutdown is in progress.(1115)
virtualbox-iso: A system shutdown is in progress.(1115)
virtualbox-iso: A system shutdown is in progress.(1115)
virtualbox-iso: A system shutdown is in progress.(1115)
virtualbox-iso: A system shutdown is in progress.(1115)
virtualbox-iso: A system shutdown is in progress.(1115)
virtualbox-iso: #< CLIXML
virtualbox-iso: <Objs Version="1.1.0.1" xmlns="http://schemas.microsoft.com/powershell/2004/04"><Obj S="progress" Re
fId="0"><TN RefId="0"><T>System.Management.Automation.PSCustomObject</T><T>System.Object</T></TN><MS><I64 N="SourceId">1
</I64><PR N="Record"><AV>Preparing modules for first use.</AV><AI>0</AI><Nil /><PI>-1</PI><PC>-1</PC><T>Completed</T><SR
>-1</SR><SD> </SD></PR></MS></Obj></Objs>
virtualbox-iso: WIN-KJJFDKIOGEB restarted.
The same when I upgrade to packer 1.2.5.
But then after I upgraded to 1.3.0, it does not detect ongoing installation properly:
==> virtualbox-iso: Restarting Machine
==> virtualbox-iso: Waiting for machine to restart...
virtualbox-iso: A system shutdown is in progress.(1115)
virtualbox-iso: WIN-SVJIF4QN5H7 restarted.
virtualbox-iso: #< CLIXML
virtualbox-iso: <Objs Version="1.1.0.1" xmlns="http://schemas.microsoft.com/powershell/2004/04"><Obj S="progress" Re
fId="0"><TN RefId="0"><T>System.Management.Automation.PSCustomObject</T><T>System.Object</T></TN><MS><I64 N="SourceId">1
</I64><PR N="Record"><AV>Preparing modules for first use.</AV><AI>0</AI><Nil /><PI>-1</PI><PC>-1</PC><T>Completed</T><SR
>-1</SR><SD> </SD></PR></MS></Obj></Objs>
==> virtualbox-iso: Machine successfully restarted, moving on
And the same on 1.3.1
@brettjacobson yes, you can't run both Hyper-V and VirtualBox at the same time, but you can turn off Hyper-V and then VirtualBox works after a reboot: https://stackoverflow.com/questions/30496116/how-to-disable-hyper-v-in-command-line.
@azr I'm not certain this is a problem with the windows-restart provisioner per se. It appears to me that its a problem where the regular powershell provisioner is thinking a step is completed (and thus, moving on to a windows-restart step which is always next with windows updates) prematurely. its like whatever Windows update processes are doing during the step are tricking packer+hyperv+powershell into thinking the step is complete when it is still active.
I personally am only seeing this behavior with HyperV; with virtual box it behaves as expected (ie, waits for the long windows update process to finish before moving on to the next provisioner step; which in these scenarios are always windows-restart).
I think this issue is getting a little confusing. OP, @nwesoccer (and others) can you confirm if you believe this to be true:
The problem is the windows update step is being declared finished prematurely, NOT that the windows-restart step is not waiting for things properly.
we also need clarification if this is HyperV only situation, or other hypervisors as well (since it works fine for me on VirtualBox, but NOT hyperv)
The tests that I brought above is from VirtualBox, on Windows 10 host machine.
As soon as I think I've finally nailed this provisioner to the wall, someone walks in with another bug 😭
I'll see if I can add whatever error code is generated by "Getting Windows ready. Don't turn off your computer" and make sure it doesn't pass the check.
@SwampDragons
Hey there. We're also experiencing this issue after upgrading to 1.3.1.
I took the liberty to have a look at the code. Seems to me that an unnecessary break
statement found it's way into the branch that invokes sleeping after checking the reboot condition. Thus, it never loops again as intended.
Also having this problem. Using "pause_before" seems to be the best workaround, but it is impossible to know the best value to use. Sometimes you only need a few minutes, others it could be 30 depending on system specs and which updates are installing.
Also having this problem.
Using "pause_before" seems to be the best workaround, but as mentioned above, is impractical, as the time needed is dependent on the tasks done prior to the restart being needed.
@Occams Great find, I compared the code to the commit prior to the latest one and it seems to make sense as to why this is not functioning correctly. It will move onto the next step after a hard 10s, no matter what state the reboot is in, due to the break escaping the for loop.
For what it's worth, I commented out the break mentioned above and recompiled the binary, tested against my previously failing code where the windows-restart provisioner now properly loops when receiving the 1115 return code, until the VM has actually actually restarted.
azure-arm: Waiting for machine to restart...
azure-arm: A system shutdown is in progress.(1115)
azure-arm: A system shutdown is in progress.(1115)
azure-arm: A system shutdown is in progress.(1115)
azure-arm: A system shutdown is in progress.(1115)
azure-arm: A system shutdown is in progress.(1115)
azure-arm: restarted.
Tested and working in my environment also.
Hey @Occams, @Poddo & everyone thanks a lot for finding/bringing a fix, can anyone that had the issue please confirm that it fixed it for them. I will merge it then.
Here are some binaries for you to try out:
Sorry for not responding promptly.
@brettjacobson I'm not exactly sure how to verify which it is.
@azr I'm not familiar with how to compile packer for testing the fix. I'm running on a mac with macOS High Sierra. I'd be happy to test it out.
I linked binaries in my comment up there so you don't have to do this.
But if you want to build packer manually, follow these steps
brew install go
to install go as it's quicker/simpler[alias]
pr = "!f() { git fetch -fu ${2:-origin} refs/pull/$1/head:pr/$1 && git checkout pr/$1; }; f"
to run git pr 6792
@azr Whoops, my brain didn't put two and two together to map darwin to mac. Will give it a try.
No worries, I should have been clearer 🙂 ( I edited it in case )
1.3.2-dev works. no more forced pauses.
I've tested with W2k16 STD image, also 1803 is in progress
Great news, thanks everyone; merge incoming. 🎉
I think the problems I am having were due to my Host Windows 10 being build 1703 (with an old Hyper-V). Currently everything is working on another machine with 1803, and I'll be trying 1709 next week. This is with packer 1.3.1. (the problem is IT will only let me have 1703 at the moment)
https://github.com/hashicorp/packer/files/2441352/packer_windows_amd64.zip worked for me. It failed before on my 1703 host but with this version verything works just fine. I also had the same issues on another host, 1803, but I did not try the fix there. @brettjacobson, it can be some timing difference between slower and faster hosts maybe? How exactly is it going on your 1803, is packer really waiting for the updates installation and the output is something like this?
==> virtualbox-iso: Restarting Machine
==> virtualbox-iso: Waiting for machine to restart...
virtualbox-iso: A system shutdown is in progress.(1115)
virtualbox-iso: A system shutdown is in progress.(1115)
virtualbox-iso: A system shutdown is in progress.(1115)
virtualbox-iso: A system shutdown is in progress.(1115)
virtualbox-iso: A system shutdown is in progress.(1115)
virtualbox-iso: A system shutdown is in progress.(1115)
virtualbox-iso: #< CLIXML
@shurick81 with the dev version of packer linked above, I'm seeing the output exactly as you describe on both 1803 and 1709 hosts. But at least it is detecting the restart is finished, and goes on to the next step. This fix seems to work for my scenarios.
I could add some code to clean up the output, but that's a cosmetic fix and can probably wait; thanks everyone for all of your input on this issue.
@shurick81 i've tried your release but i get this error
packer version
Packer v1.3.2-dev
packer validate C:\sre\packer\master.json
Template validation failed. Errors are shown below.
Errors validating build 'amazon-ebs'. 1 error(s) occurred:
@fdaforno Interesting; the master branch currently contains some changes to the code that validates aws regions: b0774d155a4d2103339e4ce6917722cdf9eced2a
If you let me know your host architecture I can revert that and see if that makes things work for you.
I think I've nailed down my issue to something about my WinRM config, but also the fact that I noticed the VM does not get an IP from DHCP after the automatic reboot performed by windows update. Still have no idea how to fix that though.
@fdaforno I can't reproduce your validation error. :-/
I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.
Most helpful comment
@SwampDragons
Hey there. We're also experiencing this issue after upgrading to 1.3.1.
I took the liberty to have a look at the code. Seems to me that an unnecessary
break
statement found it's way into the branch that invokes sleeping after checking the reboot condition. Thus, it never loops again as intended.https://github.com/hashicorp/packer/blob/4e14710a66f0e04daf21cb648a99670bc972e89c/provisioner/windows-restart/provisioner.go#L137-L142