Packer: [Azure] WinRM timeout with Windows 2016-Datacenter Marketplace Image

Created on 28 Jan 2020  ·  115Comments  ·  Source: hashicorp/packer

Please refer the end of this thread to see other users complaining that this is not working.
https://github.com/MicrosoftDocs/azure-docs/issues/31188

Issue:

Started: December, 2019.
Packer cannot connect with WinRM to machines provisioned from Windows 2016 (2016-Datacenter) Marketplace image in Azure.

Further details:

WinRM timeout increase is not working. It seems the last image working is version: "14393.3326.1911120150" (Released 12th of Nov). It stopped working with "14393.3384.1912042333" (Released 10th of Dec).

This issue is only impacting 2016-Datacenter. 2019 is working properly.

To get image Details for a Region:

az vm image list --location northeurope --offer WindowsServer --publisher MicrosoftWindowsServer --sku 2016-Datacenter --all

URL to the Last Working Image:

https://support.microsoft.com/en-us/help/4525236/windows-10-update-kb4525236

URL to the Image where something went wrong:

https://support.microsoft.com/en-us/help/4530689/windows-10-update-kb4530689

Notes:

This is currently applying to North EU. I had no time to investigate in other regions but I believe the same images getting distributed to every region.

I am opening a Microsoft case and planning to update the thread with the progress.

bug buildeazure upstream-bug

Most helpful comment

Thanks for the reply. My issue was not related to this thread. After further troubleshooting I found that the registry setting 'HKLM:\\SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Policies\\System\\LocalAccountTokenFilterPolicy’ when enabled (set to 0) prevents winrm from opening a connection.

I wanted to provide an update in case anyone else who may be working with Packer and CIS windows images.

All 115 comments

Interesting. It was definitely not working for quiet some time, but now I cannot reproduce this issue anymore. Even with the latest image and with the images between November and today it is working properly.

I will reopen in case I start to see this issue again.

I still can reproduce the issue,

used image

   "image_publisher": "MicrosoftWindowsServer",
    "image_offer": "WindowsServer",
    "image_sku": "2016-Datacenter"

From initial troubleshooting it looks to me a certificate issue, trying to run winrm quickconfig on the machine during azure-arm: Waiting for WinRM to become available... resulting

WinRM service is already running on this machine.
WSManFault
    Message
        ProviderFault
            WSManFault
                Message = Cannot create a WinRM listener on HTTPS because this machine does not have an appropriate certificate. To be used for SSL, a certificate must have a CN matching the hostname, be appropriate for Server Authentication, and not be expired, revoked, or self-signed. 

Error number:  -2144108267 0x80338115
Cannot create a WinRM listener on HTTPS because this machine does not have an appropriate certificate. To be used for SSL, a certificate must have a CN matching the hostname, be appropriate for Server Authentication, and not be expired, revoked, or self-signed. 

And when trying to connect using openssl to retrieve the certificate i'm getting errno=54

openssl s_client -connect 13.95.122.54:5986 -showcerts
CONNECTED(00000003)
write:errno=54
---
no peer certificate available
---
No client certificate CA names sent
---
SSL handshake has read 0 bytes and written 307 bytes
---
New, (NONE), Cipher is (NONE)
Secure Renegotiation IS NOT supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
SSL-Session:
    Protocol  : TLSv1.2
    Cipher    : 0000
    Session-ID:
    Session-ID-ctx:
    Master-Key:
    Key-Arg   : None
    PSK identity: None
    PSK identity hint: None
    SRP username: None
    Start Time: 1580229460
    Timeout   : 300 (sec)
    Verify return code: 0 (ok)
---

Trying to re-generate self-signed certificate and reconfigure WinRM causing packer to immediately respond to the connection

$Cert = New-SelfSignedCertificate -CertstoreLocation Cert:\LocalMachine\My -DnsName "$env:COMPUTERNAME"
Remove-Item -Path WSMan:\Localhost\listener\listener* -Recurse
New-Item -Path WSMan:\LocalHost\Listener -Transport HTTPS -Address * -CertificateThumbPrint $Cert.Thumbprint -Force
Stop-Service winrm
Start-Service winrm

and from openssl showcerts i'm getting a correct answer

 openssl s_client -connect 13.95.122.54:5986 -showcerts
CONNECTED(00000003)
depth=0 CN = pkrvm39jkvjspuk
verify error:num=20:unable to get local issuer certificate
verify return:1
depth=0 CN = pkrvm39jkvjspuk
verify error:num=21:unable to verify the first certificate
verify return:1
---
Certificate chain
 0 s:/CN=pkrvm39jkvjspuk
   i:/CN=pkrvm39jkvjspuk
-----BEGIN CERTIFICATE-----
MIIDKjCCAhKgAwIBAgIQbI6Ll/YdLKZFm3XIDuCVEzANBgkqhkiG9w0BAQsFADAa
MRgwFgYDVQQDDA9wa3J2bTM5amt2anNwdWswHhcNMjAwMTI4MTYzNDI4WhcNMjEw
MTI4MTY1NDI4WjAaMRgwFgYDVQQDDA9wa3J2bTM5amt2anNwdWswggEiMA0GCSqG
SIb3DQEBAQUAA4IBDwAwggEKAoIBAQDTaBPCr8ImXt+wyDEcNVK3lW5HOme7X8h0
gl+ZTAmwhlzyZwWI1S5fW0Gfc+VQtwmscZs7in1/Rg0EBnhCHKiXYdJdWgiNQjp8
hxNHQlPzFMxBNHJCncs3cUjl8TBvWFVof+mNmv20IcoDfhkBXo8PBMC1M08krfGd
KXxvJ/Km3dfGvY3HKyMAdwJK/r4rENnTMIr5KgOv2cL4usTNS0o4nQSDVbL8rXdN
0Pfwui0ItGiZ7auul/tioQAmKpcxle7y16b/XnX1olQp59T7WklKcfS4Rt+XloAM
dyam22dhXaPQ9/03MBEqguO/SXDV2m+7RFLPRzHDPWwrQjE6eClDAgMBAAGjbDBq
MA4GA1UdDwEB/wQEAwIFoDAdBgNVHSUEFjAUBggrBgEFBQcDAgYIKwYBBQUHAwEw
GgYDVR0RBBMwEYIPcGtydm0zOWprdmpzcHVrMB0GA1UdDgQWBBQYK0o8mxc3uUyn
9WAvpOzINrvkyzANBgkqhkiG9w0BAQsFAAOCAQEALIRGvoQONxX0RzdyOEX15dJm
tMChjVgU9y176UK03NcuNqfQqJXhnibZQO/+ApXT4C1YKUzZcmqkJpPkt2ufYmC1
sFLp3tGZ35zfjtU8Mm6xEHdQv4LGQzpCycVqlvFGrdWCMCB4EWZb0z7oqp+nsz2P
14HFaiPsHnfpJEMUF+jrMQkGb9bzMHTT4Y0q5TStVdc9q1cu3pWLnzJ6gaBlz0Iz
DG03HtTmwppmDLSE1RZYJBQ6UsgD/L/jbR2c08ko4t1uSMwRcANv5sGZ6TukyK95
JVnYbFrZWzcqWfE1uynTEdeb+l/aospY9g/Fjt4WKI0U0xnGuczsbx1KoO0ELg==
-----END CERTIFICATE-----
---
Server certificate
subject=/CN=pkrvm39jkvjspuk
issuer=/CN=pkrvm39jkvjspuk
---
No client certificate CA names sent
Peer signing digest: SHA256
Server Temp Key: ECDH, P-256, 256 bits
---
SSL handshake has read 1298 bytes and written 433 bytes
---
New, TLSv1/SSLv3, Cipher is ECDHE-RSA-AES256-GCM-SHA384
Server public key is 2048 bit
Secure Renegotiation IS supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
SSL-Session:
    Protocol  : TLSv1.2
    Cipher    : ECDHE-RSA-AES256-GCM-SHA384
    Session-ID: 5E200000884A7231C92707E15CD2222B4BE94DD50A3B61E7B8763B3BC0A2F615
    Session-ID-ctx:
    Master-Key: 6CF4DA86AEBEB597F72DB9DC9E8C8B59D8B240C7FE6F8491B14314E86529A338F07E1B2C5BEB300C48DE4D490978D5D5
    Key-Arg   : None
    PSK identity: None
    PSK identity hint: None
    SRP username: None
    Start Time: 1580229891
    Timeout   : 300 (sec)
    Verify return code: 21 (unable to verify the first certificate)
---
````


I see that packer is using azure osProfile.windowsConfiguration.winRM value in the template to configure winRM on the VM, 

So here i would assume that either there is an issue with creating the certificate from packer side before uploading it to azure vault, or and issue with azure that prevents the VM from configuring winRM correctly using the values from the template, this may needs more troubleshooting. 

"osProfile": {
"computerName": "[parameters('virtualMachines_pkrvm2nb5asnu2s_name')]",
"adminUsername": "packer",
"windowsConfiguration": {
"provisionVMAgent": true,
"enableAutomaticUpdates": true,
"winRM": {
"listeners": [
{
"protocol": "https",
"certificateUrl": "https://pkrkv2nb5asnu2s.vault.azure.net/secrets/packerKeyVaultSecret/05113faa18ee40a2b5465910b2f3dda1"
}
]
}
},
"secrets": [
{
"sourceVault": {
"id": "[parameters('vaults_pkrkv2nb5asnu2s_externalid')]"
},
"vaultCertificates": [
{
"certificateUrl": "https://pkrkv2nb5asnu2s.vault.azure.net/secrets/packerKeyVaultSecret/05113faa18ee40a2b5465910b2f3dda1",
"certificateStore": "My"
}
]
}
]
},
```

@AliAllomani Okay... Which region are you deploying to? Few weeks ago I was thinking that this is an image related issue, I had no time to investigate further. Today I tried to use an older image and it started to work so I opened this, but then tried with the latest as well and it was also working. Don't know what is going on.

Can you try with older versions as well? Also in WestUS2? Let's try to rule these out...

Reopened this for now, but you are on your own, because it is now working for me....

@Dilergore I'm deploying to EU West, also faced the timeout issue with the latest windows 2019-Datacenter image but not sure if it's the same issue, will do more tests from my side on different images.

@AliAllomani It was not happening for me with 2019. Usually it takes some time to configure WinRM by default. Using bigger machine, SSD, and increasing the timeout is usually working for this problem.

My setup is:
Timeout: 20 min
Premium SSD for OS Disk
D4s_v3

In my experience even with this sometimes it takes longer than 5-6 minutes to configure it and connect to it.

@Dilergore it seems intermittent,

The common thing i find out :

  • it's usually taking up to 10 min to initially configure the VM and to be able to run powershell commands (even locally) for 2016, for 2019 sometime it's available immediately

  • in all cases, certificate common name is not correct, as packer is always creating common name with format {hostname}.cloudapp.azure.com, however for example the format for EU West should be {hostname}.westeurope.cloudapp.azure.com, but this is not an issue as we define "winrm_insecure": true

https://github.com/hashicorp/packer/blob/af2c4346f8454edb80fefd2fb28bc8b6a632eaa6/builder/azure/arm/config.go#L452

  • when it's not responding within 10 min it is always an issue with certificate : Encountered an internal error in the SSL library
    this appears when you try to connect to the instance at any time
> Test-WSMan -ComputerName 52.142.198.26 -UseSSL
Test-WSMan : <f:WSManFault xmlns:f="http://schemas.microsoft.com/wbem/wsman/1/wsmanfault" Code="12175"
Machine="Bastion-UAT.wdprocessing.pvt"><f:Message>The server certificate on the destination computer
(52.142.198.26:5986) has the following errors:
Encountered an internal error in the SSL library.   </f:Message></f:WSManFault>
At line:1 char:1
+ Test-WSMan -ComputerName 52.142.198.26 -UseSSL
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (52.142.198.26:String) [Test-WSMan], InvalidOperationException
    + FullyQualifiedErrorId : WsManError,Microsoft.WSMan.Management.TestWSManCommand
  • removing WinRM listener and re-create it manually fixes the issue(packer directly respond, and Test-WSMan shows the correct answer) , no need to re-generate or using different a certificate , test results below.

  • seems it's happening when using latest image more often than using "14393.3326.1911120150"

  • in windows events log i can see the below event when the issue is there

A fatal error occurred while creating a TLS client credential. The internal error state is 10013.

<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event"> 
- <System> 
<Provider Name="Schannel" Guid="{1F678132-5938-4686-9FDC-C8FF68F15C85}" /> 
<EventID>36871</EventID> 
<Version>0</Version> 
<Level>2</Level> 
<Task>0</Task> 
<Opcode>0</Opcode> 
<Keywords>0x8000000000000000</Keywords> 
<TimeCreated SystemTime="2020-01-29T12:25:18.377000300Z" /> 
<EventRecordID>767</EventRecordID> 
<Correlation ActivityID="{80B997BA-F1CA-0000-01F5-7D5E9AD6D501}" /> 
<Execution ProcessID="632" ThreadID="2352" /> 
<Channel>System</Channel> 
<Computer>pkrvmudjx20x9lp</Computer> 
<Security UserID="S-1-5-18" /> 
</System> 
- <EventData> 
<Data Name="Type">client</Data> 
<Data Name="ErrorState">10013</Data> 
</EventData> 
</Event>

occurrence tests done so far ( All in EU West ) :

Standard_F8s_v2 -  SSD - win2019 - image version : latest

15:50:16  ==> azure-arm: Getting the VM's IP address ...
15:50:16  ==> azure-arm:  -> ResourceGroupName   : 'packer-Resource-Group-ibnzhmks0m'
15:50:16  ==> azure-arm:  -> PublicIPAddressName : 'pkripibnzhmks0m'
15:50:16  ==> azure-arm:  -> NicName             : 'pkrniibnzhmks0m'
15:50:16  ==> azure-arm:  -> Network Connection  : 'PublicEndpointInPrivateNetwork'
15:50:16  ==> azure-arm:  -> IP Address          : '40.68.191.187'
15:50:16  ==> azure-arm: Waiting for WinRM to become available...
15:50:16  ==> azure-arm: #< CLIXML
15:50:16      azure-arm: WinRM connected.

=======

Standard_F8s_v2 -  SSD - win2016 - image version : 14393.3326.1911120150
14:11:19  ==> azure-arm: Getting the VM's IP address ...
14:11:19  ==> azure-arm:  -> ResourceGroupName   : 'packer-Resource-Group-zhyjvoeajl'
14:11:19  ==> azure-arm:  -> PublicIPAddressName : 'pkripzhyjvoeajl'
14:11:19  ==> azure-arm:  -> NicName             : 'pkrnizhyjvoeajl'
14:11:19  ==> azure-arm:  -> Network Connection  : 'PublicEndpointInPrivateNetwork'
14:11:19  ==> azure-arm:  -> IP Address          : '52.174.178.101'
14:11:19  ==> azure-arm: Waiting for WinRM to become available...
14:20:40  ==> azure-arm: #< CLIXML
14:20:40      azure-arm: WinRM connected.

================
Standard_B2ms - HDD - win2016 - image version : latest
12:13:08  ==> azure-arm: Getting the VM's IP address ...
12:13:08  ==> azure-arm:  -> ResourceGroupName   : 'packer-Resource-Group-wt2ndevwlv'
12:13:08  ==> azure-arm:  -> PublicIPAddressName : 'pkripwt2ndevwlv'
12:13:08  ==> azure-arm:  -> NicName             : 'pkrniwt2ndevwlv'
12:13:08  ==> azure-arm:  -> Network Connection  : 'PublicEndpointInPrivateNetwork'
12:13:08  ==> azure-arm:  -> IP Address          : '52.148.254.62'
12:13:08  ==> azure-arm: Waiting for WinRM to become available...
12:43:00  ==> azure-arm: Timeout waiting for WinRM.
12:43:00  ==> azure-arm: 
12:43:00  ==> azure-arm: Cleanup requested, deleting resource group ...
12:49:52  ==> azure-arm: Resource group has been deleted.
12:49:52  Build 'azure-arm' errored: Timeout waiting for WinRM.
==============
Standard_D8s_v3 - HDD - win2016 - image version : latest

20:57:27  ==> azure-arm: Waiting for WinRM to become available...
21:06:19  ==> azure-arm: #< CLIXML
21:06:19      azure-arm: WinRM connected.
21:06:19  ==> azure-arm: <Objs Version="1.1.0.1" xmlns="http://schemas.microsoft.com/powershell/2004/04"><Obj S="progress" RefId="0"><TN RefId="0"><T>System.Management.Automation.PSCustomObject</T><T>System.Object</T></TN><MS><I64 N="SourceId">1</I64><PR N="Record"><AV>Preparing modules for first use.</AV><AI>0</AI><Nil /><PI>-1</PI><PC>-1</PC><T>Completed</T><SR>-1</SR><SD> </SD></PR></MS></Obj><Obj S="progress" RefId="1"><TNRef RefId="0" /><MS><I64 N="SourceId">1</I64><PR N="Record"><AV>Preparing modules for first use.</AV><AI>0</AI><Nil /><PI>-1</PI><PC>-1</PC><T>Completed</T><SR>-1</SR><SD> </SD></PR></MS></Obj></Objs>
21:06:19  ==> azure-arm: Connected to WinRM!
21:06:19  ==> azure-arm: Provisioning with Powershell...
===========
Standard_D8s_v3 - SSD - win2016 - image version : latest

21:17:12  ==> azure-arm: Getting the VM's IP address ...
21:17:12  ==> azure-arm:  -> ResourceGroupName   : 'packer-Resource-Group-vi2l6na2zy'
21:17:12  ==> azure-arm:  -> PublicIPAddressName : 'pkripvi2l6na2zy'
21:17:12  ==> azure-arm:  -> NicName             : 'pkrnivi2l6na2zy'
21:17:12  ==> azure-arm:  -> Network Connection  : 'PublicEndpointInPrivateNetwork'
21:17:12  ==> azure-arm:  -> IP Address          : '168.63.109.42'
21:17:12  ==> azure-arm: Waiting for WinRM to become available...
21:47:20  ==> azure-arm: Timeout waiting for WinRM.
21:47:20  ==> azure-arm: 
21:47:20  ==> azure-arm: Cleanup requested, deleting resource group ...
==============================
Standard_D8s_v3 - SSD - win2016 - image version : latest

11:51:06  ==> azure-arm: Getting the VM's IP address ...
11:51:06  ==> azure-arm:  -> ResourceGroupName   : 'packer-Resource-Group-ksei5ia6c6'
11:51:06  ==> azure-arm:  -> PublicIPAddressName : 'pkripksei5ia6c6'
11:51:06  ==> azure-arm:  -> NicName             : 'pkrniksei5ia6c6'
11:51:06  ==> azure-arm:  -> Network Connection  : 'PublicEndpointInPrivateNetwork'
11:51:06  ==> azure-arm:  -> IP Address          : '13.95.64.201'
11:51:06  ==> azure-arm: Waiting for WinRM to become available...
11:59:58      azure-arm: WinRM connected.
11:59:58  ==> azure-arm: #< CLIXML
==============================
Standard_D8s_v3 - SSD - win2016 - image version : 14393.3326.1911120150

21:56:07  ==> azure-arm: Getting the VM's IP address ...
21:56:07  ==> azure-arm:  -> ResourceGroupName   : 'packer-Resource-Group-6bz6fqr3js'
21:56:07  ==> azure-arm:  -> PublicIPAddressName : 'pkrip6bz6fqr3js'
21:56:07  ==> azure-arm:  -> NicName             : 'pkrni6bz6fqr3js'
21:56:07  ==> azure-arm:  -> Network Connection  : 'PublicEndpointInPrivateNetwork'
21:56:07  ==> azure-arm:  -> IP Address          : '104.46.40.255'
21:56:07  ==> azure-arm: Waiting for WinRM to become available...
22:03:43  ==> azure-arm: #< CLIXML
22:03:43      azure-arm: WinRM connected.
22:03:43  ==> azure-arm: <Objs Version="1.1.0.1" xmlns="http://schemas.microsoft.com/powershell/2004/04"><Obj S="progress" RefId="0"><TN RefId="0"><T>System.Management.Automation.PSCustomObject</T><T>System.Object</T></TN><MS><I64 N="SourceId">1</I64><PR N="Record"><AV>Preparing modules for first use.</AV><AI>0</AI><Nil /><PI>-1</PI><PC>-1</PC><T>Completed</T><SR>-1</SR><SD> </SD></PR></MS></Obj><Obj S="progress" RefId="1"><TNRef RefId="0" /><MS><I64 N="SourceId">1</I64><PR N="Record"><AV>Preparing modules for first use.</AV><AI>0</AI><Nil /><PI>-1</PI><PC>-1</PC><T>Completed</T><SR>-1</SR><SD> </SD></PR></MS></Obj></Objs>
22:03:43  ==> azure-arm: Connected to WinRM!
22:03:43  ==> azure-arm: Provisioning with Powershell...

=========

Standard_F8s_v2 -  HDD - win2019 - image version : latest

16:19:50  ==> azure-arm:  -> ResourceGroupName   : 'packer-Resource-Group-wwwgtctyip'
16:19:50  ==> azure-arm:  -> PublicIPAddressName : 'pkripwwwgtctyip'
16:19:50  ==> azure-arm:  -> NicName             : 'pkrniwwwgtctyip'
16:19:50  ==> azure-arm:  -> Network Connection  : 'PublicEndpointInPrivateNetwork'
16:19:50  ==> azure-arm:  -> IP Address          : '52.157.111.197'
16:19:50  ==> azure-arm: Waiting for WinRM to become available...
16:19:56  ==> azure-arm: #< CLIXML
16:19:56      azure-arm: WinRM connected.

========

Standard_B4ms - HDD - win2019 - image version : latest

16:03:00  ==> azure-arm:  -> DeploymentName    : 'pkrdp3ko5xlkk4n'
16:05:07  ==> azure-arm: Getting the VM's IP address ...
16:05:07  ==> azure-arm:  -> ResourceGroupName   : 'packer-Resource-Group-3ko5xlkk4n'
16:05:07  ==> azure-arm:  -> PublicIPAddressName : 'pkrip3ko5xlkk4n'
16:05:07  ==> azure-arm:  -> NicName             : 'pkrni3ko5xlkk4n'
16:05:07  ==> azure-arm:  -> Network Connection  : 'PublicEndpointInPrivateNetwork'
16:05:07  ==> azure-arm:  -> IP Address          : '52.166.196.146'
16:05:07  ==> azure-arm: Waiting for WinRM to become available...
16:34:59  ==> azure-arm: Timeout waiting for WinRM.
16:34:59  ==> azure-arm: 

Replacing the listener test :

Windows PowerShell
Copyright (C) 2016 Microsoft Corporation. All rights reserved.

PS C:\Users\packer> Test-WSMan -ComputerName localhost -UseSSL
Test-WSMan : <f:WSManFault xmlns:f="http://schemas.microsoft.com/wbem/wsman/1/wsmanfault" Code="12175"
Machine="pkrvmwawvo84vka"><f:Message>The server certificate on the destination computer (localhost:5986) has the
following errors:
Encountered an internal error in the SSL library.   </f:Message></f:WSManFault>
At line:1 char:1
+ Test-WSMan -ComputerName localhost -UseSSL
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (localhost:String) [Test-WSMan], InvalidOperationException
    + FullyQualifiedErrorId : WsManError,Microsoft.WSMan.Management.TestWSManCommand

PS C:\Users\packer> Get-ChildItem -path cert:\LocalMachine\My


   PSParentPath: Microsoft.PowerShell.Security\Certificate::LocalMachine\My

Thumbprint                                Subject
----------                                -------
8DDC5709AB990B6AC7F8D8CF1B97FC5FA136B9C0  CN=pkrvmwawvo84vka.cloudapp.net


PS C:\Users\packer> Remove-Item -Path WSMan:\Localhost\listener\listener* -Recurse
PS C:\Users\packer> Test-WSMan -ComputerName localhost -UseSSL
Test-WSMan : <f:WSManFault xmlns:f="http://schemas.microsoft.com/wbem/wsman/1/wsmanfault" Code="2150858770"
Machine="pkrvmwawvo84vka"><f:Message>The client cannot connect to the destination specified in the request. Verify
that the service on the destination is running and is accepting requests. Consult the logs and documentation for the
WS-Management service running on the destination, most commonly IIS or WinRM. If the destination is the WinRM service,
run the following command on the destination to analyze and configure the WinRM service: "winrm quickconfig".
</f:Message></f:WSManFault>
At line:1 char:1
+ Test-WSMan -ComputerName localhost -UseSSL
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (localhost:String) [Test-WSMan], InvalidOperationException
    + FullyQualifiedErrorId : WsManError,Microsoft.WSMan.Management.TestWSManCommand


PS C:\Users\packer> New-Item -Path WSMan:\LocalHost\Listener -Transport HTTPS -Address * -CertificateThumbPrint 8DDC5709AB990B6AC7F8D8CF1B97FC5FA136B9C0 -Force


   WSManConfig: Microsoft.WSMan.Management\WSMan::localhost\Listener

Type            Keys                                Name
----            ----                                ----
Container       {Transport=HTTPS, Address=*}        Listener_1305953032


PS C:\Users\packer> Test-WSMan -ComputerName localhost -UseSSL
Test-WSMan : <f:WSManFault xmlns:f="http://schemas.microsoft.com/wbem/wsman/1/wsmanfault" Code="12175"
Machine="pkrvmwawvo84vka"><f:Message>The server certificate on the destination computer (localhost:5986) has the
following errors:
The SSL certificate is signed by an unknown certificate authority.
The SSL certificate contains a common name (CN) that does not match the hostname.     </f:Message></f:WSManFault>
At line:1 char:1
+ Test-WSMan -ComputerName localhost -UseSSL
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (localhost:String) [Test-WSMan], InvalidOperationException
    + FullyQualifiedErrorId : WsManError,Microsoft.WSMan.Management.TestWSManCommand

PS C:\Users\packer>

Just wanted to add I also get intermittent WinRM timeouts using both 2012-R2-Datacenter and 2016-Datacenter in UK South. It seems worse on the 2012-R2-Datacenter builds.

I was using smalldisk image variants, but changed to using the standard ones with more disk available following previous advice.

I've also increased the WinRM timeout to 1 hour, and increased VM size to Standard_D4s_v3, to no avail.

I've been having the same issues in US West 2 for the last couple of days: 2019-Datacenter builds are fine, but 2016-Datacenter and 2012-R2-Datacenter ones intermittently fail to connect via WinRM, with 2012-R2 being the most problematic. Builds are done using smalldisk image, initially with D2sV3 vm_size and 20 minute winrm_timeout values. Increasing the VM size or timeout doesn't show any perceptible improvement.

I can fast track this with Microsoft but without the root cause... Also it totally seems working for me (for now), so I cannot even continue testing on my own. If you guys can find out what the issue is, I am happy to engage the support.

I just started running into this problem today. The last two weeks I've been building images to test out an automated process using Packer and did not have any issues with WinRM. I'm running Packer on the Azure DevOps Hosted Agent windows-2019 targeting resource groups in the South Central US region using the 2016-Datacenter image. I ran three builds today without issue and at 2pm EST the build started to fail for WinRM timeout reasons. I'm using a Standard_DS4_v2 size VM so it is highly unlikely to be a resource constraint issue. The way it is behaving, I'm leaning towards a networking related issue in the Azure data center. I'm running a few tests now to try and provide some more useful details.

From the my tests findings i’d assume that something is going wrong within the os during the auto winrm ssl configuration by azure vm template

@Dilergore i think there is no way currently available by packer to configure the builder vm to use non-ssl winrm ?

From the my tests findings i’d assume that something is going wrong within the os during the auto winrm ssl configuration by azure vm template

@Dilergore i think there is no way currently available by packer to configure the builder vm to use non-ssl winrm ?

https://www.packer.io/docs/communicators/winrm.html#winrm-communicator-options

Never tried it tho...

@Dilergore The available parameters are to define the method that the communicator use, however on the builder side i see it's hardcoded

https://github.com/hashicorp/packer/blob/df031db9daa3d9527a48fe3097d2d6003cb2ba57/builder/azure/common/template/template_builder.go#L90-L99

And today, just to muddy the water a bit...

Yesterday evening's (1800 GMT-8) pipeline failed due to WinRM timeout on all three builds - 2012 R2, 2016, and 2019. This morning's run (0400) ran correctly. This is the first WinRM timeout I've seen using the 2019-Datacenter source. All three builds use smalldisk, DS3v2, 60m WinRM timeout.

In addition, afternoon/evening builds have a much higher incidence of failure than early morning ones.

We have the similar issue, but this is imho doesn't depend on particular windows image, and we think that this is the issue with azure platform itself. For our case a little workaround is to change instance type from Standard_DS2_v2 to Standard_B2ms and vice versa

Hi Folks, thanks for keeping this thread up to date with your latest findings. I am looking into this issue on my end to see if there is any information that can help isolate what might be happening here. I too have observed that when using certain images connecting via winrm timeouts; changing my image to 2012-R2-Datacenter seems to work all the time within the westus region.

We have the similar issue, but this is imho doesn't depend on particular windows image, and we think that this is the issue with azure platform itself. For our case a little workaround is to change instance type from Standard_DS2_v2 to Standard_B2ms and vice versa

This is possible, but hard to tell with the information in the logs.

@Dilergore have you, or anyone on the thread, opened a support ticket with Azure around this particular issue?

@nywilken i will open it during the weekend. Will involve some people who can help us / can route the ticket inside Microsoft. If you want to contribute please send me your mail address privately.

Thanks!

As noted in the Packer Documentation - Getting started/Build an image

A quick aside/warning:
Windows administrators in the know might be wondering why we haven't simply used a winrm quickconfig -q command in the script above, as this would automatically set up all of the required elements necessary for connecting over WinRM. Why all the extra effort to configure things manually?
Well, long and short, use of the winrm quickconfig -q command can sometimes cause the Packer build to fail shortly after the WinRM connection is established. How?

  1. Among other things, as well as setting up the listener for WinRM, the quickconfig command also configures the firewall to allow management messages to be sent over HTTP.
  1. This undoes the previous command in the script that configured the firewall to prevent this access.
  1. The upshot is that the system is configured and ready to accept WinRM connections earlier than intended.
  1. If Packer establishes its WinRM connection immediately after execution of the 'winrm quickconfig -q' command, the later commands within the script that restart the WinRM service will unceremoniously pull the rug out from under the connection.
  1. While Packer does a lot to ensure the stability of its connection in to your instance, this sort of abuse can prove to be too much and may cause your Packer build to stall irrecoverably or fail!

Unfortunately, while this is true on AWS using the userdata script, I'm not sure how Azure builder configures WINRM and perhaps run winrm quickconfig -q while packer attempts to connect on Azure. If yes, then it might be the cause of this

Also, take note that Packer documentation - Communicator/WINRM stills refers to winrm quick config -q and many other repo files also mention winrm quick config -q which could affect other builders and direct the community into this issue.

A successful workaround that I am using on AWS is to use SSH communicator on windows 2016/2019 installing SSH using the userdata following installation instructions provided by Microsoft Documentation or Microsoft openssh portable. Not sure how this would translate for Azure.

Hi All,

Is there any updates on this? I can't repro the issue for a couple of days, probably some fix was rolled out?

@AlexeyKarpushin - I definitely still have the issue.

It seems to have an Azure load component. I still get WInRM timeouts on all three platforms (2012R2, 2016, 2019.) Failure rates on the three builds is 60% or more during weekday business hours, about 10-20% weeknights and weekend day, and very rare weekend overnight. In addition, the 2019 builds have a much lower failure rate than 2016 and 2012R2.

getting this issue intermittently on windows 2019 now as well. I suspect it may be as others have said, something to do with the time of day. Seems to work in mornings/evenings but not in core hours
```communicator": "winrm",
"winrm_use_ssl": true,
"winrm_insecure": true,
"winrm_timeout": "10m",
"winrm_username": "packer",

"location": "uksouth",
"vm_size": "Standard_DS2_v2"
```

Hi folks, sorry for the slow response here. I have not been able to reproduce this issue since Friday although I do notice that 2016-Datacenter builds take longer than other Os versions to connect via WinRM. But I don't know why that is.

@nywilken i will open it during the weekend. Will involve some people who can help us / can route the ticket inside Microsoft. If you want to contribute please send me your mail address privately.

@Dilergore I don't have any new information to contribute so I'll refrain from reaching out privately. But thanks for offering to include me in the thread.

I definitely still have the issue.

For folks who are still able to reproduce the issue, when WinRM connectivity is timing out.

  • Can you telnet to the WinRM port(s) 5985 or 5986?
  • If you are able to connect via RDP are there any relevant errors in the event viewer ?

same here, I was able to reproduce just now:
"location": "East US",
"image_offer": "WindowsServer",
"image_sku": "2019-Datacenter",
"communicator": "winrm",
"winrm_use_ssl": true,
"winrm_insecure": true,
"winrm_timeout": "10m",

telnet to 5986 works, telnet to 5985 does not work.

10:02:18 ==> azure-arm: Waiting for WinRM to become available...
10:12:25 ==> azure-arm: Timeout waiting for WinRM.
10:12:25 ==> azure-arm:
10:12:25 ==> azure-arm: Cleanup requested, deleting resource group ...
10:12:25 ==> azure-arm:
10:12:25 ==> azure-arm: Not waiting for Resource Group delete as requested by user. Resource Group Name is packer-Resource-Group-z27ecnv9bw
10:12:25 Build 'azure-arm' errored: Timeout waiting for WinRM.
10:12:25
10:12:25 ==> Some builds didn't complete successfully and had errors:
10:12:25 --> azure-arm: Timeout waiting for WinRM.

Hi All,

I've created a workaround which allows our Azure DevOps pipelines to run. It doesn't solve the problem but allows to ignore it.. I can't paste the whole code here, but I can make a short description, hopefully it will be useful. The main idea is to re-create WinRM listener on the temp machine during Packer build.
Here are the steps:

  1. Enable Packer log: set $Env:PACKER_LOG=1 and $Env:PACKER_LOG_PATH='path to packer log'
  2. Create a simple parser which will analyze Packer log to find resource group name and temp VM name, and to detect WinRM issue. Error message in the log which indicates the issue is: _"An existing connection was forcibly closed by the remote host"_
    2.1 If you'll find a better solution on how to find a resource group, please share it!
  3. If the issue is detected, execute Invoke-AzVMRunCommand against the temp machine. This code will reconfigure WinRM listener:
$Cert = New-SelfSignedCertificate -CertstoreLocation Cert:\LocalMachine\My -DnsName "$env:COMPUTERNAME"
Remove-Item -Path WSMan:\Localhost\listener\listener* -Recurse
New-Item -Path WSMan:\LocalHost\Listener -Transport HTTPS -Address * -CertificateThumbPrint $Cert.Thumbprint -Force
  1. Run the resulting script as async powershell job before starting Packer build. Set some timeout before parsing the log to allow Packer to provision the temp machine, 10 min works fine for me. If you're doing it from YAML Azure DevOps pipeline, start the async job in the same step where Packer starts, otherwise the async job will be terminated by Azure DevOps client.

I hope the issue will be mitigated in the nearest future and this workaround will not be needed.

Kind regards,
Alexey

@nywilken you can take a look on my previous findings

https://github.com/hashicorp/packer/issues/8658#issuecomment-579784076

I have engaged Microsoft, as it seems I am experiencing this problem again...

@Dilergore I think the problem is on Microsoft's side. For the last three days I've been able to generate images without issue in the south central region using both 2016 and 2019 editions while it seems like others are still not able too.

Are you guys all trying to do this from Azure DevOps or you are using other tools?

Are you guys all trying to do this from Azure DevOps or you are using other tools?

I'm using Jenkins and have the same issue...

Are you guys all trying to do this from Azure DevOps or you are using other tools?

I'm using Azure DevOps with one of the hosted agents.

Are you guys all trying to do this from Azure DevOps or you are using other tools?

I'm using Azure DevOps with self repo.

Overnight run was good for all three Windows target OSs. First run at 0900 was fine, second run at 10:30 failed with WinRM timout on WS2016 and WN2012R2. WS2019 ran without error.

This could be a total coincidence but this has _appeared_ to have worked the last three times I've tried this. Yesterday we lowered the VM size we are using for our Win2016 and Win2019 Packer builds. The Win2019 continues to run without issue but the Win2016 starts to incur a WinRM timeout. I try to repeat the build and wait about 15 to 20 minutes (WinRM timeout is set for 30 minutes) and notice that WinRM is still not responding. I run the Test-WSMan PowerShell cmdlet against the public IP of the machine and it fails as I expect it too. The odd part is with in minutes WinRM is suddenly responding on the VM and the Packer build finishes without issue.

Hi All, I'm a Program Manager for the Azure VM image builder (which uses Packer under the hood), we have seen this too, and we have engaged the Windows PG to investigate. Initially there was a low memory condition which could cause problems, when using Standard D1_v2, this was due to Windows Update, and has been mitigated. However, there is still an issue, Windows PG is investigating, and I will report back when I hear. In the meantime one really kind member reached out with this workaround: https://github.com/danielsollondon/azvmimagebuilder/issues/14#issuecomment-577856888

Wonderful!! Thanks so much for the update and the workaround, @danielsollondon

Hi All, I'm a Program Manager for the Azure VM image builder (which uses Packer under the hood), we have seen this too, and we have engaged the Windows PG to investigate. Initially there was a low memory condition which could cause problems, when using Standard D1_v2, this was due to Windows Update, and has been mitigated. However, there is still an issue, Windows PG is investigating, and I will report back when I hear. In the meantime one really kind member reached out with this workaround: danielsollondon/azvmimagebuilder#14 (comment)

Thanks Daniel! Also thanks to Corey C. who helped me to bring this to your attention! :-)

I also encountered a similar WinRM timeout issue when attempting to deploy the following skus via Packer:
MicrosoftWindowsServer:WindowsServer:2012-R2-Datacenter:4.127.20190603 MicrosoftWindowsServer:WindowsServer:2012-R2-Datacenter:4.127.20190521 MicrosoftWindowsServer:WindowsServer:2012-R2-Datacenter:4.127.20190416
I was able to get positive results by switching to the new hyper-v-generation V2 VM type. The WinRM timeouts didn't appear with the latest gensecond image. (Not a solution obviously - just an observation.)
2012-r2-datacenter-gensecond
Pipeline: Azure Devops
Packer Version: 1.5.1

The following Works reliably:
"os_type": "Windows",
"image_publisher": "MicrosoftWindowsServer",
"image_offer": "WindowsServer",
"image_sku": "2012-r2-datacenter-gensecond",
"communicator": "winrm",
"winrm_use_ssl": "true",
"winrm_insecure": "true",
"location": "West US2",
"vm_size": "Standard_DS4_v2",
"winrm_timeout": "40m",

The following config results in the WinRM timeout and it never recovers:
"os_type": "Windows",
"image_publisher": "MicrosoftWindowsServer",
"image_offer": "WindowsServer",
"image_sku": "2012-R2-Datacenter",
"communicator": "winrm",
"winrm_use_ssl": "true",
"winrm_insecure": "true",
"location": "West US2",
"vm_size": "Standard_DS4_v2",
"winrm_timeout": "40m",

Hi Folks, thanks for your help and time in working to figure out what might be happening here. @Dilergore @danielsollondon thanks for pushing this forward and for the workaround. Looking forward to hearing back.

Also, its seems that the workaround involves a new cert generation step please let us know if there is anything we need to change on our end.

I just hit the winrm timeout on 2019-Datacenter
The last time I did a build was on Jan 10, 2020, and at that time it worked ok.

==> azure-arm: Waiting for WinRM to become available...

==> azure-arm: Timeout waiting for WinRM.

==> azure-arm: 

==> azure-arm: Cleanup requested, deleting resource group ...

==> azure-arm: Resource group has been deleted.

Build 'azure-arm' errored: Timeout waiting for WinRM.


==> Some builds didn't complete successfully and had errors:

--> azure-arm: Timeout waiting for WinRM.


==> Builds finished but no artifacts were created.

I'm using

"image_publisher": "MicrosoftWindowsServer",
"image_offer": "WindowsServer",
"image_sku": "2019-Datacenter",

A query of versions of this image turns up these, and it looks like nothing has been published in 2020, so I'd think that I'm getting the same version as before.

Version              FilterExpression Skus
-------              ---------------- ----
17763.557.1907191810                  2019-Datacenter
17763.557.20190604                    2019-Datacenter
17763.615.1907121548                  2019-Datacenter
17763.678.1908092216                  2019-Datacenter
17763.737.1909062324                  2019-Datacenter
17763.805.1910061628                  2019-Datacenter
17763.864.1911120152                  2019-Datacenter
17763.914.1912042330                  2019-Datacenter
17763.973.2001110547                  2019-Datacenter
2019.0.20181107                       2019-Datacenter
2019.0.20181122                       2019-Datacenter
2019.0.20181218                       2019-Datacenter
2019.0.20190115                       2019-Datacenter
2019.0.20190214                       2019-Datacenter
2019.0.20190314                       2019-Datacenter
2019.0.20190410                       2019-Datacenter
2019.0.20190603                       2019-Datacenter

I changed the VM size from Standard_DS2_v2 to Standard_DS3_v2 and the build ran OK. Not sure if this proves that memory could be an issue, or if I just got lucky.

@pmozbert - I just tried several 2016-Datacenter builds using Standard_DS3_v2, and it hung on "Waiting for WinRM to become available..." as usual. The workaround brought up by @Dilergore yesterday works very well, but unfortunately isn't really practical for an automated process.

Set the following will solve the issue. WinRM with SSL now can only work well (and stable) with a domain controller environment (either Kerberos or NTLMv2).

"winrm_use_ssl": false,

This is not a best practice but that is what I've experienced when working with WinRM. Microsoft should really improve this protocol.

I attempted the fix posted by @azsec on 2016_Datacenter builds: it doesn't appear to do any good in our environment: I still got interminable WinRM timeouts on 8 consecutive attempts. Overnight automated build ran fine with SSL enabled, and the first manual build this morning was successful, but three subsequent ones failed, as did all the ones with SSL disabled.

Quick update, I spoke to the Windows team, they have identified an issue with the Windows Server 2016 image (November onwards), that impacts the time to initiate a WinRM connection with Packer, they are still working on this, and will update again mid next week. In the meantime please try increasing the Packer timeout to 30mins, and try a larger VM size.

It seems that the WINRM connection is available in about 5 mins post instance creation, in my case I can clearly see that the connection is available:

nc -z -w1 x.x.x.x 5986;echo $?
Connection to x.x.x.x port 5986 [tcp/wsmans] succeeded!

However, Packer :

==> azure-arm: Getting the VM's IP address ...
==> azure-arm:  -> ResourceGroupName   : 'packer-Resource-Group-xyz'
==> azure-arm:  -> PublicIPAddressName : 'xyz'
==> azure-arm:  -> NicName             : 'xyz'
==> azure-arm:  -> Network Connection  : 'PublicEndpoint'
==> azure-arm:  -> IP Address          : 'x.x.x.x '
==> azure-arm: Waiting for WinRM to become available...

I am using windows 2016 in this case & size is Standard_B2ms

And in about 11 mins approx. i see it gets connected:

==> azure-arm: #< CLIXML
    azure-arm: WinRM connected.
==> azure-arm: <Objs Version="1.1.0.1" xmlns="http://schemas.microsoft.com/powershell/2004/04"><Obj S="progress" RefId="0"><TN RefId="0"><T>System.Management.Automation.PSCustomObject</T><T>System.Object</T></TN><MS><I64 N="SourceId">1</I64><PR N="Record"><AV>Preparing modules for first use.</AV><AI>0</AI><Nil /><PI>-1</PI><PC>-1</PC><T>Completed</T><SR>-1</SR><SD> </SD></PR></MS></Obj><Obj S="progress" RefId="1"><TNRef RefId="0" /><MS><I64 N="SourceId">1</I64><PR N="Record"><AV>Preparing modules for first use.</AV><AI>0</AI><Nil /><PI>-1</PI><PC>-1</PC><T>Completed</T><SR>-1</SR><SD> </SD></PR></MS></Obj></Objs>
==> azure-arm: Connected to WinRM!

Not sure why there packer doesn't detect the connection availability a bit earlier, Anyway it works fine 2016-Datacenter without an issue, just seems like there is a lag between the time when winrm is available and packer detects the availability.

I have the timeout set to 30 mins anyway as advised by @danielsollondon.

Do I have related isssue or is it separate, if that is what i run and get?

{
    "builders": [
        {
            "type": "azure-arm",

            "client_id": "{{user `client_id`}}",
            "client_secret": "{{user `client_secret`}}",
            "subscription_id": "{{user `subscription_id`}}",
            "tenant_id": "{{user `tenant_id`}}",

            "managed_image_resource_group_name": "contoso-sharepoint-prod-common-rg",
            "managed_image_name": "contoso-sharepoint-prod-common-{{user `box_name`}}",

            "os_type": "Windows",
            "image_publisher": "MicrosoftWindowsServer",
            "image_offer": "WindowsServer",
            "image_sku": "2019-Datacenter",
            "image_version": "latest",

            "communicator": "winrm",
            "winrm_use_ssl": "false",
            "winrm_insecure": "true",
            "winrm_timeout": "30m",
            "winrm_username": "packer",

            "vm_size": "Standard_F2s",
            "managed_image_storage_account_type": "Premium_LRS",

            "build_resource_group_name": "contoso-sharepoint-prod-common-rg",
            "temp_compute_name": "shrpt{{timestamp}}"
        }
    ],
    "provisioners": [
        { "type": "windows-restart" }
    ],
    "variables": {
        "box_name": "win-oos{{env `vm_image_name_suffix`}}",
        "client_id": "{{env `ARM_CLIENT_ID`}}",
        "client_secret": "{{env `ARM_CLIENT_SECRET`}}",
        "subscription_id": "{{env `ARM_SUBSCRIPTION_ID`}}",
        "tenant_id": "{{env `ARM_TENANT_ID`}}"
    }
}
PS C:\Users\01sodfin\Documents> packer build -only azure-arm "win-sp.json"
azure-arm: output will be in this color.

==> azure-arm: Running builder ...
==> azure-arm: Getting tokens using client secret
==> azure-arm: Getting tokens using client secret
    azure-arm: Creating Azure Resource Manager (ARM) client ...
==> azure-arm: Using existing resource group ...
==> azure-arm:  -> ResourceGroupName : 'contoso-sharepoint-prod-common-rg'
==> azure-arm:  -> Location          : 'westeurope'
==> azure-arm: Validating deployment template ...
==> azure-arm:  -> ResourceGroupName : 'contoso-sharepoint-prod-common-rg'
==> azure-arm:  -> DeploymentName    : 'pkrdpe380nb7t2s'
==> azure-arm: Deploying deployment template ...
==> azure-arm:  -> ResourceGroupName : 'contoso-sharepoint-prod-common-rg'
==> azure-arm:  -> DeploymentName    : 'kvpkrdpe380nb7t2s'
==> azure-arm: Getting the certificate's URL ...
==> azure-arm:  -> Key Vault Name        : 'pkrkve380nb7t2s'
==> azure-arm:  -> Key Vault Secret Name : 'packerKeyVaultSecret'
==> azure-arm:  -> Certificate URL       : 'https://pkrkve380nb7t2s.vault.azure.net/secrets/packerKeyVaultSecret/f395e80640be4dbcaf47b508c4ef864b'
==> azure-arm: Setting the certificate's URL ...
==> azure-arm: Validating deployment template ...
==> azure-arm:  -> ResourceGroupName : 'contoso-sharepoint-prod-common-rg'
==> azure-arm:  -> DeploymentName    : 'pkrdpe380nb7t2s'
==> azure-arm: Deploying deployment template ...
==> azure-arm:  -> ResourceGroupName : 'contoso-sharepoint-prod-common-rg'
==> azure-arm:  -> DeploymentName    : 'pkrdpe380nb7t2s'
==> azure-arm: Getting the VM's IP address ...
==> azure-arm:  -> ResourceGroupName   : 'contoso-sharepoint-prod-common-rg'
==> azure-arm:  -> PublicIPAddressName : 'pkripe380nb7t2s'
==> azure-arm:  -> NicName             : 'pkrnie380nb7t2s'
==> azure-arm:  -> Network Connection  : 'PublicEndpoint'
==> azure-arm:  -> IP Address          : '104.40.139.200'
==> azure-arm: Waiting for WinRM to become available...
==> azure-arm: Timeout waiting for WinRM.
==> azure-arm:
==> azure-arm: The resource group was not created by Packer, deleting individual resources ...
==> azure-arm:  -> Deployment: pkrdpe380nb7t2s
==> azure-arm:  -> Microsoft.Compute/virtualMachines : 'shrpt1581359966'
==> azure-arm:  -> Microsoft.Network/networkInterfaces : 'pkrnie380nb7t2s'
==> azure-arm:  -> Microsoft.Network/virtualNetworks : 'pkrvne380nb7t2s'
==> azure-arm:  -> Microsoft.Network/publicIPAddresses : 'pkripe380nb7t2s'
==> azure-arm:  -> Microsoft.Compute/disks : '/subscriptions/58baf6a1-d140-4b25-8ed1-b3195bbf2c7c/resourceGroups/CONTOSO-SHAREPOINT-PROD-COMMON-RG/providers/Microsoft.Compute/disks/pkrose380nb7t2s'
==> azure-arm:
==> azure-arm: The resource group was not created by Packer, deleting individual resources ...
==> azure-arm: Could not retrieve OS Image details
==> azure-arm:  -> Deployment: kvpkrdpe380nb7t2s
==> azure-arm:  -> Microsoft.KeyVault/vaults/secrets : 'pkrkve380nb7t2s/packerKeyVaultSecret'
==> azure-arm:  -> Microsoft.KeyVault/vaults : 'pkrkve380nb7t2s'
==> azure-arm:  ->  : ''
==> azure-arm: Error deleting resource.  Please delete manually.
==> azure-arm:
==> azure-arm: Name:
==> azure-arm: Error: Unable to parse path of image
==> azure-arm:
==> azure-arm: The resource group was not created by Packer, not deleting ...
Build 'azure-arm' errored: Timeout waiting for WinRM.

==> Some builds didn't complete successfully and had errors:
--> azure-arm: Timeout waiting for WinRM.

==> Builds finished but no artifacts were created.

Works if I use previous image version:

"image_sku": "2019-Datacenter",
"image_version": "17763.914.1912042330"

In the meantime please try increasing the Packer timeout to 30mins, and try a larger VM size.

2016-Datacenter using VM size of Standard_D2s_v3 and a timeout of 30 minutes still results in WinRM timeout failure in West US 2 today - I'm seeing about 30% failure rate in the morning/early afternoon (UTC-0700) and 60% or more later in the afternoon. Overnight (0300) build seems to be fine. Using a larger VM or increasing the winrm_timeout value to 60m doesn't seem to have any helpful effect.

I've been concentrating on 2016-Datacenter today. The few 2019 and 2012R2 runs have been fine today, but were problematical last Friday. 2016 had 100% failure rate during the day on Friday unless I used the manual workaround @Dilergore posted last week.

Our 2019-Datacenter builds have 100% failure rate regardless of setting timeouts or adjusting VM sizes. These failures were experienced consistently during "business hours" in the East US region.

For reference, these are our settings...

""vm_size": "Standard_DS2_v2",
"image_publisher": "MicrosoftWindowsServer",
"image_offer": "WindowsServer",
"image_sku": "2019-Datacenter",
"image_version": "17763.973.2001110547" <-- January 2020 build
"winrm_use_ssl": "true",
"winrm_insecure": "true",
"winrm_port": "5986",
"winrm_timeout": "30m",

Running the same builds during after hours in the same region resulted in 100% success using a variety of different versions as provided via the following MS release notes page...

https://support.microsoft.com/en-us/help/4537134

We tested the following versions June 2019, November 2019, December 2019, January 2020, every build worked.

How can the timing of day affect this? 🤷‍♂️

We will run the same tests with the versions mentioned above during "business hours" and report back here in the AM.

Our 2019-Datacenter builds have 100% failure rate regardless of setting timeouts or adjusting VM sizes. These failures were experienced consistently during "business hours" in the East US region.

For reference, these are our settings...

""vm_size": "Standard_DS2_v2",
"image_publisher": "MicrosoftWindowsServer",
"image_offer": "WindowsServer",
"image_sku": "2019-Datacenter",
"image_version": "17763.973.2001110547" <-- January 2020 build
"winrm_use_ssl": "true",
"winrm_insecure": "true",
"winrm_port": "5986",
"winrm_timeout": "30m",

Running the same builds during after hours in the same region resulted in 100% success using a variety of different versions as provided via the following MS release notes page...

https://support.microsoft.com/en-us/help/4537134

We tested the following versions June 2019, November 2019, December 2019, January 2020, every build worked.

How can the timing of day affect this? 🤷‍♂️

We will run the same tests with the versions mentioned above during "business hours" and report back here in the AM.

Today's builds are completing without issue. For reference, these are our image settings...

IMAGE_PUBLISHER: "MicrosoftWindowsServer"
IMAGE_OFFER: "WindowsServer"
IMAGE_SKU: "2019-Datacenter"
IMAGE_VERSION: "17763.973.2001110547"
VM_SIZE: "Standard_DS2_v2"

IMAGE_PUBLISHER: "MicrosoftWindowsDesktop"
IMAGE_OFFER: "Windows-10"
IMAGE_SKU: "19h1-pro"
IMAGE_VERSION: "18362.592.2001092016"
VM_SIZE: "Standard_DS2_v2"

I hope an explanation comes down from MS so we don't have to accept the "it works on my machine" response. 🤞

Hello Everyone,

Here is the latest information I've got from Microsoft:
“Windows Server 2016 images since November 2019 can have a post first boot performance issue related to an OS code integrity operation. This issue is more pronounced on small Azure VM sizes (with lower throughput and IO) rendering the VM not immediately usable after first boot. The performance issue is mitigated in February 2020 images and forward. Please use the latest February Windows Server 2016 image once it is available from the Marketplace (ETA 2/17).”

Hello everyone,

Looking to the logs, I see that Packer tries one time to access to the WinRM service.
Is this expected? AFAIK, we should see more lines like [INFO] Attempting WinRM connection... unless the connection succeeds, right?

Regards,

2020/02/13 17:19:31 packer-builder-azure-arm plugin: Waiting for WinRM, up to timeout: 30m0s
2020/02/13 17:19:31 ui: ==> azure-arm: Waiting for WinRM to become available...
2020/02/13 17:19:31 packer-builder-azure-arm plugin: [INFO] Attempting WinRM connection...
2020/02/13 17:19:31 packer-builder-azure-arm plugin: [DEBUG] connecting to remote shell using WinRM
2020/02/13 17:19:44 packer-builder-azure-arm plugin: Checking that WinRM is connected with: 'powershell.exe -EncodedCommand [...]'
2020/02/13 17:19:44 packer-builder-azure-arm plugin: [INFO] starting remote command: powershell.exe -EncodedCommand [...]
2020/02/13 17:49:31 ui error: ==> azure-arm: Timeout waiting for WinRM.
2020/02/13 17:49:31 packer-builder-azure-arm plugin: Communication connection err: context canceled
2020/02/13 17:49:31 packer-builder-azure-arm plugin: WinRM wait canceled, exiting loop
2020/02/13 17:49:31 ui: ==> azure-arm: 

Hi All, I was going to update you, but @Dilergore beat me to it :-) If there are any changes with ETA, we will let you know.

@danielsollondon - So this is an issue with Server 2016 images only? I get this issue with Server 2012 R2 and Server 2019, as well (particularly late in the business day,) but Server 2016 is the usual problem child. Although I did have two Server 2019 builds fail early this morning with WinRM timeout errors...

I've also seen the same issues with Server 2012 R2. @Dilergore will the fix also be in the latest February Windows Server 2012 image, once available from the Marketplace?

New images:

2019-Datacenter - 17763.1039.2002091844
2016-Datacenter - 14393.3504.2002070914
2012-R2-Datacenter - 9600.19629.2002070917

Same Timeout waiting for WinRM. with this configuration:

"os_type": "Windows",
"image_publisher": "MicrosoftWindowsServer",
"image_offer": "WindowsServer",
"image_sku": "2019-Datacenter",
"image_version": "17763.1039.2002091844",

"communicator": "winrm",
"winrm_use_ssl": "true",
"winrm_insecure": "true",
"winrm_timeout": "30m",
"winrm_username": "packer",

"vm_size": "Standard_DS2_v2",
"managed_image_storage_account_type": "Premium_LRS",

Building 2012 R2, 2016, and 2019 images in West US 2, using Standard_D2s_v3 and latest marketplace image available.

Morning and early afternoon builds were all fine: did 5 sets of them. The last two sets have had hung waiting for WinRM issues - on all three targets. Sigh.

2019 still timing out for me. Anyone else?

2012R2, 2016, 2019 builds are fine overnight and morning, but they all get WinRM timeouts starting around 13:30 (UTC-0800). Timeouts continue until sometime in the late evening. If I use the run command tool in the portal to use PowerShell to create a new self-signed cert & WinRM listener the process completes successfully.

Yep same, none are working.

Region US East.
Sizes: all
WinRM timeout amount: alot.....

The MSFT guy closed their issue when the image was updated. Obv, wasn't the fix.

I wish not so great things on the people that decided we would use packer... AWS, GCP work without a issue, I'm talking end to end including pipelines for all windows versions in a matter of hours. Azure on the other hand.... it is throwing a temper tantrum. Week 4... and this isn't even my main issue. The main issue i have is powershell and Puppet "completing" before they were actually complete.

Funny that AWS and GCP have better and ALOT faster windows builds than Microsoft. Not sure how that happened.

Hi, sorry for the delay in getting back to you, I have spoken to the Windows PG team, it looks like there could another issue. @BruceShipman and @ffalorjr - I see you are using 2012R2, 2016, 2019 in WestUS2, and EastUS, what version of Packer are you using? Once I have this, I will try to repro.

Sorry, I forgot to address this questions, @BruceShipman - the fix in the Window 2016 was an issue, that resulted in WinRM timeouts, but the cause of this issue does not exist in 2019, this is why we think there could be another issue here.

@danielsollondon this is my packer version Packer version: 1.5.2 Thanks!

thanks @ffalorjr - I will try to repro this, if I cannot, I will come back to you.

@danielsollondon - I'm currently using Packer 1.5.1. If you think it might help I can update that to 1.5.4

@BruceShipman I have not reviewed the release, so I don't know if it will help. If I can repro the issue, I will try this, I will come back within 24hrs.

FYI: Having the same issue in CanadaCentral

Settings:
Packer Version v1.5.4

winrm timeout is set to 15m

"image_publisher": "MicrosoftWindowsServer",
"image_offer": "WindowsServer",
"image_sku": "2019-Datacenter",
"location": "canadacentral",
"vm_size": "Standard_DS2_v2",
"base_image_version": "17763.973.2001110547",

This has been happening to me for the past week or so.

Same issue in East US 2 on all Server 2019 based images. Server 2016 images are working fine for me.

Packer version 1.3.2, 1.5.1, and 1.5.4 tested

winrm timeout set to 60 minutes

      "os_type": "Windows",
      "image_publisher": "MicrosoftSQLServer",
      "image_offer": "sql2019-ws2019",
      "image_sku": "Enterprise",
      "image_version": "latest",
      "os_type": "Windows",
      "image_publisher": "MicrosoftWindowsServer",
      "image_offer": "WindowsServer",
      "image_sku": "2019-Datacenter",
      "image_version": "latest",

@BruceShipman, @ffalorjr, @urbanchaosmnky, @nickmhankins - I have reproduced the timeout issue using the source below and locating the build VM in West US2, with Packer 1.5.4.

One question, where are you running packer, is that on a VM in the same data center, different DC, running from on premise, or in a build pipeline..?

"os_type": "Windows",
"image_publisher": "MicrosoftWindowsServer",
"image_offer": "WindowsServer",
"image_sku": "2019-Datacenter",
"image_version": "latest",

@danielsollondon

I'm running packer off my laptop, both on our company network and from my home internet connection. I don't know if that helps you. I haven't tried to run packer on a VM in the cloud.

@danielsollondon

I've run from my labtop, I've ran it in a pipeline with build servers on-prem, and I've ran it from a pipeline with build servers in azure (in the same sub and network).

Thanks

@danielsollondon

I'm running these on cloud hosted AzDO pipelines.

@danielsollondon while I have you, have you came across the powershell or puppet provisoners exiting early and moving onto the next provisioner as if the first one completed fully without issue even though it did not complete all the way?

This is only happening to my azure builds. GCP and AWS use the exact same code, and they don't have that issue.

Hi @ffalorjr - sorry I have not see that issue the powerShell or puppet provisioners, is it easily reproducable?

All - The Windows team have a repro of this and are investigating, we will update you on Monday.

Hi @ffalorjr - sorry I have not see that issue the powerShell or puppet provisioners, is it easily reproducable?

All - The Windows team have a repro of this and are investigating, we will update you on Monday.

@danielsollondon Thanks for the feedback just thought i'd ask. I don't want to derail this thread since the issue is not related to winrm.

I found out some in guest policies are begin pushed at the SUB level so getting those disabled to see if it fixes my issue. The repro steps for me has been simply add any provisoner to the build, run it a few times and atleast one time it will exit middle of the run as "complete" when it is not.

Thanks for the latest Marketplace images guys. Since using the latest, the build success rate is >90% for both Win2012 and Win2016 :)

Hi All, the Windows Team have provided this feedback, can you test with these build properties for Windows below, for the vm_size, please use an alternative to DSv2 vm_size, such as 'Standard_D2_' size.

"communicator": "winrm",
"winrm_username": "packer",
"winrm_insecure": true,
"winrm_use_ssl": true,
"vm_size": "Standard_D2_v2"

Can you let us know if this improves build success rates? Thanks.

@danielsollondon, So far I haven't had any success with a Standard_D2_V2 with Windows 2019 image in CanadaCentral.

Thanks @urbanchaosmnky.

@BruceShipman , @ffalorjr, @nickmhankins - can you let us know if you are still broken with my previous post config? thanks,

@danielsollondon , Overnight builds all failed due to Terraform azurerm provider automatically updating to v2.0 (annoyed, that'll teach me to pin my version,) which cause the upstrea pipeline that managed teh gallery and image definitions to fail. I'm testing the fix to the upstream pipeline, and will shortly start testing runs with Standard_D2s_v3 changed to Standard_D2_v2. It may take a while, as runs are mostly fine until about 2PM or so. (The rest of the config you had was the same as I already had.)

Thanks @BruceShipman for letting me know.

@danielsollondon, I'm been able to build with packer and a win2019 image this morning so far no issues.

@urbanchaosmnky - thanks for letting me know, are you still deploying into CanadaCentral? Please keep me informed here if you see any further failures, sorry I didn't investigate your issue further yesterday, we were doing more testing, and I was waiting for additional feedback from the other folks here to see if they are hitting further failures.

@BruceShipman , @ffalorjr, @nickmhankins - can you let me know if you are still seeing failures? thanks!

@danielsollondon thanks for the help. I've been doing tests so far I have not seen a winrm error after changing the size. Before changing the size the success rate was still greatly increased compared to previous days of running.

I've been using Standard_DS2_v2 since the latest marketplace images, and have only had one build failure in the past week. A massive improvement! 👍

@danielsollondon Yes, I've still deploying in CanadaCentral I've deployed 10 times now with no issues. Thanks again.

@danielsollondon - I had the 3 Windows pipelines in a loop yesterday until about 7 PM without a single failure, and the overnight build was fine, as well. So while I'd call this a work-around instead of a fix, it definitely allows us to build our images without babysitting the automation. YAY!

My builds are finally working using Standard_D2_v2. Thanks for the efforts to resolve this, even though it seems to be a temporary solution. When will all VM sizes be available?

Thanks @danielsollondon finally build working using Standard_D2_v2 but still getting randomly timeout.

@danielsollondon I'm deploying in us central and w2k16 datacenter builds are timing out on me, I'm using a large size, Standard_D16s_v3", and I've set my timeout to 10 minutes. This was working a week ago which is really discouraging. :(

I've been using Standard_D2_v2 and haven't had a single timeout. I am getting WinRM issues when starting DSC, but I'm sure that's a different issue.

This was working fine over the weekend. Unfortunately I haven't been able to get past the timeout issue all day today (3/9/2020). I was trying to use Standard_D2_v2 as well as more powerful sources. Increasing the timeout to 30 minutes didn't help either.

I too am seeing repeated winrm timeouts when creating a Datacenter 2019 image:

"winrm_insecure": true,
"winrm_use_ssl": true,
"winrm_timeout": "30m",
"vm_size": "Standard_D3_v2",
"os_type": "Windows",
"image_offer": "MicrosoftWindowsServer",
"image_publisher": "WindowsServer",
"image_sku": "2019-Datacenter"
"image_version": "latest"

Creating in North Central US region from a self-hosted DevOps Agent also running in North Central US.

We too have experienced the WinRM timeout issue over several months. Per the above recommendation, we changed "vm_size": "Standard_DS3_v2" -->> "vm_size": "Standard_D2_v2"

On Packer v1.5.4, we've consecutively ran 3 successful builds of 3(1-2012, 2-2016s) VMs in US East. Fingers crossed for this workaround working tomorrow!

I had more success when I reverted back to using v1.5.1. As soon as I go back to 1.5.3 or 1.5.4, it starts to time out again. Unfortunately "elevated_password" is broken in 1.5.1 and I need that to work.

Running packer 1.5.4 to provision an Azure image. Details:

    "os_type": "Windows",
    "image_publisher": "MicrosoftWindowsServer",
    "image_offer": "WindowsServer",
    "image_sku": "2019-Datacenter-smalldisk-g2",
    "location": "East US",
    "vm_size": "Standard_B2ms"

When I run Test-WSMan -ComputerName 40.76.44.11 -usessl to check the WinRM connection i get this error. I have also tried with the machine DNS name "pkrvm7ekd2nbewu.eastus.cloudapp.azure.com" and have tried with different images and different sizes, same issue. This was working last Monday and has been giving issues since then.

Error:
The server certificate on the destination computer (40.76.44.11:5986) has
the following errors:
The SSL certificate is signed by an unknown certificate authority.
The SSL certificate contains a common name (CN) that does not match the hostname.

Also I have tried the commands suggested to reset the SSL certificate and I still get the error after they succeed. I ran them using the Azure Run Command on the packer VM I am creating.

$Cert = New-SelfSignedCertificate -CertstoreLocation Cert:LocalMachineMy -DnsName "$env:COMPUTERNAME"
Remove-Item -Path WSMan:Localhostlistenerlistener* -Recurse
New-Item -Path WSMan:LocalHostListener -Transport HTTPS -Address * -CertificateThumbPrint $Cert.Thumbprint -Force
Stop-Service winrm
Start-Service winrm

This is pretty frustrating, really like Packer but it doesn't seem to like Azure right now.

Hi All - Sorry for the delay, one of the causes of the timeout issue is exhibited when premium VM sizes are used, although it is not directly to do with the premium offerings, these are VM sizes which contain 'S' in the size, and is being resolved, I will come back when this has completed. If you are still seeing issues with WinRM timeout, try using a VM Size that does not contain an 'S', such as changing from Standard_DS2_v2 to Standard_D2_v2 etc. These were the setting I used during testing:

"communicator": "winrm",
"winrm_username": "packer",
"winrm_insecure": true,
"winrm_use_ssl": true,
"vm_size": "Standard_D2_v2"

For those of you who still have issues with that config (or vm_size with a no 'S'), please let me know.

I still have issues with building Windows Server 2019 (not 2016).

"os_type": "Windows",
"image_publisher": "MicrosoftWindowsServer",
"image_offer": "WindowsServer",
"image_sku": "2019-Datacenter",
"image_version": "latest",

"communicator": "winrm",
"winrm_use_ssl": "true",
"winrm_insecure": "true",
"winrm_timeout": "30m",
"winrm_username": "packer",

"vm_size": "Standard_DS2_v2",
"managed_image_storage_account_type": "Premium_LRS",

region: west europe

with the size you recommend packer has no issues with creating VMs:

"os_type": "Windows",
"image_publisher": "MicrosoftWindowsServer",
"image_offer": "WindowsServer",
"image_sku": "2019-Datacenter-smalldisk",
"image_version": "latest",

"communicator": "winrm",
"winrm_use_ssl": "true",
"winrm_insecure": "true",
"winrm_timeout": "30m",
"winrm_username": "packer",

"vm_size": "Standard_D2_v2",
"managed_image_storage_account_type": "Standard_LRS",

However, builds became much slower with standard disks...

Hi, I have this timeout issue which is quite sporadic but sometimes I cant create an image for a whole day and it works the next day. Lately I started to see the issue with "ssh timeout" for a ubuntu Image creation as well :( This is so frustrating.

@danielsollondon isn't this issue simply that the certificate name is wrong as pointed out by @AliAllomani above. The certificate name expected and configured by packer is machineName.cloudapp.net vs. the actual machine name which is this format machineName.region.cloudapp.azure.com. Thoughts?

When creating images using MarketPlace images everything works great

"os_type": "Windows",
"image_publisher": "MicrosoftWindowsServer",
"image_offer": "WindowsServer",
"image_sku": "2019-Datacenter",
"image_version": "latest",

"communicator": "winrm",
"winrm_use_ssl": "true",
"winrm_insecure": "true",
"winrm_timeout": "5m",
"winrm_username": "packer",

"vm_size": "Standard_D2_v2",

When trying to use an image from a Shared Gallery with the same VM size I am getting timeouts.

"os_type": "Windows",

      "shared_image_gallery": {
        "subscription": "{{user `subscription`}}",
        "resource_group": "RG",
        "gallery_name": "gallery",
        "image_name": "win2019",
        "image_version": "latest"
         },

"communicator": "winrm",
"winrm_use_ssl": "true",
"winrm_insecure": "true",
"winrm_timeout": "5m",
"winrm_username": "packer",

"vm_size": "Standard_D2_v2",

During the build I have opened an RDP session and confirmed that the listener, the certificate, and the firewall rule are present. I have confirmed that port 5986 does respond The image from the SIG does is configured for CIS compliance so the following settings are configured:

Allow Basic authentication is disabled
Allow unencrypted traffic is disabled
Disallow WinRM from storing RunAs credentials is Enabled

All other winrm service settings are set to Not configured.

As far as I can tell winrm should be working on this SIG image and I'm not sure if it's possible related to this issue or if there is a seemingly unrelated policy setting that is causing issues.

Any advice on this would be appreciated.

Hi folks, thanks for keeping this thread up to date with the latest finds and test results. The Packer team has been monitoring this issue closely to see if there is anything on the Packer side that can be changed to resolve the issue. In looking at the thread I see a possible cert domain change for self-signed certs and the change in vm_size (more of a user configuration change). I have since started a round of testing 10 managed disk builds (per tests), across West Us and East Us, in 15 minute intervals to test out the proposed solutions. My findings are as follows:

  1. v1.5.5 using Standard_D2_v2 resulted in a successful build every time.
  2. v1.5.5 patched to use <machinename>.<location>.cloudapp.azure.com using Standard_D2_v2 resulted in a succesful build every time.
  3. v1.5.5 using Standard_DS2_v2 timed out 60% of the time.
  4. v1.5.5 patched to use <machinename>.<location>.cloudapp.azure.com using Standard_DS2_v2 timed out 80% of the time (seems like a lot, so maybe be something else going on; will retest)

What I found is that by using Standard_D2_v2 I was always able to get a successful build regardless of the domain name used for the self-signed cert. I did find some Azure examples where the domain used for self-signed certs is <machinename-randomnumbers>.<location>.cloudapp.azure.com which is an easy change to make, but that doesn't seem to be the issue. Please let me know if you are seeing otherwise. I can push up a WIP PR with the change if that makes it easier for folks to test.

With that being said, if you're still running into issues here. Please make sure you are trying to create builds with the recommended vm_sizes https://github.com/hashicorp/packer/issues/8658#issuecomment-600857201. If builds are still failing with the new vm sizes please attach your build configuration and debug logs via a Gist in a new comment. If someone is using the same configuration as you please just thumbs up their config to help us determine the best set of configs to test against. Thanks!

During the build I have opened an RDP session and confirmed that the listener, the certificate, and the firewall rule are present. I have confirmed that port 5986 does respond The image from the SIG does is configured for CIS compliance so the following settings are configured:

Hello @nfischer2 I suspect this may be another issue, possibily related to the custom image with some of CIS settings enabled.We've seen WinRM issues related to CIS in the past https://github.com/hashicorp/packer/issues/6951. Have you had a chance to reach out to our mailing list or community forums for your issue? We have a bigger community that may have people who can answer your questions better than me. After reaching out to the community forums if you suspect that you are running into a bug please open a new issue and attach the related Packer debug logs PACKER_LOG=1 packer build template.json to see if there is any way we can help out.

Thanks for the reply. My issue was not related to this thread. After further troubleshooting I found that the registry setting 'HKLM:\\SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Policies\\System\\LocalAccountTokenFilterPolicy’ when enabled (set to 0) prevents winrm from opening a connection.

I wanted to provide an update in case anyone else who may be working with Packer and CIS windows images.

@danielsollondon The solution to use Standard_D2_v2 worked for us. If it is useful for resolving the bug, i would like to point out the things I have noted when it work vs not work. When the winrm connection works, the certificate available in the LocalMachine (imported from keyvault) has a private key associated with it, and when it does not there is no private key associated . So something is breaking while getting/importing this certificate from keyvault. This is not an issue from packer as the certificate secret passed into the ARM template in keyvault has both private key and public key.
Output from a VM that works

PS C:\windows\system32>  $hostname=‘xxxxxxxxx.cloudapp.net'
PS C:\windows\system32>  $cert = (Get-ChildItem cert:\LocalMachine\My | Where-Object { $_.Subject -eq "CN=" + $hostname } | Select-Object -Last 1)
PS C:\windows\system32> echo $cert.Thumbprint
7E1C9BXXXXXXXXXXXXXXXXXXXXXXD988FEB3
PS C:\windows\system32> echo $cert.PrivateKey

PublicOnly           : False
CspKeyContainerInfo  : System.Security.Cryptography.CspKeyContainerInfo
KeySize              : 2048
KeyExchangeAlgorithm : RSA-PKCS1-KeyEx
SignatureAlgorithm   : http://www.w3.org/2000/09/xmldsig#rsa-sha1
PersistKeyInCsp      : True
LegalKeySizes        : {System.Security.Cryptography.KeySizes}



Output from a VM that does not work

PS C:\Users\testadmin> $hostname=‘yyyyyyyy.cloudapp.net'
PS C:\Users\testadmin> $cert = (Get-ChildItem cert:\LocalMachine\My | Where-Object { $_.Subject -eq "CN=" + $hostname } | Select-Object -Last 1)
PS C:\Users\testadmin> echo $cert.Thumbprint
89706YYYYYYYYYyYYYYYYYYYYYYYYYY9FF8DE
PS C:\Users\testadmin> echo $cert.PrivateKey
PS C:\Users\testadmin> !!!NO OUTPUT HERE!!!

Thanks for the reply. My issue was not related to this thread. After further troubleshooting I found that the registry setting 'HKLM:\\SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Policies\\System\\LocalAccountTokenFilterPolicy’ when enabled (set to 0) prevents winrm from opening a connection.

I wanted to provide an update in case anyone else who may be working with Packer and CIS windows images.
@nfischer2 How did you resolve this? Did you manually update the image and then deploy from that going forward or did you implement a fix in your pipeline?

the winrm timeout with windows server 2016 is still happening. Is there an open ticket with Azure for this?

@pmozbert what VM size are you using?

Since I moved to Standard_D2_v2, I've not had a single WinRM timeout.

@amarkulis We had to move away from Packer and now use Azure custom script extension to configure Windows CIS images for Azure.

Agreed, the CIS benchmarks can definitely complicate matters.

I increase vm size to Standard_DS_v2 and that worked for a while several months ago, along with a 30m timeout, but now the timeouts are back.

Having the same level of build flakiness with Standard_B2ms vm size

I have built quite a few Windows 2016 images with Standard_D4s_v3 size and it usually works just fine but last 4 times in a row I got this error again.

            "image_publisher": "MicrosoftWindowsServer",
            "image_offer": "WindowsServer",
            "image_sku": "2016-Datacenter-smalldisk",
            "image_version": "latest",

            "communicator": "winrm",
            "winrm_use_ssl": "true",
            "winrm_insecure": "true",
            "winrm_timeout": "30m",
            "winrm_username": "packer",
            "temp_compute_name": "swazpkr00",

            "location": "WestEurope",
            "vm_size": "Standard_D4s_v3",
            "managed_image_storage_account_type": "Standard_LRS"

I am getting this issue as well with Standard_D4s_v3 at the moment. It has troubled our team on and off for months even with a timeout of 30m...

            "os_type": "Windows",
            "image_publisher": "MicrosoftWindowsServer",
            "image_offer": "WindowsServer",
            "image_sku": "2016-Datacenter",
            "os_disk_size_gb": 256,
            "communicator": "winrm",
            "winrm_use_ssl": true,
            "winrm_insecure": true,
            "winrm_timeout": "30m",

Also seeing it with Windows 10 Pro RS4:

            "os_type": "Windows",
            "image_publisher": "MicrosoftWindowsDesktop",
            "image_offer": "Windows-10",
            "image_sku": "rs4-pron",

hi , same issue with one of our client's WVD win 10 multisession deployments
When i deploy the same packer temlate to my msdn test subscription - it goes just fine (
Weird (

        "os_type": "Windows",
        "image_publisher": "MicrosoftWindowsDesktop",
        "image_offer": "office-365",
        "image_sku": "20h1-evd-o365pp",
        "communicator": "winrm",
        "winrm_use_ssl": "true",
        "winrm_insecure": "true",
        "winrm_timeout": "15m",
        "winrm_username": "packer",
        "location": "EastUS",
        "vm_size": "Standard_DS4_v2",
        "async_resourcegroup_delete":false,
        "managed_image_storage_account_type": "Standard_LRS",
Was this page helpful?
0 / 5 - 0 ratings

Related issues

mushon4 picture mushon4  ·  3Comments

s4mur4i picture s4mur4i  ·  3Comments

jesse-c picture jesse-c  ·  3Comments

mwhooker picture mwhooker  ·  3Comments

mvermaes picture mvermaes  ·  3Comments