I have a 4x host deployment to try and prove the concept of this solution. Scaling up when peak hours begins seems to work fine, but upon scaling down - the process only manages the first VM in the pool then it seems to time out/fail. This is the output from my automation job where it will just hang until the next job starts - then they both fail within a few minutes.
Starting WVD tenant hosts scale optimization: Current Date Time is: 03/18/2020 15:45:56
It is Off-peak hours
Starting to scale down WVD session hosts ...
Processing hostpool FullDesktop
Checking session host: wvd-1.domain.local
of sessions:0 and status:Available
Checking session host: wvd-2.domain.local
of sessions:0 and status:Available
Checking session host: wvd-3.domain.local
of sessions:0 and status:Available
Checking session host: wvd-4.domain.local
of sessions:0 and status:Available
Stopping Azure VM: wvd-1 and waiting for it to complete ...
Azure VM has been stopped: wvd-1 ...
When I check the portal, wvd-1 has succesfully de-allocated but the other three are still running. I've deployed the Win10 1909 w/ Office Pro Plus image from the gallery and have performed no further config (I've deployed the host pool from the gallery which configures the hosts via DSC for use in an WVD Pool). Are there any other configurations that I need to carry out on the hosts for this to work?
⚠Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.
There are no errors or warnings logged against the failed automation task, but this is listed as an exception:
The running command stopped because the preference variable "ErrorActionPreference" or common parameter is set to Stop: ActivityId: Powershell commands to diagnose the failure: Get-RdsDiagnosticActivities -ActivityId
Experiencing the exact same issue.
I will add that I was able to change the errorActionPreference under the workbook to "continue" and was able to generate the following error.
Get-RdsSessionHost : ActivityId: Powershell commands to diagnose the failure: Get-RdsDiagnosticActivities -ActivityId At line:679 char:27 + ... nHostInfo = Get-RdsSessionHost -TenantName $TenantName -HostPoolName ... + ~~~~~~~~~~~~~ + CategoryInfo : FromStdErr: (Microsoft.RDInf...tRdsSessionHost:GetRdsSessionHost) [Get-RdsSessionHost], RdsPowerShellException + FullyQualifiedErrorId : DefaultNoRdsError,Microsoft.RDInfra.RDPowershell.SessionHost.GetRdsSessionHost
@townendk Thank you for the detailed feedback. We are actively investigating and will get back to you soon.
@townendk Please find similar issue here, you can also find few suggestions/troubleshooting steps from product team.
Thanks All.
VikasPullagura-MSFT - this is not the same issue, I've read the one described in your post but in that scenario the session hosts do not heartbeat. Mine seem to have an appropriate heartbeat. Thank you though!
@viswanadham-k can you please check and add your comments on this.
I've done some further testing this morning.
1- Overnight all VMs in the pool were powered off (manually) and de-allocated in the portal. The tasks overnight all ran succesfully but they didn't power anything on. My "MinimumNumberOfRDSH" variable is set to '1' so I expected it to boot one of them back up for me but I'm perhaps misunderstanding the behaviour in off-peak time(?)
2- Upon peak time starting, the script correctly booted up the first VM in my pool.
3- I logged in two concurrent sessions to this (my current threshold is for one user per vCPU, and the VM had 2x vCPU)
4- The next script run successfully identified that my VM was at capacity with two users on it, so the second VM in the pool was booted up.
5- I logged off one of my users, but both VMs stayed up and running during peak time (I believe this is by design, scaling down only happens out of hours(?))
6- Peak time ended, and the job fired and correctly identified that my second VM host should be powered off. The VM has de-allocated in the portal but the script again is hanging at this stage:
Starting WVD tenant hosts scale optimization: Current Date Time is: 03/19/2020 09:34:44
It is Off-peak hours
Starting to scale down WVD session hosts ...
Processing hostpool FullDesktop
Checking session host: wvd-2.domain.local
of sessions:0 and status:Available
Checking session host: wvd-1.domain.local
of sessions:1 and status:Available
Stopping Azure VM: wvd-2 and waiting for it to complete ...
Azure VM has been stopped: wvd-2 ...
The task seems to hang at this point until the next scheduled occurrence is fired. Whether that's 15mins as per default, or 2 hours - the result is the same. The script hangs, then as soon as the next job fires they both immediately fail within a few seconds of each other.
@ChristianMontoya Can you check and add your comments.
I am experiencing, the exact same output as above as well.
Adding @RoopChevuri as well for comment.
Just wondering if there is an update on this. Thank you.
Hi @townendk & @Jimbos10
Please let me know your availability time today. I will send a meeting invitation then will connect and assist you to resolve the error. Let me know your mailing addresses for to send meeting inviation.
Thanks
I have time now if you like, up until 3PM est.
Did you guys manage to talk about this? I'm tied up this week (with WVD deployments funnily enough) but I should hopefully have time for a meeting later in the week
Not yet, we did work on it, but I am sure that he is very busy given the current situation.
Hi, I worked with Microsoft and appears to be an issue with the automation account. My automation account was created in East US, which doesn't appear to be working. When we created the resource group and the automation account in West US2 it is now working. They are going to continue to work on the issue with the EUS DC.
I wanted to add that I worked with Viswanadham, who was super knowledgable and very easy to work with.
Hi @Jimbos10, thank you for the update.
@Viswanadham-k - I've been experiencing the same in West Europe, so not sure if that helps the troubleshooting.
Just a quick followup Viswanadham found another issue with a variable that was reporting different values in different datacenter zones. The updated code should now be available according to Viswanadham. Thanks again for your help Viswanadham!
@viswanadham-k is there a plan for this code update to be reflected on the github repository referenced in this document?
Hi @thefonz3h
We already updated latest code to github repository. If you have already deployed WVD Scaling tool if you want latest code changes please run this script "createazureautomationaccount.ps1" it will reflect latest code.
Thanks
Hi @viswanadham-k I did a fresh deployment today getting the latest version of the script and it still failed with the same issue. I edited the script and could still see a variable set to WestUS. I tried switching this to WestEurope but it had no impact.
Do you have a direct download link for the latest version of this script so I can check it against my version?
Many thanks
@viswanadham-k Can you please check on this.
@RoopChevuri Please add your comments.
Hi @thefonz3h
We have fixed the issues which you have listed here.
Please go through below steps to update latest changes in your environment.
Access below link and copy the basicScale.ps1 code
https://github.com/Azure/RDS-Templates/blob/ptg-wvdautoscaling-automation/wvd-templates/wvd-scaling-script/basicScale.ps1
Access your automation account resource in Azure portal.
Click on Runbooks
a. Select the "WVDAutoScaleRunbook".
b. Paste into azure automation account runbook.
c. Click on save button and
d. click on publish button.
If you are still facing issue please let me know.
Thanks,
Hi @viswanadham-k thank you for the update. I have updated my automation account script and now can see that the scale down job no longer fails.
This may warrant a separate thread but now that this seems to be working OK, could I please get clarification on what the script SHOULD be doing? I've found it does not scale down hosts that have got users logged into them. I was expecting for these users to be ejected via a grace period but the scaling script just seems to ignore hosts with active user sessions.
Many thanks for the support so far!
Hi @townendk
Thank you for giving feedback.
Can you please close this issue.
Thanks
Hi,
I was experiencing the same issue. Apparently the Hosts Status when they are deallocated has changed from NoHeartBeat to unavailable. I changed to the new script suggested https://github.com/Azure/RDS-Templates/blob/ptg-wvdautoscaling-automation/wvd-templates/wvd-scaling-script/basicScale.ps1. Now some of the hosts were shutdown but not all that should have. For the hosts that were left online the script identified that it was off peak hours and started the "scale down" process but it kept trying to shut down the wrong host. Also, apparently the hosts that were on during Off peak hours and that were supposed to be down were left with the "No Allow New Connections" setting off so no users were able to connect in the morning.
See sample output below.
It is Off-peak hours
Starting to scale down WVD session hosts ...
Processing hostpool AZC-WVD-Prod
Checking session host: AZCWVD-9.hl.local
of sessions: 0 and status: Available
Checking session host: AZCWVD-2.hl.local
of sessions: 1 and status: Available
Checking session host: AZCWVD-7.hl.local
of sessions: 1 and status: Available
Stopping Azure VM: AZCWVD-0 and waiting for it to complete ...
Azure VM has been stopped: AZCWVD-0 ...
HostpoolName: AZC-WVD-Prod, TotalRunningCores: 8 NumberOfRunningHosts: 2
Most helpful comment
Adding @RoopChevuri as well for comment.