Something looks strange in here: "However, let's increase the HealthCheckStableDuration to 60 seconds (so that the services are healthy for at least 20 seconds before the upgrade proceeds to the next update domain)."
Is not supposed to wait for 60 seconds instead of 20? If not, what happens with the 40s left?
⚠Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.
Thanks for the feedback! We are currently investigating and will update you shortly.
@mani-ramaswamy could you clarify on this one?
@thiago-vivas
This document is incorrect.
Specifically the default (recommendation) is 120 seconds, so setting to 60 is decreasing the default.
Ref:
https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-application-upgrade-parameters
Further the UpgradeHealthCheckInterval is "The frequency of health status checks during a monitored application upgrades" with a default (recommendation) value of 60 seconds.
Also when an upgrade occurs, while validating health stability for some period of time, the user needs to take into consideration how frequent testing of health is occurring (The value of UpgradeHealthCheckInterval).
Thus this:
"However, let's increase the HealthCheckStableDuration to 60 seconds (so that the services are healthy for at least 20 seconds before the upgrade proceeds to the next update domain)."
Will change to:
"However, let's increase the HealthCheckStableDuration to 180 seconds (so that the services are validated to be healthy for at least 120 seconds before the upgrade proceeds to the next update domain)."
Change Comment:
The first health passing test doesn't validate health, it starts the HealthCheckStableDuration timer, and the UpgradeHealthCheckInterval by default will check again every 60 seconds; so its validated to be healthy during the second check 60 seconds later (then Re-validated a second time to reduce false positive health and subsequently reduce livesite issues by validating twice).
@MicahMcKittrick-MSFT
If the user and you both feel the proposed change resolves the issue, please close this, and submit the PR (or let me know its resolved the issue and ill submit the PR)
@aljo-microsoft thanks for clarifying! I will make the change and submit the PR.
@thiago-vivas I submitted the fix :)
Once the PR merges this issue will close and the changes will go live in a few hours.