Azure-pipelines-agent: Support for Reboot scenarios in Release Management

Created on 13 Apr 2018 · 17Comments · Source: microsoft/azure-pipelines-agent

Workflow

During a release you come to the point that a reboot is required. Some usual workflow:

Install MSI
Reboot
Run Integration Tests

Current solution

Currently we make use of agent phases but this is not reliable.

We execute the installation and the reboot on the target machine
jump to another agent (phase) and ping the target machine to check if reboot is finished
jump back to target machine an execute the tests

The reboot can take some time. So we wait not long enough. Additionally the agent on "target machine" can be taken by some other build/release. As a workaround the "target machine" is exclusively used by one release.

Preferred solution

Native support of reboots. The release executes on the target machine. During the reboot the agent at the target machine remains reserved for the current release. The agents starts automatically after the reboot and continues the exeuction.

No more remote controll neither polling from outside.

enhancement stale

Source

thomasdgx

👍14

Most helpful comment

By "stale", does your bot mean that nobody wants the feature anymore, or that you're not going to do any work on it anyway? Because I can assure you that the need hasn't gone away.

nzbart on 25 Aug 2020

😄2

All 17 comments

Yes. This is on our backlog. We may still require multiple phases but the coordination should be built in as you outlined. It would need to coordinate with the server with a status of rebooting and the next phase would return to that machine.

I'll leave this open and track as an enhancement.

bryanmacfarlane on 13 Apr 2018

Specifically we are looking at reboot support for deployment groups not for generic build and release phases.

chrispat on 13 Apr 2018

I'd like to be able to reboot agent hosts from the Agent Pool page. That would be much more convenient than having to RDP to the specific machine

flcdrg on 14 Oct 2018

@flcdrg - that wouldn't be too hard to do if we did the other work above. Especially since there's no state issues (the service would be cooperative and not assign it another job till it came back up).

bryanmacfarlane on 7 Feb 2019

Specifically we are looking at reboot support for deployment groups not for generic build and release phases.

@chrisrpatterson, is this to say that we shouldn't expect to get reboot support for build/release pipeline agents at all, or just that reboot support for deployment groups are being prioritized ahead of build/release?

AtOMiCNebula on 13 Feb 2019

👍2

To pile on here, I'm also hoping for generic reboot support as part of a release agent job (not deployment group). We run some tests that mess up the machine state, and would really like to reboot afterwards so the next pipeline execution can get a clean machine to run on.

Deployment groups are also not an option for us, as we are only allowed to create custom agent pools.

kevpar on 8 May 2019

I totally support this request. We face the same workflow that is outlined here, and struggle with the same problems:

install msi on a set of machines (via deployment group and tags)
reboot
wait for machines to come back
finish work (e.g. check if installation was successfull, run additional tests)

As we might deploy to many machines (VMs and physical ones) which might also be located in different subnets, we cannot use the concept of a "coordination agent" to check if all machines/agents are back online (this agent would need to have access to all machines, which is difficult due to firewall restrictions, etc).
Instead, it would be great if the server could remember which machines where executing the first part of the job, wait for each one to come back, and then allow them to complete the second part of the job. Maybe combined with a timeout, in case the machines do not come back.
It would be awesome, if some of those machines could already run phase 2, while others are still in phase 1 or rebooting (no need to wait for all machines to complete phase 1 and rebooting before a single machine can start with phase 2).

bgoeppner on 29 May 2019

👍1

My team needs this too.

ajklotz on 15 Aug 2019

Yes, this is very appreciated feature to have it. Is there any update according this issue? I use TFS server 2018, still no support :(

petermisovic on 19 Sep 2019

Another vote for ability to reboot a hosted pipeline and resume the pipeline after the restart.

We must install MSI (https://package.chocolatey.org/packages/sqlserver-cmdlineutils ) to get sqlcmd added to path, but this now has KB dependencies requiring a reboot.

jimhess on 22 Oct 2019

Same here, would be nice to be able to do a complete new OS installation, with an agent "surviving" this. Implemented my own mechanism now, however this "homebrew" is not visible in agent status to end-user, resulting in incorrect expectations.

Is there anything I can do by voting or something to make this higher priority?

kittenpoop on 9 Dec 2019

Any update on this enhancement? My team also needs this feature after some installation.

ruicao93 on 26 Feb 2020

This issue has had no activity in 180 days. Please comment if it is not actually stale

github-actions[bot] on 24 Aug 2020

By "stale", does your bot mean that nobody wants the feature anymore, or that you're not going to do any work on it anyway? Because I can assure you that the need hasn't gone away.

nzbart on 25 Aug 2020

😄2

Sorry this was closed, since I also need your solution for this issue.

kittenpoop on 1 Sep 2020

The bot closed it because we're trying to centralize feedback (issues, suggestions) through Developer Community. We don't currently have plans to implement reboot support, but please feel free to file a ticket on DevComm. As more people pile onto it, it becomes easier to justify the investment.