Azure-pipelines-agent: Provide a task/command in agent/config.sh to exit build agent after this job

Created on 1 Dec 2018 · 9Comments · Source: microsoft/azure-pipelines-agent

Also posted on Azure DevOps Developer Community

I want to self-host my build agents in containers, and have each build create new containers at the beginning of the pipeline and destroy them at the end of each job in the pipeline. To ensure that I don't get into a race where a new build might end up running on containers created by another build before that build can cleanup, I want a build job to be able to to tell its build agent to gracefully exit after the current job completes.

I've tried using /agent/config.sh remove, but regardless of attempts to sleep, background, etc. It seems to always fail with an error that the agent is busy with a job. It would be nice if I could do config.sh remove --afterCurrentJob or something along those lines.

Example:

phases:
- phase: Phase_1
  displayName: Create container instance
  condition: succeeded()
  server: true
#Task group has not been exported, task groups are not supported yet


- phase: Phase_2
  displayName: Build Phase
  condition: succeeded()
  queue:
    name: test-johnst
  - task: Gulp@0
    displayName: 'gulp build -p'
    inputs:
      targets: build
      arguments: '-p'
    enabled: true
  - script: |  
       ./config.sh remove --after-current-task --unattended --auth pat --token $(my_pat)
    displayName: 'Command Line Script'
    condition: always()

enhancement stale

Source

jstevans

👍1

All 9 comments

It would make easier running agents as jobs (for example k8s Jobs).

filipmnowak on 7 Dec 2018

This is a must have. When shutting down agents, you want them to stop fetching new jobs, and finish the current job!

atamgp on 6 Feb 2019

👍1

2119

I am adding a new flag to run.cmd/sh. run.cmd/sh --once
When agent launched in this mode it will only run one job and exit after the job finished.
I hope that can help these agent container scenario.

TingluoHuang on 28 Feb 2019

Thanks, @TingluoHuang! This is definitely an improvement. It'd be great if there were also a way to signal to an already-running Agent.Listener that it should stop accepting new jobs, and wind down after the current job.

jstevans on 28 Feb 2019

👍1

I've actually achieved this with some bash magic. Essentially written a script to watch the number of containers running and then spawning new containers when less than N. This runs as a systemctl service in linux.

Then I've got another that watches the vsts agent log and looks for completed, removes the container from the pool and then deletes it - at which point the script above spawns a new container in its place. Been using this for months now and is robust as it comes.

I will open source in the next week or so if anyone is interested.

I did think to go down the route of #2119 however multi stage jobs were then just picking a dead container and causing more issues than it solved - also I didn't like having to put that step to kill the container everywhere.

dpmerron-ltd on 6 Mar 2019

I think this would a useful enhancement. Being able to safely shutdown an agent without interrupting a build is crucial for server administration.

RichardBegg on 5 Jun 2019

👍1

Now that we have proper --once handling, it would be great to make the agent accept a signal that it should finish the current job and then go away.

vtbassmatt on 11 Feb 2020

Agreed, @vtbassmatt 🙂 It'd be nice if this could take the form of either/both of:

some out-of-band IPC, so that folks can do graceful host wind-down -- e.g: SIGTERM if this isn't already used, or the presence of a .wind-down file in the agent diagnostic directory
a task within a job that the agent is running, so that build admins can do resource management from within ADO

jstevans on 20 Feb 2020

This issue has had no activity in 180 days. Please comment if it is not actually stale