Also posted on Azure DevOps Developer Community
I want to self-host my build agents in containers, and have each build create new containers at the beginning of the pipeline and destroy them at the end of each job in the pipeline. To ensure that I don't get into a race where a new build might end up running on containers created by another build before that build can cleanup, I want a build job to be able to to tell its build agent to gracefully exit after the current job completes.
I've tried using /agent/config.sh remove, but regardless of attempts to sleep, background, etc. It seems to always fail with an error that the agent is busy with a job. It would be nice if I could do config.sh remove --afterCurrentJob or something along those lines.
Example:
phases:
- phase: Phase_1
displayName: Create container instance
condition: succeeded()
server: true
#Task group has not been exported, task groups are not supported yet
- phase: Phase_2
displayName: Build Phase
condition: succeeded()
queue:
name: test-johnst
- task: Gulp@0
displayName: 'gulp build -p'
inputs:
targets: build
arguments: '-p'
enabled: true
- script: |
./config.sh remove --after-current-task --unattended --auth pat --token $(my_pat)
displayName: 'Command Line Script'
condition: always()
It would make easier running agents as jobs (for example k8s Jobs).
This is a must have. When shutting down agents, you want them to stop fetching new jobs, and finish the current job!
I am adding a new flag to run.cmd/sh. run.cmd/sh --once
When agent launched in this mode it will only run one job and exit after the job finished.
I hope that can help these agent container scenario.
Thanks, @TingluoHuang! This is definitely an improvement. It'd be great if there were also a way to signal to an already-running Agent.Listener that it should stop accepting new jobs, and wind down after the current job.
I've actually achieved this with some bash magic. Essentially written a script to watch the number of containers running and then spawning new containers when less than N. This runs as a systemctl service in linux.
Then I've got another that watches the vsts agent log and looks for completed, removes the container from the pool and then deletes it - at which point the script above spawns a new container in its place. Been using this for months now and is robust as it comes.
I will open source in the next week or so if anyone is interested.
I did think to go down the route of #2119 however multi stage jobs were then just picking a dead container and causing more issues than it solved - also I didn't like having to put that step to kill the container everywhere.
I think this would a useful enhancement. Being able to safely shutdown an agent without interrupting a build is crucial for server administration.
Now that we have proper --once handling, it would be great to make the agent accept a signal that it should finish the current job and then go away.
Agreed, @vtbassmatt 馃檪 It'd be nice if this could take the form of either/both of:
.wind-down file in the agent diagnostic directoryThis issue has had no activity in 180 days. Please comment if it is not actually stale