Azure-pipelines-agent: RFC: Dockerize vsts-agent

Created on 3 Oct 2016 · 12Comments · Source: microsoft/azure-pipelines-agent

Current PR #600 as initial details. This discussion is to consider future enhancements.

@stepro Yes you are right that having access to Docker daemon (on the same machine) effectively makes you root and all those little tricks of security are basically for nothing. With this in mind, perhaps we should not be obsessing about this here when trying to dockerise the agent.

The "more serious subject" was meant to be some use case and design ideas, both for the agent and the dockerised version of it. Some of these may make life easier and better for both MSFT and customers.

These are just ideas, not specifically directed at this PR.

In two steps:

Step 1. With existing agent.

_It would be valuable for the main use-case of the dockerised agent to kick off build tasks in customer-provided images._

This is what we are doing at the moment for some builds and are going to move all our builds to this concept (at least those not requiring Windows).

All our agent does is runs docker run -i OUR_BUILD_IMAGE our_script_in_the_container. The agent maps the source directory into our build container as a volume.

No need for multiple OS images of the agent -- since build is running in customer-provided image, customers are free to build have whatever OS they want and it does not need to be same as the Agent's OS.
Agent-agnostic builds. No need to have your build work in Agent-specific ways. Have one build which works everywhere.
Agent that always works -- It would be easy(ier) for MSFT to build and test one single image that is known to work; and customer will no longer have to do this on their own. No more issues like "the agent does not work with my version of Ubuntu/Centos etc"

Security is not something which is solved by this :(

I don't know if this would be in scope for the dockerise the agent piece of work. It does not need to be as long as it's "friendly" for this kind of scenario. It would require more than a bare docker image anyway -- you'd need a bunch of scripts etc. Not sure if it's possible to make it completely generic which would cater for all customers' needs.

Also note that the pre-built VSTS tasks as they currently stand would not run in this scenario. For that -- see Grand Design further.

Step 2. Grand design.

Not really to do with this PR but nevertheless if anyone is interested.

Many use cases are hard with existing agent because of these three:

Customers run build tasks on the same "machine" as the Worker.
Agent Listener and Worker are rolled into one.
The Worker is allowed to execute anything using same UID as Listener on the same "machine".

The §1 presents the problem of "but do we need all combinations of different workers for all combinations of build tasks and target OS's that we have"? This can be addressed now with current agent by running actual builds in the customer-provided docker images. The agent needs to support this and customers would need to create their own docker images.

The §2 makes is very hard to scale, make things elastic and respond to loads, run on clusters, causes issue with updates, de-registration, etc.

The §3 causes all sorts of security issues like "how do we hide the secret VSTS tokens?".

If we assume we run everything on a cluster which spans our whole data centre (say Mesos, ACS, Docker swarm, whatever). We use Docker images for everything.

1. The Listener is standalone.

It is provided as a Docker image by MSFT.

It's the only thing which needs to register with VSTS using and secret tokens etc.
Its only job is to listen for requests from VSTS and kick off workers.

I understand there was an intention to move to this model with the VSTS agent.

2. The Worker is standalone.

Workers are provided as Docker images by MSFT.

When Listener gets a request to run a job, it does docker pull Microsoft/vsts-worker:required_version and starts it on the cluster _somewhere_.

The Worker container is given one-off keys to pull down source code and communicate with VSTS on build progress. This is pretty much what is happening in the current implementation.

The only job of the Worker is to kick off build tasks and report progress to VSTS. Note that the Worker does not run the tasks in its own container.

3. The customer-provided Task containers.

These days many people (us including) run (all) their builds in Docker containers: we build an image with all required dependencies and tools with required OS disto, map the source code into the running container as a volume, and kick off the build. Simple!

So the Task containers are the same _customer-provided Docker images_. The images to run are specified in the build definition in VSTS. The Worker uses build definition to kick off the containers as required.

The worker maps the source code into the the task container (just like we would do manually) and starts whatever script or program in that container as specified in the build definition.

Again, each Task container is executed _somewhere in the cluster_.

The Worker captures the stdout and stderr of the task container to report progress to VSTS, and uses its exit status to report success/failure of the task.

If the task containers are not given access to the Docker daemon, they can be completely sealed off and all tokens would be safe from the arbitrary build tasks.

Many existing tasks in VSTS (like kick of MSBuild) can probably be be pre-canned and provided by MSFT as Docker images too. So you wouldn't need to build any of your own often if all you use is pre-built tasks.

What this design solves: everything! :)

OK, there are _many, many_ things which this approach would address/solve/make possible. Some are:

MSFT and customers can have one generic agent which can build _anything_.
VSTS-agnostic builds. If your build worked in your containers on your machine -- they will work in the agent world.
VSTS agent that always works. MSFT develops and tests just one image, customers no longer need to build and troubleshoot their own.
Security. Build tasks are segregated, and can be made secure if they are not given access to docker daemon.
Updates / Agent versions. The problem is gone. Listener pulls correct version of worker images at every build. Listener itself can be restarted with newer versions much easier now.
New feature: Elastic/Scaleable. Now it's possible to have just _one_ listener. And it will kick off as many builds as there is capacity in the cluster when required. No longer need to have many agents sitting there idle on expensive hardware just in case there are bursts in the build activities.

This was meant to be a brief comment... Thanks for reading! All this is very different from the current design and implementation -- but with containers and clusters everywhere these days, might be of interest as a longer-term direction.

enhancement

Source

chrispat

Most helpful comment

A dockerized vsts-agent on Windows is also handy too.

compulim on 21 Jul 2017

👍3

All 12 comments

For reference, we are running a docker-based scheme in one of our clusters as a base here: developertown/vsts-agent, and then with technology-specific descendents, such as developertown/vsts-agent-nodejs.

This has worked OK for us, although it's been rocky across agent updates -- the auto-update feature in the vsts-agent causes some issues for the current design of those images. Otherwise, this is working fine for us. I'd love to see first-class support for working this way.

jasonvasquez on 7 Oct 2016

Also note that in the most recent 2.107.x+ agents, we don't auto update on restart. Only if you invoke from the web UI. So, shouldn't have a problem going forward.

bryanmacfarlane on 7 Oct 2016

@jasonvasquez , just yesterday we published an official set of images to Docker Hub at https://hub.docker.com/r/microsoft/vsts-agent, backed by a new GitHub repo at https://github.com/microsoft/vsts-agent-docker. We would love to hear any comments you have on the approaches we have taken there. Here's a brief summary.

There are three dimensions to the images we are providing:

Base OS: currently only Ubuntu 14.04 and 16.04, but CentOS support will be added soon.
Agent version: by default, this is determined and installed automatically at container startup by asking the target VSTS account for the latest version of the agent that it supports, but there are explicit version images as well. The automatic determination at container startup effectively deals with the auto-update problem when used in combination with an automated restart of the container if it gets auto-updated (if it is auto-updated, the container dies, and is restarted, at which point it determines that the newer agent version should be installed).
Specific tools: currently only docker and docker-compose CLIs, which won't run a majority of the existing tasks, but it does mean you can run tasks that spin up arbitrary workloads inside Docker containers, likely volume mapping in the source repository or other artifacts. This hints at support for the "grand design" described above, although existing tasks don't play into such a model currently. The other planned specific tool images includes a "standard" one that would provide a wide range of tools similar to what you might get from an agent in the hosted agent pool.

stepro on 7 Oct 2016

👍3

@stepro This is awesome. I will give this a try for our VS Code Linux builds.

joaomoreno on 12 May 2017

@stepro This is working beautifully for us. We now base our Dockerfile in yours and run the agents with the help of a few bash scripts.

joaomoreno on 15 May 2017

Cool, glad it worked out for you!

stepro on 15 May 2017

A dockerized vsts-agent on Windows is also handy too.

compulim on 21 Jul 2017

👍3

Guys, it seems that you have removed the ubuntu-14.04-standard based images. We heavily depend on this for our VSCode build agents. Any chance you can keep publishing them?

joaomoreno on 27 Sep 2017

Nevermind, I can pull it now, seems to have been an auth problem.

joaomoreno on 27 Sep 2017

Interesting. I specifically removed those tags so there's no guarantee they will continue to work. Any chance you can upgrade to Ubuntu-16.04? We removed these because there were simply too many combinations of images and it would take an extremely long time to build them all each time any update was made to the repo.

stepro on 27 Sep 2017

I'm happy to assist in helping you to upgrade if that would be useful.

stepro on 27 Sep 2017

We still depend on it... last time we tried to move to 16.04, users who had 14.04 installed as their OS couldn't run Code anymore. That might have something to do with the libraries Code is linked against at build time of its native modules, but I'm not an expert...

joaomoreno on 27 Sep 2017

Was this page helpful?

0 / 5 - 0 ratings