Vault: AWS' ECS authentication to Vault

Created on 6 Apr 2016 · 10Comments · Source: hashicorp/vault

Intro

After reviewing the various ideas brought up here (948, 828, 805) we came up with a slightly different approach for a slightly different problem (ECS containers) and would love feedback on this idea.

Our goals

Containers started in ECS are able to get secrets without storing any logins/passwords/ids/etc. Not in the Dockerfile and not in the task definition.
Keep our containers vault-unaware as they may be used in environments where Vault isnt set up. But also allow secrets to get into the containers in non-Vault environments.
Keep the system as flexible as possible. Ideally, it would be 'data driven' so very little would be hard coded into the code.

The high level approach

When a container is started on a host, some process would talk to Vault, validate against AWS that it was just started by ECS and then feed the secrets it needs into the container. All of this is done over the wire and therefore nothing is stored.

The new tools we built

Secret Agent Man

Every ECS host has 2 containers that always run: ecs-agent and SAM. SAM is linked to the docker socket and reads the Docker events stream. Whenever a container starts, it jumps into action to help manage secrets.

Web Init Script

WIS is an entry point for all containers. When a container starts, WIS starts a webserver waiting for exactly one HTTP POST. If it doesnt get this POST in 30sec, it exits (and the container goes with it). If it gets any other HTTP request, it exits. When it receives the HTTP POST its waiting for, the payload being a JSON object of key/value pairs, it changes the k/v pairs into environment variables and then runs the CMD directive with the environment set.

VaultBuddy

VB validates that a given ECS task ARN has launched in the last X seconds and is reported as state=="RUNNING". VB should really be a Vault backend, but since we're a bunch of Ruby guys it's currently a simple Ruby script that acts as an HTTP proxy to Vault.

Tying it all together

ECS starts a container.
Secret Agent Man detects this start. SAM gets the cluster name from the ecs-agent metadata as well as gathers the task ARN from running a docker inspect of this new container (its in the Docker labels).
SAM submits the cluster name and task ARN to VaultBuddy.
VB holds the connection to SAM and in the background talks to ECS. It validates that said task ARN has been launched in the last X seconds (this is configurable) and the state is RUNNING.
If the validation is good, VB talks to Vault and gets a one time use, X second TTL token and returns it back to SAM.
SAM takes this single use, short TTL token and talks to Vault, gathering the secrets from a path or paths. It knows these paths from other Docker labels we've added or it could gather them from ENV vars in the container/task definition.
SAM then compiles the secrets into a JSON object and submits them to the container via an HTTP POST.
Web Init Script accepts the HTTP POST, assembles the ENV vars and runs the CMD directive.

Drawbacks

We dont expect this to be the most incredibly secure method of using Vault. But it gets the job done for now. A couple problems areas:

VaultBuddy basically needs a root token - or at least enough to read any/all secrets and generate tokens for these secrets. This would be mitigated if it were written as a Vault backend.
VaultBuddy needs an IAM role for ecs:DescribeTasks.
VaultBuddy doesnt have the auditing that Vault has. Again, mitigated if it were a backend.
Time is the attack vector. If you can get in and watch the docker stream and implement your own SAM instead, you can steal secrets. That being said, if you have that ability you have basically rooted the host node anyway and only a Vault-aware application will help you now.

Each tool is under 150 lines of code (with heavy comments) as we tried to keep it very maintainable.
We'd love feedback from Hashicorp and others about this approach.

Source

natefox

👍1

Most helpful comment

@jefferai, sounds like you guys are working on this; do you have a timeframe for when we can expect it?

We'd really like docker level authentication as well (unfortunately, the dockervault solution above has some security concerns - see https://raesene.github.io/blog/2016/03/06/The-Dangers-Of-Docker.sock/ and other resources - mounting the docker socket is just not a viable option).

Currently, we need to give all of our containers access to the union of all the secrets that any of our containers need, which is also pretty sub-optimal.

ftrimble on 20 Jan 2017

👍5

All 10 comments

Hi @natefox ,

You have some very interesting timing, because we were just preparing to post our initial PR for our AWS auth backend as it's feature complete and we're just working on testing (and in fact it's now up in #1300). There are dissimilarities between this and #1300 since that's basically designed to address the asks in #805, #828, and #948), and the methodology wouldn't work for ECS -- mostly because Amazon can't be relied upon as a trusted third party since it doesn't sign the ECS metadata. It may be interesting for you to look it over though, keeping in mind that it's not yet fully tested and documented, and we have integration testing to do as well.

I noticed you linked to the cubbyhole document and while you don't use cubbyhole I can see similar thinking here -- limited use/limited-ttl tokens to fetch secrets, and using coprocesses to manage the secure introduction. What you came up with actually rather closely matches something I designed at my previous job with our security architect, with an important difference (detailed below).

I do know that [some company with major AWS experience] is planning on posting a blog about the method they designed for performing secure introduction with ECS and Vault. I don't know when that will land but when I see it I will try to ensure that I link to it from here. It's very different in that it's much more AWS-specific using more AWS-specific technologies, for better or for worse. I unfortunately can't divulge details ahead of time.

Regarding your specific implementation, I have an overall large comment:

As you noted, you have a trust on first use problem. In #1300 this is mitigated by the fact that you need to actually be able to fetch the instance metadata from that instance but here you can get the information on the new container from a different container -- meaning, _any_ container can get that information. My suggestion is to flip the logic a bit: rather than have SAM act as the intermediary and retrieve the token and then submit to WIS, I think you'd be better off having SAM simply inform VB that a new container has been started, and give it the relevant info. Then, have VB connect to a service (a listening port in SAM, a modified WIS, or something else) and give the token to a service _in the container_.

The really nice thing about doing it this way is that rather than trust that SAM is really SAM and giving it a token, it doesn't really matter if SAM isn't who it says it is -- if the info you're given checks out (by verifying the container is running and was started < 30 seconds ago) then that container should be given credentials anyways. The worst thing a fake SAM can do is poke you to send a Vault token to a valid container. You can also keep a whitelist of containers that you've already sent a token to to make sure that it only happens once, no matter how many times SAM (or a fake SAM) pokes you.

You can still have SAM be the service that actually grabs the token and then use it to grab secrets and inject via WIS (or combine the two), but with the key being that VB is what actually initiates this connection. Bonus points for signing the message carrying the token so that the service in the container can validate it -- although if it connects to the right Vault server and the token is invalid it'll figure that part out soon enough.

If you wanted, this service that receives the connection (SAM or something else) could in fact use cubbyhole, or something similar, to keep a permanent token in memory to inject new credentials into applications as needed.

Another comment is regarding one of your drawbacks:

VaultBuddy basically needs a root token - or at least enough to read any/all secrets and generate tokens for these secrets. This would be mitigated if it were written as a Vault backend.

You may want to look into token roles, which are new in 0.5.2 (docs at https://www.vaultproject.io/docs/auth/token.html). This lets VaultBuddy have access to create tokens with policies that are _not_ subsets of its own token's policies. It's designed for exactly these types of situations.

As far as a built-in Vault backend goes, we've had some internal discussions around ECS but I don't think we're quite there yet. We've seen some various approaches to ECS auth workflows (including this and very similar ones like the one I worked on at my previous job and made suggestions about above) but they all require more coordination than would be necessary if ECS supported signed metadata and have associated other drawbacks.

At the moment we and some customers are talking to AWS about possibilities to enhance the ECS metadata and API to hopefully overcome some of these issues. Based on how those go and the likelihood/timeline of any changes we'll continually be reevaluating our plans, so you never know what the future holds!

jefferai on 6 Apr 2016

hey folks,

I'm looking at this and wondering what it would take to make the leap to making it generic for docker, rather than tied to ECS.

I'm spitballing here a bit:

use #1300 as the way to get the base token
use docker content trust to sign images
- use a meta-data label to put a uuid in each image
in a separate off-line process, map image uuids to roles (e.g. this uuid can run as service.etl role); this provides more flexibility than using the image_id, though you can use that instead to have tighter control over which image is authorized to retrieve a token
when listening to the docker event stream (SAM), if images are launched, contain the meta-data label, it is signed & valid (which is key!), and the label maps to a vault role, use that vault role to generate a token (similar to VB). The token can be retrieved once and expires after a few seconds.

at that point you can use a WIS like process to retrieve the token, call vault to retrieve secrets, and do whatever you want.

Taking into account @jefferai comments, the main difference in this flow is:

use #1300 to generate the initial token
use vault roles mapped to image_ids (or a meta uuid) to give more granular permissions
use (and heavily lean on) signed docker containers rather than using the AWS ECS meta-data service to verify launched images

Notable issues:

content trust: This is the branded name of the docker OSS notary service, but I"m not sure notary is being used by any of the other registry providers such as quay.io, google, aws, etc. There are workarounds (e.g. manually create and manage image signatures), but they all require a bit of work and aren't generic
and many others I'm sure :)

While how we use the token is probably dependent upon our need, I think the need to provide each container instance with its own scoped token in as secure a manner as possible is a common need.

skippy on 17 Apr 2016

Hi @skippy ,

Indeed, something like #1300 that works for Docker (generically) is on our roadmap!

jefferai on 17 Apr 2016

👍1

thanks @jefferai

skippy on 19 Apr 2016

@natefox
I've created docker vault bridge that i'm running on AWS ECS. It requires to start docker-vault container with (wrapped) token on ECS hosts.
Hope it helps:
https://hub.docker.com/r/eskey/dockervault/

skarnecki on 31 Oct 2016

@jefferai, sounds like you guys are working on this; do you have a timeframe for when we can expect it?

Currently, we need to give all of our containers access to the union of all the secrets that any of our containers need, which is also pretty sub-optimal.

ftrimble on 20 Jan 2017

👍5

@jefferai Hi! Do you have any updates regarding this issue?

Maybe someone know any guide that describes ECS & Vault integration?

StyleT on 6 Apr 2017

@StyleT This will end up being handled by the IAM support being merged into the aws (aws-ec2) auth backend.

jefferai on 6 Apr 2017

@jefferai Am I right that I can track #2441 PR?

StyleT on 6 Apr 2017

Yes.

Closing since that's the right place to watch.

jefferai on 6 Apr 2017

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Feature request: Amazon CloudWatch logs audit backend

mfischer-zd · 3Comments

Cannot write & in secrets

adamroddick · 3Comments

Principals with SSH CA

ngunia · 3Comments

ldap userPrincipalName search all wrong

gtmtech · 3Comments

vault kv list fails when a path component has a trailing white space

singuliere · 3Comments