Vault: AWS auth backend client unable to use IAM credentials from ECS task metadata

Created on 25 Apr 2020  路  5Comments  路  Source: hashicorp/vault

_This is a regression_

Worked in version: 1.3.0
Broken in version: 1.4.0

Bug Description
I have Vault deployed as an ECS service, using an ECS task definition with an associated task role. I have an AWS auth backend configured with a client that uses the IAM credentials from ECS task metadata. This configuration was working without issues with Vault 1.3.0. After upgrading Vault to version 1.4.0, I am unable to create AWS auth backend roles. Vault is unable to resolve the ARN and produces the following output (IDs and URLs redacted or modified):

$ vault write auth/aws/role/example auth_type=iam bound_iam_principal_arn=arn:aws:iam::3xxxxxxxxxx6:role/example policies=default
Error writing data to auth/aws/role/example: Error making API request.

URL: PUT https://vault.internal:8200/v1/auth/aws/role/example
Code: 400. Errors:

* unable to resolve ARN "arn:aws:iam::3xxxxxxxxxx6:role/example" to internal ID: unable to fetch current caller: InvalidClientTokenId: The security token included in the request is invalid
    status code: 403, request id: 5xxxxxxx-5xxx-4xxx-9xxx-5xxxxxxxxxx

The backend client is apparently able to use the AWS access key id and secret access key from ECS metadata, but not the token which is also required to authenticate.

To Reproduce
Steps to reproduce the behavior:

  1. Deploy Vault 1.4.0 in AWS ECS using a task role that grants ecs:DescribeInstances, iam:GetInstanceProfiles, iam:GetRole, and iam:GetUser
  2. Configure an AWS auth backend, omitting credentials from the backend client
  3. Attempt to create an AWS auth role as indicated in the bug description

Expected behavior
I expect Vault to use AWS credentials (access key, secret key, and token) from ECS metadata, successfully resolve the IAM role, and create the auth role.

Environment:

  • Vault Server Version: 1.4.0
  • Vault CLI Version: 1.4.0
  • Server Operating System/Architecture: vault docker image deployed to AWS ECS

Vault server configuration file template:

storage "consul" {
  address = "${consul_http_address}"
  path    = "${vault_consul_path}"
}

listener "tcp" {
  address         = "0.0.0.0:8200"
  tls_cert_file = "/ssl/certs/server.pem"
  tls_key_file  = "/ssl/private/server.pem"
}

ui = true
autaws bug

Most helpful comment

I was able to fix this, but not with the hop limit.

On my ECS hosts, I have a rule in the FORWARDING table that redirects 169.254.169.254:80 to a container running nginx that is configured as a WAF for the EC2 metadata service. This IMDS WAF is one hop from the task container at L3 where the hop limit is imposed. The WAF configuration blackholes credentials and user data, but leaves open other instance metadata like networking, AMI ID, etc. I added /latest/api/token to the blackhole list and now vault running as an ECS task is correctly using its ECS metadata credentials (169.254.170.2 rather than 169.254.169.254) and can resolve the IAM role when I create a new AWS auth backend role.

All 5 comments

There is additional configuration via environment variables which I realize I neglected to include. I have VAULT_SEAL_TYPE set to awskms with a corresponding VAULT_AWSKMS_SEAL_KEY_ID, and I am also setting VAULT_API_ADDR and VAULT_CLUSTER_ADDR to appropriate values for each of two nodes.

Hi. Vault 1.4.0 updated the AWS SDK, and along with that began using IMDSv2. In researching this issue, I've come across various issues raised against the SDK and in other projects concerning IMDSv2 within container services. A common recommendation is to increase the response hop limit for the underlying instances. I was wondering if you'd tried that? This is not desirable requirement, and it looks like there requests into AWS to improve it. Nonetheless, it would be very useful to know if this is at least a workaround for now.

References:

I was able to fix this, but not with the hop limit.

On my ECS hosts, I have a rule in the FORWARDING table that redirects 169.254.169.254:80 to a container running nginx that is configured as a WAF for the EC2 metadata service. This IMDS WAF is one hop from the task container at L3 where the hop limit is imposed. The WAF configuration blackholes credentials and user data, but leaves open other instance metadata like networking, AMI ID, etc. I added /latest/api/token to the blackhole list and now vault running as an ECS task is correctly using its ECS metadata credentials (169.254.170.2 rather than 169.254.169.254) and can resolve the IAM role when I create a new AWS auth backend role.

Closing since the original issue has been resolved.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

0x9090 picture 0x9090  路  3Comments

gtmtech picture gtmtech  路  3Comments

tustvold picture tustvold  路  3Comments

narayan8291 picture narayan8291  路  3Comments

lexsys27 picture lexsys27  路  3Comments