_This is a regression_
Worked in version: 1.3.0
Broken in version: 1.4.0
Bug Description
I have Vault deployed as an ECS service, using an ECS task definition with an associated task role. I have an AWS auth backend configured with a client that uses the IAM credentials from ECS task metadata. This configuration was working without issues with Vault 1.3.0. After upgrading Vault to version 1.4.0, I am unable to create AWS auth backend roles. Vault is unable to resolve the ARN and produces the following output (IDs and URLs redacted or modified):
$ vault write auth/aws/role/example auth_type=iam bound_iam_principal_arn=arn:aws:iam::3xxxxxxxxxx6:role/example policies=default
Error writing data to auth/aws/role/example: Error making API request.
URL: PUT https://vault.internal:8200/v1/auth/aws/role/example
Code: 400. Errors:
* unable to resolve ARN "arn:aws:iam::3xxxxxxxxxx6:role/example" to internal ID: unable to fetch current caller: InvalidClientTokenId: The security token included in the request is invalid
status code: 403, request id: 5xxxxxxx-5xxx-4xxx-9xxx-5xxxxxxxxxx
The backend client is apparently able to use the AWS access key id and secret access key from ECS metadata, but not the token which is also required to authenticate.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
I expect Vault to use AWS credentials (access key, secret key, and token) from ECS metadata, successfully resolve the IAM role, and create the auth role.
Environment:
Vault server configuration file template:
storage "consul" {
address = "${consul_http_address}"
path = "${vault_consul_path}"
}
listener "tcp" {
address = "0.0.0.0:8200"
tls_cert_file = "/ssl/certs/server.pem"
tls_key_file = "/ssl/private/server.pem"
}
ui = true
There is additional configuration via environment variables which I realize I neglected to include. I have VAULT_SEAL_TYPE set to awskms with a corresponding VAULT_AWSKMS_SEAL_KEY_ID, and I am also setting VAULT_API_ADDR and VAULT_CLUSTER_ADDR to appropriate values for each of two nodes.
Hi. Vault 1.4.0 updated the AWS SDK, and along with that began using IMDSv2. In researching this issue, I've come across various issues raised against the SDK and in other projects concerning IMDSv2 within container services. A common recommendation is to increase the response hop limit for the underlying instances. I was wondering if you'd tried that? This is not desirable requirement, and it looks like there requests into AWS to improve it. Nonetheless, it would be very useful to know if this is at least a workaround for now.
References:
I was able to fix this, but not with the hop limit.
On my ECS hosts, I have a rule in the FORWARDING table that redirects 169.254.169.254:80 to a container running nginx that is configured as a WAF for the EC2 metadata service. This IMDS WAF is one hop from the task container at L3 where the hop limit is imposed. The WAF configuration blackholes credentials and user data, but leaves open other instance metadata like networking, AMI ID, etc. I added /latest/api/token to the blackhole list and now vault running as an ECS task is correctly using its ECS metadata credentials (169.254.170.2 rather than 169.254.169.254) and can resolve the IAM role when I create a new AWS auth backend role.
I believe https://github.com/hashicorp/vault/pull/7738 should fix this
Closing since the original issue has been resolved.
Most helpful comment
I was able to fix this, but not with the hop limit.
On my ECS hosts, I have a rule in the
FORWARDINGtable that redirects 169.254.169.254:80 to a container running nginx that is configured as a WAF for the EC2 metadata service. This IMDS WAF is one hop from the task container at L3 where the hop limit is imposed. The WAF configuration blackholes credentials and user data, but leaves open other instance metadata like networking, AMI ID, etc. I added/latest/api/tokento the blackhole list and nowvaultrunning as an ECS task is correctly using its ECS metadata credentials (169.254.170.2 rather than 169.254.169.254) and can resolve the IAM role when I create a new AWS auth backend role.