Amazon-ecs-agent: CredentialsV2Request: ID not found. Request IP Address: ---------

Created on 8 Jan 2019 · 25Comments · Source: aws/amazon-ecs-agent

Summary

Containers within the server are unable to access credentials from the ECS Agent

Description

Containers within the server are unable to access credentials from the ECS Agent resulting in inability to access Boto among other things within the container

2019-01-08T12:26:40Z [INFO] CredentialsV2Request: ID not found. Request IP Address: 172.17.0.3:22252
2019-01-08T12:26:40Z [WARN] Unknown eventType: GetCredentialsInvalidRoleType

Environment Details

Amazon ECS Agent - v1.21.0 (3d368554)
Docker Version - 18.06.1-ce
Filesystem Size Used Avail Use% Mounted on
devtmpfs 3.9G 104K 3.9G 1% /dev
tmpfs 3.9G 0 3.9G 0% /dev/shm
/dev/xvda1 20G 1.1G 19G 6% /

Supporting Log Snippets

Please provide a way to provide the supporting log files in a private manner

kinbug kintracking scopECS Service workaround available

Source

superprat

👍4

Most helpful comment

Hey guys, by chance has someone come up with a solution to this? I've just run into this problem. I'm deploying 2 services with the exact same deployment script but it's only the second one that runs into this problem

carlosmiguel84 on 9 Jul 2019

👍3

All 25 comments

Please provide a way to provide the supporting log files in a private manner

feel free to send logs to adnkha at amazon dot com with a reference to this issue. to help with debugging - do you have steps for a minimal repro? are you able to repro this in a constrained way?

adnxn on 16 Jan 2019

I don't have exact steps to reproduce this. We faced this issue only one ECS service, it was working for our other ECS Services. This issue was affecting our ability to access S3 from within the container as it could not fetch the IAM Role. The HTTP Request to fetch the IAM role above gave the above error.

Things we tried:
We tried recreating the ECS service with the different images and tasks that did not help
We also tried upgrading the agent version to v1.23 as well, but that did not help.

Eventually the issue resolved itself in a few hours with the original ECS definition and ECR image

I've shared the logs, hope that can provide some insight.

superprat on 18 Jan 2019

@adnxn What does CredentialsV2Request: ID not found mean?

I'm getting the same error message and this issue is the first result Google gives me.

The underlying issue might be different but I really to know what the error message means so that I can have some clue.

Domon on 23 Jan 2019

I realized that the error message CredentialsV2Request: ID not found means the ID in AWS_CONTAINER_CREDENTIALS_RELATIVE_URI is problematic. However, I have no idea how to find out why. It also happens in v1.24.

Domon on 23 Jan 2019

Hi @adnxn

Adding to @superprat 's comments:

Faced the issue again today. Our production systems partially went down since they could not talk to S3 and Kinesis.

ERROR 2019-01-25 13:46:35,298 kinesis put_record_to_stream exception: Error when retrieving credentials from container-role: Error retrieving metadata: Received error when attempting to retrieve ECS metadata: Connect timeout on endpoint URL: "http://169.254.170.2/v2/credentials/XXXXXXXXX",

CredentialRetrievalError: Error when retrieving credentials from container-role: Error retrieving metadata: Received error when attempting to retrieve ECS metadata: Connect timeout on endpoint URL: "http://169.254.170.2/v2/credentials/XXXXXXXXX"

After 10 minutes this exception automatically went away and the systems started working normally.

Please help us here.

ranvijayj on 25 Jan 2019

@ranvijayj: are you seeing the same errors in the agent logs that @superprat referenced?

specifically, see below:

2019-01-08T12:26:40Z [INFO] CredentialsV2Request: ID not found. Request IP Address: 172.17.0.3:22252
2019-01-08T12:26:40Z [WARN] Unknown eventType: GetCredentialsInvalidRoleType

also, the logs that you referenced - where are these logs originating from? they don't look like agent logs.

adnxn on 5 Feb 2019

👍1

Eventually the issue resolved itself in a few hours with the original ECS definition and ECR image

I've shared the logs, hope that can provide some insight.

@superprat: so the set of logs you've sent are not at the debug level so our visibility is limited. i suspect the agent for some reason hadn't been relayed the container credentials from our backend by the time your application went looking for them. i think this is the case since you mention the issue is transient.

the [WARN] Unknown eventType: GetCredentialsInvalidRoleType entry is interesting, though I realised we can't see what role type was actually received. we should add more detailed logging for this failure mode.

are you still running into this regularly? i've tried to reproduce this but haven't had any luck.

adnxn on 6 Feb 2019

@adnxn We are also seeing this issue regularly.

on one of our EC2 instances in the cluster, if we run docker logs ecs-agent we see:

2019-02-05T23:37:54Z [INFO] CredentialsV2Request: ID not found. Request IP Address: 10.0.100.189:33724
2019-02-05T23:37:54Z [WARN] Unknown eventType: GetCredentialsInvalidRoleType
2019-02-05T23:37:54Z 400 10.0.100.189:33724 "/v2/credentials" "aws-sdk-go/1.12.66 (go1.10.3; linux; amd64)" -

docker ps for the ecs agent looks like fwiw:

433399f72a24       amazon/amazon-ecs-agent:latest        "/agent"         20 minutes ago        Up 20 minutes        ecs-agent

it appears the ID not found error happens when the API response on /v2/credentials is not successful which leads to services failing. It's only happening on a particular task for us as well. Other tasks are running just fine for us on the same instance in the same cluster.

Let me know if there is more information I can provide for you.

jrichard0725 on 6 Feb 2019

Did you notice any Docker timeout or other Docker errors in agent logs when this issue happened?

My theory is that Docker operation like inspect would have failed on the task's container, due to which agent would have moved the task to STOPPED. So the task's credentials are cleaned up as well. But the container is actually running and is now requesting for creds, which fails due to ID not being found.

sharanyad on 6 Feb 2019

Not really:

2019-02-05T23:35:35Z [INFO] Managed task [arn:aws:ecs:us-west-2:390...:task/45b5e47...]: redundant container state change. style-survey to RUNNING, but already RUNNING
2019-02-05T23:37:54Z [INFO] Handling http requestmethodGETfrom10.0.100.189:33724
2019-02-05T23:37:54Z [INFO] CredentialsV2Request: ID not found. Request IP Address: 10.0.100.189:33724
2019-02-05T23:37:54Z [WARN] Unknown eventType: GetCredentialsInvalidRoleType
2019-02-05T23:37:54Z 400 10.0.100.189:33724 "/v2/credentials" "aws-sdk-go/1.12.66 (go1.10.3; linux; amd64)" - 
...
...
2019-02-06T00:07:12Z [INFO] TCS Websocket connection closed for a valid reason
2019-02-06T00:07:12Z [INFO] Establishing a Websocket connection to https://ecs-t-3.us-west-2.amazonaws.com/ws?cluster=production&containerInstance=arn%3Aaws%3Aecs%3Aus-west-2%3A3909...56%3Acontainer-instance%2Fproduction%2Fbefa...2f5
2019-02-06T00:07:12Z [INFO] Connected to TCS endpoint

If I view the /var/log/docker log on the same instance there is

time="2019-02-05T23:36:13.125150996Z" level=error msg="Error setting up exec command in container ecs-laravel-production-12-...01: Container 5c566...4bbcab15 is not running"
time="2019-02-05T23:36:15.106024184Z" level=error msg="Error setting up exec command in container ecs-laravel-production-12...01: Container 5c566...4bbcab15 is not running"
time="2019-02-05T23:37:54.524648464Z" level=error msg="Failed to create log stream" errorCode=CredentialsEndpointError logGroupName=/ecs/laravel-production logStreamName=ecs-laravel-fpm/laravel-fpm/771f8....05 message="failed to load credentials" origError="InvalidIdInRequest: CredentialsV2Request: ID not found"
time="2019-02-05T23:37:54.524758929Z" level=error msg="Handler for GET /v1.38/containers/ecs-......./logs returned error: failed to create Cloudwatch log stream: CredentialsEndpointError: failed to load credentials\ncaused by: InvalidIdInRequest: CredentialsV2Request: ID not found"
time="2019-02-05T23:48:27.334867542Z" level=error msg="stream copy error: reading from a closed fifo"

Not sure if that's helpful or not. If there are other log files I can poke at please let me know.

jrichard0725 on 6 Feb 2019

@jrichard0725 please feel free to send the full set of logs to sharanyd at amazon.com
If you can reproduce this with log level as DEBUG and obtain those, it would be really helpful.

sharanyad on 6 Feb 2019

The task role
The task execution role
&
the ec2 instance role
all have the policy
Cloudwatchlogsfullaccess

Error response from daemon: failed to create Cloudwatch log stream: CredentialsEndpointError: failed to load credentials caused by: InvalidIdInRequest: CredentialsV2Request: ID not found

Anyone have any ideas?

adampblack on 21 May 2019

I think I know why this error message is coming up now.

Read "Enabling the awslogs Log Driver for Your Containers":
https://docs.docker.com/config/containers/logging/awslogs/

Read "Credentials":
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/using_awslogs.html

No idea what "AWS_SESSION_TOKEN" is but I went to IAM, added a user with the policy awslogs and got the access and secret keys. I added them as environment variables to container but still got the same error message.

From my point of view, I think this is a BUG. The AWS ECS agent should be providing the instance EC2 role permissions with the docker containers to do logging.

I am going to stop using awslogs - too much hassle.

adampblack on 21 May 2019

Tried this, still not working
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/using_cloudwatch_logs.html

adampblack on 22 May 2019

carlosmiguel84 on 9 Jul 2019

👍3

I'm seeing this as well, only happening to one out of the 5 services I'm currently running in ECS.

matthewcummings on 30 Jul 2019

I've just hit this issue on one of my services when I activated "Auto configure Cloudwatch logs" on my container definition. Going back to the container definition, I could see that the options were still there, but the auto-configure checkbox was now unticked. Re-ticking it doesn't fix the issue, but I was at least able to disable logging temporarily and get my service back up. Weirdly the logs were still getting through to CloudWatch =/

FredDeschenes on 6 Sep 2019

I'm unable to reproduce the issue. If anyone still has this issue, please send the following information to ecs-agent-external AT amazon.com:

Logs on the instance (ideally on debug level and collected by https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-logs-collector.html which gives us more information)
Affected task ID
Task definition

Thanks.

fenxiong on 7 Oct 2019

Hi,
Sorry you’re facing this issue. Currently, ECS Agent does not persist the credentials information for security reasons.
So, when the agent restarts, the credentials information for the tasks is streamed by the ECS service to the agent. Now, there are possibilities where the message containing the credentials information could get lost in transit. During this time period, if the task’s container requests for the credentials info, then agent will not hold this information and could return ID not found response. Hence the 400 http error response.
We will make this error message clearer and work on a server side fix to detect this state sooner.
For now as a workaround, if such an error occurs, we suggest you restart agent manually and see if that works.
We also suggest sending the instance debug logs as mentioned in the above comment by @fenxiong.

Thanks,
Sharanya

sharanyad on 10 Oct 2019

Closing. Please re-open if you face this issue. As a prerequisite to reopening, please send instance debug logs as mentioned in the comment above by @fenxiong

fierlion on 27 Apr 2020

Hello,
Lately we started facing a similar issue, as per the logs in cloud-watch this is the error we're running into. Any help would be appreciated. Thanks
```
botocore.exceptions.CredentialRetrievalError: Error when retrieving credentials from container-role: Error retrieving metadata: Received non 200 response (400) from ECS metadata:
{
"code": "InvalidIdInRequest",
"message": "CredentialsV2Request: Credentials not found",
"HTTPErrorCode": 400
}

PratMis on 7 May 2020

Hi,
Sorry you’re facing this issue. For current workaround, please refer to @sharanyad 's comment above.
We also suggest sending the instance debug logs as mentioned in the above comment by @fenxiong.

Thanks,
Meghna

mssrivas on 21 May 2020

Hi all,
We have deployed a service side change for handling this. This should be fixed now. I'm closing this issue. Please feel free to re-open/send task info and agent level debug logs if you face this issue again to ecs-agent-external AT amazon.com.

Thanks,
Sharanya

sharanyad on 17 Jun 2020

I am running into the same issue. I am using AWS SDK inside my container to get some sensitive data from the Secret Manager. And in the Secret Manager I am giving read access only to the IAM role of the container.

With the same error, container cannot connect to the Secret Manager or even simple STS.

What I do is

I have an IAM role for the instance. I have another IAM role for the container.
From the EC2 Instance I can easily call aws sts get-caller-identity and get the response back.
From the container, when I call the same aws sts get-caller-identity it fails.

Error inside the container:

Error when retrieving credentials from container-role: Error retrieving metadata: Received non 200 response (400) from ECS metadata: {"code":"InvalidIdInRequest","message":"CredentialsV2Request: Credentials not found","HTTPErrorCode":400}

Same log lines from docker logs ecs-agent of the instance:

level=warn time=2021-01-14T23:09:48Z msg="Unknown eventType: GetCredentialsInvalidRoleType" module=entry_types.go
level=error time=2021-01-14T23:09:48Z msg="HTTP response status code is '400', request type is: credentials, and response in JSON is {\"code\":\"InvalidIdInRequest\",\"message\":\"CredentialsV2Request: Credentials not found\",\"HTTPErrorCode\":400}" module=helpers.go
2021-01-14T23:09:48Z 400 172.17.0.2:45020 "/v2/credentials" "" -

Things that do work

From the container I can do curl http://169.254.169.254/latest/meta-data/identity-credentials/ec2/info and it properly returns AccountId. So does curl http://169.254.169.254/latest/meta-data/identity-credentials/ec2/security-credentials/ec2-instance and I see AccessKeyId and SecretAccessKey values for the temporary token that expires in 6 hours. So, there is a valid token actually created.
From the instance when I call aws secretsmanager get-secret-value --secret-id <my-secret-id-name>, it properly returns AccessDeniedException for the "Instance Role", as expected. I can lift that via giving the read access to the instance role. So there is nothing wrong between the Instance and secretsmanager in terms of connection.

Also worth mentioning

I am not doing anything special -- just created an ECS Cluster (default AMI from the wizard that is ecs-optimized) and tried to run a task on the EC2 Instances it created.
I tried changing the clusters and instances (from t3.micro to t3.medium) and it didn't help.
I tried playing with possible ECS Agent Params (e.g. ECS_ENABLE_TASK_IAM_ROLE, ECS_ENABLE_TASK_IAM_ROLE_NETWORK_HOST) and none of the ones I tried help. In fact it resulted in getting timeout connections between the container and the instance, that I dropped the instance and started fresh again -- still the same problem. I assume the default ones should work when I go through the wizard and just pick the instance type to be t3.medium.
I tried changing the docker network type to default/host/bridge in the Task Definition, but again none works.
I have got ECS Logs via the ecs-logs-collection.sh (per https://github.com/aws/amazon-ecs-agent/issues/1146#issuecomment-353195570) and I can send them over if you are interested.

eyedean on 15 Jan 2021

After a full day of investigation, I found the root cause of my case, and I'm left with a big "how-to" question.

What was happening in my case

I had 3 containers that I wanted to run together as a single task (3 containers one instance). I had marked all of them as essential. Apparently one of them (container A) was dying on start-up (error on my side), which was causing B and C to also stop. And I was solely looking at C and didn't know why it dies right after starting up with no error from the container or in the UI.

So, what I was doing to further debug was to manually, SSH into the EC2 Instance, and docker start <container-id> of container C to see what's going on. That's why I was getting this error, after the manual start!

Kinda makes sense

My guess is, the IAM roles and some configs/credentials are passed to the container only when the container is started (not just created, also started) by ECS Agent.
That should be why the manual docker start <container-id> of the dead container, would cause the container to be in this weird start -- after going into the container via docker -u 0 -exec it <container-id> bash, I would see the above errors trying to make any call to AWS services (e.g. aws sts get-caller-identity).

So, What to do then?

I understand that starting/restarting docker containers manually is not how things should be handled via ECS. But in case someone needs that (e.g. you are time-crunched to make a minor change to the container and restart it), how would they be able to do this? What's the equivalent of docker start/restart <container-id> in an ecs-agent-controlled instance?

PS. Cross-reference to StackOverFlow: https://stackoverflow.com/a/65743014

eyedean on 15 Jan 2021

Was this page helpful?

0 / 5 - 0 ratings

Related issues

AWS Parameter Store for user specific secrets

pspanchal · 3Comments

Issue with awslogs-datetime-format

leonblueconic · 3Comments

Can not acquire network metric in EC 2/Bridge mode

hayajo · 3Comments

ECS agent can't pull image from ECR repository on another AWS account

AlexShuraits · 4Comments

Service:AmazonECS, Code:ClientException, Message:Actual length: '34432'. Max allowed length is '32768' bytes., Class:com.amazonaws.services.ecs.model.ClientException

devotox · 3Comments