Amazon-ecs-agent: Aws::Errors::MissingCredentialsError: unable to sign request without credentials set

Created on 13 Feb 2018  路  6Comments  路  Source: aws/amazon-ecs-agent

There is an issue with the latest ECS Agent 1.17.0 AMI which is breaking authentication for one of our Ruby applications in an inconsistent manner.

This fault only occurred for us after upgrading to the ECS agent 1.17.0 using AWS AMI amzn-ami-2017.09.h-amazon-ecs-optimized (ami-5e414e24) but does not occur on earlier images, such as amzn-ami-2017.09.e-amazon-ecs-optimized (ami-13401669) with ECS agent 1.16.1.

Note that we're not sure yet if this is the agent's fault (could be something else in the AMI, eg Docker, Kernel, or some bug in the Ruby SDK that is only triggered with the newer ECS Agent).

We have AWS Business Support, so have logged a more formal support ticket, but I wanted to log an issue here incase anyone else is experiencing issues and looking online using this error string to combined knowledge.

Demonstration of the issue:

1 Uploading s3://example/530b1815-01dc-4add-b89a-4db0cf3da660...
2 Uploading s3://example/251ea219-c480-4857-94cb-0736a2a66ac8...
3 Uploading s3://example/0949cec1-e6fa-4a3a-b4bc-2d2da5a6a024...
4 Uploading s3://example/31efdb30-1a60-4243-b4c6-0d46ea09fe57...
5 Uploading s3://example/1c78dbbc-aa01-42c7-994a-8b06ece260c5...
6 Uploading s3://example/e0a3c434-25e7-4a36-85e0-ca193c90b546...
7 Uploading s3://example/238a685d-325b-4925-8b52-751847534681...
8 Uploading s3://example/e1561427-16d0-444a-ad78-4e884fc5ab28...
9 Uploading s3://example/b83e9cd4-f54c-499b-a038-445d8a30d115...
10 Uploading s3://example/de1fdd18-1b92-46e9-8961-f6fd670148c5...
11 Uploading s3://example/d5183b91-3b21-412f-b370-7d53408e7479...
12 Uploading s3://example/18ea17df-8ed5-4572-bb32-ecfcf92eb06d...
13 Uploading s3://example/e23bcdc7-5105-4ae0-8e16-74737b2fc093...
14 Uploading s3://example/cdfe7b07-0e7e-489a-b560-1299734f25bb...
15 Uploading s3://example/57a6e6eb-3858-4e25-a5d0-371198cf79d3...
16 Uploading s3://example/e9c88ff3-1077-4aea-b120-27068a3be9c6...
17 Uploading s3://example/66d1d4fc-1a46-4266-9936-0920d5d82117...
18 Uploading s3://example/7a787d9b-5034-453f-a6ad-645f654bf4db...
19 Uploading s3://example/d6d1b7b0-c503-4ead-911b-c9dda4225715...
bundler: failed to load command: demo.rb (demo.rb)
Aws::Errors::MissingCredentialsError: unable to sign request without credentials set
/demo/vendor/bundle/ruby/2.3.0/gems/aws-sdk-core-2.10.129/lib/aws-sdk-core/plugins/request_signer.rb:104:in `require_credentials'
/demo/vendor/bundle/ruby/2.3.0/gems/aws-sdk-core-2.10.129/lib/aws-sdk-core/plugins/s3_request_signer.rb:14:in `call'
/demo/vendor/bundle/ruby/2.3.0/gems/aws-sdk-core-2.10.129/lib/aws-sdk-core/plugins/s3_host_id.rb:14:in `call'
/demo/vendor/bundle/ruby/2.3.0/gems/aws-sdk-core-2.10.129/lib/aws-sdk-core/xml/error_handler.rb:8:in `call'
/demo/vendor/bundle/ruby/2.3.0/gems/aws-sdk-core-2.10.129/lib/aws-sdk-core/plugins/helpful_socket_errors.rb:10:in `call'
/demo/vendor/bundle/ruby/2.3.0/gems/aws-sdk-core-2.10.129/lib/aws-sdk-core/plugins/s3_request_signer.rb:65:in `call'
/demo/vendor/bundle/ruby/2.3.0/gems/aws-sdk-core-2.10.129/lib/aws-sdk-core/plugins/s3_redirects.rb:15:in `call'
/demo/vendor/bundle/ruby/2.3.0/gems/aws-sdk-core-2.10.129/lib/aws-sdk-core/plugins/retry_errors.rb:89:in `call'
/demo/vendor/bundle/ruby/2.3.0/gems/aws-sdk-core-2.10.129/lib/aws-sdk-core/plugins/s3_dualstack.rb:32:in `call'
/demo/vendor/bundle/ruby/2.3.0/gems/aws-sdk-core-2.10.129/lib/aws-sdk-core/plugins/s3_accelerate.rb:49:in `call'
/demo/vendor/bundle/ruby/2.3.0/gems/aws-sdk-core-2.10.129/lib/aws-sdk-core/plugins/s3_md5s.rb:31:in `call'
/demo/vendor/bundle/ruby/2.3.0/gems/aws-sdk-core-2.10.129/lib/aws-sdk-core/plugins/s3_expect_100_continue.rb:21:in `call'
/demo/vendor/bundle/ruby/2.3.0/gems/aws-sdk-core-2.10.129/lib/aws-sdk-core/plugins/s3_bucket_name_restrictions.rb:12:in `call'
/demo/vendor/bundle/ruby/2.3.0/gems/aws-sdk-core-2.10.129/lib/aws-sdk-core/plugins/s3_bucket_dns.rb:31:in `call'
/demo/vendor/bundle/ruby/2.3.0/gems/aws-sdk-core-2.10.129/lib/aws-sdk-core/rest/handler.rb:7:in `call'
/demo/vendor/bundle/ruby/2.3.0/gems/aws-sdk-core-2.10.129/lib/aws-sdk-core/plugins/user_agent.rb:12:in `call'
/demo/vendor/bundle/ruby/2.3.0/gems/aws-sdk-core-2.10.129/lib/seahorse/client/plugins/endpoint.rb:41:in `call'
/demo/vendor/bundle/ruby/2.3.0/gems/aws-sdk-core-2.10.129/lib/aws-sdk-core/plugins/param_validator.rb:21:in `call'
/demo/vendor/bundle/ruby/2.3.0/gems/aws-sdk-core-2.10.129/lib/seahorse/client/plugins/raise_response_errors.rb:14:in `call'
/demo/vendor/bundle/ruby/2.3.0/gems/aws-sdk-core-2.10.129/lib/aws-sdk-core/plugins/s3_sse_cpk.rb:19:in `call'
/demo/vendor/bundle/ruby/2.3.0/gems/aws-sdk-core-2.10.129/lib/aws-sdk-core/plugins/s3_dualstack.rb:24:in `call'
/demo/vendor/bundle/ruby/2.3.0/gems/aws-sdk-core-2.10.129/lib/aws-sdk-core/plugins/s3_accelerate.rb:34:in `call'
/demo/vendor/bundle/ruby/2.3.0/gems/aws-sdk-core-2.10.129/lib/aws-sdk-core/plugins/jsonvalue_converter.rb:20:in `call'
/demo/vendor/bundle/ruby/2.3.0/gems/aws-sdk-core-2.10.129/lib/aws-sdk-core/plugins/idempotency_token.rb:18:in `call'
/demo/vendor/bundle/ruby/2.3.0/gems/aws-sdk-core-2.10.129/lib/aws-sdk-core/plugins/param_converter.rb:20:in `call'
/demo/vendor/bundle/ruby/2.3.0/gems/aws-sdk-core-2.10.129/lib/seahorse/client/plugins/response_target.rb:21:in `call'
/demo/vendor/bundle/ruby/2.3.0/gems/aws-sdk-core-2.10.129/lib/seahorse/client/request.rb:70:in `send_request'
/demo/vendor/bundle/ruby/2.3.0/gems/aws-sdk-core-2.10.129/lib/seahorse/client/base.rb:207:in `block (2 levels) in define_operation_methods'
/demo/vendor/bundle/ruby/2.3.0/gems/aws-sdk-resources-2.10.129/lib/aws-sdk-resources/services/s3/file_uploader.rb:42:in `block in put_object'
/demo/vendor/bundle/ruby/2.3.0/gems/aws-sdk-resources-2.10.129/lib/aws-sdk-resources/services/s3/file_uploader.rb:49:in `open_file'
/demo/vendor/bundle/ruby/2.3.0/gems/aws-sdk-resources-2.10.129/lib/aws-sdk-resources/services/s3/file_uploader.rb:41:in `put_object'
/demo/vendor/bundle/ruby/2.3.0/gems/aws-sdk-resources-2.10.129/lib/aws-sdk-resources/services/s3/file_uploader.rb:34:in `upload'
/demo/vendor/bundle/ruby/2.3.0/gems/aws-sdk-resources-2.10.129/lib/aws-sdk-resources/services/s3/object.rb:252:in `upload_file'
demo.rb:20:in `<top (required)>'

We wrote a container to demonstrate/POC this bug, which you can get from:
https://github.com/carnivalmobile/demo-breaking-ecs-iam

kinbug pending release scopECS Agent

Most helpful comment

@jcarr-sailthru thank you so much for posting this bug report. I've also been affected by this throttling issue. While I only suspected this to be some kind of new limitation published along with ecs-agent 1.17.0 I could not confirm this neither by reading changelogs and new features introduced. Also, thank you @aaithal, for confirming this information.

All 6 comments

Hi @jcarr-sailthru, we're sorry that you're running into this issue. In v1.17.0 agent, a rate a limiter was added to the credentials endpoint, which allows clients to invoke the endpoint at a steady state rate of 10 per second and a burst of 15 per second: https://github.com/aws/amazon-ecs-agent/blob/761937f7fb98a06bea00c4f55f9c594560e73725/agent/handlers/taskmetadata/handler.go#L112

The issue that you're running into is easily reproducible since the code is creating s3 clients as fast as possible. Each new s3 client sends a credentials query to the ECS agent every time its invoked. However, most of the SDKs have built in logic to cache credentials for some period of time. Since the code at https://github.com/carnivalmobile/demo-breaking-ecs-iam/blob/1cd5dcac208977c35c190de41aa762804696b911/demo.rb#L19 is creating thousands of clients every second, it'll end up getting throttled at some point in time. However, the following modification makes this container exit with an error code of 0 (1000 runs without failure - probably an OK AMI).

$ git diff
diff 鈥攇it a/demo.rb b/demo.rb
index 835a4e8..87d0d84 100755
--- a/demo.rb
+++ b/demo.rb
@@ -7,6 +7,7 @@ require 'tempfile'
bucket = ENV['S3_BUCKET_NAME']

i=0
+s3 = Aws::S3::Resource.new
while true
i += 1
filename = SecureRandom.uuid
@@ -16,7 +17,6 @@ while true

puts "#{i} Uploading s3://#{bucket}/#{filename}..."

- s3 = Aws::S3::Resource.new
s3.bucket(bucket).object(filename).upload_file(tmp_file.path)

tmp_file.unlink

The throttling logic was added to ensure that containers on an instance are protected from each other so that an overzealous container doesn't cause an outage for other containers on the instance. It seems like current limits are probably too conservative for your use-case. I'm not sure if the modified code sample above is helpful to you. We'll also look at increasing these limits and also make them configurable.

Thanks,
Anirudh

hi Anirudh,

Thanks for the quick and detailed reply, appreciate that.

I agree that the code example isn't ideal and your suggestion of caching the S3 object makes a lot of sense - although sometimes a specific framework makes this harder than expected, eg we found the issue with a Sidekiq worker which executes from clean state for every single message it picks up off the queue. That being said, we can fix it. :-)

I have a couple other items of feedback for consideration:

  1. The SDK really needs to be amended to identify when auth has failed due to a rate limiting problem, vs failing due to a missing creds problem. EG we should have a Aws::Errors::AuthRateLimitExeeced error that makes it clear(er) what the fault is.

  2. I'd consider this a potentially breaking change, yet it was not identified in the ECS Agent release notes. Might be worth adding retrospectively for those whom have not yet updated and keeping this kind of change in mind for clearer disclosure in future.

regards,
Jethro

hi Anirudh,

One further question - you mentioned that the throttling was added to prevent one container from causing major disruption to another container.

Am I correct in thinking that on a typical EC2 instance with IAM, there is no rate limit on the credentials endpoint, so this protection is purely to stop the HTTP server in the ECS agent itself from being overloaded, rather than the combination of multiple containers overloading the underlying EC2 host's credentials endpoint?

The plan to increase the limits and make them configurable sounds like a good course of action to mitigate apps that are tricky to change - although we will make a change to our app since any excess HTTP calls are wasted effort and we like regaining those precious milliseconds :-)

regards,
Jethro

I'd consider this a potentially breaking change, yet it was not identified in the ECS Agent release notes.

I get your point. We'll see if we can retroactively capture this in our release notes.

Am I correct in thinking that on a typical EC2 instance with IAM, there is no rate limit on the credentials endpoint, so this protection is purely to stop the HTTP server in the ECS agent itself from being overloaded, rather than the combination of multiple containers overloading the underlying EC2 host's credentials endpoint?

That is correct. In the worst case, one of your containers might cause one of your other containers to have a lag/failure while retrieving credentials if it's hogging all the bandwidth. The lack of protection for EC2 instances' credentials endpoint (EC2 instance metadata service) is also a gap.

The plan to increase the limits and make them configurable sounds like a good course of action to mitigate apps that are tricky to change

Thanks for confirming that. We'll definitely try to address this in the next release.

although we will make a change to our app since any excess HTTP calls are wasted effort and we like regaining those precious milliseconds

Sounds great!

Thanks for all the details Anirudh!

@jcarr-sailthru thank you so much for posting this bug report. I've also been affected by this throttling issue. While I only suspected this to be some kind of new limitation published along with ecs-agent 1.17.0 I could not confirm this neither by reading changelogs and new features introduced. Also, thank you @aaithal, for confirming this information.

Was this page helpful?
0 / 5 - 0 ratings