Aws-sdk-net: Dotnet Core 2.1 - AmazonSQSClient AmazonServiceException when call ReceiveMessageAsync

Created on 17 Jul 2018  路  25Comments  路  Source: aws/aws-sdk-net

Hi.
When I updated dotnet core (2.0.6 -> 2.1) and deployed on server, we met some error about credentials in AmazonSQSClient.
I think it is from dotnet core 2.1 HttpClientHandler update.
So I rollback dotnet core version (2.1 -> 2.0.6), then there is no exception anymore.

Expected Behavior


Whenever we call Amazon.SQS.AmazonSQSClient.ReceiveMessageAsync(ReceiveMessageRequest request, CancellationToken cancellationToken = default(CancellationToken)), there should be no exception.

Current Behavior





The function throws exception very frequently(3 minutes in 6 hour average) like below.

Amazon.Runtime.AmazonServiceException: Failed to retrieve credentials from EC2 Instance Metadata Service.
   at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.GetCredentials()
   at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.GetCredentialsAsync()
   at Amazon.Runtime.Internal.CredentialsRetriever.InvokeAsync[T](IExecutionContext executionContext)
   at Amazon.Runtime.Internal.RetryHandler.InvokeAsync[T](IExecutionContext executionContext)
   at Amazon.Runtime.Internal.RetryHandler.InvokeAsync[T](IExecutionContext executionContext)
   at Amazon.Runtime.Internal.CallbackHandler.InvokeAsync[T](IExecutionContext executionContext)
   at Amazon.Runtime.Internal.CallbackHandler.InvokeAsync[T](IExecutionContext executionContext)
   at Amazon.Runtime.Internal.ErrorCallbackHandler.InvokeAsync[T](IExecutionContext executionContext)
   at Amazon.Runtime.Internal.MetricsHandler.InvokeAsync[T](IExecutionContext executionContext)

Context



I want to get the network performance provided by dotnet core 2.1, with AWS features.

Your Environment

  • AWSSDK.Core version used: 3.3.24.2
  • AWSSDK.SQS version used: 3.3.3.10
  • Operating System and version: Amazon Linux AMI 2017.09.k x86_64 ECS HVM GP2
  • Dotnet Build Docker Image: microsoft/dotnet:2.1-sdk
  • Dotnet Runtime Docker Image: microsoft/dotnet:2.1-runtime

.NET Core Info

  • .NET Core version used for development: 2.1
  • .NET Core version installed in the environment where application runs: microsoft/dotnet:2.1-runtime (docker image)
  • Output of dotnet --info:
Host (useful for support):
  Version: 2.1.2
  Commit:  811c3ce6c0

.NET Core SDKs installed:
  No SDKs were found.

.NET Core runtimes installed:
  Microsoft.NETCore.App 2.1.2 [/usr/share/dotnet/shared/Microsoft.NETCore.App]
guidance

Most helpful comment

We have a web api on .NET core 2.1 and publishing messages to SNS. We have seen that same error call-stack in our logs:
Amazon.Runtime.AmazonServiceException: Failed to retrieve credentials from EC2 Instance Metadata Service. at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.GetCredentials() at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.GetCredentialsAsync()

We have the problem seen at the same time that sometimes a machine's CPU goes up to 100%, requests fail, and about half the time it recovers, and half the time the resulting healthcheck failure causes it to be terminated. This severe issue means that we are not rolling out .NET core into production any further.

Since something is going wrong in the class DefaultInstanceProfileAWSCredentials, we therefore raised this PR for discussion: #1010

My questions related to this class are:

  • Why do you use Instance?.logger.Error() on line 155? Instance?. - with a null check - is not done anywhere else. if Instance is null, it will fail in other places, including the line that immediately follows. I don't think that Instance can be null anywhere in RenewCredentials().

  • The Instance getter uses a lock to make sure that the _instance is not null. But the lock is acquired every time the property is accessed. This appears to be an ideal candidate for double-checked locking to reduce the amount of locking here drastically.

  • I would expect Instance.lastRetrievedCredentials = null on line 159 to come before the logging that preceeds it, since getting into the desired state is more important and faster than the logging. If logging takes a while or blocks, then other threads have more chance to access the state that you have decided is unwanted.

  • As pointed out by @arishlabroo, is it necessary to set the creds to null at all? Is keeping the previous creds not a better option?

  • Are you aware of the issues with spinlocks under load? This looks a bit like the CPU spikes that we see. 5 seconds (line 42) is a very long time to wait on a cpu-intensive spinlock.

  • Where the Spinlock is used, Is any lock necessary at all? It is around assignments and reads. .NET reference assignment is an atomic operation anyway. There is no halfway state that other threads need to be locked away from, inside a statement of the form a = b;. I _think_ this is also true of statements like a.b.c = d;.

If you're worried about things changing during your slightly more complex read expression, then you can do
var mine = lastRetrievedCredentials; // atomic assign credentials = mine?.Copy();

But if the IL for that is the same as before, then there is no need for even a temp var.

  • Have you considered not using Instance in RenewCredentials() since the Instance getter can potentially call RenewCredentials() again. There may be no actual potential for recursion, but it gives me a bad feeling to call the property that called you, to essentially get this with repeated checks and locks. You can use _instance, or even just use this implicitly, which should be the same object as Instance. If they're not the same not, you have other issues.

The way that EC2InstanceMetadata.IAMSecurityCredentials is sync but actually also async inside AsyncHelpers is a deeper issue that requires a more comprehensive fix, out of scope today.

All 25 comments

Interesting, could you try running on .NET Core 2.1 again and set the environment variable DOTNET_SYSTEM_NET_HTTP_USESOCKETSHTTPHANDLER equal to 0. This is a .NET Core environment variable that you can use to toggle back to the libcurl implementation of HttpClient. I'm really curious if you see a change in behavior.

These are what I found on AWS SDK Internal Log when AmazonSQSClient threw Exception without the env DOTNET_SYSTEM_NET_HTTP_USESOCKETSHTTPHANDLER setting

DefaultInstanceProfileAWSCredentials 5390|2018-07-19T03:33:08.174Z|ERROR|Failed to retrieve credentials from EC2 Instance Metadata Service. --> System.Threading.Tasks.TaskCanceledException: The operation was canceled. ---> System.IO.IOException: Unable to read data from the transport connection: Operation canceled. ---> System.Net.Sockets.SocketException: Operation canceled
--- End of inner exception stack trace ---
at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error)
at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.GetResult(Int16 token)
at System.Net.Http.HttpConnection.FillAsync()
at System.Net.Http.HttpConnection.ReadNextResponseHeaderLineAsync(Boolean foldedHeadersAllowed)
at System.Net.Http.HttpConnection.SendAsyncCore(HttpRequestMessage request, CancellationToken cancellationToken)
--- End of inner exception stack trace ---
at System.Net.Http.HttpConnection.SendAsyncCore(HttpRequestMessage request, CancellationToken cancellationToken)
at System.Net.Http.HttpConnectionPool.SendWithRetryAsync(HttpRequestMessage request, Boolean doRequestAuth, CancellationToken cancellationToken)
at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
at System.Net.Http.HttpClient.FinishSendAsyncUnbuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts)
at System.Net.Http.HttpClient.GetStringAsyncCore(Task`1 getTask)
at Amazon.Runtime.Internal.Util.AsyncHelpers.<>c__DisplayClass1_1`1.<<RunSync>b__0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at Amazon.Runtime.Internal.Util.AsyncHelpers.ExclusiveSynchronizationContext.BeginMessageLoop()
at Amazon.Runtime.Internal.Util.AsyncHelpers.RunSync[T](Func`1 task)
at Amazon.Util.AWSSDKUtils.DownloadStringContent(Uri uri, TimeSpan timeout)
at Amazon.Util.EC2InstanceMetadata.GetItems(String relativeOrAbsolutePath, Int32 tries, Boolean slurp)
at Amazon.Util.EC2InstanceMetadata.get_IAMSecurityCredentials()
at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.RenewCredentials(Object unused)


AmazonSQSClient 5393|2018-07-19T03:33:10.306Z|ERROR|AmazonServiceException making request ReceiveMessageRequest to https://sqs.ap-northeast-1.amazonaws.com/. Attempt 1. --> Amazon.Runtime.AmazonServiceException: Failed to retrieve credentials from EC2 Instance Metadata Service.
at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.GetCredentials()
at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.GetCredentialsAsync()
at Amazon.Runtime.Internal.CredentialsRetriever.InvokeAsync[T](IExecutionContext executionContext)
at Amazon.Runtime.Internal.RetryHandler.InvokeAsync[T](IExecutionContext executionContext)
DefaultInstanceProfileAWSCredentials 11507|2018-07-19T04:41:17.431Z|ERROR|Failed to retrieve credentials from EC2 Instance Metadata Service. --> System.OperationCanceledException: The operation was canceled.
at System.Net.Http.HttpClient.HandleFinishSendAsyncError(Exception e, CancellationTokenSource cts)
at System.Net.Http.HttpClient.FinishSendAsyncUnbuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts)
at System.Net.Http.HttpClient.GetStringAsyncCore(Task`1 getTask)
at Amazon.Runtime.Internal.Util.AsyncHelpers.<>c__DisplayClass1_1`1.<<RunSync>b__0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at Amazon.Runtime.Internal.Util.AsyncHelpers.ExclusiveSynchronizationContext.BeginMessageLoop()
at Amazon.Runtime.Internal.Util.AsyncHelpers.RunSync[T](Func`1 task)
at Amazon.Util.AWSSDKUtils.DownloadStringContent(Uri uri, TimeSpan timeout)
at Amazon.Util.EC2InstanceMetadata.GetItems(String relativeOrAbsolutePath, Int32 tries, Boolean slurp)
at Amazon.Util.EC2InstanceMetadata.get_IAMSecurityCredentials()
at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.RenewCredentials(Object unused)


AmazonSQSClient 11510|2018-07-19T04:41:18.111Z|ERROR|AmazonServiceException making request ReceiveMessageRequest to https://sqs.ap-northeast-1.amazonaws.com/. Attempt 1. --> Amazon.Runtime.AmazonServiceException: Failed to retrieve credentials from EC2 Instance Metadata Service.
at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.GetCredentials()
at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.GetCredentialsAsync()
at Amazon.Runtime.Internal.CredentialsRetriever.InvokeAsync[T](IExecutionContext executionContext)
at Amazon.Runtime.Internal.RetryHandler.InvokeAsync[T](IExecutionContext executionContext)

As we can see, the exception come from here: https://github.com/aws/aws-sdk-net/blob/3cacef5654bc17169f7b2e8f5fd86b92fbbd2243/sdk/src/Core/Amazon.Util/AWSSDKUtils.cs#L941-L953

And after set the environment variable as you suggest, there is still exception (but rarely than previous).
I attached logs containing request information.

efaultInstanceProfileAWSCredentials 13559|2018-07-19T07:25:59.131Z|ERROR|Failed to retrieve credentials from EC2 Instance Metadata Service. --> System.Threading.Tasks.TaskCanceledException: A task was canceled.
at System.IO.Stream.CopyToAsyncInternal(Stream destination, Int32 bufferSize, CancellationToken cancellationToken)
at System.Net.Http.HttpClient.GetStringAsyncCore(Task`1 getTask)
at Amazon.Runtime.Internal.Util.AsyncHelpers.<>c__DisplayClass1_1`1.<<RunSync>b__0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at Amazon.Runtime.Internal.Util.AsyncHelpers.ExclusiveSynchronizationContext.BeginMessageLoop()
at Amazon.Runtime.Internal.Util.AsyncHelpers.RunSync[T](Func`1 task)
at Amazon.Util.AWSSDKUtils.DownloadStringContent(Uri uri, TimeSpan timeout)
at Amazon.Util.EC2InstanceMetadata.GetItems(String relativeOrAbsolutePath, Int32 tries, Boolean slurp)
at Amazon.Util.EC2InstanceMetadata.get_IAMSecurityCredentials()
at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.RenewCredentials(Object unused)
AmazonSQSClient 13560|2018-07-19T07:26:14.118Z|DEBUG|Received response (truncated to 1024 bytes): [<?xml version="1.0"?><ReceiveMessageResponse xmlns="http://queue.amazonaws.com/doc/2012-11-05/"><ReceiveMessageResult/><ResponseMetadata><RequestId>21d0c202-d6a1-5bd2-b89b-91c8e9e36a52</RequestId></ResponseMetadata></ReceiveMessageResponse>]
AmazonSQSClient 13561|2018-07-19T07:26:14.119Z|INFO|Request metrics:
{
    "properties": {
        "AsyncCall": "True",
        "CanonicalRequest": "{secret}",
        "StringToSign": "{secret}",
        "ServiceName": "Amazon.SQS",
        "ServiceEndpoint": "https://sqs.ap-northeast-1.amazonaws.com/",
        "MethodName": "ReceiveMessageRequest",
        "RequestSize": "102",
        "StatusCode": "OK",
        "BytesProcessed": "240",
        "AWSRequestID": "21d0c202-d6a1-5bd2-b89b-91c8e9e36a52"
    },
    "timings": {
        "CredentialsRequestTime": 0.0088,
        "RequestSigningTime": 0.5013000000000001,
        "HttpRequestTime": 20007.033900000002,
        "ResponseUnmarshallTime": 0.33140000000000003,
        "ResponseProcessingTime": 0.5935,
        "ClientExecuteTime": 20008.325500000003
    },
    "counters": {}
}
AmazonSQSClient 13562|2018-07-19T07:26:14.136Z|ERROR|AmazonServiceException making request ReceiveMessageRequest to https://sqs.ap-northeast-1.amazonaws.com/. Attempt 1. --> Amazon.Runtime.AmazonServiceException: Failed to retrieve credentials from EC2 Instance Metadata Service.
at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.GetCredentials()
at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.GetCredentialsAsync()
at Amazon.Runtime.Internal.CredentialsRetriever.InvokeAsync[T](IExecutionContext executionContext)
at Amazon.Runtime.Internal.RetryHandler.InvokeAsync[T](IExecutionContext executionContext)
AmazonSQSClient 13563|2018-07-19T07:26:14.137Z|INFO|Request metrics:
{
    "properties": {
        "AsyncCall": "True"
    },
    "timings": {
        "CredentialsRequestTime": 0.7505000000000001,
        "ClientExecuteTime": 13.6751
    },
    "counters": {}
}
AmazonSQSClient 13564|2018-07-19T07:26:14.150Z|ERROR|AmazonServiceException making request ReceiveMessageRequest to https://sqs.ap-northeast-1.amazonaws.com/. Attempt 1. --> Amazon.Runtime.AmazonServiceException: Failed to retrieve credentials from EC2 Instance Metadata Service.
at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.GetCredentials()
at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.GetCredentialsAsync()
at Amazon.Runtime.Internal.CredentialsRetriever.InvokeAsync[T](IExecutionContext executionContext)
at Amazon.Runtime.Internal.RetryHandler.InvokeAsync[T](IExecutionContext executionContext)
AmazonSQSClient 13565|2018-07-19T07:26:14.151Z|INFO|Request metrics:
{
    "properties": {
        "AsyncCall": "True"
    },
    "timings": {
        "CredentialsRequestTime": 0.0601,
        "ClientExecuteTime": 0.8111
    },
    "counters": {}
}

also, looking at the implementation here

https://github.com/aws/aws-sdk-net/blob/master/sdk/src/Core/Amazon.Runtime/Credentials/_bcl+coreclr/DefaultInstanceProfileAWSCredentials.cs

Looks like even if there is a failure to get the credentials, null credentials are cached for 2 minutes. All subsequent calls fail with default FailedToGetCredentialsMessage. This is a major issue we are running into production without any seeming workaround.

Is there a workaround 馃 you can recommend till you look into this more?

setting the DOTNET_SYSTEM_NET_HTTP_USESOCKETSHTTPHANDLER didnt get rid of it for us.

With @arishlabroo saying, I think it should not set null but remain previous Instance.lastRetrievedCredentials when thrown TaskCanceledException.
Like it :

catch (TaskCanceledException e)
{
    Instance?.logger.Error(e, "RenewCredentials task canceled");
}
catch (Exception e)
{
    // we want to suppress any exceptions from this timer task.
    Instance?.logger.Error(e, FailedToGetCredentialsMessage);

    Instance.refreshLock.TryEnterWriteLock(lockWaitTimeOut);
    {
        Instance.lastRetrievedCredentials = null;
    }
    Instance.refreshLock.ExitWriteLock();
}

We have a web api on .NET core 2.1 and publishing messages to SNS. We have seen that same error call-stack in our logs:
Amazon.Runtime.AmazonServiceException: Failed to retrieve credentials from EC2 Instance Metadata Service. at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.GetCredentials() at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.GetCredentialsAsync()

We have the problem seen at the same time that sometimes a machine's CPU goes up to 100%, requests fail, and about half the time it recovers, and half the time the resulting healthcheck failure causes it to be terminated. This severe issue means that we are not rolling out .NET core into production any further.

Since something is going wrong in the class DefaultInstanceProfileAWSCredentials, we therefore raised this PR for discussion: #1010

My questions related to this class are:

  • Why do you use Instance?.logger.Error() on line 155? Instance?. - with a null check - is not done anywhere else. if Instance is null, it will fail in other places, including the line that immediately follows. I don't think that Instance can be null anywhere in RenewCredentials().

  • The Instance getter uses a lock to make sure that the _instance is not null. But the lock is acquired every time the property is accessed. This appears to be an ideal candidate for double-checked locking to reduce the amount of locking here drastically.

  • I would expect Instance.lastRetrievedCredentials = null on line 159 to come before the logging that preceeds it, since getting into the desired state is more important and faster than the logging. If logging takes a while or blocks, then other threads have more chance to access the state that you have decided is unwanted.

  • As pointed out by @arishlabroo, is it necessary to set the creds to null at all? Is keeping the previous creds not a better option?

  • Are you aware of the issues with spinlocks under load? This looks a bit like the CPU spikes that we see. 5 seconds (line 42) is a very long time to wait on a cpu-intensive spinlock.

  • Where the Spinlock is used, Is any lock necessary at all? It is around assignments and reads. .NET reference assignment is an atomic operation anyway. There is no halfway state that other threads need to be locked away from, inside a statement of the form a = b;. I _think_ this is also true of statements like a.b.c = d;.

If you're worried about things changing during your slightly more complex read expression, then you can do
var mine = lastRetrievedCredentials; // atomic assign credentials = mine?.Copy();

But if the IL for that is the same as before, then there is no need for even a temp var.

  • Have you considered not using Instance in RenewCredentials() since the Instance getter can potentially call RenewCredentials() again. There may be no actual potential for recursion, but it gives me a bad feeling to call the property that called you, to essentially get this with repeated checks and locks. You can use _instance, or even just use this implicitly, which should be the same object as Instance. If they're not the same not, you have other issues.

The way that EC2InstanceMetadata.IAMSecurityCredentials is sync but actually also async inside AsyncHelpers is a deeper issue that requires a more comprehensive fix, out of scope today.

Not to hijack this issue, but I'm getting a really similar exception though it sounds like in a different environment (.NET 4.7.1, ECS and SSM not SQS).

We're running a Windows IIS docker image in ECS, running our ASP.NET 4.7.1 web application. AWSSDK is latest available in nuget. We're getting this with the SimpleSystemsManagement Client and not SQS, but this error seems to be happening in AWSSDK.Core so I thought it may be related.

I've verified that we have all of the ECS Task IAM Role stuff configured correctly, I see the AWS_CONTAINER_CREDENTIALS_RELATIVE_URI environment variable set, if I docker exec to the container I'm able to do an Invoke-WebRequest "http://169.254.170.2$($env:AWS_CONTAINER_CREDENTIALS_RELATIVE_URI)" -UseBasicParsing and see what looks like a valid metadata document. I've restarted the task a couple times to try different ECS hosts. Nothing seems to work here.

Sorry if this turns out to be a different/unrelated issue, but I thought it might be helpful for tracking this issue down since they seem so similar.

Stack Trace:

2018-07-31 03:32:18.8118 ERROR [R= C= I=] An exception occurred that was not caught by the controller
EXCEPTION: System.InvalidOperationException: Failed to retrieve credentials from EC2 Instance Metadata Service. (See inner exception for details.) ---> Amazon.Runtime.AmazonServiceException: Failed to retrieve credentials from EC2 Instance Metadata Service.
   at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.GetCredentials()
   at Amazon.Runtime.Internal.CredentialsRetriever.PreInvoke(IExecutionContext executionContext)
   at Amazon.Runtime.Internal.CredentialsRetriever.InvokeSync(IExecutionContext executionContext)
   at Amazon.Runtime.Internal.RetryHandler.InvokeSync(IExecutionContext executionContext)
   at Amazon.Runtime.Internal.CallbackHandler.InvokeSync(IExecutionContext executionContext)
   at Amazon.Runtime.Internal.CallbackHandler.InvokeSync(IExecutionContext executionContext)
   at Amazon.Runtime.Internal.ErrorCallbackHandler.InvokeSync(IExecutionContext executionContext)
   at Amazon.Runtime.Internal.MetricsHandler.InvokeSync(IExecutionContext executionContext)
   at Amazon.Runtime.Internal.RuntimePipeline.InvokeSync(IExecutionContext executionContext)
   at Amazon.Runtime.AmazonServiceClient.Invoke[TRequest,TResponse](TRequest request, IMarshaller`2 marshaller, ResponseUnmarshaller unmarshaller)
   at Amazon.SimpleSystemsManagement.AmazonSimpleSystemsManagementClient.GetParameter(GetParameterRequest request)
   at Name.Changed.To.Protect.The.Innocent.AmazonParameterStoreConfigurationProvider.RetrieveValue(String name) in C:\w\Source\Path\Changed\Too\AmazonParameterStoreConfigurationProvider.cs:line 27
   at lambda_method(Closure , Object[] )
   at Autofac.Core.Activators.Reflection.ConstructorParameterBinding.Instantiate()
   --- End of inner exception stack trace ---
   at Autofac.Core.Activators.Reflection.ConstructorParameterBinding.Instantiate()
   at Autofac.Core.Activators.Reflection.ReflectionActivator.ActivateInstance(IComponentContext context, IEnumerable`1 parameters)
   at Autofac.Core.Resolving.InstanceLookup.Activate(IEnumerable`1 parameters)
   --- End of inner exception stack trace ---
   at Autofac.Core.Resolving.InstanceLookup.Activate(IEnumerable`1 parameters)
   at Autofac.Core.Lifetime.LifetimeScope.GetOrCreateAndShare(Guid id, Func`1 creator)
   at Autofac.Core.Resolving.InstanceLookup.Execute()
   at Autofac.Core.Resolving.ResolveOperation.GetOrCreateInstance(ISharingLifetimeScope currentOperationScope, IComponentRegistration registration, IEnumerable`1 parameters)
   at Autofac.Core.Activators.Reflection.ConstructorParameterBinding.Instantiate()
   at Autofac.Core.Activators.Reflection.ReflectionActivator.ActivateInstance(IComponentContext context, IEnumerable`1 parameters)
   at Autofac.Core.Resolving.InstanceLookup.Activate(IEnumerable`1 parameters)
   --- End of inner exception stack trace ---
   at Autofac.Core.Resolving.InstanceLookup.Activate(IEnumerable`1 parameters)
   at Autofac.Core.Resolving.InstanceLookup.Execute()
   at Autofac.Core.Resolving.ResolveOperation.GetOrCreateInstance(ISharingLifetimeScope currentOperationScope, IComponentRegistration registration, IEnumerable`1 parameters)
   at Autofac.Core.Resolving.ResolveOperation.Execute(IComponentRegistration registration, IEnumerable`1 parameters)
   at Autofac.ResolutionExtensions.TryResolveService(IComponentContext context, Service service, IEnumerable`1 parameters, Object& instance)
   at Autofac.ResolutionExtensions.ResolveOptionalService(IComponentContext context, Service service, IEnumerable`1 parameters)
   at Autofac.Integration.Mvc.AutofacDependencyResolver.GetService(Type serviceType)
   at System.Web.Mvc.DefaultControllerFactory.DefaultControllerActivator.Create(RequestContext requestContext, Type controllerType)
   --- End of inner exception stack trace ---
   at System.Web.Mvc.DefaultControllerFactory.DefaultControllerActivator.Create(RequestContext requestContext, Type controllerType)
   at System.Web.Mvc.MvcHandler.ProcessRequestInit(HttpContextBase httpContext, IController& controller, IControllerFactory& factory)
   at System.Web.Mvc.MvcHandler.BeginProcessRequest(HttpContextBase httpContext, AsyncCallback callback, Object state)
   at System.Web.HttpApplication.CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
   at System.Web.HttpApplication.ExecuteStepImpl(IExecutionStep step)
   at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously)

AWS SDK Log:

Amazon Error: 0 : AmazonServiceException making request GetParameterRequest to https://ssm.us-east-1.amazonaws.com/. Attempt 1., Amazon.Runtime.AmazonServiceException: Failed to retrieve credentials from EC2 Instance Metadata Service.
   at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.GetCredentials()
   at Amazon.Runtime.Internal.CredentialsRetriever.PreInvoke(IExecutionContext executionContext)
   at Amazon.Runtime.Internal.CredentialsRetriever.InvokeSync(IExecutionContext executionContext)
   at Amazon.Runtime.Internal.RetryHandler.InvokeSync(IExecutionContext executionContext)
Amazon Error: 1 : AmazonServiceException making request GetParameterRequest to https://ssm.us-east-1.amazonaws.com/. Attempt 1., Amazon.Runtime.AmazonServiceException: Failed to retrieve credentials from EC2 Instance Metadata Service.
   at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.GetCredentials()
   at Amazon.Runtime.Internal.CredentialsRetriever.PreInvoke(IExecutionContext executionContext)
   at Amazon.Runtime.Internal.CredentialsRetriever.InvokeSync(IExecutionContext executionContext)
   at Amazon.Runtime.Internal.RetryHandler.InvokeSync(IExecutionContext executionContext)
Amazon Error: 2 : AmazonServiceException making request GetParameterRequest to https://ssm.us-east-1.amazonaws.com/. Attempt 1., Amazon.Runtime.AmazonServiceException: Failed to retrieve credentials from EC2 Instance Metadata Service.
   at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.GetCredentials()
   at Amazon.Runtime.Internal.CredentialsRetriever.PreInvoke(IExecutionContext executionContext)
   at Amazon.Runtime.Internal.CredentialsRetriever.InvokeSync(IExecutionContext executionContext)
   at Amazon.Runtime.Internal.RetryHandler.InvokeSync(IExecutionContext executionContext)
Amazon Error: 1 : Unable to contact EC2 Metadata service., System.Net.WebException: The operation has timed out
   at System.Net.HttpWebRequest.GetResponse()
   at Amazon.Util.AWSSDKUtils.DownloadStringContent(Uri uri, TimeSpan timeout)
   at Amazon.Util.EC2InstanceMetadata.GetItems(String relativeOrAbsolutePath, Int32 tries, Boolean slurp)
Amazon Error: 1 : Failed to retrieve credentials from EC2 Instance Metadata Service., System.NullReferenceException: Object reference not set to an instance of an object.
   at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.RenewCredentials(Object unused)

In order to characterise these GetCredentials() issues as related or not, I would ask:

  • Do they occur at startup or later?
  • Every time a certain circumstance happens, or intermittently?
  • Does it happen more at peak request volume, or under light load?
  • What does the machine CPU level do at the same time? Was it low before, did it increase?
  • We see it on all requests we make for a parameter. We've got some caching (5m) for values but we never cache the value since we get the exception
  • every time
  • The EC2 instances are t2.large, very light load, plenty of memory free
  • always
  • CPU usage hits 30-40% while our app initializes, then sits at ~3% usage

I think I'm going to spin up a sample web app that does nothing but get an SSM parameter and spit out some diagnostic info... I may even build against a one-off build of the AWS SDK with some additional logging thrown in there.

If you don't think the issues are related I can open a new issue for tracking them independently.

That sounds quite different to what I am seeing. What i am seeing is

  • intermittent, after the machines have been running for hours / days
  • more likely to happen under load, but not predictable
  • CPU spikes from the baseline around 3%, up to 100%

We found our issue, it's a quirky behavior of the IIS docker image:

  • Docker container environment variables are set as process variables for the entrypoint (not machine environment variables)
  • servicemonitor.exe sets/propagates the process-level environment variables to IIS but only for the default app pool
  • we were using custom app-pools so the docker-injected environment variables weren't set
  • AWS SDK looks for the AWS_CONTAINER_CREDENTIALS_RELATIVE_URI environment variable, uses the ECS Task IAM Role/Metadata behavior if its set
  • Since it wasn't set, the AWS SDK was using the default 169.254.169.254 address to try and retrieve metadata

I'm clearing out of this thread but wanted to document this a bit here. Thanks for the help/discussion.

@actumn how are you resolving the credentials in your container? Have you tried setting AWS_EC2_METADATA_DISABLED to true if you are not relying on IMDS?

@sstevenkang @actumn is my colleague. We only retrieve credential from IMDS. So if I set AWS_EC2_METADATA_DISABLED true, sqs client can't find credential, then throwing exception.

I am a little confused because it looks according to the report, you are running this application in a container not in an EC2 instance. Is it hitting IMDS on the host EC2 instance that's running the docker daemon?

@sstevenkang Yes. I didn't set ECS Task Role and AWS_CONTAINER_CREDENTIALS_RELATIVE_URI.

Below code is working for me.
AppContext.SetSwitch("System.Net.Http.UseSocketsHttpHandler", false)

Please refer to below document:
https://docs.microsoft.com/en-us/dotnet/core/whats-new/dotnet-core-2-1

The .NetCore used is v2.1.2.

@LOB-JimD That code is same with setting DOTNET_SYSTEM_NET_HTTP_USESOCKETSHTTPHANDLER false.
After #1010 merged and aws core sdk version updated, the problem has been solved.

Thanks @Reasonably for the update. Look forward to use it.

@Reasonably would this fix have specifically applied to the SQS Client? I am seeing the same issue when updating from core 2.0 to core 2.1 with the PutObjectAsync call. Setting DOTNET_SYSTEM_NET_HTTP_USESOCKETSHTTPHANDLER to 0 solves this issue.

@Reasonably also tested #1010, it does not fix the issue. Using the IWebHostBuilder.UseLibuv() to work around this issue.
https://github.com/aspnet/Announcements/issues/296

@LOB-JimD #1010 is for that the credential don't set to null when retrieving credential has failed.
Before #1010 , when retrieving credentials failed, the client set credentials to null and so many error occurred and cpu spiked during caching time.
I think it(http request fail) can happened sometimes. But it is not critical because of some retry logic.

P.S. I'm so sorry about my terrible english skill

@Reasonably
Have you tried updating .NETCore to version 2.1.402 and Microsoft.AspNetCore.All to version 2.1.4 ?
It fixes this issue: https://github.com/aspnet/KestrelHttpServer/issues/2694
And in our experience so far, we're not seeing CPU spikes afterwards.

The issue has gotten to confusing and I'm sure what is the state of things. I'm going to close this issue. If anybody is still experience the feel free to create a new issue.

For anyone encountering this error on a windows EC2 instance, try navigate to http://169.254.169.254/latest/meta-data/ from the server. If the page is unreachable then the SSM agent has not been initialized, this can happen if you customize the base image

I had this issue on asp.net core 2.1 targeted to framework net462 via the IIS hosting module, and it turned out to be some proxy configuration that I couldn't easily see, though presented itself as a nullreferenceexception initially.

After getting AWS SDK logging working, I was clearly able to see it was trying to access the EC2 Instance Metadata Service at 127.0.0.1:8888, which made no sense given IIS configuration.

The workaround was to disable proxies globally in code by setting this to null: https://docs.microsoft.com/en-us/dotnet/api/system.net.webrequest.defaultwebproxy?view=netframework-4.7.2#System_Net_WebRequest_DefaultWebProxy

Was this page helpful?
0 / 5 - 0 ratings