Hi.
When I updated dotnet core (2.0.6 -> 2.1) and deployed on server, we met some error about credentials in AmazonSQSClient.
I think it is from dotnet core 2.1 HttpClientHandler update.
So I rollback dotnet core version (2.1 -> 2.0.6), then there is no exception anymore.
Whenever we call Amazon.SQS.AmazonSQSClient.ReceiveMessageAsync(ReceiveMessageRequest request, CancellationToken cancellationToken = default(CancellationToken))
, there should be no exception.
The function throws exception very frequently(3 minutes in 6 hour average) like below.
Amazon.Runtime.AmazonServiceException: Failed to retrieve credentials from EC2 Instance Metadata Service.
at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.GetCredentials()
at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.GetCredentialsAsync()
at Amazon.Runtime.Internal.CredentialsRetriever.InvokeAsync[T](IExecutionContext executionContext)
at Amazon.Runtime.Internal.RetryHandler.InvokeAsync[T](IExecutionContext executionContext)
at Amazon.Runtime.Internal.RetryHandler.InvokeAsync[T](IExecutionContext executionContext)
at Amazon.Runtime.Internal.CallbackHandler.InvokeAsync[T](IExecutionContext executionContext)
at Amazon.Runtime.Internal.CallbackHandler.InvokeAsync[T](IExecutionContext executionContext)
at Amazon.Runtime.Internal.ErrorCallbackHandler.InvokeAsync[T](IExecutionContext executionContext)
at Amazon.Runtime.Internal.MetricsHandler.InvokeAsync[T](IExecutionContext executionContext)
I want to get the network performance provided by dotnet core 2.1, with AWS features.
dotnet --info
: Host (useful for support):
Version: 2.1.2
Commit: 811c3ce6c0
.NET Core SDKs installed:
No SDKs were found.
.NET Core runtimes installed:
Microsoft.NETCore.App 2.1.2 [/usr/share/dotnet/shared/Microsoft.NETCore.App]
Interesting, could you try running on .NET Core 2.1 again and set the environment variable DOTNET_SYSTEM_NET_HTTP_USESOCKETSHTTPHANDLER
equal to 0
. This is a .NET Core environment variable that you can use to toggle back to the libcurl implementation of HttpClient. I'm really curious if you see a change in behavior.
These are what I found on AWS SDK Internal Log when AmazonSQSClient threw Exception without the env DOTNET_SYSTEM_NET_HTTP_USESOCKETSHTTPHANDLER
setting
DefaultInstanceProfileAWSCredentials 5390|2018-07-19T03:33:08.174Z|ERROR|Failed to retrieve credentials from EC2 Instance Metadata Service. --> System.Threading.Tasks.TaskCanceledException: The operation was canceled. ---> System.IO.IOException: Unable to read data from the transport connection: Operation canceled. ---> System.Net.Sockets.SocketException: Operation canceled
--- End of inner exception stack trace ---
at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error)
at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.GetResult(Int16 token)
at System.Net.Http.HttpConnection.FillAsync()
at System.Net.Http.HttpConnection.ReadNextResponseHeaderLineAsync(Boolean foldedHeadersAllowed)
at System.Net.Http.HttpConnection.SendAsyncCore(HttpRequestMessage request, CancellationToken cancellationToken)
--- End of inner exception stack trace ---
at System.Net.Http.HttpConnection.SendAsyncCore(HttpRequestMessage request, CancellationToken cancellationToken)
at System.Net.Http.HttpConnectionPool.SendWithRetryAsync(HttpRequestMessage request, Boolean doRequestAuth, CancellationToken cancellationToken)
at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
at System.Net.Http.HttpClient.FinishSendAsyncUnbuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts)
at System.Net.Http.HttpClient.GetStringAsyncCore(Task`1 getTask)
at Amazon.Runtime.Internal.Util.AsyncHelpers.<>c__DisplayClass1_1`1.<<RunSync>b__0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at Amazon.Runtime.Internal.Util.AsyncHelpers.ExclusiveSynchronizationContext.BeginMessageLoop()
at Amazon.Runtime.Internal.Util.AsyncHelpers.RunSync[T](Func`1 task)
at Amazon.Util.AWSSDKUtils.DownloadStringContent(Uri uri, TimeSpan timeout)
at Amazon.Util.EC2InstanceMetadata.GetItems(String relativeOrAbsolutePath, Int32 tries, Boolean slurp)
at Amazon.Util.EC2InstanceMetadata.get_IAMSecurityCredentials()
at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.RenewCredentials(Object unused)
AmazonSQSClient 5393|2018-07-19T03:33:10.306Z|ERROR|AmazonServiceException making request ReceiveMessageRequest to https://sqs.ap-northeast-1.amazonaws.com/. Attempt 1. --> Amazon.Runtime.AmazonServiceException: Failed to retrieve credentials from EC2 Instance Metadata Service.
at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.GetCredentials()
at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.GetCredentialsAsync()
at Amazon.Runtime.Internal.CredentialsRetriever.InvokeAsync[T](IExecutionContext executionContext)
at Amazon.Runtime.Internal.RetryHandler.InvokeAsync[T](IExecutionContext executionContext)
DefaultInstanceProfileAWSCredentials 11507|2018-07-19T04:41:17.431Z|ERROR|Failed to retrieve credentials from EC2 Instance Metadata Service. --> System.OperationCanceledException: The operation was canceled.
at System.Net.Http.HttpClient.HandleFinishSendAsyncError(Exception e, CancellationTokenSource cts)
at System.Net.Http.HttpClient.FinishSendAsyncUnbuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts)
at System.Net.Http.HttpClient.GetStringAsyncCore(Task`1 getTask)
at Amazon.Runtime.Internal.Util.AsyncHelpers.<>c__DisplayClass1_1`1.<<RunSync>b__0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at Amazon.Runtime.Internal.Util.AsyncHelpers.ExclusiveSynchronizationContext.BeginMessageLoop()
at Amazon.Runtime.Internal.Util.AsyncHelpers.RunSync[T](Func`1 task)
at Amazon.Util.AWSSDKUtils.DownloadStringContent(Uri uri, TimeSpan timeout)
at Amazon.Util.EC2InstanceMetadata.GetItems(String relativeOrAbsolutePath, Int32 tries, Boolean slurp)
at Amazon.Util.EC2InstanceMetadata.get_IAMSecurityCredentials()
at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.RenewCredentials(Object unused)
AmazonSQSClient 11510|2018-07-19T04:41:18.111Z|ERROR|AmazonServiceException making request ReceiveMessageRequest to https://sqs.ap-northeast-1.amazonaws.com/. Attempt 1. --> Amazon.Runtime.AmazonServiceException: Failed to retrieve credentials from EC2 Instance Metadata Service.
at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.GetCredentials()
at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.GetCredentialsAsync()
at Amazon.Runtime.Internal.CredentialsRetriever.InvokeAsync[T](IExecutionContext executionContext)
at Amazon.Runtime.Internal.RetryHandler.InvokeAsync[T](IExecutionContext executionContext)
As we can see, the exception come from here: https://github.com/aws/aws-sdk-net/blob/3cacef5654bc17169f7b2e8f5fd86b92fbbd2243/sdk/src/Core/Amazon.Util/AWSSDKUtils.cs#L941-L953
And after set the environment variable as you suggest, there is still exception (but rarely than previous).
I attached logs containing request information.
efaultInstanceProfileAWSCredentials 13559|2018-07-19T07:25:59.131Z|ERROR|Failed to retrieve credentials from EC2 Instance Metadata Service. --> System.Threading.Tasks.TaskCanceledException: A task was canceled.
at System.IO.Stream.CopyToAsyncInternal(Stream destination, Int32 bufferSize, CancellationToken cancellationToken)
at System.Net.Http.HttpClient.GetStringAsyncCore(Task`1 getTask)
at Amazon.Runtime.Internal.Util.AsyncHelpers.<>c__DisplayClass1_1`1.<<RunSync>b__0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at Amazon.Runtime.Internal.Util.AsyncHelpers.ExclusiveSynchronizationContext.BeginMessageLoop()
at Amazon.Runtime.Internal.Util.AsyncHelpers.RunSync[T](Func`1 task)
at Amazon.Util.AWSSDKUtils.DownloadStringContent(Uri uri, TimeSpan timeout)
at Amazon.Util.EC2InstanceMetadata.GetItems(String relativeOrAbsolutePath, Int32 tries, Boolean slurp)
at Amazon.Util.EC2InstanceMetadata.get_IAMSecurityCredentials()
at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.RenewCredentials(Object unused)
AmazonSQSClient 13560|2018-07-19T07:26:14.118Z|DEBUG|Received response (truncated to 1024 bytes): [<?xml version="1.0"?><ReceiveMessageResponse xmlns="http://queue.amazonaws.com/doc/2012-11-05/"><ReceiveMessageResult/><ResponseMetadata><RequestId>21d0c202-d6a1-5bd2-b89b-91c8e9e36a52</RequestId></ResponseMetadata></ReceiveMessageResponse>]
AmazonSQSClient 13561|2018-07-19T07:26:14.119Z|INFO|Request metrics:
{
"properties": {
"AsyncCall": "True",
"CanonicalRequest": "{secret}",
"StringToSign": "{secret}",
"ServiceName": "Amazon.SQS",
"ServiceEndpoint": "https://sqs.ap-northeast-1.amazonaws.com/",
"MethodName": "ReceiveMessageRequest",
"RequestSize": "102",
"StatusCode": "OK",
"BytesProcessed": "240",
"AWSRequestID": "21d0c202-d6a1-5bd2-b89b-91c8e9e36a52"
},
"timings": {
"CredentialsRequestTime": 0.0088,
"RequestSigningTime": 0.5013000000000001,
"HttpRequestTime": 20007.033900000002,
"ResponseUnmarshallTime": 0.33140000000000003,
"ResponseProcessingTime": 0.5935,
"ClientExecuteTime": 20008.325500000003
},
"counters": {}
}
AmazonSQSClient 13562|2018-07-19T07:26:14.136Z|ERROR|AmazonServiceException making request ReceiveMessageRequest to https://sqs.ap-northeast-1.amazonaws.com/. Attempt 1. --> Amazon.Runtime.AmazonServiceException: Failed to retrieve credentials from EC2 Instance Metadata Service.
at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.GetCredentials()
at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.GetCredentialsAsync()
at Amazon.Runtime.Internal.CredentialsRetriever.InvokeAsync[T](IExecutionContext executionContext)
at Amazon.Runtime.Internal.RetryHandler.InvokeAsync[T](IExecutionContext executionContext)
AmazonSQSClient 13563|2018-07-19T07:26:14.137Z|INFO|Request metrics:
{
"properties": {
"AsyncCall": "True"
},
"timings": {
"CredentialsRequestTime": 0.7505000000000001,
"ClientExecuteTime": 13.6751
},
"counters": {}
}
AmazonSQSClient 13564|2018-07-19T07:26:14.150Z|ERROR|AmazonServiceException making request ReceiveMessageRequest to https://sqs.ap-northeast-1.amazonaws.com/. Attempt 1. --> Amazon.Runtime.AmazonServiceException: Failed to retrieve credentials from EC2 Instance Metadata Service.
at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.GetCredentials()
at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.GetCredentialsAsync()
at Amazon.Runtime.Internal.CredentialsRetriever.InvokeAsync[T](IExecutionContext executionContext)
at Amazon.Runtime.Internal.RetryHandler.InvokeAsync[T](IExecutionContext executionContext)
AmazonSQSClient 13565|2018-07-19T07:26:14.151Z|INFO|Request metrics:
{
"properties": {
"AsyncCall": "True"
},
"timings": {
"CredentialsRequestTime": 0.0601,
"ClientExecuteTime": 0.8111
},
"counters": {}
}
also, looking at the implementation here
Looks like even if there is a failure to get the credentials, null credentials are cached for 2 minutes. All subsequent calls fail with default FailedToGetCredentialsMessage
. This is a major issue we are running into production without any seeming workaround.
Is there a workaround 馃 you can recommend till you look into this more?
setting the DOTNET_SYSTEM_NET_HTTP_USESOCKETSHTTPHANDLER
didnt get rid of it for us.
With @arishlabroo saying, I think it should not set null but remain previous Instance.lastRetrievedCredentials
when thrown TaskCanceledException
.
Like it :
catch (TaskCanceledException e)
{
Instance?.logger.Error(e, "RenewCredentials task canceled");
}
catch (Exception e)
{
// we want to suppress any exceptions from this timer task.
Instance?.logger.Error(e, FailedToGetCredentialsMessage);
Instance.refreshLock.TryEnterWriteLock(lockWaitTimeOut);
{
Instance.lastRetrievedCredentials = null;
}
Instance.refreshLock.ExitWriteLock();
}
We have a web api on .NET core 2.1 and publishing messages to SNS. We have seen that same error call-stack in our logs:
Amazon.Runtime.AmazonServiceException: Failed to retrieve credentials from EC2 Instance Metadata Service.
at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.GetCredentials()
at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.GetCredentialsAsync()
We have the problem seen at the same time that sometimes a machine's CPU goes up to 100%, requests fail, and about half the time it recovers, and half the time the resulting healthcheck failure causes it to be terminated. This severe issue means that we are not rolling out .NET core into production any further.
Since something is going wrong in the class DefaultInstanceProfileAWSCredentials
, we therefore raised this PR for discussion: #1010
My questions related to this class are:
Why do you use Instance?.logger.Error()
on line 155? Instance?.
- with a null check - is not done anywhere else. if Instance
is null, it will fail in other places, including the line that immediately follows. I don't think that Instance can be null anywhere in RenewCredentials()
.
The Instance
getter uses a lock to make sure that the _instance
is not null. But the lock is acquired every time the property is accessed. This appears to be an ideal candidate for double-checked locking to reduce the amount of locking here drastically.
I would expect Instance.lastRetrievedCredentials = null
on line 159 to come before the logging that preceeds it, since getting into the desired state is more important and faster than the logging. If logging takes a while or blocks, then other threads have more chance to access the state that you have decided is unwanted.
As pointed out by @arishlabroo, is it necessary to set the creds to null
at all? Is keeping the previous creds not a better option?
Are you aware of the issues with spinlocks under load? This looks a bit like the CPU spikes that we see. 5 seconds (line 42) is a very long time to wait on a cpu-intensive spinlock.
Where the Spinlock is used, Is any lock necessary at all? It is around assignments and reads. .NET reference assignment is an atomic operation anyway. There is no halfway state that other threads need to be locked away from, inside a statement of the form a = b;
. I _think_ this is also true of statements like a.b.c = d;
.
If you're worried about things changing during your slightly more complex read expression, then you can do
var mine = lastRetrievedCredentials; // atomic assign
credentials = mine?.Copy();
But if the IL for that is the same as before, then there is no need for even a temp var.
Instance
in RenewCredentials()
since the Instance
getter can potentially call RenewCredentials()
again. There may be no actual potential for recursion, but it gives me a bad feeling to call the property that called you, to essentially get this
with repeated checks and locks. You can use _instance
, or even just use this
implicitly, which should be the same object as Instance
. If they're not the same not, you have other issues.The way that EC2InstanceMetadata.IAMSecurityCredentials
is sync but actually also async inside AsyncHelpers
is a deeper issue that requires a more comprehensive fix, out of scope today.
Not to hijack this issue, but I'm getting a really similar exception though it sounds like in a different environment (.NET 4.7.1, ECS and SSM not SQS).
We're running a Windows IIS docker image in ECS, running our ASP.NET 4.7.1 web application. AWSSDK is latest available in nuget. We're getting this with the SimpleSystemsManagement Client and not SQS, but this error seems to be happening in AWSSDK.Core so I thought it may be related.
I've verified that we have all of the ECS Task IAM Role stuff configured correctly, I see the AWS_CONTAINER_CREDENTIALS_RELATIVE_URI
environment variable set, if I docker exec to the container I'm able to do an Invoke-WebRequest "http://169.254.170.2$($env:AWS_CONTAINER_CREDENTIALS_RELATIVE_URI)" -UseBasicParsing
and see what looks like a valid metadata document. I've restarted the task a couple times to try different ECS hosts. Nothing seems to work here.
Sorry if this turns out to be a different/unrelated issue, but I thought it might be helpful for tracking this issue down since they seem so similar.
Stack Trace:
2018-07-31 03:32:18.8118 ERROR [R= C= I=] An exception occurred that was not caught by the controller
EXCEPTION: System.InvalidOperationException: Failed to retrieve credentials from EC2 Instance Metadata Service. (See inner exception for details.) ---> Amazon.Runtime.AmazonServiceException: Failed to retrieve credentials from EC2 Instance Metadata Service.
at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.GetCredentials()
at Amazon.Runtime.Internal.CredentialsRetriever.PreInvoke(IExecutionContext executionContext)
at Amazon.Runtime.Internal.CredentialsRetriever.InvokeSync(IExecutionContext executionContext)
at Amazon.Runtime.Internal.RetryHandler.InvokeSync(IExecutionContext executionContext)
at Amazon.Runtime.Internal.CallbackHandler.InvokeSync(IExecutionContext executionContext)
at Amazon.Runtime.Internal.CallbackHandler.InvokeSync(IExecutionContext executionContext)
at Amazon.Runtime.Internal.ErrorCallbackHandler.InvokeSync(IExecutionContext executionContext)
at Amazon.Runtime.Internal.MetricsHandler.InvokeSync(IExecutionContext executionContext)
at Amazon.Runtime.Internal.RuntimePipeline.InvokeSync(IExecutionContext executionContext)
at Amazon.Runtime.AmazonServiceClient.Invoke[TRequest,TResponse](TRequest request, IMarshaller`2 marshaller, ResponseUnmarshaller unmarshaller)
at Amazon.SimpleSystemsManagement.AmazonSimpleSystemsManagementClient.GetParameter(GetParameterRequest request)
at Name.Changed.To.Protect.The.Innocent.AmazonParameterStoreConfigurationProvider.RetrieveValue(String name) in C:\w\Source\Path\Changed\Too\AmazonParameterStoreConfigurationProvider.cs:line 27
at lambda_method(Closure , Object[] )
at Autofac.Core.Activators.Reflection.ConstructorParameterBinding.Instantiate()
--- End of inner exception stack trace ---
at Autofac.Core.Activators.Reflection.ConstructorParameterBinding.Instantiate()
at Autofac.Core.Activators.Reflection.ReflectionActivator.ActivateInstance(IComponentContext context, IEnumerable`1 parameters)
at Autofac.Core.Resolving.InstanceLookup.Activate(IEnumerable`1 parameters)
--- End of inner exception stack trace ---
at Autofac.Core.Resolving.InstanceLookup.Activate(IEnumerable`1 parameters)
at Autofac.Core.Lifetime.LifetimeScope.GetOrCreateAndShare(Guid id, Func`1 creator)
at Autofac.Core.Resolving.InstanceLookup.Execute()
at Autofac.Core.Resolving.ResolveOperation.GetOrCreateInstance(ISharingLifetimeScope currentOperationScope, IComponentRegistration registration, IEnumerable`1 parameters)
at Autofac.Core.Activators.Reflection.ConstructorParameterBinding.Instantiate()
at Autofac.Core.Activators.Reflection.ReflectionActivator.ActivateInstance(IComponentContext context, IEnumerable`1 parameters)
at Autofac.Core.Resolving.InstanceLookup.Activate(IEnumerable`1 parameters)
--- End of inner exception stack trace ---
at Autofac.Core.Resolving.InstanceLookup.Activate(IEnumerable`1 parameters)
at Autofac.Core.Resolving.InstanceLookup.Execute()
at Autofac.Core.Resolving.ResolveOperation.GetOrCreateInstance(ISharingLifetimeScope currentOperationScope, IComponentRegistration registration, IEnumerable`1 parameters)
at Autofac.Core.Resolving.ResolveOperation.Execute(IComponentRegistration registration, IEnumerable`1 parameters)
at Autofac.ResolutionExtensions.TryResolveService(IComponentContext context, Service service, IEnumerable`1 parameters, Object& instance)
at Autofac.ResolutionExtensions.ResolveOptionalService(IComponentContext context, Service service, IEnumerable`1 parameters)
at Autofac.Integration.Mvc.AutofacDependencyResolver.GetService(Type serviceType)
at System.Web.Mvc.DefaultControllerFactory.DefaultControllerActivator.Create(RequestContext requestContext, Type controllerType)
--- End of inner exception stack trace ---
at System.Web.Mvc.DefaultControllerFactory.DefaultControllerActivator.Create(RequestContext requestContext, Type controllerType)
at System.Web.Mvc.MvcHandler.ProcessRequestInit(HttpContextBase httpContext, IController& controller, IControllerFactory& factory)
at System.Web.Mvc.MvcHandler.BeginProcessRequest(HttpContextBase httpContext, AsyncCallback callback, Object state)
at System.Web.HttpApplication.CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
at System.Web.HttpApplication.ExecuteStepImpl(IExecutionStep step)
at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously)
AWS SDK Log:
Amazon Error: 0 : AmazonServiceException making request GetParameterRequest to https://ssm.us-east-1.amazonaws.com/. Attempt 1., Amazon.Runtime.AmazonServiceException: Failed to retrieve credentials from EC2 Instance Metadata Service.
at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.GetCredentials()
at Amazon.Runtime.Internal.CredentialsRetriever.PreInvoke(IExecutionContext executionContext)
at Amazon.Runtime.Internal.CredentialsRetriever.InvokeSync(IExecutionContext executionContext)
at Amazon.Runtime.Internal.RetryHandler.InvokeSync(IExecutionContext executionContext)
Amazon Error: 1 : AmazonServiceException making request GetParameterRequest to https://ssm.us-east-1.amazonaws.com/. Attempt 1., Amazon.Runtime.AmazonServiceException: Failed to retrieve credentials from EC2 Instance Metadata Service.
at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.GetCredentials()
at Amazon.Runtime.Internal.CredentialsRetriever.PreInvoke(IExecutionContext executionContext)
at Amazon.Runtime.Internal.CredentialsRetriever.InvokeSync(IExecutionContext executionContext)
at Amazon.Runtime.Internal.RetryHandler.InvokeSync(IExecutionContext executionContext)
Amazon Error: 2 : AmazonServiceException making request GetParameterRequest to https://ssm.us-east-1.amazonaws.com/. Attempt 1., Amazon.Runtime.AmazonServiceException: Failed to retrieve credentials from EC2 Instance Metadata Service.
at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.GetCredentials()
at Amazon.Runtime.Internal.CredentialsRetriever.PreInvoke(IExecutionContext executionContext)
at Amazon.Runtime.Internal.CredentialsRetriever.InvokeSync(IExecutionContext executionContext)
at Amazon.Runtime.Internal.RetryHandler.InvokeSync(IExecutionContext executionContext)
Amazon Error: 1 : Unable to contact EC2 Metadata service., System.Net.WebException: The operation has timed out
at System.Net.HttpWebRequest.GetResponse()
at Amazon.Util.AWSSDKUtils.DownloadStringContent(Uri uri, TimeSpan timeout)
at Amazon.Util.EC2InstanceMetadata.GetItems(String relativeOrAbsolutePath, Int32 tries, Boolean slurp)
Amazon Error: 1 : Failed to retrieve credentials from EC2 Instance Metadata Service., System.NullReferenceException: Object reference not set to an instance of an object.
at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.RenewCredentials(Object unused)
In order to characterise these GetCredentials()
issues as related or not, I would ask:
I think I'm going to spin up a sample web app that does nothing but get an SSM parameter and spit out some diagnostic info... I may even build against a one-off build of the AWS SDK with some additional logging thrown in there.
If you don't think the issues are related I can open a new issue for tracking them independently.
That sounds quite different to what I am seeing. What i am seeing is
We found our issue, it's a quirky behavior of the IIS docker image:
AWS_CONTAINER_CREDENTIALS_RELATIVE_URI
environment variable, uses the ECS Task IAM Role/Metadata behavior if its setI'm clearing out of this thread but wanted to document this a bit here. Thanks for the help/discussion.
@actumn how are you resolving the credentials in your container? Have you tried setting AWS_EC2_METADATA_DISABLED
to true
if you are not relying on IMDS?
@sstevenkang @actumn is my colleague. We only retrieve credential from IMDS. So if I set AWS_EC2_METADATA_DISABLED true, sqs client can't find credential, then throwing exception.
I am a little confused because it looks according to the report, you are running this application in a container not in an EC2 instance. Is it hitting IMDS on the host EC2 instance that's running the docker daemon?
@sstevenkang Yes. I didn't set ECS Task Role and AWS_CONTAINER_CREDENTIALS_RELATIVE_URI.
Below code is working for me.
AppContext.SetSwitch("System.Net.Http.UseSocketsHttpHandler", false)
Please refer to below document:
https://docs.microsoft.com/en-us/dotnet/core/whats-new/dotnet-core-2-1
The .NetCore used is v2.1.2.
@LOB-JimD That code is same with setting DOTNET_SYSTEM_NET_HTTP_USESOCKETSHTTPHANDLER false.
After #1010 merged and aws core sdk version updated, the problem has been solved.
Thanks @Reasonably for the update. Look forward to use it.
@Reasonably would this fix have specifically applied to the SQS Client? I am seeing the same issue when updating from core 2.0 to core 2.1 with the PutObjectAsync
call. Setting DOTNET_SYSTEM_NET_HTTP_USESOCKETSHTTPHANDLER
to 0 solves this issue.
@Reasonably also tested #1010, it does not fix the issue. Using the IWebHostBuilder.UseLibuv() to work around this issue.
https://github.com/aspnet/Announcements/issues/296
@LOB-JimD #1010 is for that the credential don't set to null when retrieving credential has failed.
Before #1010 , when retrieving credentials failed, the client set credentials to null and so many error occurred and cpu spiked during caching time.
I think it(http request fail) can happened sometimes. But it is not critical because of some retry logic.
P.S. I'm so sorry about my terrible english skill
@Reasonably
Have you tried updating .NETCore to version 2.1.402
and Microsoft.AspNetCore.All
to version 2.1.4
?
It fixes this issue: https://github.com/aspnet/KestrelHttpServer/issues/2694
And in our experience so far, we're not seeing CPU spikes afterwards.
The issue has gotten to confusing and I'm sure what is the state of things. I'm going to close this issue. If anybody is still experience the feel free to create a new issue.
For anyone encountering this error on a windows EC2 instance, try navigate to http://169.254.169.254/latest/meta-data/ from the server. If the page is unreachable then the SSM agent has not been initialized, this can happen if you customize the base image
I had this issue on asp.net core 2.1 targeted to framework net462 via the IIS hosting module, and it turned out to be some proxy configuration that I couldn't easily see, though presented itself as a nullreferenceexception initially.
After getting AWS SDK logging working, I was clearly able to see it was trying to access the EC2 Instance Metadata Service at 127.0.0.1:8888, which made no sense given IIS configuration.
The workaround was to disable proxies globally in code by setting this to null: https://docs.microsoft.com/en-us/dotnet/api/system.net.webrequest.defaultwebproxy?view=netframework-4.7.2#System_Net_WebRequest_DefaultWebProxy
Most helpful comment
We have a web api on .NET core 2.1 and publishing messages to SNS. We have seen that same error call-stack in our logs:
Amazon.Runtime.AmazonServiceException: Failed to retrieve credentials from EC2 Instance Metadata Service. at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.GetCredentials() at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.GetCredentialsAsync()
We have the problem seen at the same time that sometimes a machine's CPU goes up to 100%, requests fail, and about half the time it recovers, and half the time the resulting healthcheck failure causes it to be terminated. This severe issue means that we are not rolling out .NET core into production any further.
Since something is going wrong in the class
DefaultInstanceProfileAWSCredentials
, we therefore raised this PR for discussion: #1010My questions related to this class are:
Why do you use
Instance?.logger.Error()
on line 155?Instance?.
- with a null check - is not done anywhere else. ifInstance
is null, it will fail in other places, including the line that immediately follows. I don't think that Instance can be null anywhere inRenewCredentials()
.The
Instance
getter uses a lock to make sure that the_instance
is not null. But the lock is acquired every time the property is accessed. This appears to be an ideal candidate for double-checked locking to reduce the amount of locking here drastically.I would expect
Instance.lastRetrievedCredentials = null
on line 159 to come before the logging that preceeds it, since getting into the desired state is more important and faster than the logging. If logging takes a while or blocks, then other threads have more chance to access the state that you have decided is unwanted.As pointed out by @arishlabroo, is it necessary to set the creds to
null
at all? Is keeping the previous creds not a better option?Are you aware of the issues with spinlocks under load? This looks a bit like the CPU spikes that we see. 5 seconds (line 42) is a very long time to wait on a cpu-intensive spinlock.
Where the Spinlock is used, Is any lock necessary at all? It is around assignments and reads. .NET reference assignment is an atomic operation anyway. There is no halfway state that other threads need to be locked away from, inside a statement of the form
a = b;
. I _think_ this is also true of statements likea.b.c = d;
.If you're worried about things changing during your slightly more complex read expression, then you can do
var mine = lastRetrievedCredentials; // atomic assign credentials = mine?.Copy();
But if the IL for that is the same as before, then there is no need for even a temp var.
Instance
inRenewCredentials()
since theInstance
getter can potentially callRenewCredentials()
again. There may be no actual potential for recursion, but it gives me a bad feeling to call the property that called you, to essentially getthis
with repeated checks and locks. You can use_instance
, or even just usethis
implicitly, which should be the same object asInstance
. If they're not the same not, you have other issues.The way that
EC2InstanceMetadata.IAMSecurityCredentials
is sync but actually also async insideAsyncHelpers
is a deeper issue that requires a more comprehensive fix, out of scope today.