Given the sequencing of actions that occur during the "RuntimePipeline" for the AmazonServiceClient, it is possible to use expired credentials when making an AWS API call over HTTP that requires retries.
For us, this issue was uncovered because of an outbound firewall issue which caused the first few requests to 500 (triggering an SDK retry). The cumulative length of those retries was longer than AssumeRoleAWSCredentials PreemptExpireyTime which caused our token within the AssumeRoleAWSCredentials to be expired when making the call to send a message to our SQS queue (which bubbles up as a 403).
When a calls to SendMessageAsync occurs on the AmazonSQSClient, GetCredentials should be invoked as part of each retry to the actual HTTP request (i.e. each POST to https://sqs.us-west-2.amazonaws.com/accountId/queueName should at the very least verify that the token is still good not expired before making the call).
When a calls to SendMessageAsync occurs on the AmazonSQSClient, GetCredentials is invoked 1 time before retrying API calls. (e.g. POST to https://sqs.us-west-2.amazonaws.com/accountId/queueName).
One solution would be to periodically check the current time against the expirey time asynchronously and update if expired. This wouldn't be bullet proof because there'd still be a chance that the asynchronous check wouldn't occur at the right itme
Another solution would be to just call GetCredentials() on the AWSCredentials closer to the time when the API request is made. This would be a little more involved and perhaps still problematic (there will still potentially be a period of a few milliseconds or microseconds where the token is expired).
Here's the code which reproduces the issue. You'll need to use fiddler to force a 502 for the first 4 calls (initial + 3 retries -- to the send message http request url looks like this - https://sqs.us-west-2.amazonaws.com/accountId/queueName) and then disable it before the 5th and final retry occurs.
using System;
using System.Threading;
using Amazon;
using Amazon.Runtime;
using Amazon.Runtime.CredentialManagement;
using Amazon.SQS;
using Amazon.SQS.Model;
namespace ConsoleApp2
{
class Program
{
static void Main(string[] args)
{
string awsProfileName = "aws profile name";
var chain = new CredentialProfileStoreChain();
if (!chain.TryGetAWSCredentials(awsProfileName, out AWSCredentials sourceCredentials))
{
throw new InvalidOperationException($"AWS credentials profile '{awsProfileName}' not found.");
}
var apiCreds = new AssumeRoleAWSCredentials(sourceCredentials, "role arn", "role session name")
{
// this says, "expire the token 59 minutes before the default of one hour (so ~1 minute after the current time)"
PreemptExpiryTime = TimeSpan.FromMinutes(59)
};
// wait for expiration time to get close
Thread.Sleep(TimeSpan.FromSeconds(45));
using (var client = new AmazonSQSClient(apiCreds, new AmazonSQSConfig
{
RegionEndpoint = RegionEndpoint.USWest2,
}))
{
var request = new SendMessageRequest
{
MessageBody = "foo",
QueueUrl = "queue url"
};
client.SendMessageAsync(request, CancellationToken.None).Wait(); // discard response
}
}
}
}
It results in a failure to perform business logic that bubbles back to the user (sure, we could build in some retries for the actual send message call, but that seems clunky and risky). We have tweaked our outbound firewall so that it whitelists all amazon IPs at the moment. Really, we shouldn't need to do that (and we won't, once all the application hosts are running in aws), but this issue could resurface when we are fully running in aws.
I believe a fix was just made for this in the SDK. I'm told the retries are now getting credentials. We're still working on fixing it for PowerShell.
Please let us know if you have further issues.
Hi @klaytaybai can you tell me what version of the AWSSDK fixes this? Specifically your mention of "retries are now getting credentials".
Hi @spadfield, thank you for following up on this. I should have verified the change. I think I was either misinformed or there was a miscommunication. I'll do some more follow-up on this tomorrow and probably re-open it then.
@klaytaybai any updates on this? you mentioned following up a week ago and we still haven't heard back.
Hi @ardove and @spadfield, it does appear to still be an unresolved issue. Due to the nature of the changes needed to improve this issue, I think approvals may take some time. I or someone else from the team will let you know if a fix comes out.
Any ETA on this issue? It seems, this is long pending. We are also facing same issue.
Any updates on this? We are experiencing this issue regularly.
+1 would love to see a fix soon
This bug is over 9 months old and we haven't heard from you in 5 months. @klaytaybai any updates on whether or not this on someone's radar? Are there any plans to include it in the .NET AWS SDK at any point in the future?
I believe we have identified the problem. Credentials are being fetched for every retry in the "RuntimePipeline" but only new credentials were being retrieved from STS (Amazon Security Token Service) once the current credentials have expired. I suspect if the credentials were a few milliseconds from being expired but did expired by the time the request got to the service you would get the error reported here.
I have made a commit https://github.com/aws/aws-sdk-net/commit/34eaad45f7b481e124e6c1f77469e79322ac8df6 to the fix-assume-role-credentials-refresh branch. We are working on testing the fix now.
Awesome. Thanks @normj, please let us know what version to find the fix in so that we can update our package references!
Quick update. I have been running the change through some endurance testing which has been successful. I'm getting others on the team to approve the change and then we will get the fix out soon.
Version 3.3.102.4 of AWSSDK.Core has been released which contains the fix. All other service packages like AWSSDK.SQS have also been released bumping up their required version of Core to 3.3.102.4. So if you get the latest version of AWSSDK.SQS you should pull in the latest Core.
Most helpful comment
This bug is over 9 months old and we haven't heard from you in 5 months. @klaytaybai any updates on whether or not this on someone's radar? Are there any plans to include it in the .NET AWS SDK at any point in the future?