I believe I've discovered a race condition in how IdentityServer4 handles refresh_token authentication requests. I'm able to consistently reproduce this at my end.
http://localhost:4000Client with offline_access (refresh_token permissions). This website is running at http://localhost:4001access_token, refresh_token, and id_token) in the website visitors' cookie store using the HttpContext.SignInAsync extension method)access_token expiry remains the default at 1 hour (but I reduced this to 60 seconds to aid reproducibility). The RefreshTokenUsage == TokenUsage.OneTimeOnly so whenever a refresh-token is used it cannot be used again - otherwise you get an invalid_grant error response.PersisteredGrant objects - which increases the time required to complete a refresh-token authentication request.refresh_token, id_token, access_token). The access_token will expire 1 hour from now by default (the parent ASP.NET cookie expiration is longer than 1 hour, FWIW).access_token expires.access_token.access_token and refresh_token from the incoming request's cookies, both would determine that the access_token had expired, and both would then fire-off new authentication requests to the IS4 service with the same refresh_token.refresh_token) to have a single Task<TokenResponse> for multiple refresh_token refreshes, but this is only possible in single-process/single-instance client applications. In a web-farm scenario there is no synchronization between web-farm instances and this scenario is unavoidable.IPersistedGrantStore.GetAsync( String key) and RemoveAsync(String key) are called (from the first of the two requests). When the second request (often within milliseconds) reaches GetAsync its key (derived from the refresh_token value) will already have been removed, causing an invalid_grant error response to be sent back to the MVC application for the second request, even though the first request succeeded.SignOutAasync, causing the browser's cookies to be dropped when the second request completes - thus breaking the user-experience.The only mitigation I'm aware of that reliably works is to use Reusable refresh_token values - but I feel it should be possible to handle this situation using single-use refresh_token values by allowing a grace-period of at least a few seconds for a refresh_token to be used (but only in a way that multiple refresh-token authentication requests for the exact same refresh_token (and scope, client_id, etc) receive identical responses).
7\. This error response would cause an MVC application to typically trigger a `SignOutAasync`, causing the browser's cookies to be dropped when the second request completes - thus breaking the user-experience.
I this this assumption is why you think there's a problem. I don't think it has to work like that.
We have a sample for this:
I don't know if there are updates pending for that sample from @leastprivilege.
But the short of it is that if you have a one-time use refresh token, then it's the client's responsibility to deal with it.
Ah, the MvcHybridAutomaticRefresh example was added after I last updated my IdentityServer4.Samples repo which is why I didn't see it. I now see how you solved that problem in this example.
class AutomaticTokenManagementCookieEvents : CookieAuthenticationEvents
{
// [...]
public override async Task ValidatePrincipal(CookieValidatePrincipalContext context)
{
// [...]
if (dtRefresh < _clock.UtcNow)
{
var shouldRefresh = _pendingRefreshTokenRequests.TryAdd(refreshToken.Value, true);
if (shouldRefresh)
{
try
{
var response = await _service.RefreshTokenAsync(refreshToken.Value);
if (response.IsError)
{
_logger.LogWarning("Error refreshing token: {error}", response.Error);
return;
}
context.Properties.UpdateTokenValue("access_token", response.AccessToken);
context.Properties.UpdateTokenValue("refresh_token", response.RefreshToken);
var newExpiresAt = DateTime.UtcNow + TimeSpan.FromSeconds(response.ExpiresIn);
context.Properties.UpdateTokenValue("expires_at", newExpiresAt.ToString("o", CultureInfo.InvariantCulture));
await context.HttpContext.SignInAsync(context.Principal, context.Properties);
}
finally
{
_pendingRefreshTokenRequests.TryRemove(refreshToken.Value, out _);
}
}
}
}
If two requests enter this section of code simultaneously, then the second request will fail to add it to _pendingRefreshTokenRequests and shouldRefresh == false and control will leave the method and the HttpContext's AuthenticationProperties will still have the old value, so when the second request eventually proceeds to use a protected API it will use the old refresh_token (which would now be revoked and cause an error due to the first request already).
There might also be problems using this approach with AccesTokenDelegatingHandler - for example, if the MVC application were consuming an external API by using a .NET web-API client library that used AccesTokenDelegatingHandler to automatically refresh tokens too.
Secondarily, if the RefreshTokenAsync operation fails, how can the MVC website return an appropriate error message to the human user? Right now the code simply logs the error and leaves the function. The code could call context.RejectPrincipal which AFAIK would result in the user redirected to the login page without any explanation (and the user would retain their cookies, including ASP.NET's own authentication cookie, so wouldn't they get stuck in a redirect loop?).
Yes - that's why we start to refresh early - IIRC 60 seconds before the actual expiration (by default . but is configurable). The assumption is that the 2nd request can still use the current access token while the 1st one is refreshing.
If refreshing fails - this will ultimately result in an expired access token. The application layer has to deal with this problem. Which might be a new login.
Ahhhh. I was operating on the assumption that access_tokens were counter-signed by their parent refresh_token, so when a refresh_token is revoked all associated access_tokens were also revoked (assuming the access_token consumer checked for revocation).
Closing. If you still have issues, feel free to reopen.
I have a question for which this thread context is perfect, so please forgive me for commenting on a closed question (I don't want to reopen it just for this).
What happens to two requests when they come (in short succession) into ValidatePrincipal above _after_ the access token expired? The first one to arrive will trigger a refresh, but the second one will continue with the invalid access token (shouldRefresh will be false for that one because of a pending refresh).
Is the solution to have a check-session poller of some kind to ensure that we always refresh before access token expires, or there's a better approach?
@tymarats I added logging to see if that happens - and indeed, it does happen - albeit rarely (a few times per week in my web-application that gets a few thousand daily users).
It doesn't happen for _most people_ because most people's browsers requests a page/document first and nothing else - if they request a page with an expired token then it's renewed before the browser gets the document and then requests other URLs (like images, other pages, etc) - I assume it happens because people left their browsers open overnight or something).
I solved it by using a ConcurrentDictionary<String,Task<String>> where TKey = String is the raw token and Task<String> is a task that represents the concurrent (immediately prior) request for a new token - which are then awaited, so the secondary requests are blocked until the first request succeeds or fails.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Most helpful comment
@tymarats I added logging to see if that happens - and indeed, it does happen - albeit rarely (a few times per week in my web-application that gets a few thousand daily users).
It doesn't happen for _most people_ because most people's browsers requests a page/document first and nothing else - if they request a page with an expired token then it's renewed before the browser gets the document and then requests other URLs (like images, other pages, etc) - I assume it happens because people left their browsers open overnight or something).
I solved it by using a
ConcurrentDictionary<String,Task<String>>whereTKey = Stringis the raw token andTask<String>is a task that represents the concurrent (immediately prior) request for a new token - which are thenawaited, so the secondary requests are blocked until the first request succeeds or fails.