IdentityServer4 doesn't mitigate race-conditions in refresh_token handling

Created on 10 Feb 2019 · 8Comments · Source: IdentityServer/IdentityServer4

I believe I've discovered a race condition in how IdentityServer4 handles refresh_token authentication requests. I'm able to consistently reproduce this at my end.

Issue / Steps to reproduce the problem

Have this environent running, as follows:
- An ASP.NET Core web-application hosting IdentityServer4 instance running at http://localhost:4000
- Have another ASP.NET MVC web-application registered as a Client with offline_access (refresh_token permissions). This website is running at http://localhost:4001
- The ASP.NET MVC web-application is configured to store any granted tokens (access_token, refresh_token, and id_token) in the website visitors' cookie store using the HttpContext.SignInAsync extension method)
- The ASP.NET MVC web-application's access_token expiry remains the default at 1 hour (but I reduced this to 60 seconds to aid reproducibility). The RefreshTokenUsage == TokenUsage.OneTimeOnly so whenever a refresh-token is used it cannot be used again - otherwise you get an invalid_grant error response.
- The IdentityServer4 instance is configured to use a SQL Server table for storing PersisteredGrant objects - which increases the time required to complete a refresh-token authentication request.
Access the MVC web-application in a new Incognito browser window and be signed-in. Your browser's cookies will store (an encrypted version of) the three tokens (refresh_token, id_token, access_token). The access_token will expire 1 hour from now by default (the parent ASP.NET cookie expiration is longer than 1 hour, FWIW).
Wait 1 hour - or until the access_token expires.
In the same browser window as step 2, quickly open two new links in quick succession (i.e. by middle-clicking two links on the same page). This will fire-off two simultaneous requests to the MVC application using the same cookie state as immediately after step 2, so both requests will contain the expired access_token.
Code inside the MVC application will handle these two requests simultaneously and concurrently by taking the access_token and refresh_token from the incoming request's cookies, both would determine that the access_token had expired, and both would then fire-off new authentication requests to the IS4 service with the same refresh_token.
- I appreciate that the MVC application could try to prevent this double-request happening by using some kind of concurrent-dictionary (keyed on the refresh_token) to have a single Task<TokenResponse> for multiple refresh_token refreshes, but this is only possible in single-process/single-instance client applications. In a web-farm scenario there is no synchronization between web-farm instances and this scenario is unavoidable.
The IS4 service receives both requests and eventually one request will reach the point where IPersistedGrantStore.GetAsync( String key) and RemoveAsync(String key) are called (from the first of the two requests). When the second request (often within milliseconds) reaches GetAsync its key (derived from the refresh_token value) will already have been removed, causing an invalid_grant error response to be sent back to the MVC application for the second request, even though the first request succeeded.
This error response would cause an MVC application to typically trigger a SignOutAasync, causing the browser's cookies to be dropped when the second request completes - thus breaking the user-experience.

The only mitigation I'm aware of that reliably works is to use Reusable refresh_token values - but I feel it should be possible to handle this situation using single-use refresh_token values by allowing a grace-period of at least a few seconds for a refresh_token to be used (but only in a way that multiple refresh-token authentication requests for the exact same refresh_token (and scope, client_id, etc) receive identical responses).

question

Source

Jehoel

Most helpful comment

@tymarats I added logging to see if that happens - and indeed, it does happen - albeit rarely (a few times per week in my web-application that gets a few thousand daily users).

It doesn't happen for _most people_ because most people's browsers requests a page/document first and nothing else - if they request a page with an expired token then it's renewed before the browser gets the document and then requests other URLs (like images, other pages, etc) - I assume it happens because people left their browsers open overnight or something).

I solved it by using a ConcurrentDictionary<String,Task<String>> where TKey = String is the raw token and Task<String> is a task that represents the concurrent (immediately prior) request for a new token - which are then awaited, so the secondary requests are blocked until the first request succeeds or fails.

Jehoel on 23 Oct 2019

🎉1 👍1

All 8 comments

7\. This error response would cause an MVC application to typically trigger a `SignOutAasync`, causing the browser's cookies to be dropped when the second request completes - thus breaking the user-experience.

I this this assumption is why you think there's a problem. I don't think it has to work like that.

We have a sample for this:

https://github.com/IdentityServer/IdentityServer4.Samples/tree/master/Clients/src/MvcHybridAutomaticRefresh

I don't know if there are updates pending for that sample from @leastprivilege.

But the short of it is that if you have a one-time use refresh token, then it's the client's responsibility to deal with it.

brockallen on 10 Feb 2019

Ah, the MvcHybridAutomaticRefresh example was added after I last updated my IdentityServer4.Samples repo which is why I didn't see it. I now see how you solved that problem in this example.

class AutomaticTokenManagementCookieEvents : CookieAuthenticationEvents
{

    // [...]

    public override async Task ValidatePrincipal(CookieValidatePrincipalContext context)
    {
        // [...]

        if (dtRefresh < _clock.UtcNow)
        {
            var shouldRefresh = _pendingRefreshTokenRequests.TryAdd(refreshToken.Value, true);
            if (shouldRefresh)
            {
                try
                {
                    var response = await _service.RefreshTokenAsync(refreshToken.Value);

                    if (response.IsError)
                    {
                        _logger.LogWarning("Error refreshing token: {error}", response.Error);
                        return;
                    }

                    context.Properties.UpdateTokenValue("access_token", response.AccessToken);
                    context.Properties.UpdateTokenValue("refresh_token", response.RefreshToken);

                    var newExpiresAt = DateTime.UtcNow + TimeSpan.FromSeconds(response.ExpiresIn);
                    context.Properties.UpdateTokenValue("expires_at", newExpiresAt.ToString("o", CultureInfo.InvariantCulture));

                    await context.HttpContext.SignInAsync(context.Principal, context.Properties);
                }
                finally
                {
                    _pendingRefreshTokenRequests.TryRemove(refreshToken.Value, out _);
                }
          }
    }
}

If two requests enter this section of code simultaneously, then the second request will fail to add it to _pendingRefreshTokenRequests and shouldRefresh == false and control will leave the method and the HttpContext's AuthenticationProperties will still have the old value, so when the second request eventually proceeds to use a protected API it will use the old refresh_token (which would now be revoked and cause an error due to the first request already).

There might also be problems using this approach with AccesTokenDelegatingHandler - for example, if the MVC application were consuming an external API by using a .NET web-API client library that used AccesTokenDelegatingHandler to automatically refresh tokens too.

Secondarily, if the RefreshTokenAsync operation fails, how can the MVC website return an appropriate error message to the human user? Right now the code simply logs the error and leaves the function. The code could call context.RejectPrincipal which AFAIK would result in the user redirected to the login page without any explanation (and the user would retain their cookies, including ASP.NET's own authentication cookie, so wouldn't they get stuck in a redirect loop?).

Jehoel on 11 Feb 2019

Yes - that's why we start to refresh early - IIRC 60 seconds before the actual expiration (by default . but is configurable). The assumption is that the 2nd request can still use the current access token while the 1st one is refreshing.

If refreshing fails - this will ultimately result in an expired access token. The application layer has to deal with this problem. Which might be a new login.

leastprivilege on 11 Feb 2019

Ahhhh. I was operating on the assumption that access_tokens were counter-signed by their parent refresh_token, so when a refresh_token is revoked all associated access_tokens were also revoked (assuming the access_token consumer checked for revocation).

Jehoel on 11 Feb 2019

Closing. If you still have issues, feel free to reopen.

brockallen on 11 Feb 2019

I have a question for which this thread context is perfect, so please forgive me for commenting on a closed question (I don't want to reopen it just for this).

What happens to two requests when they come (in short succession) into ValidatePrincipal above _after_ the access token expired? The first one to arrive will trigger a refresh, but the second one will continue with the invalid access token (shouldRefresh will be false for that one because of a pending refresh).

Is the solution to have a check-session poller of some kind to ensure that we always refresh before access token expires, or there's a better approach?

tymarats on 22 Oct 2019

@tymarats I added logging to see if that happens - and indeed, it does happen - albeit rarely (a few times per week in my web-application that gets a few thousand daily users).

Jehoel on 23 Oct 2019

🎉1 👍1

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.