Microsoft-authentication-library-for-dotnet: [Enhancement] When IConfidentialClientApplication is used as a singleton service, the cache size grows

Created on 6 Aug 2020  路  12Comments  路  Source: AzureAD/microsoft-authentication-library-for-dotnet

Which Version of MSAL are you using ?
4.17.1

What authentication flow has the issue?

  • Desktop / Mobile

    • [ ] Interactive

    • [ ] Integrated Windows Auth

    • [ ] Username Password

    • [ ] Device code flow (browserless)

  • Web App

    • [ ] Authorization code

    • [ ] OBO

  • Web API

    • [x ] OBO

Repro
See the repro with Ramjot (offline)
See Product Backlog Item 1103111: Performance investigation of token cache in confidential client applications when singleton which has all the details and a repro to a solution

Expected behavior

  • One cache per token hash should not grow.
  • In services, avoid calling the cache accessors twice in the case of a new token if the cache is not shared ...

Actual behavior
o(users)

@bgavrilMS @jennyf19 @henrik-me

Fixed P2 enhancement performance

Most helpful comment

Given the issue originally reported was a usability issue, I don't think we should do anything immediately to address this. My larger question is though if we should invest in making a better caching api for confidential client applications. the current one has a lot of overhead and complexities.

All 12 comments

Investigation for the token cache accessors:

The 4 access in the case of a new token are:
1) Try to look into the cache based on the token from the Web API (from its hash really).
2) it's a new token so does not find it.
-- gets the token using On behalf of flow
-- lock the cache for modification
3) read the cache the cache key is still the hash, but now we know the identity of the user as we got it with the token for the downstream API. We re-read the cache as it could have been modified (by another app) since 1)
4) write the cache with the new token
-- unlock the cache for modification

Could it be possible, in case we know a service does not share the cache, to avoid 3) re-reading the cache @bgavrilMS ?

It should possible to have a cache strategy where an app has exclusive access over the cache. I believe MSAL.py used to have this strategy by default. It

  • loads the cache when the app starts
  • persists the cache when the app stops or crashes

Problem is, if you scale your web app to use multiple servers, how can you guarantee one server has exclusive control over a set of users? You cannot.

I think a better approach is to use a cache with 2 layers, smth fast and small (in memory) and smth slower but with bigger capacity (Redis).

@RamjotSingh prioritizing this one.
cc: @bgavrilMS @henrik-me @trwalke @neha-bhargava
cc: @aiwangmicrosoft FYI

@bgavrilMS ADAL has that sweet spot though right? IIRC It assumes it has exclusive access to cache and that cache overwrites are fine since 2 tokens from AzureAD in the same duration are going to largely be identical. Hence my recommendation, remove the cache locking and unlocking. The steps become

  1. Check if cache has value
  2. If cache has value, return, otherwise go to AzureAD and get a new token
  3. Store the token in cache

Can we have similar behavior in MSAL?

@bgavrilMS @jmprieur : not really sure what we should do here. Are we only thinking about locking? or ware we also thinking about L1/L2 caching? Additionally are we considering options that prevents writing the entire cache (imo: that seems to be a better option especially for services)? Perhaps it should be an opt. in.

Also marking this as an enhancement rather than a bug. It's by current design, even though it's not a great design.

Let's explore a better design for the cache.

Don't we want to try to honor the "don't reload" flag?

Needs more detail on how we want to handle this.

I synced w/ @jmprieur on this, and I tried on the desktop app, making sure just one PCA was used, I signed in w/three users and was not able to repro it. I also tried w/the customer provided code and could not repro it there either.
image
so, i would say this has either been fixed or is a result of not using the cache correctly.

The issue talks about CCA and not PCA, not sure if that makes a real difference? Also only code changes we have had around cache afaik would be the latest performance improvement.

@henrik-me. I synced with Jenny and she tried with CCA too (on the customer code).

What was the issue

Looking more at it, what happened when I created this issue is:

  • in the customer sample is a new token was generated for users each time OBO was called,
  • and therefore the number of tokens in the cache was growing because the customer had used a cache key which was the user-id not the hash of the token.

The number of users was not growing (5). This behavior is the expected one (but I didn't know at that time) as the key used by MSAL for OBO is the token hash, so new incoming tokens produce new OBO tokens, and at that time we did not have the SuggestedCacheKey property in the TokenCacheNotificationArgs, nor guidance on what to do and even MIcrosoft.Identity.Web was not doing the right thing.

It does not repro any longer and here is why

Now that the SuggestedCacheKey is right for each case (that is is the token hash for OBO), the size of each token cache does not grow any longer. It's one user+AT+RT. Now of course the number caches grows.

What we could improve

Now there might be something that we could do: help deleting the caches (one cache per incoming token in the case of OBO), when the OBO token expires (as we cannot know what's the expiry of the incoming token to the Web API). I'm not sure we have the logic for this in MSAL today, we rely on the serialization to manage the expiry, but it does not know the optimum value. Therefore if we passed-in in the TokenCacheNotificationArgs a cache expiry, this could help the serialization layers removing the caches.

Follow-up

proposing to close this long due issue as non-repro (any longer), and to raise a different issue to investigate what we could do to have a better synergy between MSAL expiry and the serialization. BTW this is a topic which interest our customers a lot: we had several issues in Microsoft.Identity.Web about that. See for instance Question - Recommended expiration setting #786

Given the issue originally reported was a usability issue, I don't think we should do anything immediately to address this. My larger question is though if we should invest in making a better caching api for confidential client applications. the current one has a lot of overhead and complexities.

Was this page helpful?
0 / 5 - 0 ratings