Sp-dev-docs: Increasing REST API Throttling errors in SharePoint Online

Created on 22 Jul 2019  Â·  38Comments  Â·  Source: SharePoint/sp-dev-docs

Category

  • [ ] Question
  • [ ] Typo
  • [X] Bug
  • [ ] Additional article idea

Expected or Desired Behavior

Before July 17 2019 we got 429 errors here and there, but their ratio was not significant and allowed our production servers to work as expected.

Observed Behavior

Starting July 17 2019, we see 10 times more 429 exceptions on our Sharepoint Online REST API calls, which makes our production to slow down operations 10 times.

Steps to Reproduce

  • Which tenant has the issue?

All tenants that we accessing (hundreds, across different regions over the world).

  • When did the issue happen, so that we can check right log entries?

Since around July 17 2019.

  • What was your code performing when this happened?

GET /sites/SlavaTest/Restore-20190615140814/_api/web?$expand=webs,lists,AllProperties,ThemeInfo&$select=*,webs/Url,lists/Id,lists/Title,lists/Description,lists/BaseType,lists/BaseTemplate,lists/Hidden,lists/Language,lists/ItemCount,lists/Created,lists/TemplateFeatureId,AllProperties/DesignPreviewThemedCssFolderUrl HTTP/1.1
Authorization: Bearer xxx
Accept: application/json; odata=verbose
Content-Type: application/json; odata=verbose
X-FORMS_BASED_AUTH_ACCEPTED: f
User-Agent: ISV|Cloud|Cloud/5.0
Connection: Keep-Alive
Accept-Encoding: gzip,deflate

Thanks

csorest throttling answered to-be-reviewed question

Most helpful comment

Not sure why I wasn't getting notifications on this thread, they probably got filtered somewhere deep in my inbox (I get LOTS of GitHub e-mail so it's difficult to keep them organized).

There are a couple different things mentioned at different points and I'll try to address them each...

1) In terms of how throttling is calculated there isn't really a difference between classic SharePoint REST and Microsoft Graph endpoints. Ultimately we get a unique identifier for an app and that gets a quota of resources (today based on CPU usage). When that quota is used up over a period of time then we send 429 responses indicating you need to wait before calling further.

2) In classic SharePoint it is possible to also be using a user identity which creates its own bucket of quota-able resources. It is absolutely NOT recommended that you use multiple user accounts to try and get more quota. If you're doing this and the server has load issues it is likely your app will be manually throttled more while the issue is going on to help the server recover.

3) As Andrew mentioned above, in SPFX all calls made using the provided contexts will share an App ID. This means that you're sharing one pool of resources for all WebParts in use across your enterprise. It's possible that the particular WebPart getting throttled isn't actually the one that initially used up the resources available for the pool. As also mentioned using an isolated WebPart will give your WebPart its own pool to pull from rather than the common one.

4) There is no "calls per X period of time" throttle in place, throttles are resource consumption based so you could make 2 calls and get throttled if the first one is expensive, or you could make a hundred over the same period of time if the calls are trivial. It's difficult to know exactly what the cost of any given API call is going to be but some things to avoid:
a) Large expands on large lists of data in a single call
b) Requesting permissions for multiple items at once

I'll also look again to see if there's anything obvious changing on our side but the last few times I've looked there haven't been any changes yet to how we throttle. Overall what % of traffic are you seeing get 429 responses?

All 38 comments

Thank you for reporting this issue. We will be triaging your incoming issue as soon as possible.

We are seeing similar issues using certain Graph endpoints, just the SharePoint related ones. Interestingly enough we don't get a Graph 429 error (we'll report this separately, it seems an odd response), but instead we get a 304 with a redirect to a /layouts/15/throttle.html. We thought at first it was something we did, as we did an update on the 17th, but even after changing some calls the issue persists. Perhaps this is related to the above issue?

image

@justdevelopment I think this is the same issue, just wrapped by different API, I think the issue is deeper. Interesting the they respond by redirect to web page, as in REST API we do see 429 TOO MANY REQUESTS and Retry-After header has 120. We're trying to work with different MS support to see if this indeed a bug, but so far we got answer : "No changes in our side, check your application", it's pretty standard MS response at the beginning of each ticket.

@slavag or anyone else; are you still seeing this issue? It seems to have decreased some, but we are still getting throttle errors on SPO objects. We're wondering if it's on our end, or on the SPO side.

@justdevelopment Yes, decreased but not significant , still see lot of 429, across man tenants across different regions. The issue is global, but support says no changes in policy, and they don't confirm any bug.

@patmill @VesaJuvonen Any chance you could shine a light on this? :) We don't see any increase in usage for these tenants, so we're unsure why we are suddenly hitting a throttle where we weren't before. I can provide tenant info via email.

I've reported the graph httpcode issue to stackoverflow for the graph team btw; https://stackoverflow.com/questions/57198452/microsoft-graph-returns-unexpected-http-code-on-sharepoint-throttling

@slavag could you provide request IDs andthe timestamps for calls you're seeing being throttled? We'll need that information to look anything up internally.

@justdevelopment it looks like your screenshot includes IDs, are you making the calls from a SharePoint Framework web part or some other application?

Hi @JeremyKelley it's a SPFx webpart. Now that you mention it I didn't hear anyone complain about our other application which does the same calls but through our own app registration. We will to look into this to verify.

@JeremyKelley cloud it be that issue is resolved? As we see real decrease in number of throttling responses. As for ID and timestamp, I'll monitor production servers and will post some once I have them. Thanks

Appears the issue is resolved & no longer happening?

BTW, those two "comment deletes" were for the incorrect person tagged in this duped comment & the person who was tagged responding.

Yes, seems that issue is fixed.
Thanks

The issue seemed to be gone friday but it's back in full force today. I'm going to take a guess here and say it was due to a lot less people on a friday in the summer ;)

I was too fast to close, checked in logs, the issue is back full force today.

@andrewconnell @justdevelopment seems that it's even worse that was last week.

@JeremyKelley
Request ids (from the same domain all):
fd0af59e-70e0-0000-3fcf-7f9b96fb662c
b00bf59e-60da-0000-41fa-40ed0ece3c29
300cf59e-a0f8-0000-4704-adfaac9941e7
570cf59e-50ba-0000-41fa-48c1d550bc7a

Also, it seems that not only OneDrive and Sharepoint are throttled, but 365 Exchange also.
Thanks

@JeremyKelley @andrewconnell is there any update ? As it hits us hard. Thanks.

@justdevelopment do you still see increased throttling ? Thanks

I don't (_& won't_) have anything to add... throttling is 100% in MSFT's hands so I need to let them comment. ¯_(ツ)_/¯

It seems to stay the same on our end, not a big increase but not a big decrease either.

We've been testing with the other applications we have, which do the same calls to the same tenant but on a different app registration (SharePoint app reg vs our own). The other app, outside of SharePoint, never hits a throttle even when the SharePoint one is full on throttled. Obviously not a perfect test, but it might help narrow it down.

Thanks Andrew :)

@justdevelopment Well, we're using REST API calls , not a graph API to many different tenants, but we see very high throttling (I know that graph API counts requests differently than REST API) and we're using few admin accounts to access many sharepoint sites within same tenant, but last 2 weeks we see 10 times more throttling than before.

@slavag what's "rest API"? Both the SharePoint & Graph API's are REST endpoints, so I'm not sure what you mean.

@justdevelopment Not sure if this helps, but the requests you're making to any endpoint (SharePoint / Graph endpoints) includes an access token that is tied to the AzureAD app that's been granted access to the endpoint. If other apps work but this SPFx component doesn't (as you mention here), maybe MSFT is using the AppID+TenantID as the unique comparison for the throttling.

The throttling limits are pretty high... a client-side app shouldn't be hitting them. Unless you're doing something unusual (IMHO at least). Keep in mind, in SPFx, anything that requests an auth token to an AzureAD secured resource (_including SPO's REST API & Graph's REST API_), share the same Azure AD app across their tenant. A way around this is to deploy the component as an isolated web part, which in that case it will get it's own Azure AD app.

Thinking more about it, how did you deploy this? Is it an isolated web part? If not, a good test would be to dupe the project & deploy as an isolated web part, grant the AzureAD app the same permission grant & test. If it works, then I bet this is the root of the issue. In that case, I'd argue this is "by design"

I guess the question is:

@JeremyKelley Does the throttling factor in just the tenant, or tenant+azureadapp as well?

@andrewconnell We're using Sharepoint REST API (not Graph API), as far as I know the throttling differences between them, main difference is how throttling is calculated in Sharepoint REST it's counted on the account that is doing access (for example we have an customer's admin account that we access using his credentials / OAuth) and in this cases if there large tenant and in parallel we access number of sharepoint sites / OneDrive accounts we get throttling. When we ask form customer multiple admin accounts to access with, for example 5 more, we see for that tennant that number of throttling responses decreased by 5 times.
As in Graph API the requests counted on sharepoint site / OneDrive account that accessing to and on account that accsing with, in this case here less chance for throttling.
Nevertheless, last 2 weeks the situation is somehow changed and we see 10 time more throttling.
Thanks

@slavag Very well could be they (MSFT) has modified the formula lately for the threshold when throttling kicks in. But what you're running into isn't too uncommon from what the migration vendors run into. You may just have to build around it and follow the guidance provided by MSFT and leverage the backoff value when you get 429's back.

Also, looking back at your query, you may try to optimize it a bit which may help. I see you're selecting a lot of fields with * and there's also an$expand... both of which are expensive operations. You could also look to leverage the $batch operation.

Marking this issue as answered as it seems like things are working as they should...

@andrewconnell Maybe they changed formula or maybe there's a bug. We did everything that MS suggested to be not throttled including backoff and even more. As for field * and $expand, if I'll no use that I need to do more API calls and this will lead to additional throttling response.
So, what I'm asking is that someone will say that this is policy change or opposite - there's a bug and we're working to fix. So status of this issue is not answered and Microsoft need to review and need Attention. Also we got a confirmation from MS Support that throttling formula and policy didn't change.
Thanks

We are doing a grand total of 8 read calls to these endpoints. We've been told in the past our solution should NOT hit a throttle limit. Now I understand this is something for MSFT to answer, but it was working fine before and now it's not, so something changed. If they did change the limits, and 8 calls is to many, I would like an official guideline on what would be acceptable regarding SPO calls.

Notice none of our other graph calls get throttled, we do a bunch of calls to other endpoints (definitely more then 8) which are working perfectly well. I understand the amount of users impacts the throttle limits, and this is a big tenant, but the only numbers we have been able to find we do not reach. This blog post mentions 'The limit is 10000 requests per 10-minute period, per user (or group), per app ID. '. Obviously this is about exchange and not SPO, but I'm making the assumption SharePoint limits would not be this much lower.

We can't test using isolated webparts, we are talking about a large production tenant with almost 2000 active users on this specific sitecollection. This is not a place I can use as playground to verify throttle limits. Our test and acceptation environments are not hitting the throttle, but they have significantly less users. I suppose I could try to emulate there, but that will get dangerously close to performance testing which is not allowed according to documentation.

I guess my point is; we would like some official word on throttle limits regarding specifically SPO API / Graph SPO endpoints.

I do really appreciate your help Andrew. We'll look into if we can replace some expensive calls, do you have any links to docs on which are more expensive and what alternatives might be?

@JeremyKelley is there any update on this issue ?

I have to say lack of any response here from Microsoft side is disappointing.
There's a real issue and someone need to provide answers, is it change in policy throttling or bug.
As 5 concurrent requests are starting to be throttled this doesn't looks good.
@JeremyKelley any news / answers / anything related this issue ?

thx @slavag for being persistent here and sorry for the delay on our side for processing the message. We are currently looking into this and seeing if there are anything specific we can help with and if the issue is not a tenant specific, rather potentially more widely impacted change.

Thanks @VesaJuvonen , the issue is not tenant specific, as we see this across thousands of tenants, across all our customer base.
Thanks

Not sure why I wasn't getting notifications on this thread, they probably got filtered somewhere deep in my inbox (I get LOTS of GitHub e-mail so it's difficult to keep them organized).

There are a couple different things mentioned at different points and I'll try to address them each...

1) In terms of how throttling is calculated there isn't really a difference between classic SharePoint REST and Microsoft Graph endpoints. Ultimately we get a unique identifier for an app and that gets a quota of resources (today based on CPU usage). When that quota is used up over a period of time then we send 429 responses indicating you need to wait before calling further.

2) In classic SharePoint it is possible to also be using a user identity which creates its own bucket of quota-able resources. It is absolutely NOT recommended that you use multiple user accounts to try and get more quota. If you're doing this and the server has load issues it is likely your app will be manually throttled more while the issue is going on to help the server recover.

3) As Andrew mentioned above, in SPFX all calls made using the provided contexts will share an App ID. This means that you're sharing one pool of resources for all WebParts in use across your enterprise. It's possible that the particular WebPart getting throttled isn't actually the one that initially used up the resources available for the pool. As also mentioned using an isolated WebPart will give your WebPart its own pool to pull from rather than the common one.

4) There is no "calls per X period of time" throttle in place, throttles are resource consumption based so you could make 2 calls and get throttled if the first one is expensive, or you could make a hundred over the same period of time if the calls are trivial. It's difficult to know exactly what the cost of any given API call is going to be but some things to avoid:
a) Large expands on large lists of data in a single call
b) Requesting permissions for multiple items at once

I'll also look again to see if there's anything obvious changing on our side but the last few times I've looked there haven't been any changes yet to how we throttle. Overall what % of traffic are you seeing get 429 responses?

@JeremyKelley Thanks for your answer, so actually throttling counted on our APP ID, good to know. Now the question : we serve thousands of domains with hundred of thousands of sharepoint sites, OneDrive sites, Teams sites and so on. We definitely doing lot of API calls per second (for different tenants) but all with the same application. It's much more than just one executing his app on his entity. So, what can we do to reduce throttling ? as number of customer is increasing, and it increase number of API calls, it means that soon we'll reach situation where we can't work at all ?

It's hard to say what % of traffic we throttled, but we see daily about 200k throttling responses. And this some how changed last month, as before it was not that bad at all, it's now 10 times more throttling but we didn't grew 10 time for last month in our number of API calls.

Thanks

@slavag When I re-wrote one of my comments above a few times I dropped an important detail. The throttling quota is at the app + tenant level so you get a bucket per-tenant. So you could have one tenant that is totally fine based on what you're doing but another busier tenant could be getting throttled. Conversely being high in tenant A won't affect tenant B.

200k per day across all tenants or in a single tenant?

If you could e-mail me your app ID that will let me look up some more data. You can send it to jeremyke AT microsoft.

@JeremyKelley Yes, this is important detail :).
200k is total for all tenants, not a single one.
I'll send app id soon, thanks for the assistance.

@JeremyKelley App ID is client id ? or it's something else ? If it's else do you know where I can find it ?

@slavag yes, app id is also referred to as the client id

@JeremyKelley Sent to your private email. Thanks

Closing issue as "answered". If you encounter similar issue(s), please open up a NEW issue. Thank you.

Issues that have been closed & had no follow-up activity for at least 7 days are automatically locked. Please refer to our wiki for more details, including how to remediate this action if you feel this was done prematurely or in error: Issue List: Our approach to locked issues

Was this page helpful?
0 / 5 - 0 ratings

Related issues

nanddeepn picture nanddeepn  Â·  3Comments

jonthenerd picture jonthenerd  Â·  3Comments

byrongits picture byrongits  Â·  3Comments

acksoft picture acksoft  Â·  3Comments

waldekmastykarz picture waldekmastykarz  Â·  3Comments