Sp-dev-docs: Throttling challenges in Office 365 (429)

Created on 10 Oct 2019 · 31Comments · Source: SharePoint/sp-dev-docs

Expected or Desired Behavior

Improved Throttling. I obviously won't expect throttling to go away, but it _feels_ like something has changed in the past couple of weeks.

Observed Behavior

I'm working in an Office 365 tenant of about 30k users.

It seems there may have been some changes to the throttling behaviors in Office 365. For the past few weeks, I have been working on a migration into Office 365. A few weeks ago, I migrated close to 1m files with just few issues.

This past week, I've been running reports on some of the document libraries in the environment in Office 365... and I get immediately throttled. I can't even run a report on a library with 1700 items in it, without being throttled...

I have more files that I need to migrate into this environment, which will be nearly impossible due to the current throttling issues.

Here is a Twitter thread with lots of comments from people who are also seeing increased throttling in the last couple of weeks. Seems to be a common issue.

https://twitter.com/Beau__Cameron/status/1181992491018403840?s=20

Have there been any changes to the throttling behaviors in Office365?

csorest to-be-reviewed question

Source

bcameron1231

👍2

Most helpful comment

This seems to be getting worse. I just copied 191 items with Sharegate from an on premises 2010 environment into a list in SPO. I've been locked out with a throttling message in the UI for over 5 minutes.

Note that I am not running any custom code here, just Sharegate and then trying to use the UI.

https://[tenant].sharepoint.com/_layouts/15/Throttle.htm#1033

Something's not right
The page you requested is temporarily unavailable. We apologize for the inconvenience, please check back in a few minutes.

sympmarc on 8 Dec 2019

👍3

All 31 comments

Thank you for reporting this issue. We will be triaging your incoming issue as soon as possible.

msft-github-bot on 10 Oct 2019

We are seeing the same thing have been throttled after opening a ticket with MS we found that the account being throttled was making only 4-5% of overall calls the support engineer considered this a low value. We had previously tested similar throughput a couple of months ago and did not get throttled. this is a bit of a moving target and better communication would be ideal.

hittoday on 10 Oct 2019

We are seeing increased throttling on Graph calls. We added some comments on an open ticket for this one already, but it is getting a little quiet there: https://github.com/SharePoint/sp-dev-docs/issues/4573

MRuimerman on 11 Oct 2019

We've had the same since end of August. We had hoped it was a temporary issue but seems to have got worse, so have raised issue with Microsoft. We've been getting '503 Server Unavailable' and not 429 Throttling responses however.

MikeBatchelor365 on 11 Oct 2019

Same here, some LOB apps (CSOM/Sharepoint REST) running since a few years are being throttled. Office 365 Tenant with 35k users.

Our scenario is mostly a background job running every 5 minutes and it is not implemented with decorated traffic, this was implemented years ago. Why I joined this thread is because it showed up now all of a sudden. It was before my time and I would’ve suggested other patterns if was introduced today, in fact I will ask them to make a change.

fthorild on 11 Oct 2019

Can all of you provide additional details on the exact scenario. What kind of operation was performed? Provisioning, migration, automation, Azure etc? This helps on understanding things on our side, so all details would be highly appreciated.

VesaJuvonen on 11 Oct 2019

Provisioning new sites

hittoday on 11 Oct 2019

Loading a page with webparts that do graph calls (only fetching data). We have a ticket open with Microsoft support for this.

MRuimerman on 14 Oct 2019

There are two resources everyone should have handy when talking about throttling just to make sure you're prepared for the case where throttling kicks in.

The first is https://aka.ms/spo429 this page will tell you everything you need to know about the SharePoint Online throttling mechanism.

The second resource is https://aka.ms/scanguidance. The scanning guidance will give you the best practices for creating high volume apps. In particular this gives the best guidance on how to process document libraries using the Delta query API.

Some other general points to remember:
1) There are two different throttle responses, 429 means your app has used up its available quota for calling the service and it needs to wait before more calls can be made. If you're processing libraries and getting 429's definitely check out the scan guidance for instructions. 503 means the server is under pressure and needs to return to normal before apps can call it. Generally if one app is getting 503s then other apps will be as well. 503 can clear on its own over time or may be an indication of broader issues. If you're getting continued 503s the tenant owner should call support for help.

2) If you're working with permissions definitely read the scan guidance for how to use Delta query to retrieve permission hierarchies. Other legacy methods are not going to be as optimized for this scenario and can easily result in throttling.

JeremyKelley on 14 Oct 2019

Thank you @JeremyKelley I have read those docs and they are helpful in understanding in how throttling works. I've played around with decorating traffic, and that seems okay for my use case on this ticket (generating a report from a list via csom).

What do we do in case of migrations? Most migrations involve using a Third Party software (ShareGate, Metalogix,AvePoint, etc...), and we don't have much control over how they interact with SharePoint. We repeatedly get 429s when migrating content from on-premises systems to SharePoint Online.

Because of this, Migrations are taking much much longer. Even small migrations that once took a couple of weeks, are now taking a couple months as a result, because we have to split out and migrate much smaller pieces at a time.

Do you have any recommendations?

bcameron1231 on 17 Oct 2019

I'm working on two major migration tasks currently and have to provision a lot of sites in that process. During July/August timeframe throttling suddenly became a big issue - after being in contact with premium support we learned that CSOM API is on purpose being throttled a lot because MS wants us to shift into the Graph.
This is to some extend understandable but without notice and time to react (at least as far as I know!) this could potentially give us huge problems. We have been working on moving towards Graph as must as possible but the feature parity is not the best hence some things still remains in CSOM.
The importance of following the guidelines when dealing with 429 is crucial but at the end of the day it will only get worse!
In start August we also had trouble with 503 errors which MS acknowledged was related to a bug which should be fixed by now.

nordtorp on 21 Oct 2019

We have throttling on the Graph API so I am not sure if moving to Graph will be a good solution.

MRuimerman on 21 Oct 2019

Hi,

We are also concerned on the approach and how we are impacted in certain design patterns with throttling on O365 and Office Graph.

Our scenario is we are running an Azure App Service that is hosting multiple facets of our solution from Office Addin for Word/Outlook to an azure function that will take a template from SharePoint Site collection and then merge data from Dynamics and store the result back into sharepoint.

Each of the above processes are designed to run as a Multi tenant solution as an ISV we deliver for multiple SharePoint/Office 365 Tenants and Dynamics 365 tenants by our single service application. Our concern is we are not sure where and how throttling is managed from. Here are some examples

As an example our application is managing 5 different customer tenants with one being 4000 users, the others being an average of say 100 users per tenant.

Would our single application service be given different throttling rates per tenant connection?
Does the size of the tenant give different throttling for service based connections as our work throughput on a 4000 user tenant would be dramatically more than a 100 user tenant?
Our application service will also be working with MSGraph to manage Exchange Mailboxes on multiple tenants and the questions are the same.

We are managing processes and optimising calls to the service and appreciate protecting the service but due to our application design this is only deferring work from what would be on a client based solution and managing from a server (Like a proxy) reducing desktop client dependencies and improving service and clarity on how this would be effecting service and or how we can decorate and get these limits changed by a specific client workload would be appreciated.

Look forward to your response.

mikewalker74 on 24 Oct 2019

👍1

I have also experienced these new throttling challenges when migrating even small sites into SharePoint Online in the last month (October 2019). Various sites with less than 1 GB of content will repeatedly stall (using Sharegate and various performance settings). Similar sites migrated successfully before are now failing due to timeouts. This is unfortunately pure anecdotal since nothing has been published regarding throttling changes in the Message Center. I absolutely feel something is different and have to rethink how I go about moving thousands of sites to SharePoint Online.

strausy on 31 Oct 2019

👍1

We are also started experimenting "(503) Server Unavailable" errors randomly from last 2 month for CSOM / PowerShell script which is been successfully running since last few years. Currently working with MS Premiere support but they have no clue so far.

nilang-shah on 19 Nov 2019

Another update.

I just tried to run an even smaller ShareGate report on a list with 700 items... and I got throttled with 429s.

bcameron1231 on 2 Dec 2019

Note that I am not running any custom code here, just Sharegate and then trying to use the UI.

https://[tenant].sharepoint.com/_layouts/15/Throttle.htm#1033

Something's not right
The page you requested is temporarily unavailable. We apologize for the inconvenience, please check back in a few minutes.

sympmarc on 8 Dec 2019

👍3

We are also trying to provision sites using PnP - have added the Application Name to the context (trying to follow the advice in the MS article of avoiding getting throttled) but I don't see any other way to decorate the request or figure out if we are being throttled but can't create more that one or two sites without it going off into the ether. Is there a least some was to know if we are being throttled?

jmachale on 20 Dec 2019

This seems to be getting worse. I just copied 191 items with Sharegate from an on premises 2010 environment into a list in SPO. I've been locked out with a throttling message in the UI for over 5 minutes.

Note that I am not running any custom code here, just Sharegate and then trying to use the UI.

https://[tenant].sharepoint.com/_layouts/15/Throttle.htm#1033

Something's not right
The page you requested is temporarily unavailable. We apologize for the inconvenience, please check back in a few minutes.

It is also random. I've been migrating libraries all morning and then suddenly I get throttled using the exact same settings as before.

strausy on 30 Dec 2019

^^ Same issue with @strausykumc this morning. A simple ShareGate migration, moving just under 2000 documents from an on-premises environment into SharePoint Online. Throttled almost instantly.

This was the first time I logged into the site today and performed any actions.

bcameron1231 on 31 Dec 2019

We have a customer that has seen what appears to be throttling from ~3 people uploading 3-6 documents at a time, through the interface, all day long. When they called us about the problem I went to their site and was met with "The server is busy now. Try again later"

This is with no migration tools, no automated uploads, just a few people putting documents in a library.
example

narrowhouse on 16 Jan 2020

Hi All, Is there any resolution for the issue so far? I am experiencing the same issue when I am trying to sort a list view by modified date. The list has about 27K records and the view is defined to make sure the list view threshold issue does not occur.

Earlier this week had a very similar experience when I tried to use Azure Function (CSOM) to create sub folders under the same library and then set permissions for newly created folders. Since I was expecting throttling issues just tried a small batch of records to start the update (100 folders). Is there any way to increase this throttling impact for identified Administrator accounts on the tenant? With the way its working it will take me forever to just provision folders and set their permissions.

Appreciate any help in fixing this behaviour as there are several other Power automate (MS flow) processes running in co-ordination with Azure functions which might get impacted if this issue continues.

vivekbspgurus on 29 Jan 2020

These are really really hard to investigate and fix as such as they are happening in the moment of time and we simply don't scale as such on jumping these one by one to detect if the issue is on the external code or at the service side.

So situation around these is really complex and if you have the option, please do report the issue through Premier Support as that provides us away to allocate more resources for the investigation.

Each and every throttling issue is highly dependent on what is being performed and on which tenant, which makes it basically impossible for us to provide exact answer when issue is possibly resolved and when it's not.

In general we are working on constant improvements on this area, but also do understand the possible frustration here as throttling is always causing additional head-ache for the developers. If you however gracefully roll back or delay the calls based on our guidance, that should resolve the issue... it does though also mean that processing the code can take much longer.

Here's our throttling guidance - https://docs.microsoft.com/en-us/sharepoint/dev/general-development/how-to-avoid-getting-throttled-or-blocked-in-sharepoint-online. If you are using PnP CSOM extension, that will automatically take care of the throttling handling when ExecuteQueryRetry is being used. We are also working on similar automation for .NET Core code, but no ETA yet.

VesaJuvonen on 30 Jan 2020

Hi Vesa,
I appreciate the point in time nature to the challenges to help support this but the clear lack of instrumentation to help developers appreciate why and in most cases non explainable reasoning. If another tenant is saturating the platform we have code to try and mitigate and manage as per PNP best practices.. Our concern and issue is where we have user experience issues and further more synchronous behaviours like creating a folder and uploading a document to allow it to be then edited is not a human workflow that appreciates a 429 error stopping a human from continuing work :(..

The ask
1) Provide more clear statements/logs for us developers to investigate and appreciate what and why a solution may have triggered an issue.
2) If CPU throttled or resource throttled understanding the usage on a tenant historically and how it is scoped allows informed decisions
3) Add more resources to reduce the reason to throttle alongside we will optimise our code when we know what code impacts the resources.. :)

Thanks

FWIW : It is important if you are a developer to update SDK and review as there was some issues in the retry logic creating CPU heavy processes client side due to the thread.sleep approach.. Just a heads up in case your out of date.

mikewalker74 on 30 Jan 2020

It's not just a developer problem, either. Either this thread or one other one has examples of throttling when using Sharegate to copy just a few items. It then impacts the UI for the end user. No dev involved.

sympmarc on 30 Jan 2020

I'm involved in a programme of file migrations (file share to SPO) and have been seeing average migration speeds of 1.5 GB/h peak and 2.5 GB/h off-peak. These speeds are observed using both ShareGate and SPMT and when migrating to tenants in completely different countries.

I spoke to an MS engineer via a service request and his response was “the tenant looks fine/too many variables/we don’t guarantee anything/we can’t monitor it”. He was also quick to blame our internet speeds but it's clear from observing a migration that the bottleneck lies at the import stage. Several gigbytes can be uploaded to the Azure storage area in under 10 minutes then can spend over an hour being imported into SPO via the API.

I have conducted a number of migration projects in the last few years and have never seen performance like this. It may put the current project in jeopardy.

CloudyWill on 13 Feb 2020

We've also been receiving large portions (like, near 80% of our requests) of "503 Server Unavailable"
errors during work-hours (nights and weekends are fine) since the start of February.

nozhT on 25 Feb 2020

Hi, mayor issue here too whilst migrating our on-prem farms to SP Online. Whilst you guys are fixing this issue, can you make it so that if i am forwarded to the throttling page (_layouts/15/Throttle.htm), you do NOT overwrite the current URL and remove it from the history, i quite like the 'back' button going back to the page i was just on. Thanks

JJandJ on 26 Feb 2020

👍1

Does the global throttle policy provide any alert mechanism? First it seems undocumented despite the customer is with "pay-as-you-go". Here the case seems to be that despite you pay, you will not go. This method of work creates huge problems in migrations with cut-off dates. At least, MSFT should design a process through support to receive the proper attention during migration.

khintrtust on 22 Mar 2020

👍1

I can confirm we've hit this issue with a couple of clients and our own SharePoint. We have eased this by implementing improved caching and retry mechanisms but we still experience a large amount of 503 errors for pretty reasonable request rates. We've probably had 50+ instances of this over the last couple of months. There doesn't seem to be a specific request that causes the issue, 95% of the requests are just 'get' or 'update' requests to list resources.

CPritch on 4 Aug 2020

Category

[x] Question

[ ] Typo

[ ] Bug

[ ] Additional article idea

Expected or Desired Behavior

Improved Throttling. I obviously won't expect throttling to go away, but it _feels_ like something has changed in the past couple of weeks.

Observed Behavior

I'm working in an Office 365 tenant of about 30k users.

It seems there may have been some changes to the throttling behaviors in Office 365. For the past few weeks, I have been working on a migration into Office 365. A few weeks ago, I migrated close to 1m files with just few issues.

This past week, I've been running reports on some of the document libraries in the environment in Office 365... and I get immediately throttled. I can't even run a report on a library with 1700 items in it, without being throttled...

I have more files that I need to migrate into this environment, which will be nearly impossible due to the current throttling issues.

Here is a Twitter thread with lots of comments from people who are also seeing increased throttling in the last couple of weeks. Seems to be a common issue.

https://twitter.com/Beau__Cameron/status/1181992491018403840?s=20

Have there been any changes to the throttling behaviors in Office365?