Arcade: Failures to restore from Azure artifacts feeds due to throttling

Created on 23 Oct 2019  路  19Comments  路  Source: dotnet/arcade

We have started seeing some throttling errors when attempting to restore NuGet packages from Azure Artifacts that manifest like this:

/root/coresetup/.dotnet/sdk/3.0.100/NuGet.targets(123,5): error : Failed to download package 'transport.runtime.linux-musl-x64.Microsoft.NETCore.Jit.3.1.0-preview1.19504.3' from 'https://pkgs.dev.azure.com/dnceng/9ee6d478-d288-47f7-aacc-f6e6d082ae6d/_packaging/9c2ea29a-00e0-4bae-b470-161fdab1f360/nuget/v3/flat2/transport.runtime.linux-musl-x64.microsoft.netcore.jit/3.1.0-preview1.19504.3/transport.runtime.linux-musl-x64.microsoft.netcore.jit.3.1.0-preview1.19504.3.nupkg'.
/root/coresetup/.dotnet/sdk/3.0.100/NuGet.targets(123,5): error : Response status code does not indicate success: 429 (Request was blocked due to exceeding usage of resource 'Concurrency' in namespace 'IPAddress'. For more information on why your request was blocked, see the topic "Rate limits" on the Microsoft Web site (https://go.microsoft.com/fwlink/?LinkId=823950). (DevOps Activity ID: 5B41D91F-6ED5-41D1-814B-0328F8821422)).
##[error]/root/coresetup/.dotnet/sdk/3.0.100/NuGet.targets(123,5): error : Failed to download package 'transport.runtime.linux-musl-x64.Microsoft.NETCore.Jit.3.1.0-preview1.19504.3' from 'https://pkgs.dev.azure.com/dnceng/9ee6d478-d288-47f7-aacc-f6e6d082ae6d/_packaging/9c2ea29a-00e0-4bae-b470-161fdab1f360/nuget/v3/flat2/transport.runtime.linux-musl-x64.microsoft.netcore.jit/3.1.0-preview1.19504.3/transport.runtime.linux-musl-x64.microsoft.netcore.jit.3.1.0-preview1.19504.3.nupkg'.

This does not appear to be causing widespread failures so far, but as we increase our reliance on these feeds, we're starting to get more reports.

Example builds where this has been seen:
https://dev.azure.com/dnceng/public/_build/results?buildId=398862
https://dnceng.visualstudio.com/internal/_build/results?buildId=384184

All unauthenticated NuGet requests are getting thrown into the same throttling bucket by AzDO, and due to the multi-feed lookup for each package that NuGet does, depending on how many feeds you have in your NuGet config, and how many packages you need to restore, the problem gets worse.

The short term suggestions from the AzDO team are:

  • Reduce the number of feeds used (Their docs mention we should only be using a single feed for restores)
  • Use private feeds instead of public, that way NuGet won't make unauthenticated requests
  • Change NuGet.config to limit the maxHttpRequestsPerSource:
    <config>
        <add key='maxHttpRequestsPerSource' value='2' />
    </config>

We are still in talks with the AzDO team as these workarounds will end up requiring a lot of changes to our infrastructure for this.

Most helpful comment

Ok - AzDO is saying they've fixed it (for real) now.

All 19 comments

Took me a while to track the exact configuration for NuGet.config (it's not documented yet). Updated the issue description with it:

    <config>
        <add key='maxHttpRequestsPerSource' value='2' />
    </config>

According to AzDO telemetry a large number of our AzDO buildpool machines are being detected as being the same machine, which makes it so they all get into the same throttling bucket, making things a lot worse for our BYOC pools than it is for hosted.

As a short-term solution, the concurrency limits for the dnceng instance has been increased and it seems to have helped somewhat.

The Azure DevOps team has found that we are now hitting some other throttling limits, and are in the process of investigating.

The AzDO team is also considering some long-term solutions like filling the gaps that are blocking us from adopting upstream feeds, which would reduce the number of feeds we need to specify in our repos' NuGet.config, and looking at improvements with the NuGet team.

@riarenas - Is this still an thing?

Yes. We have had some quota increases to help with this in the short term, but we haven't heard back with a long term solution.

We've received additional reports of 429s during restore operations in attempt 1 of these builds:

https://dev.azure.com/dnceng/internal/_build/results?buildId=587047&_a=summary
https://dev.azure.com/dnceng/internal/_build/results?buildId=587046&_a=summary
https://dev.azure.com/dnceng/internal/_build%2Fresults?buildId=587045&_a=summary

I reached out in the thread we had with the azure artifacts group about throttling.

CC @wtgodbe

The Azure Artifacts team said the problems on 4/3 were due to dnceng using 60% of the traffic for their scale units when they were completely scaled down. Additionally, yesterday we saw IP throttling come back in a lot more cases:

Using runfo I was able to find these from runtime, but we have additional reports from Roslyn, where the error was reported as a timeout instead of a build failure error.

|Build|Kind|Timeline Record|
|---|---|---|
|592529|PR https://github.com/dotnet/runtime/pull/34666|Build System.Private.CoreLib|
|592529|PR https://github.com/dotnet/runtime/pull/34666|Build System.Private.CoreLib|
|592488|PR https://github.com/dotnet/runtime/pull/34519|Build product|
|592488|PR https://github.com/dotnet/runtime/pull/34519|Build managed product components and packages|
|592488|PR https://github.com/dotnet/runtime/pull/34519|Build managed product components and packages|
|592488|PR https://github.com/dotnet/runtime/pull/34519|Build managed product components and packages|
|592482|PR https://github.com/dotnet/runtime/pull/34054|Restore and Build Product|
|592482|PR https://github.com/dotnet/runtime/pull/34054|Restore and Build Product|
|592437|PR https://github.com/dotnet/runtime/pull/34522|Restore and Build Product|
|592437|PR https://github.com/dotnet/runtime/pull/34522|Build CoreCLR Runtime|
|592437|PR https://github.com/dotnet/runtime/pull/34522|Restore and Build Product|
|592437|PR https://github.com/dotnet/runtime/pull/34522|Restore and Build Product|
|592437|PR https://github.com/dotnet/runtime/pull/34522|Build CoreCLR Runtime|
|592437|PR https://github.com/dotnet/runtime/pull/34522|Restore and Build Product|
|592404|Rolling|Prepare TestHost with runtime CoreCLR|
|592404|Rolling|Build System.Private.CoreLib|
|592404|Rolling|Build product|
|592404|Rolling|Build System.Private.CoreLib|
|592404|Rolling|Build managed product components and packages|
|592417|PR https://github.com/dotnet/runtime/pull/34663|Restore and Build Product|
|592417|PR https://github.com/dotnet/runtime/pull/34663|Build managed product components and packages|
|592417|PR https://github.com/dotnet/runtime/pull/34663|Restore and Build Product|
|592415|PR https://github.com/dotnet/runtime/pull/34665|Build managed product components and packages|
|592415|PR https://github.com/dotnet/runtime/pull/34665|Build System.Private.CoreLib|
|592415|PR https://github.com/dotnet/runtime/pull/34665|Build managed product components and packages|
|592105|PR https://github.com/dotnet/runtime/pull/34654|Build product|
|592105|PR https://github.com/dotnet/runtime/pull/34654|Build product|
|592105|PR https://github.com/dotnet/runtime/pull/34654|Restore and Build|
|592105|PR https://github.com/dotnet/runtime/pull/34654|Build System.Private.CoreLib|
|592105|PR https://github.com/dotnet/runtime/pull/34654|Build System.Private.CoreLib|
|592105|PR https://github.com/dotnet/runtime/pull/34654|Build System.Private.CoreLib|
|592194|PR https://github.com/dotnet/runtime/pull/34658|Restore and Build Product|
|592194|PR https://github.com/dotnet/runtime/pull/34658|Restore blob feed tasks|
|592194|PR https://github.com/dotnet/runtime/pull/34658|Build managed product components and packages|
|592187|PR https://github.com/dotnet/runtime/pull/34518|Build System.Private.CoreLib|
|592187|PR https://github.com/dotnet/runtime/pull/34518|Restore and Build Product|
|592313|PR https://github.com/dotnet/runtime/pull/34662|Restore and Build Product|
|592313|PR https://github.com/dotnet/runtime/pull/34662|Restore and Build Product|
|592295|PR https://github.com/dotnet/runtime/pull/34661|Restore and Build Product|
|592295|PR https://github.com/dotnet/runtime/pull/34661|Restore and Build Product|
|592380|PR https://github.com/dotnet/runtime/pull/34664|Build product|
|592380|PR https://github.com/dotnet/runtime/pull/34664|Build System.Private.CoreLib|
|592380|PR https://github.com/dotnet/runtime/pull/34664|Build System.Private.CoreLib|
|592252|PR https://github.com/dotnet/runtime/pull/34659|Build managed product components and packages|
|592272|PR https://github.com/dotnet/runtime/pull/34521|Build product|
|592272|PR https://github.com/dotnet/runtime/pull/34521|Build product|
|592272|PR https://github.com/dotnet/runtime/pull/34521|Restore and Build Product|
|592272|PR https://github.com/dotnet/runtime/pull/34521|Build product|
|592272|PR https://github.com/dotnet/runtime/pull/34521|Restore and Build Product|
|592192|PR https://github.com/dotnet/runtime/pull/34274|Build|
|592192|PR https://github.com/dotnet/runtime/pull/34274|Build System.Private.CoreLib|
|592192|PR https://github.com/dotnet/runtime/pull/34274|Build|
|592052|PR https://github.com/dotnet/runtime/pull/34432|Restore and Build Product|
|592052|PR https://github.com/dotnet/runtime/pull/34432|Build managed product components and packages|
|592040|PR https://github.com/dotnet/runtime/pull/33902|Restore and Build Product|
|592040|PR https://github.com/dotnet/runtime/pull/33902|Restore and Build Product|
|592080|Rolling|Build product|
|592080|Rolling|Build product|
|592080|Rolling|Build product|
|592080|Rolling|Restore and Build Product|
|592037|PR https://github.com/dotnet/runtime/pull/34652|Build|
|592037|PR https://github.com/dotnet/runtime/pull/34652|Restore and Build Product|
|592037|PR https://github.com/dotnet/runtime/pull/34652|Restore and Build Product|
|592032|PR https://github.com/dotnet/runtime/pull/34651|Restore and Build Product|
|591984|PR https://github.com/dotnet/runtime/pull/33733|Build managed test components|
|591984|PR https://github.com/dotnet/runtime/pull/33733|Build|
|591984|PR https://github.com/dotnet/runtime/pull/33733|Build|
|591984|PR https://github.com/dotnet/runtime/pull/33733|Build|
|592074|PR https://github.com/dotnet/runtime/pull/34211|Build managed product components and packages|
|592074|PR https://github.com/dotnet/runtime/pull/34211|Restore and Build Product|
|592074|PR https://github.com/dotnet/runtime/pull/34211|Restore and Build Product|
|592074|PR https://github.com/dotnet/runtime/pull/34211|Restore and Build Product|
|592074|PR https://github.com/dotnet/runtime/pull/34211|Build System.Private.CoreLib|
|592061|PR https://github.com/dotnet/runtime/pull/34650|Restore and Build Product|
|592061|PR https://github.com/dotnet/runtime/pull/34650|Restore and Build Product|
|592061|PR https://github.com/dotnet/runtime/pull/34650|Restore and Build Product|
|592061|PR https://github.com/dotnet/runtime/pull/34650|Build managed product components and packages|
|592061|PR https://github.com/dotnet/runtime/pull/34650|Build managed product components and packages|
|592061|PR https://github.com/dotnet/runtime/pull/34650|Restore and Build Product|
|592402|PR https://github.com/dotnet/roslyn/pull/43152|Build|

I asked the artifacts team for increased quota as we're ramping up usage of the feeds.

@riarenas thanks for looking into this, do we have an ETA?

would it make sense to bring back the dotnet blob feed back as a restore source in the meantime?

No ETA.

I'll create PRs to re-add dotnet-core as a backup if we don't hear from them soon.

Thanks @riarenas

AzDO folks have increased our limits. I'll keep this in FR for a bit to see if this gives us some relief, and move it back to general tracking afterwards, as our feed usage is only going to increase in the near term. (we haven't moved ASPNet or Installer to relying entirely on azdo feeds yet)

The new limits seem to have stuck. Haven't seen any more throttling during restore since the limits were raised. I'll remove this from FR.

The AzDO team said they are evaluating more sustainable options to handle our load. I'll keep this open because I think if we onboarded another big repo to only using azdo feeds, we'd start seeing this again.

We have reached out to the azure artifacts team again for options, https://github.com/dotnet/core-eng/issues/9681

Ok - AzDO is saying they've fixed it (for real) now.

After the recent AzDO changes it doesn't look like we'll be easily throttled again, so I don't think there's much worth in keeping this long standing issue open anymore. We can open new issues for any sporadic throttling we see.

Was this page helpful?
0 / 5 - 0 ratings