There's a report of "A task was canceled." error on NuGet 5.2.x. The root cause is unknown.
(We are investigating potential fixes in NuGet client code and the interaction with https://github.com/Microsoft/artifacts-credprovider. This is impacting several customers badly.)
Thanks for your work here - @jmyersmsft and I are happy to test any possible fixes on @AArnott's consistent repro of this issue whenever you think you have something.
I've got CancellationToken tracking code in a branch in my fork to help further investigation if necessary too
Moving to 5.4, once we have a fix, we can consider servicing if necessary.
Since raising the timeouts seems to have solved the problem for me, can we not simply deduce that the problem is that there are timeouts that cancel tokens and cause non-deterministic failures? Perhaps the default timeouts are far too aggressive.
We believe there are multiple cancellation issues at hand.
Some are related to unwanted cancellations that are happening due to the way NuGet manages the lifetime of the plugins.
Thread starvation due to NuGet spamming the threadpool.
In your case logs and repros suggest the 2nd scenario.
Ideally we'd like to get to the bottom of why the threadpool becomes exhausted in some situations.
Increasing the timeout feels like a band aid that delays rather than fixes the root cause.
There's also been some reports that simply increasing the timeouts without clearing the http cache does not get rid of the problem for some customers.
Whether or not we fix the root causes, I think the default timeouts should be increased to at least 30s. There's almost no downside (they only matter if the plugin is badly misbehaving), and 5s isn't really a very long time to wait for a .net process to start up when the system is under heavy load (e.g. when restoring a bunch of packages from a bunch of feeds)
Whether or not we fix the root causes, I think the default timeouts should be increased to at least 30s.
I agree with this. We have a crazy high timeout for http request, the plugins should have a similar timeout.
I just didn't want us to do a reactionary fix in 3.0.
Updating the status here:
We have 5 fixes on the NuGet side so far, 5.3.x – 3.0.100, 5.4.x – 3.1.100, 5.5.x -3.1.200
We have 2 fixes on the plugin side in 0.19 and 0.20 versions
NuGet internally cancelling prematurely. (5.3.1) #8648
Logging failures triggering plugin shut down. (5.3.1) #8688
Propagated exception at dispose time fails the whole auth with a task cancellation error. (5.4) #8732
Plugin logging accuracy improvement (5.4) #8747
Threadpool starvation issues (5.5) #8528
Plugin side:
Unhandled exception crashes the restore operation (0.19.0) - microsoft/artifacts-credprovider#108
Change exit timeout to start ticking on a shutdown request, not at startup(0.20.0) microsoft/artifacts-credprovider#143
Final fix in:
https://github.com/NuGet/NuGet.Client/pull/3141
What version of .NET Core SDK or nuget.exe do we need to get this?
Checked into dev in plenty of time for 5.5. Which will ship with 16.5.
3.1.200 will be the matching sdk version, I think.
NuGet.exe might get patched in a 5.4.1 version if there is big need.
Dotnet.exe may have a servicing possibility too...but we don’t control timing.
Would love to understand where you (and others) most need fixes and how urgent a patch is.
5.4.0 already has 4 fixes helping this and related issues. Plus the cred provide added two fixes.
This fifth fix should polish off the problem.
Thanks @nkolev92 for continuing to attack until we killed this problem!
Thanks for the version numbers, @rrelyea.
For my own purposes there isn't an urgent need because I've already applied the workaround pretty much everywhere. But I'll be happy to remove it when I can.
Most helpful comment
Updating the status here:
We have 5 fixes on the NuGet side so far, 5.3.x – 3.0.100, 5.4.x – 3.1.100, 5.5.x -3.1.200
We have 2 fixes on the plugin side in 0.19 and 0.20 versions
NuGet internally cancelling prematurely. (5.3.1) #8648
Logging failures triggering plugin shut down. (5.3.1) #8688
Propagated exception at dispose time fails the whole auth with a task cancellation error. (5.4) #8732
Plugin logging accuracy improvement (5.4) #8747
Threadpool starvation issues (5.5) #8528
Plugin side:
Unhandled exception crashes the restore operation (0.19.0) - microsoft/artifacts-credprovider#108
Change exit timeout to start ticking on a shutdown request, not at startup(0.20.0) microsoft/artifacts-credprovider#143