We see restore failures when we update a project to consume a package that was just published. We can use --no-cache
to workaround it, but that shouldn't be necessary.
It seems like NuGet should realize it is running from a cache, and attempt to refresh that cache, before failing the restore because it cannot find the package.
This would still let the common case be fast while not failing in the case that the cache was stale.
Are you using floating versions, or are you specifying the version exactly in your project.json?
Version is specified exactly, not floating.
Workflow we see is:
Today the best solutions for this are to:
1) clear the http cache before build; or
2) call restore with --no-cache
We've fixed this kind of thing in the UI scenario for install.
Not committing to doing this now...but keeping it around to consider.
Thanks for raising the issue.
We're still hitting this often. When we have geo-distributed NuGet feeds and one doesn't have the package, we need to go clear %localappdata%\NuGet\v3-cache
manually on the build server. This is the same for any user having the same issue. Indefinitely caching a 404 is just against all reason when dealing with HTTP. Why are we caching 404s? I'm not saying there's no merit to it, but caching it for more than a minute or so only serves to cause this maddening behavior.
If you click restore packages over and over in VS by right clicking the solution, a user is just shown that the page isn't in the feed. But it is. It's lying. There's an invalid cache in play. I've had to explain this to several of our developers lately and none of them find this intuitive.
Can we please not cache 404s indefinitely? Fixing a build server isn't something every dev should have to learn how to do, nor do they necessarily have access to do so.
I've had this issue myself with NuGet packages just published to a ProGet feed from a TeamCity job, which aren't quite indexed fast enough when I try to reference them elsewhere.
As annoying as it is, another workaround I've found is to add/remove as appropriate a trailing slash onto the feed URL (in my case, as the path isn't one ending in .json
), which cache-busts the 404.
Still hitting this several times a week. Then I have to go clear the cache folder on dozens of build servers. Please change this behavior. It's not right. It doesn't follow HTTP standards. It's an invalid cache, straight up.
I'm too hitting this bug frequently. Please fix this, caching 404s indeed doesn't seem like a good idea
I just spent an hour clearing out the HTTP "cache" folder on a whole series of build agents, instructing developers how to do it locally, and getting builds back in order due to a cached 404 yet again.
PLEASE stop caching a 404 forever. This behavior is nuts. I cannot find a single other product or platform that does this...and I looked, really hard.
After a few days of troubleshooting I found this github issue. This started with our teamcity build server and all builds failing after a lock. Here are what I consider to be related issues:
https://github.com/NuGet/Home/issues/7060
https://github.com/NuGet/Home/issues/7020
Is this behavior going to change in 5.0? We're still hitting it several times a month. While cleaning out build servers due to race conditions is super fun, but I'd rather spend my time on things that matter.
Please stop caching 404s. No sane platform does this. And they certainly don't do it indefinitely.
The last time an iteraction with this from the team was https://github.com/NuGet/Home/issues/3116#event-1364654236 - which happens to have been November 2017
This is pretty bad in managing this as an incident report and whilst this does not currently affect me, I can sympathise with the others on this thread that it does affect.
This effect is undesirable at best and therefore should be prioritised to be changed in a future release.
The fact that this has been hitting @NickCraver and team enough that he has come and replied 3 times since his original comment April 2018 should show that this requires better prioritisation than this is currently getting
Why hasn't this been fixed already?
I don't see any technical difficulties and it is driving people crazy.
Our project (Xenko) relies on nuget and I'm pretty sure we hit that exact issue all the time.
Appreciate the feedback. Will try to get this fixed early in 5.x
@rrelyea, Setting RestoreNoCache=true
prior to RestoreTask
doesn't seem to work reliably for me. Is this not quite the same as --no-cache
?
@rrelyea fixing it for a new version is not acceptable, it should be fixed in minor version on all version currently deployed. Caching HTTP (esp. 404) requests is just a bad practice and a design flaw. Fixing it ASAP should be considered CRITICAL.
@nkolev92 whats the best solution here? IMO, not caching 404s seems like the right option. Shouldn鈥檛 have adverse side effects. If a package is not found, build breaks and hence not something folks run repeatedly unless they fix the availability of package or change the dependency graph to an available version.
Or we could expire 404 cache in a min or some small time period.
I would still vote for first option.
I guess the misconception is that we cache 404s.
We do not do that.
We cache the versions list, which is not a 404 if the package exists on said server.
We could consider retrying if we get a cache response that doesn't contain the requested version, as suggested by @ericstj , but that could have an effect on performance, which we should consider.
Of course that'd be driven by the distribution of the packages/versions across the different sources that a project/solution has, so it might be barely noticeable. Unfortunately there's no low hanging fruit here :)
You could do a version list retry if the requested version is higher than any in the cached list, right?
@NinoFloris
Yeah the special refresh logic could be, the requested version is higher than the one in the cached list, or it could be that the version is missing altogether like the OP suggested.
Can we suggest dotnet restore --no-cache
in the warning message? Something like:
C:\Code\x\x.csproj : warning NU1603: x depends on y (>= 0.0.1-a) but y 0.0.1-a was not found. An approximate best match of y 0.0.1-b was resolved. If y 0.0.1-a was recently published, please try 'dotnet restore --no-cache' [D:\Code\x\x.sln]
This seems like an easy "fix" to help users unblock themselves until we figure out a real solution. I'm on the NuGet team and it took me a solid 5 minutes to work around this problem. I can't imagine the pain the community must be going through when they run into this issue...
I fail to see how it is a hard problem to fix. The behavior of a cache is a 101 programming problem:
That is how ALL cache system in the world works except in nuget for some non-obvious reasons.
Hi there, reporting in that we're still hitting this. Often. NuGet is using an old cache and failing to go see if the package is there. So I'm here (again) manually issuing directory clears across our entire build fleet due to this bug.
Please, please fix this. Looking at cache and saying "oh, well that package version wasn't here yesterday...it must not exist" is not reasonable behavior, and the cost keeps adding up over time.
@NickCraver - can you confirm if you have tried calling restore with --no-cache as suggested by several folks above?
@karann-msft Yes, sure, locally. But that's not the issue.
How would we do this on a build server? Unless we always ignore cache completely, which would lengthen and increase the cost of every build. The build: a) doesn't run as me, so I have to execute this command as a privileged account I only have access to because I'm a sysadmin, and b) runs across many machines.
It seems like we're not really thinking about context with solutions here. We have many dozens of agents, and I routinely have to purge the cache from them all due to this bug. I've had to do it 8 times so far this year, all from having a cached feed used which didn't have the latest version of a package. Pushing a new package _and then using it_ is pretty common (and I imagine it would be for most people).
How would we do this _on a build server_? Unless we always ignore cache completely, which would lengthen and increase the cost of every build. The build: a) doesn't run as me, so I have to execute this command as a privileged account I only have access to because I'm a sysadmin, and b) runs across many machines.
I am not sure what build system you are using. What approach have you considered/tried so far to leverage the no-cache option? If you provide more details, I am sure we can help you incorporate it into your builds.
Also, can you try that approach on a build server manually as a sysadmin and tell us how much time it would normally take v/s no-cache option?
@NickCraver --no-cache
only skips the http-cache and not the GlobalPackagesFolder
and hence using this may not be so bad till we fix this issue. Since it also depends upon specific usage scenario (i.e. how often the http-cache
is used for your use-case), it may be a good idea to baseline the time taken with and without the -no-cache
option.
I am also listing out the scenarios where not refreshing http-cache
is a problem (this issue) as per my understanding. Do comment or correct where I am wrong or add to these scenarios:
Common scenario: Publishing a package version and trying to consume it immediately from the following types of feeds:
| Feed type | Is it an issue? | Reason/Comments|
|:--- |:--- |:---
| Local folder based feed | Not an issue | There is no http-cache
in picture
| NuGet.org | Not an issue for most cases | NuGet.org takes ~22 mins on average to publish a package version. While this in itself is an issue, this kind of makes the http-cache
a non issue as http-cache
gets invalidated in 30 mins
| Private http feed | Yes | Most likely the internal private feed makes the package version available immediately and hence unlike nuget.org, its an issue for these feeds used in most CI/CD scenarios
@nkolev92 can u also scan the above to see if I missed anything.
--no-cache
only skips the http-cache and not the GlobalPackagesFolder
Should be renamed --no-http-cache
because that is confusing. I would never guess that it affects only that layer. There might also be a need for --no-package-cache
to skip the GlobalPackagesFolder
.
Sure, we can run some builds with both and post numbers here. FWIW I am aware of the --no-cache
but agree it's a confusing name.
The reason this doesn't make sense on a build server is _it's a build server_. Most of them allow you to local cache the git repo (e.g. local clone, updates are pulls, for faster builds). So everything you need is on that server. You may be building 1 thing or 20 things in various branches from the same clone, across many directories or builds based on any given tree. Or the lib may be a company library used in 10 projects not even related to a shared repo, just being an illustration of the very common "this is wider than a build" case.
We have all of the above combinations. Chances are, if you have a private feed in the first place, this shared library and it being cached as "missing" on the system level from the a stale feed cache is a system issue. I want that to be super clear here, because the fixes we're talking about are per project, and per build. This is a system level issue (most build agents run as a specific user, so 1:1 with system) and needs a system level fix. That's why we clear the HTTP cache via PowerShell across the fleet each time. Otherwise we're chasing it n
times.
Again, happy to post numbers when I'm back in the office next week, but I wanted to better relay why --no-cache
isn't the _right_ fix, even if it can work per-build. For instance, if I could disable NuGet HTTP cache server-wide in with an environmental variable, we'd do it in a heartbeat. Fixing dozens of builds and ensuring they all use a workaround forever isn't a good solution to a systemic issue. (And I totally see we're trying to see _what_ that solution should look like here - I'm not discounting that, and will help with numbers.)
Or, forget all of the above and just have it actually ignore cache when it fails. I cannot understand why it _doesn't_ do this today. It has the default attitude towards the project of "your reference must be wrong"...and so developers chase it, thinking they're wrong. That's a bad experience and I hear about it from at least one of our teams once a week. Can someone please explain why it doesn't try and _not_ use the cache once it fails to deliver what was requested? That's how most developers expect cache to generally work.
@NickCraver - focusing on the no-cache option, just to make sure I am reading that correctly, let me paraphrase you:
Also, want to throw this into the mix RestoreNoCache
.
<PropertyGroup>
<RestoreNoCache>true</RestoreNoCache>
</PropertyGroup>
Apologies to @NickCraver.
We also clean the http cache on CI builds via PowerShell/build script...
@karann-msft That's a good summary, yep! In all, all of those solutions are still n
solutions, whether it's command line or project file. So the fundamental principle of the problem/solution set remains the same - we need a global fix for global behavior.
@llehn I agree there's bad behavior here, but bad human behavior on top of it doesn't help. I'm no saint in this issue either, I just read back and see my posts indicating increasing frustration at dealing with this same issue almost weekly for 3.5 years. I don't mean to sound flippant here and it's not a good example for others - for that I apologize.
The frustration is real for many, but we should try to discuss this calmly, including me doing a better job of not letting frustration bleed in. I hope we can resolve it for everyone, and appreciate the team actively discussing how we do that here. It's progress. I'll take progress any day.
@NickCraver - how about setting the RestoreNoCache
in a Directory.Build.props
file at the root?
@karann-msft We're already using that pretty pervasively, so one at the root of the drive wouldn't be picked up by most projects - and if it did, it'd be hit or miss (e.g. easy to break/unintuitive). Directory.Build.props
isn't recursive, so only the first one found works, unless you're manually including higher up.
If the package in question is an MSBuild SDK, then RestoreNoCache
has no effect. The _MSBuild SDK Resolver_ simply ignores this property AFAIK.
@vatsan-madhavan - what do you mean by If the package in question is an MSBuild SDK
? Did you mean if the consuming project was an SDK style project?
@NickCraver - how about setting it as an env variable?
@karann-msft I'd definitely take that as a workaround (we can deploy it via puppet), but are we talking about a temporary workaround here or as the only change? I'd like to see the base behavior of "actually check on a miss" modified here so it just works for everyone...not everyone finding this issue or troubleshooting it though it a few times.
@karann-msft Here are some examples: https://github.com/microsoft/MSBuildSdks/blob/master/README.md.
https://github.com/novotnyllc/MSBuildSdkExtras is another popular one.
https://github.com/dotnet/arcade/tree/master/src/Microsoft.DotNet.Arcade.Sdk is an SDK used extensively by .NET Core repos.
@NickCraver - I am trying to validate if the RestoreNoCache
is a decent workaround (sounds like it is?) while we identify a solution including the "actually check on a miss" one. As a matter of fact, @nkolev92 is actively investigating the technical challenges and will be posting his findings shortly.
Not sure I came across right: RestoreNoCache
is still a "go modify every repository" solution - it's synonymous with --no-cache
on the n
changes front (and every new project accounting for it). So, not a good workaround.
Good to hear on investigating a better long-term :)
Let me clarify - I meant RestoreNoCache
env variable so that it is applicable machine-wide.
@karann-msft Ah gotcha - yes, that'd work as a workaround for our case. Is this a thing today?
@NickCraver - I believe it is! I just tried and msbuild recognizes it
@karann-msft - Would you expect the environment variable workaround to also work within visual studio?
I could make it work when running msbuild from the developer command line, but within visual studio doing a rebuild-all did not work as expected to fetch transitive packages right after they are available on the internal nuget server.
I also tried settings the properties via Directory.Build.props
file which also didn't work in VS :-(
<Project>
<PropertyGroup>
<RestoreForce>true</RestoreForce>
<RestoreNoCache>true</RestoreNoCache>
</PropertyGroup>
</Project>
@nkolev92 What's the status of a 'real' fix here? It's pretty clear that there is a workaround, but I'd like a fix that doesn't involve knowledge of the underpinnings of feed infrastructure (e.g. what's the package publish time of my input feeds?).
.NET has been hitting this pretty regularly lately (we removed our http cache clear a bit back).
I understand that there are some complexities here. In .NET's case, we look for exact version matches in all cases, so the error path (package version not found in http cache, fallback should be to re-request) is correct and should not affect performance. On the other hand, when * versioning or version ranges are in use, then things get complicated. I think in those cases, you either need to get more info from the package feeds (e.g. a simple bit that indicates whether a new version of X package has been published), or you cannot use the http cache altogether from a correctness standpoint. I lean more towards the second option in those cases.
@mmitche Are the problems in .NET build still related to MSBuild SDK's? If so, then does MSBuild's SDK resolver honor RestoreNoCache
yet (I don't believe it did last I checked over a year ago and that made this workaround unreliable).
Not sure, @riarenas is investigating. We had a cache clearing step in the builds before (I think it was an explicit initial clear), but it was removed at some point.
@mmitche - Can you use the RestoreNoCache env variable mentioned in https://github.com/NuGet/Home/issues/3116#issuecomment-586426496?
@mmitche - Can you use the RestoreNoCache env variable mentioned in #3116 (comment)?
Yes, we believe we can and we're adding it back into our builds. But I'm more interested in a fix for the underlying issue.
We had a cache clearing step in the builds before (I think it was an explicit initial clear), but it was removed at some point.
The old nuget-cache clearing in Arcade's build targets/scripts weren't helpful for MSBuild SDK's even in the past, unfortunately.
The MSBuild SDK resolver did its own NuGet calls I believe (probably still does), and didn't respect RestoreNoCache
env etc (which means that these workarounds - however helpful in general - weren't helpful enough for .NET builds).
@vatsan-madhavan I don't think that we're seeing those problems in these cases. All of these are post msbuild SDK-install restore issues.
Most helpful comment
I just spent an hour clearing out the HTTP "cache" folder on a whole series of build agents, instructing developers how to do it locally, and getting builds back in order due to a cached 404 yet again.
PLEASE stop caching a 404 forever. This behavior is nuts. I cannot find a single other product or platform that does this...and I looked, really hard.