go version
)?13.6
yes
go env
)?Window 10, intel
go env
Output
$ go env
I had a mod file which is 1.13
When I run go build, it download the dependency, however, it end with error like this. e.g.
cannot load golang.org/x/net/html: rename C:\Users\502380\go\pkg\mod\golang.org\x\[email protected] C:\Users\502380\go\pkg\mod\golang.org\x\[email protected]:
build should successful without any error.
go: extracting golang.org/x/net v0.0.0-20190620200207-3b0461eec859
build command-line-arguments: cannot load golang.org/x/net/html: rename C:\Users\502380\go\pkg\mod\golang.org\x\[email protected] C:\Users\502380\go\pkg\mod\golang.org\x\[email protected]: Access is denied.
@kiwionly Could you provide an example of the code that produced the error, preferably as a Go playground link?
Seems like the same issue as microsoft/vscode-go#2897. We've never been able to reproduce this.
Could you post a listing of the directory C:\Users\502380\go\pkg\mod\golang.org\x\
?
Are you able to reproduce this or did it just happen once? How about after clearing the module cache with go clean -modcache
?
@cagedmantis @jayconrod I am doing go build with module using Jetbrain Goland, some how command prompt also able to produce the error.
Note that my application contain a lot of dependency, and i need to rerun > go build , sometime it success, sometime it will fail with error message below ( depend on the package, some package will have issue, then I run go build again, no more issue/)
Here is the error when running in command prompt using go get, but actually the error is same ( except go build is loading a lot of dependency compare to go get below, but github.com/go-playground/locales is the one that always fail )
C:\Users\502380\GolandProjects\myproject>go get github.com/go-playground/universal-translator
go: finding github.com/go-playground/universal-translator v0.17.0
go: downloading github.com/go-playground/universal-translator v0.17.0
go: extracting github.com/go-playground/universal-translator v0.17.0
go: downloading github.com/go-playground/locales v0.13.0
go: extracting github.com/go-playground/locales v0.13.0
..\..\go\pkg\mod\github.com\go-playground\[email protected]\errors.go:7:2: rename C:\Users\502380\go\pkg\mod\github.com\go-playground\[email protected]
C:\Users\502380\go\pkg\mod\github.com\go-playground\[email protected]: Access is denied.
For the rename
of that directory, we use a separate lockfile:
https://github.com/golang/go/blob/6718cb617f3d09878088779fe9d18b08949a9bc0/src/cmd/go/internal/modfetch/fetch.go#L76
So the only possibilities I can see for this failure are:
lockedfile.Mutex
or os.Rename
implementation is causing the file-locks to not actually synchronize the other I/O operations.Of course, it is also possible that I have missed something.
For cause (1), we should fix our implementation.
For cause (2) or (3), we should ensure that a bug is filed against the appropriate other project, and then we could perhaps use robustio.Rename
instead of os.Rename
if we believe that will help. However, I would want some way to empirically reproduce and measure the problem and the mitigation to verify that it actually does help.
CC @jstarks @alexbrainman @zx2c4 @networkimprov in case they have any insights into how to debug this.
@bcmills is this the relevant WinAPI call for lockedfile.Mutex? How is it used here?
https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-lockfileex
Since the error is intermittent, it seems like retrying os.Rename() might be the solution. Unrelated directory updates could have saturated the NTFS journal (@jstarks is that a reasonable guess?)
A way to test that might be a set of goroutines, each repeatedly renaming a separate file between two unique names in the same directory; there's no possibility of name collisions, only concurrent directory updates.
PS: If that's the problem, I don't think it's a bug in NTFS per se...
Yes, lockedfile.Mutex
does use LockFileEx
internally. (The implementation is in src/cmd/go/internal/lockedfile/internal/filelock/filelock_windows.go
.)
What would it mean to saturate the NTFS journal? If the program is outpacing the buffering capacity of the filesystem, I would expect windows.MoveFileEx
to block until the buffer clears far enough to enqueue the operation...
My guess about NTFS may be wrong. It's the only reason I could imagine for unexplained rename failures. My test concept above might prove it...
Some info on NTFS journaling is here, tho not linked to a particular Windows version. It describes "Log Full", but isn't clear about the end result for filesystem ops which encounter it. Some are retried, but not all? I can't tell.
CC @jstarks @alexbrainman @zx2c4 @networkimprov in case they have any insights into how to debug this.
I don't have any good ideas.
I am not familiar with that code. And I don't have free time to learn the code to debug it.
Alex
@bcmills here's a test app which repeatedly renames a set of files in a single directory. It defaults to 10 concurrent renaming loops, and a command line argument lets you pick the quantity.
I tried this using Go 1.13.3 on Win7 with 10, 33, 100, and 250 loops, but saw no errors. Maybe it behaves differently on Win10, or a higher loop count is required? Maybe it should also write files, not just change directory entries? Or maybe it disproves my hypothesis :-)
I watched the Task Manager while it ran, and noticed that SearchIndexer.exe also consumes a lot of cpu time.
package main
import ( "fmt"; "os"; "strconv" )
const kDir = "zwinrnd/"
func main() {
var err error
aGr := 10
if len(os.Args) > 1 {
aGr, err = strconv.Atoi(os.Args[1])
if err != nil { panic(err) }
}
err = os.MkdirAll(kDir, 0700)
if err != nil { panic(err) }
for a := 0; a < aGr; a++ {
go func(c int) {
cA, cB := 'a', 'b'
cFd, err := os.Create(kDir + fmt.Sprintf("%d-%c", c, cA))
if err != nil && !os.IsExist(err) { panic(err) }
cFd.Close()
for {
err = os.Rename(kDir + fmt.Sprintf("%d-%c", c, cA),
kDir + fmt.Sprintf("%d-%c", c, cB))
if err != nil {
fmt.Println(err)
} else {
if cA == 'a' { cA, cB = 'b', 'a'
} else { cA, cB = 'a', 'b' }
}
}
}(a)
}
select {}
}
I see this error many times in my Windows 10 workstation when downloading dependencies that are not cached locally, and it also happen to all my colleagues. In our case it seems to be related to the on-access scanner enterprise antivirus detecting as a threat the go modules cache renames. It is happening since the adoption of modules on Go 1.11.
The only solution I've found is to create an scanning exception for the go/pkg/mod
folder that I activate when this error occurs.
I don’t know if this can be useful. I posted on vscode-go a way to reproduce the issue.
https://github.com/microsoft/vscode-go/issues/2897
@alfmos, do you have repro steps that do not involve indirection through VSCode? Ideally we should have a regression test that we can check in to the main go
repository and run on the Go project's builders, which means that it cannot depend on going through a specific third-party IDE.
@bcmills I've just tried from cmd and I get the error.
go version: 1.13.7 windows/amd64
go env:
C:\Users\alfonso moscato>go env
set GO111MODULE=on
set GOARCH=amd64
set GOBIN=
set GOCACHE=C:\Users\alfonso moscato\AppData\Localgo-build
set GOENV=C:\Users\alfonso moscato\AppData\Roaminggo\env
set GOEXE=.exe
set GOFLAGS=
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GONOPROXY=
set GONOSUMDB=
set GOOS=windows
set GOPATH=d:go
set GOPRIVATE=
set GOPROXY=https://proxy.golang.org,direct
set GOROOT=c:go
set GOSUMDB=sum.golang.org
set GOTMPDIR=
set GOTOOLDIR=c:go\pkg\tool\windows_amd64
set GCCGO=gccgo
set AR=ar
set CC=gcc
set CXX=g++
set CGO_ENABLED=1
set GOMOD=NUL
set CGO_CFLAGS=-g -O2
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-g -O2
set CGO_FFLAGS=-g -O2
set CGO_LDFLAGS=-g -O2
set PKG_CONFIG=pkg-config
set GOGCCFLAGS=-m64 -mthreads -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=d:\tempgo-build604923562=/tmp/go-build -gno-record-gcc-switches
Steps to reproduce the issue:
C:\Users\alfonso moscato>go get -v github.com/mdempsky/gocode
go: finding github.com/mdempsky/gocode latest
go: finding golang.org/x/tools latest
go: extracting golang.org/x/tools v0.0.0-20200216192241-b320d3a0f5a2
build github.com/mdempsky/gocode: cannot load golang.org/x/tools/go/gcexportdata: rename d:go\pkg\modgolang.org\x\[email protected] d:go\pkg\modgolang.org\x\[email protected]: Access is denied.
Tell me what data you need and I'll collect them.
I could let you inspect my pc from remote if needed
I spent some more timing looking at this today. Based on the MoveFileEx
documentation and various threads on Windows Q&A sites and mailing list, I think the most likely explanation for this issue is that another process (probably antivirus or search indexer) is opening files in the module directory before it's completely extracted.
Here's our normal sequence of operations at a high level:
os.Stat
the module version directory, for example, %GOPATH%\pkg\mod\golang.org\pkg\[email protected]
. If it exists, we assume it's complete.os.OpenFile
on the module version's lock file, possibly creating it.LockFileEx
to lock the lock file.os.Stat
the module version directory again. If it exists, another process got the lock and extracted the module zip first, so we assume the directory is complete.os.RemoveAll
on temporary module version directories with a ".tmp-"
suffix. These were created by other processes which failed. We hold the lock for this version, so these directories should not be actively used. Errors are ignored here.".tmp-"
suffix. Extract the module zip file there.os.Rename
the temporary module version directory to its final location. This calls MoveFileEx
, which is where we're encountering the error.UnlockFileEx
on the lock file.Close
on the lock file.We're getting ERROR_ACCESS_DENIED
on the os.Rename
call. I've verified this error is reported when os.Rename
is called on a directory when another process has a file open in that directory. So while I can't confirm this is what's happening, it seems likely. Note that by default, the module cache is not in a hidden directory: %USERPROFILE%\pkg\mod
will probably be read by search indexers, backup tools, and anything else that watches the user's home directory.
I thought about an alternate flow where we extract a module version to its final location, then create a special file indicating everything has been extracted. However, if we encounter an error extracting the module, we would need to call os.RemoveAll
on the partial directory the next time, and that would fail if other processes have files open. So it's just a different kind of flaky.
I don't think we can fix this, but some mitigations are possible.
First, we can check if the error is ERROR_ACCESS_DENIED
and add some information to the error message for this specific case.
Second, we can retry os.Rename
a few times before reporting an error. It looks like cmd/go/internal/robustio.Rename
can already do this.
Does each file extracted from the module zip to the temp directory get an os.File.Sync() after it's written and before close?
If not, that might be worth a try. I have a similar process in my Windows app (make temp-dir, populate, rename) with f.Sync() for each file, and have not seen "Access Denied" errors.
Also on a system that reorders writes, a directory rename without prior file sync could yield corrupted files in an apparently complete directory after a crash.
EDIT: to answer my own Q, extracted files are not sync'd:
https://github.com/golang/mod/blob/master/zip/zip.go#L465
The current solution leaves the environment in an unstable state. Could it be possible to change os.Rename to os.Copy followed by os.Remove? In this way if there is an Access Denied error, the only side effect would be a dirty temporary directory, and the environment would be updated and well working.
@alfmos I think you're suggesting to use os.Link() to make the directory appear with the correct name, then os.Remove() to clear the temporary name. That method would work for a file, but not a directory.
Sorry, I assumed there was os.Copy, that actually doesn't exist.
My point is, apart from the function to use, to copy of all files from temp directory to the final one, instead of renaming the directory.
This would eliminate the access denied problem, as the access would be read-only.
Of course this woud waste some space because there would be two copies of the files before the temp directory is deleted, but wouldn't stop the installation if there is another process reading the file.
If the copy failed for any reason, you'd be left with a corrupt module. Using os.Rename() prevents that.
If calling os.File.Sync() doesn't fix it, the solution is to retry os.Rename() until it works.
If the copy failed for any reason, you'd be left with a corrupt module. Using os.Rename() prevents that.
You're right, I didn't think of that.
@networkimprov Could you explain more about why os.File.Sync
would help with the ERROR_ACCESS_DENIED
problem? I'm not that familiar with the pitfalls of NTFS and the Windows API, so I feel like I'm missing something. Coming from Linux / Darwin, I'd expect that to help avoid corruption in the case of power loss, but it doesn't seem like it would guard against other processes opening files.
I wrote a couple small test programs: one extracts a small module zip, then renames and removes the directory. The other repeatedly walks the directory and reads files. I was able to get an ERROR_ACCESS_DENIED
pretty quickly.
I tried adding os.File.Sync
in golang.org/x/mod/zip
, but I still saw ERROR_ACCESS_DENIED
in this case.
A quick benchmark showed that adding the os.File.Sync
made a combined unzip-and-remove operation take 2.78x as long as before. So I'd rather not add it unless it's absolutely necessary.
Which op returned access-denied, remove or rename? I assume you're calling os.RemoveAll()? I'm not sure why you're testing that.
I suggested os.File.Sync() because I use it and haven't seen such errors, tho maybe I haven't tested my app widely enough.
Which op returned access-denied, remove or rename? I assume you're calling os.RemoveAll()? I'm not sure why you're testing that.
os.Rename
fails. os.RemoveAll
is just there to clean up the temporary directory after a successful rename.
I suggested os.File.Sync() because I use it and haven't seen such errors, tho maybe I haven't tested my app widely enough.
I don't understand why os.File.Sync
would guard against ERROR_ACCESS_DENIED
though. Have you seen ERROR_ACCESS_DENIED
without using os.File.Sync
?
We haven't been able to reproduce this problem with the go
command, so it seems fairly rare, or at least dependent on other software running on the system we don't know about. I've only been able to reproduce the error with synthetic examples.
Based on the MoveFileEx documentation and various threads on Windows Q&A sites and mailing list, I think the most likely explanation for this issue is that another process (probably antivirus or search indexer) is opening files in the module directory before it's completely extracted.
That sounds reasonable. Especially given that you have small example program that reproduces the problem.
I googled for MoveFileEx ERROR_ACCESS_DENIED
and I found
https://habr.com/ru/post/450700/
While it is not exactly the same issue, but similar. It talks about
The DeleteFile function marks a file for deletion on close. Therefore, the file deletion does not occur until the last handle to the file is closed. Subsequent calls to CreateFile to open the file fail with ERROR_ACCESS_DENIED.
But the article gave me an idea. The author used Process Monitor
https://docs.microsoft.com/en-us/sysinternals/downloads/procmon
to monitor all syscalls that are happening to his file. We could do the same.
@alfmos you could use Process Monitor to log all accesses to files with path starting with d:go\pkg\modgolang.org\x\[email protected] or similar. This will tell you which processes (Go tools and other programs) access yor file in question and how. Especially the bit where the failure occurs. Please try it.
I don't think we can fix this,
Why not? The problem is that you use MoveFileEx, and it does not work for your use case. Surely there are different ways to design your cache storage process - the way that does not require moving files.
Alex
Logfile_access_denied.xlsx
I used Process Monitor and I can confirm that files are opened for scan by KAspersky antivirus and Cybereason ransomfree.
As a countercheck, I disabled both and the installation went fine.
I have attached a process monitor printscreen and an Excel file with all accesses to files.
Regarding the solution, maybe it's only a matter of waiting.
Both tools open new files for scan when they are created, but just once.
After the scan is ended files can be moved without issues.
@alfmos Thanks! It's really good to confirm this is the root cause.
@alexbrainman You're right, there's probably a way to do this that doesn't involve renaming.
In an earlier comment, I was thinking about extracting the zip file in place (while holding a lock, which we already do), then creating an empty file (let's call it a .extracted file) to indicate the directory is complete and the zip has been fully extracted.
If we later find that the module was only partially extracted by an earlier run (that is, the directory is present but the .extracted file is not), it's prudent to remove the directory before unzipping the module again. zip.Unzip
requires this at the moment, though it could be changed if needed. I was worried that os.RemoveAll
would fail if any files were open in that directory, and that seems to be the case, but robustio.RemoveAll
works well enough. I didn't see any errors using that in a synthetic test.
The “extract in place” approach unfortunately has an awkward migration path: folks' existing module caches will not include .extracted
files — so older versions of the go
command will not know to wait for them, and at the version transition every archive will be re-extracted.
(Maybe that's worth the fix, though.)
Yeah, I'm not wild about the migration.
Maybe we could compare a directory to the zip file and check that it's fully extracted instead of deleting and re-extracting it. It's not clear that's actually less file I/O though.
Yeah, I'm not wild about the migration.
Maybe we could compare a directory to the zip file and check that it's fully extracted instead of deleting and re-extracting it. It's not clear that's actually less file I/O though.
Why not a hash function? This would guarantee also against data corruption.
Hashing is very expensive compared to stat
.
We do want to ensure that the go
command doesn't _introduce_ data corruption in the course of normal operation, but we intentionally do not make it go out of its way to _detect_ corruption introduced by other sources.
Before starting the unzip, you could create a marker file, then delete it after the unzip. You might need to loop the delete tho. The easiest fix is probably to loop os.Rename.
I've been struggling with the same issue for a couple of days. First I failed to build a go project on WSL (Ubuntu) but after that the same occurred on the Windows itself.
I use another antivirus software than the mentioned above but still notice that its process occupies some files under /pkg/mod directory while go mod command is in place.
Unfortunately I can't disable it, so I am looking forward to the solution, or at least another workaround.
In an earlier comment, I was thinking about extracting the zip file in place (while holding a lock, which we already do), then creating an empty file (let's call it a .extracted file) to indicate the directory is complete and the zip has been fully extracted.
I suspect the problem here is that we re-create file / directory with the same path after it has been removed. For example
If external antivirus program opens c:\a.txt between 1 and 2 and keeps the file open until after 3, then 2 still succeeds. But the file is not deleted immediately - delete operation is delayed until all programs (including antivirus) close the file. And the file will still be opened by anti-virus at 3. That is why 3 fails - because this file is meant to be deleted, but now it has been created again - Windows is confused, so it just fails creation.
I am pretty sure, this is exactly what is happening in #25965. #25965 happens while we re-creating executable file. Same exe file is deleted, and then re-created with the same name. If any anti-virus opens the file, we will have the same problem.
But at least here, we can avoid the problem by using different design. Maybe we could download and extract zip file into a randomly named directory. And rename it to proper name only at the very end. Would this work?
Or maybe we could download into directories with random names, and use some index file to refer to this directory.
Or something different.
I was worried that
os.RemoveAll
would fail if any files were open in that directory, and that seems to be the case, butrobustio.RemoveAll
works well enough. I didn't see any errors using that in a synthetic test
I suspect, if you avoid recreating files / directories, you won't even need robustio.RemoveAll
.
Hashing is very expensive compared to
stat
.
Windows implementation of stat
is far from trivial. And syscalls it uses might be quite expensive to run.
Alex
But at least here, we can avoid the problem by using different design. Maybe we could download and extract zip file into a randomly named directory. And rename it to proper name only at the very end. Would this work?
That's what cmd/go currently does for module cache directories.
I think the problem is that antivirus and other tools are walking these directories and opening files as they are created, before they are renamed to their final locations.
Or maybe we could download into directories with random names, and use some index file to refer to this directory.
That would work. It makes the module cache a bit more opaque for users though. I'd like to try creating a marker file (.incomplete
or .extracted
) first.
First we should retry the rename a good number of times; that's proven to work in other cases.
Change https://golang.org/cl/220978 mentions this issue: cmd/go/internal/modfetch: retry rename for unzipped directories
Ok, in the interest of trying a simple solution before a complicated solution, https://golang.org/cl/220978 retries the os.Rename
a few times with a short timeout.
Anyone who's affected by this, please try out this fix (maybe using gotip
) and let me know if you still see the issue.
In my case (using MCAfee), gotip
does retries on failure but still fails after one second.
The command I've executed is gotip get golang.org/x/tools/gopls@latest
.
@qmuntal Too bad. Thanks for confirming though.
Time to try the more complicated solution.
Change https://golang.org/cl/221157 mentions this issue: cmd/go: extract module zip files in place
The rename is retried only 6 times? If retrying longer would delay progress on other modules, could you spin up a goroutine to retry os.Rename() and then wait on them before exiting?
It's retried at random intervals for up to 500 ms. The timeout is arbitrary, but we don't want to block indefinitely.
Downloads are already done in parallel using a worker pool, so this shouldn't delay progress on other modules.
500ms probably isn't enough time for an anti-malware tool to vet every file in a large directory.
EDIT: I'd probably keep trying for 30s.
Hi, where do you suggest to change interval setting ?
@networkimprov, 500ms of extra latency is enough as it is. 30s is unacceptably slow. We'll figure out some alternative.
@bcmills pointed out a problem with extracting module zip files in place: older versions of Go just stat the module directory to check whether it exists. They don't lock the lock file, and they won't notice .partial
files. So with https://golang.org/cl/221157 as it is, if an old version (1.14) of Go accesses a module directory while a new version (1.15) is extracting it, it will see incomplete contents. The old version also won't know if the directory is partially extracted due to an error.
So we need a migration plan. Here's what I'm thinking.
Increase the robustio.Rename
timeout from 500 ms to 2000 s. This makes the failure less likely, and I think it's the longest reasonable time to wait.
Continue extracting zips into random temporary directories before renaming them. When checking whether a directory exists, stat both the directory and the .partial
file. If the .partial
file exists, that indicates a future version of Go partially extracted the directory in place. Take the lock and check again. If the .partial file is still present, remove the module directory and re-extract the zip (into a random temporary directory first, as we do now). Delete the .partial
file before releasing the lock. This change would land in Go 1.15, but it could potentially be backported to 1.13 and 1.14 to speed up migration.
Allow users to opt into the unzip-in-place behavior drafted in https://golang.org/cl/221157 by setting GODEBUG=modcacheunzipinplace=1
in their environment.
After time has passed and old versions are less likely to be used concurrently, we will make the new extract-in-place behavior the default and remove the current renaming behavior. This would be Go 1.16 at the earliest.
Re supporting previous versions if the module cache "schema" changes, lots of sites won't update to the latest 1.14.x release when 1.15 appears.
Since this bug doesn't bite many users, you could simply create a symlink with the final directory name if rename sees access-denied too many times. You could also tidy up that state in certain other _go mod_ operations.
Thinking about this some more: I don't think we can use the _presence_ of a file to indicate incompleteness. That would result in the following race:
A
notices that the module is not yet present, downloads its .zip
file, and begins extracting it.B
checks for the presence of the module. The directory exists and does not contain a .partial
file.A
creates the .partial
file and extracts a few of the source files into the directory.In order to mitigate that race, we would need process B
to verify that the directory is non-empty, although today it is technically possible for a valid module to be completely empty (see #31866).
Moreover, we would need to be very careful about filesystem synchronization: if process B
checks for non-empty directory contents, then it must be guaranteed to see the .partial
file if it _may_ have obtained an incomplete directory listing. Otherwise the race could become:
A
creates the .partial
file and extracts the first N source files into the directory.B
reads the first N source files (in system-dependent order per readdir
).A
finishes extracting source files, some of which are now in “directory order” before those read by B
, and deletes the .partial
file.B
finishes reading the directory contents, which omit the .partial
file.Now process B
has read an incomplete list, but is unaware of that fact because of filesystem-dependent directory ordering behavior.
It is _possible_ that all of the filesystems we care about produce a “directory order” that is consistent with the order of creation. But I'm not comfortable betting on that.
So I think we need to create a file that indicates completeness, rather than one that indicates incompleteness.
Re supporting previous versions if the module cache "schema" changes, lots of sites won't update to the latest 1.14.x release when 1.15 appears.
@networkimprov Agreed, but the earliest this would be on by default is 1.16. It seems reasonable to me that people would update to 1.14.1 at least before 1.16 is released (a year from now). And this would only affect people running multiple versions of Go concurrently.
Since this bug doesn't bite many users, you could simply create a symlink with the final directory name if rename sees access-denied too many times.
My understanding is that creating symbolic links on Windows requires admin privileges or an explicit switch to developer mode on Windows 10. Is that now longer the case?
@bcmills The patch I'm working on creates the .partial
file in $GOPATH/pkg/mod/cache/download/$module/@v/$version.partial
, next to the .lock
file. The module directory is separate, $GOPATH/pkg/mod/$module@$version
. The .partial
file is created before the module directory.
I think that prevents the race. If process B
observes the module directory but no .partial
file, that implies process A
finished unzipping the module and deleted the .partial
file.
Ah, yep — that should work! (But we should be careful to Sync
the .partial
file before we create anything else.)
Change https://golang.org/cl/221820 mentions this issue: cmd/go: make module zip extraction more robust on Windows
creating symbolic links on Windows requires admin privileges or an explicit switch to developer mode on Windows 10.
Oh yes. I forgot about that :-(
we should be careful to sync the
.partial
file
You can't sync a directory in Windows, but don't need to because directory changes go through the NTFS journal. If this change isn't Win-specific, it should open/sync/close the .partial
parent directory on Unix. If .partial
is empty, there's no need to sync its contents.
@jayconrod if possible allow users to opt into the new behaviour via GODEBUG flag so we can avoid this issue without having to deal with corporate antivirus meanwhile the fix is not fully implemented.
@qmuntal That's the plan. With https://golang.org/cl/221157, you would be able to opt into the new behavior by setting GODEBUG=modcacheunzipinplace=1
in your environment.
I think the problem is that antivirus and other tools are walking these directories and opening files as they are created, before they are renamed to their final locations.
I run this test on my computer https://play.golang.org/p/ldwnbko393L and this is what I see:
C:\Users\Alex>u:\test -test.v
=== RUN TestRenameAfterOpenFile
TestRenameAfterOpenFile: main_test.go:63: rename C:\Users\Alex\AppData\Local\Temp\TestRename993273147\from C:\Users\Alex\AppData\Local\Temp\TestRename993273147\to: Access is denied.
--- FAIL: TestRenameAfterOpenFile (1.05s)
=== RUN TestRenameAfterChdir
TestRenameAfterChdir: main_test.go:63: rename C:\Users\Alex\AppData\Local\Temp\TestRename352688990\from C:\Users\Alex\AppData\Local\Temp\TestRename352688990\to: Access is denied.
--- FAIL: TestRenameAfterChdir (1.02s)
FAIL
C:\Users\Alex>
So, I agree, os.Rename will not work properly in general. I don't think we should use os.Rename to move directories.
But that does not mean we cannot implement this process in the way that works on Windows.
For example, we could store cache directory locations in a little table of import_path
-> random_path_on_my_disk
pairs (text file or something). Then you download your package zip into random directory (it can starts with some recognizable name to help debugging), and then you update your text file / table to note where package files are stored.
My example is very simplistic. I am sure you guys can come up with something better.
Alex
@gopherbot Please open backport issue for 1.14.
Backport issue(s) opened: #37800 (for 1.14).
Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.
@gopherbot Please open backport issue for 1.13.
1.13 backport issue is #37802.
In https://github.com/golang/go/commit/576fa692, you're doing:
ioutil.WriteFile(partialPath...)
modzip.Unzip(...)
os.Remove(partialPath)
That may not work if there's a crash and the OS does out-of-order disk writes. To fix that:
a) After WriteFile()
the partialPath
parent directory needs os.Open/File.Sync/File.Close()
b) After Unzip()
the module directory needs the same.
That's a unix-only technique; the NTFS journal protects you on Windows. The directory-rename approach doesn't have this problem.
EDIT: You could consider a system sync
command after Unzip(), as that would ensure all the file contents made it to disk. That's more efficient than per-file sync by Unzip.
The directory-rename approach doesn't have this problem.
Actually, it does. Directory renaming is atomic with respect to normal operation, not crashes.
(For a good read, see https://danluu.com/deconstruct-files/.)
Given that the module cache is _explicitly_ a cache, I don't think we should spend too much effort (or performance) on crash-robustness in the download path.
We probably do need better tools for pruning individual corrupted modules out of the cache, though. That's #32976.
Renaming of files and directories are different issues (that post is about files). With files you can end up with both names on disk -- these are just hard links to the same object. Directories can only have one location in the tree, so fsck
at boot would fix a corrupted directory rename.
I've tested gotip
with GODEBUG=modcacheunzipinplace=1
and I can no longer reproduce this issue.
On the other hand, I can still reproduce it when that flag is disabled, but it now works around 80% of the time, probably thanks to having increased to retry timeout to 2000ms.
@qmuntal Glad to hear it's working for you. Thanks for verifying.
@networkimprov I'll concede that some files may be out of place after a system crash on an un-journaled file system. I'm mostly assuming metadata writes are committed in order. I mainly want to ensure that 1) if the go
command crashes, nothing is broken in a way that can't be automatically cleaned up, and 2) the go
command can access the cache concurrently with another instance of itself (possibly at another version).
The module cache is just a cache, so it shouldn't contain any irreplaceable user data. I think this is the right place on the performance / safety spectrum to make the tradeoff.
Dear all, I experienced the same issue inside an Ubuntu WSL on Windows. Luckily @alexbrainman pointed me to this issue here so I tried gotip and surprisingly went into the same issue. But only because I used the wrong GODEBUG=
gomodcacheunzipinplace=1
and not the correct GODEBUG=modcacheunzipinplace=1
. The latter is working fine with gotip. @jayconrod and @qmuntal please consider adjusting your comments above.
@cb80, thanks for the heads-up! I've updated the comments above, but we should also add a draft release note and update documentation for the new GODEBUG=modcacheunzipinplace
value to make things more discoverable.
Mentioning this in the release notes is a good idea, but I'll hold off writing anything until I know whether the backport issues are approved.
Just tried the new flag in 1.14.2 on Windows 10 1803 with McAfee Virus scanner and finally I can work again! Great work!
Like @cb80 I am seeing this problem in WSL. I also have 1.14.2 with Windows 1803 and McAfee antivirus like @Hades32. It did not go away for me, however. Still seeing this problem unless I am applying the suggested solution wrong? For example this simple example fails when run in a new go directory:
$ export GODEBUG=modcacheunzipinplace=1
$ GO111MODULE=on go get -v google.golang.org/[email protected]
@S1erraQ Would you mind posting the full go version
and go env
output, plus the complete error text from the commands you ran? Just want to confirm this is the exact same issue.
The fix for this isn't platform-specific, but WSL is not one of the platforms we test on, and it's likely there are file-system idiosyncrasies in WSL I'm not aware of.
@s1erraQ if you had the problem before, ensure cleaning your complete mod cache before retrying
@jayconrod I too am not sure this is the same issue so I would appreciate any insight you can give. I have run into a number of problems due to Windows 1803 (but I cannot update that at the moment), which uses WSL v1--so it could be related to that. Here is the output you asked for:
$ env
HOSTTYPE=x86_64
LESSCLOSE=/usr/bin/lesspipe %s %s
LANG=C.UTF-8
USER=cnelson
GOPATH=/d/go
PWD=/d/go
HOME=/home/cnelson
XDG_DATA_DIRS=/usr/local/share:/usr/share:/var/lib/snapd/desktop
SHELL=/bin/bash
TERM=xterm-256color
DOCKER_HOST=tcp://localhost:2375
SHLVL=1
LOGNAME=cnelson
PATH=/home/cnelson/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/c/Program Files/WindowsApps/CanonicalGroupLimited.UbuntuonWindows_2004.2020.424.0_x64__79rhkp1fndgsc:/c/Program Files (x86)/Common Files/Oracle/Java/javapath_target_12506218:/c/ProgramData/Oracle/Java/javapath_target_12438546:/c/Python27:/c/Windows/System32:/c/Windows:/c/Windows/System32/wbem:/c/Windows/System32/WindowsPowerShell/v1.0:/c/Program Files (x86)/Vagrant/bin:/c/Python27/Scripts:/c/Program Files/TortoiseSVN/bin:/c/Program Files (x86)/EaseUS/Todo Backup/bin:/c/Windows/System32/OpenSSH:/c/Program Files/Intel/WiFi/bin:/c/Program Files/Common Files/Intel/WirelessCommon:/c/Go/bin:/c/Program Files/Docker/Docker/resources/bin:/c/ProgramData/DockerDesktop/version-bin:/c/Users/cnelson/AppData/Local/Microsoft/WindowsApps:/c/Users/cnelson/AppData/Local/GitHubDesktop/bin:/c/Users/cnelson/AppData/Local/Microsoft/WindowsApps:/c/Program Files/protoc/bin:/c/Users/cnelson/go/bin:/c/Users/cnelson/AppData/Local/Programs/Microsoft VS Code/bin:/snap/bin:/usr/local/go/bin
PS1=$
LESSOPEN=| /usr/bin/lesspipe %s
_=/usr/bin/env
$ go version
go version go1.14.2 linux/amd64
$ pwd
/d/go
$ ls
$ GO111MODULE=on go get -v google.golang.org/[email protected]
go: downloading google.golang.org/grpc v1.28.1
go get google.golang.org/[email protected]: rename /d/go/pkg/mod/google.golang.org/[email protected] /d/go/pkg/mod/google.golang.org/[email protected]: permission denied
$ find .
.
./pkg
./pkg/mod
./pkg/mod/cache
./pkg/mod/cache/download
./pkg/mod/cache/download/google.golang.org
./pkg/mod/cache/download/google.golang.org/grpc
./pkg/mod/cache/download/google.golang.org/grpc/@v
./pkg/mod/cache/download/google.golang.org/grpc/@v/v1.28.1.info
./pkg/mod/cache/download/google.golang.org/grpc/@v/v1.28.1.lock
./pkg/mod/cache/download/google.golang.org/grpc/@v/v1.28.1.zip
./pkg/mod/cache/download/google.golang.org/grpc/@v/v1.28.1.ziphash
./pkg/mod/cache/download/sumdb
./pkg/mod/cache/download/sumdb/sum.golang.org
./pkg/mod/cache/download/sumdb/sum.golang.org/lookup
./pkg/mod/cache/download/sumdb/sum.golang.org/lookup/google.golang.org
./pkg/mod/cache/download/sumdb/sum.golang.org/lookup/google.golang.org/[email protected]
./pkg/mod/cache/download/sumdb/sum.golang.org/tile
./pkg/mod/cache/download/sumdb/sum.golang.org/tile/8
./pkg/mod/cache/download/sumdb/sum.golang.org/tile/8/0
./pkg/mod/cache/download/sumdb/sum.golang.org/tile/8/0/x004
./pkg/mod/cache/download/sumdb/sum.golang.org/tile/8/0/x004/092
./pkg/mod/cache/download/sumdb/sum.golang.org/tile/8/0/x004/458.p
./pkg/mod/cache/download/sumdb/sum.golang.org/tile/8/0/x004/458.p/232
./pkg/mod/cache/download/sumdb/sum.golang.org/tile/8/1
./pkg/mod/cache/download/sumdb/sum.golang.org/tile/8/1/015
./pkg/mod/cache/download/sumdb/sum.golang.org/tile/8/1/017.p
./pkg/mod/cache/download/sumdb/sum.golang.org/tile/8/1/017.p/106
./pkg/mod/cache/download/sumdb/sum.golang.org/tile/8/2
./pkg/mod/cache/download/sumdb/sum.golang.org/tile/8/2/000.p
./pkg/mod/cache/download/sumdb/sum.golang.org/tile/8/2/000.p/17
./pkg/mod/google.golang.org
./pkg/sumdb
./pkg/sumdb/sum.golang.org
./pkg/sumdb/sum.golang.org/latest
$ export GODEBUG=modcacheunzipinplace=1
$ GO111MODULE=on go get -v google.golang.org/[email protected]
go get google.golang.org/[email protected]: rename /d/go/pkg/mod/google.golang.org/[email protected] /d/go/pkg/mod/google.golang.org/[email protected]: permission denied
$ go get google.golang.org/grpc
(this works)
@S1erraQ if you had the problem before, ensure cleaning your complete mod cache before retrying
I did try that before posting. No effect. Also the output above is in a fresh go folder as shown.
@S1erraQ Thanks, that confirms this is the same issue.
I missed something though: Go 1.14.2 is compatible with the new behavior triggered by modcacheunzipinplace=1
, but it doesn't actually implement that new behavior yet.
If you're comfortable living on the bleeding edge, gotip can build a Go toolchain from the master
branch which has the new behavior. This will also be in the 1.15 beta, which should be out in June.
I missed something though: Go 1.14.2 is compatible with the new behavior triggered by
modcacheunzipinplace=1
, but it doesn't actually implement that new behavior yet.Really? I specifically searched for the commit and traced it to release 1.14.2 and could confirm a change in behavior ONLY after setting the env. Seemed like a big coincident...
@jayconrod thanks for the explanation. I guess then the 2s interval is what truly helped me...
Change https://golang.org/cl/230537 mentions this issue: doc/go1.15: add notes for GOMODCACHE, modcacheunzipinplace
@kiwionly do u use Ubuntu 18.04 on windows 10 ?
do u use Mcafee Endpoint Security ?
if so just search and find ubuntu1804.exe in your system
then right click ubuntu1804.exe-> properties -> compatibility(second tab) -> Setting(select run as administration ) ,then click Application (A) , all is ok ;
if it still build fail ,please click security tab , and click Advaced(V)
make sure your account access has Complete control ,default all allow is TrustedInstaller user ,this will prevented by Mcafee ;
wish it helped all who met this err ,I had troubled by it for months ;
@kiwionly do u use Ubuntu 18.04 on windows 10 ?
do u use Mcafee Endpoint Security ?
if so just search and find ubuntu1804.exe in your system
then right click ubuntu1804.exe-> properties -> compatibility(second tab) -> Setting(select run as administration ) ,then click Application (A) , all is ok ;if it still build fail ,please click security tab , and click Advaced(V)
make sure your account access has Complete control ,default all allow is TrustedInstaller user ,this will prevented by Mcafee ;wish it helped all who met this err ,I had troubled by it for months ;
@herohenu I build GO on window 10 with McAfee, unfortunately I no admin right to change any settings.
Anyway, thx for your remind.
Also hit this issue:
go version go1.15 linux/amd64
rm -rf ~/go/*
but that didn't helpGODEBUG=modcacheunzipinplace=1 go mod tidy
For fellow McAfee sufferers, this vague page may be related: https://kc.mcafee.com/corporate/index?page=content&id=KB91411
Change https://golang.org/cl/258798 mentions this issue: cmd/go/internal/modfetch: always extract module directories in place
Most helpful comment
Seems like the same issue as microsoft/vscode-go#2897. We've never been able to reproduce this.
Could you post a listing of the directory
C:\Users\502380\go\pkg\mod\golang.org\x\
?Are you able to reproduce this or did it just happen once? How about after clearing the module cache with
go clean -modcache
?