Go: path/filepath: Walk is slow compared to 'find' due to extra Stat calls

Created on 17 Jul 2016  路  48Comments  路  Source: golang/go

On my Mac laptop (SSD, warm caches),

bradfitz@laptop ~$ time (find src/ | wc -l)
   42408

real    0m0.177s
user    0m0.035s
sys     0m0.144s
bradfitz@laptop ~$ time (find src/ | wc -l)
   42408

real    0m0.178s
user    0m0.036s
sys     0m0.144s
bradfitz@laptop ~$ time (find src/ | wc -l)
   42408

real    0m0.177s
user    0m0.035s
sys     0m0.144s

But with a basic use of filepath.Walk instead of find:

$ cat walk.go 
package main

import (
        "fmt"
        "log"
        "os"
        "path/filepath"
)

func main() {
        err := filepath.Walk("src", func(path string, fi os.FileInfo, err error) error {
                if err != nil {
                        return err
                }
                fmt.Println(path)
                return nil
        })
        if err != nil {
                log.Fatal(err)
        }
}

It's much slower:

bradfitz@laptop ~$ time (./walk  | wc -l)
   42408

real    0m0.447s
user    0m0.123s
sys     0m0.406s
bradfitz@laptop ~$ time (./walk  | wc -l)
   42408

real    0m0.434s
user    0m0.120s
sys     0m0.399s
bradfitz@laptop ~$ time (./walk  | wc -l)
   42408

real    0m0.435s
user    0m0.119s
sys     0m0.398s

This is the bulk of the goimports execution time. goimports actually does a slightly parallelized walk with goroutines (which helps on NFS filesystems), but it doesn't seem to matter. I'm just trying to get any Go program to be closer in performance to find.

Any clues?

Speed tracking bug for goimports is #16367

/cc @josharian @ianlancetaylor

Go2 Go2Cleanup NeedsFix Performance help wanted

Most helpful comment

For people looking for reusing the code into their own projecst, there is now this third-party library that is basically a clone of fastwalk: https://github.com/karrick/godirwalk

All 48 comments

The same on Linux. This isn't just about OS X.

Actually on Linux it's even more pronounced:

bradfitz@dev-bradfitz-debian2:~$ time (find src/ | wc )
  97909   98025 5108045

real    0m0.146s
user    0m0.128s
sys     0m0.092s
bradfitz@dev-bradfitz-debian2:~$ time (find src/ | wc )
  97909   98025 5108045

real    0m0.147s
user    0m0.136s
sys     0m0.084s
bradfitz@dev-bradfitz-debian2:~$ time (find src/ | wc )
  97909   98025 5108045

real    0m0.150s
user    0m0.148s
sys     0m0.076s

Versus:

bradfitz@dev-bradfitz-debian2:~$ time (./walk | wc )
  97909   98025 5108044

real    0m0.530s
user    0m0.412s
sys     0m0.388s
bradfitz@dev-bradfitz-debian2:~$ time (./walk | wc )
  97909   98025 5108044

real    0m0.515s
user    0m0.344s
sys     0m0.432s
bradfitz@dev-bradfitz-debian2:~$ time (./walk | wc )
  97909   98025 5108044

real    0m0.528s
user    0m0.312s
sys     0m0.484s

Walk sorts the file entries, but find doesn't. Not sure how much that affects the profiles though.

@dgryski, that's a bit of it, but it's looking like the actual problem is that filepath.Walk does a Readdirnames followed by a bunch of Stat calls on each to figure out what's a directory, when the underlying kernel interface supports telling you which are directories in the same call where you read the names, but Go doesn't take advantage of that on Unix platforms. (And path/filepath could do that on Windows, which Go's syscall and os package already supports, but it hard-codes the use of Readdirnames)

Actually, I realize now that the filepath.Walk (at least with our definition of os.FileInfo) is inherently slow, as its API guarantees you get a complete FileInfo with each call, which requires us to do a stat per file, even if the user doesn't want it.

I think I'll move away from using filepath.Walk for goimports.

@bradfitz out of curiosity, is your idea here to rewrite readdirnames for unix/linux to make use of the Type in Dirent (https://golang.org/src/syscall/syscall_linux.go?h=ParseDirent#L773) as you were mentioning above?

It looks like some OSs may have that information but Go is discarding in favor of lstat.

@vinceprignano, yes, that's what I'm doing now.

CL https://golang.org/cl/25001 mentions this issue.

In my observations there were 2 problems:
1) GC-unfriendly API for reading directory (no ability to do just readdir and get file types without all other stat structures, a lot of slice and struct pointer allocations)
2) That's it

I wrote an article (in russian, sorry for that) that describes how to achieve performance that is higher than find(1) has: https://habrahabr.ru/post/281382/ (google translate: https://translate.google.ru/translate?sl=ru&tl=en&js=y&prev=_t&hl=ru&ie=UTF-8&u=https%3A%2F%2Fhabrahabr.ru%2Fpost%2F281382%2F&edit-text=&act=url)

Actually, I realize now that the filepath.Walk (at least with our definition of os.FileInfo) is inherently slow, as its API guarantees you get a complete FileInfo with each call, which requires us to do a stat per file, even if the user doesn't want it.

I think I'll move away from using filepath.Walk for goimports.

I've run into a similar situation.

I created filepath.Walk-like functionality for http.FileSystem in https://godoc.org/github.com/shurcooL/httpfs/vfsutil#Walk. In my use cases, I often worked with virtual filesystems where files did not exist on disk, but rather virtually after doing some computation. So the act of "opening" a file was an a relatively expensive action. I ran into the same problem, the filepath.Walk behavior was suboptimal. It would open a virtual file (potentially somewhat expensive), do a stat on it to pass its os.FileInfo to walk func, then close the file. Then, inside the walk func, I would often need to access the file contents, so I'd have to open it again (somewhat expensive).

For those specific needs, I ended up creating https://godoc.org/github.com/shurcooL/httpfs/vfsutil#WalkFiles which would pass both the os.FileInfo but also the file contents as an io.ReadSeeker:

// WalkFiles walks the filesystem rooted at root, calling walkFn for each file or
// directory in the filesystem, including root. In addition to FileInfo, it passes an
// ReadSeeker to walkFn for each file it visits.
func WalkFiles(fs http.FileSystem, root string, walkFn WalkFilesFunc) error { ... }

// WalkFilesFunc is the type of the function called for each file or directory visited by Walk.
// It's like filepath.WalkFunc, except it provides an additional ReadSeeker parameter for file being visited.
type WalkFilesFunc func(path string, info os.FileInfo, rs io.ReadSeeker, err error) error

That worked well for my needs, but I like your idea of just not passing os.FileInfo at all, since computing even that can be expensive and sometimes not neccessary for caller. So it'd be simpler, more general, and more performant to just give the file path to walk func and let it do what it needs (stat, read file, etc.).

@hirochachacha It sounds like you are describing a bug in the syscall package, a bug that should be fixed in any case. The syscall package provides ReadDirent and ParseDirent functions on all GNU/Linux targets; do those functions work on mips64?

@cherrymui, can you run go test golang.org/x/tools/imports on mips64?

@hirochachacha, please file a separate issue for that.

x/sys/unix 's Getdents uses SYS_GETDENTS64.
So what I am saying is completely wrong.
I'm really sorry about confusing people.

I'll remove my comments above and close the opened issue.

Again, I'm sorry.

ref https://github.com/golang/go/issues/16435

@bradfitz, on MIPS64

$ go test -v golang.org/x/tools/imports 
=== RUN   TestFastWalk_Basic
--- PASS: TestFastWalk_Basic (0.01s)
=== RUN   TestFastWalk_Symlink
--- PASS: TestFastWalk_Symlink (0.01s)
=== RUN   TestFastWalk_SkipDir
--- PASS: TestFastWalk_SkipDir (0.01s)
=== RUN   TestFastWalk_TraverseSymlink
--- PASS: TestFastWalk_TraverseSymlink (0.01s)
=== RUN   TestFixImports
--- PASS: TestFixImports (0.06s)
=== RUN   TestImportSymlinks
--- PASS: TestImportSymlinks (0.04s)
=== RUN   TestFixImportsVendorPackage
--- PASS: TestFixImportsVendorPackage (0.01s)
=== RUN   TestFindImportGoPath
--- PASS: TestFindImportGoPath (0.01s)
=== RUN   TestFindImportInternal
--- FAIL: TestFindImportInternal (0.03s)
    fix_test.go:1034: findImportGoPath("race", Acquire ...) = "", false; want "internal/race", false
=== RUN   TestFindImportRandRead
--- PASS: TestFindImportRandRead (0.00s)
=== RUN   TestFindImportVendor
--- PASS: TestFindImportVendor (0.01s)
=== RUN   TestProcessVendor
--- FAIL: TestProcessVendor (0.03s)
    fix_test.go:1127: Process("/home/a/src/git/go-official/src/math/x.go") did not add expected hpack import "golang_org/x/net/http2/hpack"; got:
        package http

        import "strings"

        func f() { strings.NewReader(); hpack.HuffmanDecode() }
=== RUN   TestFindImportStdlib
--- PASS: TestFindImportStdlib (0.00s)
=== RUN   TestRenameWhenPackageNameMismatch
--- PASS: TestRenameWhenPackageNameMismatch (0.03s)
=== RUN   TestOptimizationWhenInGoroot
--- PASS: TestOptimizationWhenInGoroot (0.03s)
=== RUN   TestIgnoreDocumentationPackage
--- PASS: TestIgnoreDocumentationPackage (0.03s)
=== RUN   TestImportPathToNameGoPathParse
--- PASS: TestImportPathToNameGoPathParse (0.03s)
=== RUN   TestIgnoreConfiguration
--- PASS: TestIgnoreConfiguration (0.03s)
=== RUN   TestSkipNodeModules
--- PASS: TestSkipNodeModules (0.03s)
=== RUN   TestPkgIsCandidate
--- PASS: TestPkgIsCandidate (0.00s)
FAIL
exit status 1
FAIL    golang.org/x/tools/imports  0.499s

$ go version
go version devel +1d2ca9e Mon Jul 18 21:07:34 2016 +0000 linux/mips64le

Is the failure related? Should I go look into it?

@cherrymui, those are old tests. I changed them to not rely on real GOROOT contents. Update & try again?

$ go get -u golang.org/x/tools/imports
$ go test -v golang.org/x/tools/imports
=== RUN   TestFastWalk_Basic
--- PASS: TestFastWalk_Basic (0.01s)
=== RUN   TestFastWalk_Symlink
--- PASS: TestFastWalk_Symlink (0.00s)
=== RUN   TestFastWalk_SkipDir
--- PASS: TestFastWalk_SkipDir (0.00s)
=== RUN   TestFastWalk_TraverseSymlink
--- PASS: TestFastWalk_TraverseSymlink (0.01s)
=== RUN   TestFixImports
--- PASS: TestFixImports (0.06s)
=== RUN   TestImportSymlinks
--- PASS: TestImportSymlinks (0.04s)
=== RUN   TestFixImportsVendorPackage
--- PASS: TestFixImportsVendorPackage (0.01s)
=== RUN   TestFindImportGoPath
--- PASS: TestFindImportGoPath (0.01s)
=== RUN   TestFindImportInternal
--- FAIL: TestFindImportInternal (0.03s)
    fix_test.go:1034: findImportGoPath("race", Acquire ...) = "", false; want "internal/race", false
=== RUN   TestFindImportRandRead
--- PASS: TestFindImportRandRead (0.00s)
=== RUN   TestFindImportVendor
--- PASS: TestFindImportVendor (0.01s)
=== RUN   TestProcessVendor
--- FAIL: TestProcessVendor (0.03s)
    fix_test.go:1127: Process("/home/a/src/git/go-official/src/math/x.go") did not add expected hpack import "golang_org/x/net/http2/hpack"; got:
        package http

        import "strings"

        func f() { strings.NewReader(); hpack.HuffmanDecode() }
=== RUN   TestFindImportStdlib
--- PASS: TestFindImportStdlib (0.00s)
=== RUN   TestRenameWhenPackageNameMismatch
--- PASS: TestRenameWhenPackageNameMismatch (0.03s)
=== RUN   TestOptimizationWhenInGoroot
--- PASS: TestOptimizationWhenInGoroot (0.03s)
=== RUN   TestIgnoreDocumentationPackage
--- PASS: TestIgnoreDocumentationPackage (0.03s)
=== RUN   TestImportPathToNameGoPathParse
--- PASS: TestImportPathToNameGoPathParse (0.03s)
=== RUN   TestIgnoreConfiguration
--- PASS: TestIgnoreConfiguration (0.03s)
=== RUN   TestSkipNodeModules
--- PASS: TestSkipNodeModules (0.03s)
=== RUN   TestPkgIsCandidate
--- PASS: TestPkgIsCandidate (0.00s)
FAIL
exit status 1
FAIL    golang.org/x/tools/imports  0.489s

I did a go get -u but see no difference...

@cherrymui, oh, sorry... there are still those two tests which still use the machine's real $GOROOT. I didn't convert those yet I guess. In any case, you can ignore those errors. They should go away when you sync your $GOROOT. I'll fix those tests.

But the real question was whether the TestFastWalk_ tests passed, which they did.

Thanks!

Ok, great. Thanks!

Not sure what the current plan is to fix this, but given that os.FileInfo is an interface, wouldn't make sense to return an object that does a lazy Lstat only when required? It could answer Name() and IsDir() using only information returned by ParseDirent, and anything else would lazily call Lstat.

@rasky, that doesn't work, as the Size() int64 method (or IsDir() bool) on the os.FileInfo has no means of returning an error if the lazy lstat fails.

It seems cool. Such a significant improvement is very impressive. When will it be ready to replace the current filepath.Walk or add into the filepath pkg?

@dongweigogo, it can't replace the current filepath.Walk because the API is too different. We would have to add something, which I don't think anybody's very excited about.

@bradfitz should we copy your fastwalk in the mean time? or this is something that would be merged or put on some of the experimental repos?

The license permits copying if you want to copy it. I suppose I could put it on github or something.

CL https://golang.org/cl/40092 mentions this issue.

On macOS 10.12.4 with go version go1.8.1 darwin/amd64 goimports panics with a deadlock on this change.

After "go get -u golang.org/x/tools/cmd/goimports" this is the panic:
```fatal error: all goroutines are asleep - deadlock!

goroutine 1 [chan receive]:
golang.org/x/tools/imports.fixImports(0xc420015380, 0xc420094e80, 0x7fff5fbff0e2, 0x12, 0x329, 0x600, 0x132c770, 0xc420094e80, 0x0)
/Users/johans/Go/src/golang.org/x/tools/imports/fix.go:237 +0x6ba
golang.org/x/tools/imports.Process(0x7fff5fbff0e2, 0x12, 0xc42016e000, 0x329, 0x600, 0x132c770, 0x0, 0x0, 0x0, 0xc420049da8, ...)
/Users/johans/Go/src/golang.org/x/tools/imports/imports.go:58 +0x719
main.processFile(0x7fff5fbff0e2, 0x12, 0x0, 0x0, 0x132d620, 0xc42000c018, 0x1, 0x0, 0x0)
/Users/johans/Go/src/golang.org/x/tools/cmd/goimports/goimports.go:136 +0x139
main.gofmtMain()
/Users/johans/Go/src/golang.org/x/tools/cmd/goimports/goimports.go:273 +0x21c
main.main()
/Users/johans/Go/src/golang.org/x/tools/cmd/goimports/goimports.go:189 +0x32

goroutine 5 [semacquire]:
sync.runtime_Semacquire(0xc4201bf26c)
/usr/local/go/src/runtime/sema.go:47 +0x34
sync.(WaitGroup).Wait(0xc4201bf260)
/usr/local/go/src/sync/waitgroup.go:131 +0x7a
golang.org/x/tools/imports.fastWalk(0xc4201c6860, 0x14, 0xc4201bf250, 0x132daa0, 0xc4202ab608)
/Users/johans/Go/src/golang.org/x/tools/imports/fastwalk.go:92 +0x7c7
golang.org/x/tools/imports.scanGoDirs(0xc420014200)
/Users/johans/Go/src/golang.org/x/tools/imports/fix.go:568 +0x354
golang.org/x/tools/imports.scanGoPath()
/Users/johans/Go/src/golang.org/x/tools/imports/fix.go:490 +0x26
sync.(
Once).Do(0x135f600, 0x1223060)
/usr/local/go/src/sync/once.go:44 +0xbe
golang.org/x/tools/imports.findImportGoPath(0xc420011810, 0x8, 0xc4201bd380, 0x7fff5fbff0e2, 0x12, 0x0, 0x0, 0x0, 0x0, 0x0)
/Users/johans/Go/src/golang.org/x/tools/imports/fix.go:723 +0x1cd
golang.org/x/tools/imports.fixImports.func2(0x7fff5fbff0e2, 0x12, 0xc420077320, 0xc420011810, 0x8, 0xc4201bd380)
/Users/johans/Go/src/golang.org/x/tools/imports/fix.go:227 +0x7b
created by golang.org/x/tools/imports.fixImports
/Users/johans/Go/src/golang.org/x/tools/imports/fix.go:233 +0x614

goroutine 7 [chan send]:
golang.org/x/tools/imports.(*walker).doWork(0xc4201bd4a0, 0xc4201bf260)
/Users/johans/Go/src/golang.org/x/tools/imports/fastwalk.go:121 +0x190
created by golang.org/x/tools/imports.fastWalk
/Users/johans/Go/src/golang.org/x/tools/imports/fastwalk.go:71 +0x273

goroutine 8 [chan send]:
golang.org/x/tools/imports.(*walker).doWork(0xc4201bd4a0, 0xc4201bf260)
/Users/johans/Go/src/golang.org/x/tools/imports/fastwalk.go:121 +0x190
created by golang.org/x/tools/imports.fastWalk
/Users/johans/Go/src/golang.org/x/tools/imports/fastwalk.go:71 +0x273

goroutine 10 [chan send]:
golang.org/x/tools/imports.(*walker).doWork(0xc4201bd4a0, 0xc4201bf260)
/Users/johans/Go/src/golang.org/x/tools/imports/fastwalk.go:121 +0x190
created by golang.org/x/tools/imports.fastWalk
/Users/johans/Go/src/golang.org/x/tools/imports/fastwalk.go:71 +0x273

goroutine 11 [chan send]:
golang.org/x/tools/imports.(*walker).doWork(0xc4201bd4a0, 0xc4201bf260)
/Users/johans/Go/src/golang.org/x/tools/imports/fastwalk.go:121 +0x190
created by golang.org/x/tools/imports.fastWalk
/Users/johans/Go/src/golang.org/x/tools/imports/fastwalk.go:71 +0x273


After doing:

cd ~/Go/src/golang.org/x/tools
git revert 4436e5475416d77a9352558d118d0b585b962ef1
go get golang.org/x/tools/cmd/goimports
```

goimports works as intended again and executes successfully.

Of course if there is any extra information needed, please let me know.

Dug a little further, all works as expected using only one worker, the problem is that walk is blocking on writing to the error channel. Emptying the work and error channels on return resolved this.

This patch works for me:

diff --git a/imports/fastwalk.go b/imports/fastwalk.go
index c8a79490..0a15d898 100644
--- a/imports/fastwalk.go
+++ b/imports/fastwalk.go
@@ -64,7 +64,21 @@ func fastWalk(root string, walkFn func(path string, typ os.FileMode) error) erro
                // buffered for correctness & not leaking goroutines:
                resc: make(chan error, numWorkers),
        }
-       defer close(w.donec)
+       defer func() {
+               close(w.donec)
+
+       L:
+               for {
+                       select {
+                       case <-w.workc:
+                       case <-w.resc:
+                       default:
+                               break L
+                       }
+               }
+
+       }()
+
        // TODO(bradfitz): start the workers as needed? maybe not worth it.
        for i := 0; i < numWorkers; i++ {
                wg.Add(1)

CL https://golang.org/cl/40351 mentions this issue.

That's interesting, I hadn't considered that. It think this happens when a worker received more input but then gets stuck on sending the result to w.resc here.

Another possible way of dealing with this that doesn't involve draining could be:

func (w *walker) doWork(wg *sync.WaitGroup) {
    defer wg.Done()
    for {
        select {
        case <-w.donec:
            return
        case it := <-w.workc:
            err := w.walk(it.dir, !it.callbackDone)
            select {
            case <-w.donec:
                return
            case w.resc <- err:
            }
        }
    }
}

I wonder if there's a better way of dealing with this.

Confirmed that your approach works as well. I'm not happy with either solution either, but don't see any better way, so either way works from my perspective ;)

@johansglock, thanks, and sorry. I've reverted the change in https://go-review.googlesource.com/c/40296/

CL https://golang.org/cl/41681 mentions this issue.

Any updates on this?

@Tim15, there can't be any update on this due to the fundamental API design. See comment https://github.com/golang/go/issues/16399#issuecomment-233190667 above.

I'll label this bug Go 2. Maybe we can change the walk interface to be more efficient later.

I've written a few benchmarks:

  • a naive recursive using os.File.Readdir()
  • parallelized recursive version using goroutines and wait groups
  • and also included the one by @bradfitz up above using filepath.Walk()

Here they are: https://github.com/Tim15/golang-parallel-io.

They're all slower than find, probably due to the fact that os.File.Readdir() returns a complete os.FileInfo. I was surprised though that my parallel version was faster than filepath.Walk().

Hopefully this helps us figure this out!

@Tim15, we already figured it out over a year ago. See the comment I linked you to 3 days ago. The solution for goimports ended up being https://github.com/golang/tools/blob/master/imports/fastwalk.go but that's not applicable to the standard library. This isn't fixable with the current API.

@bradfitz Cool, thanks for catching me up!

@bradfitz Sorry to ask here but is it possible to use the fastwalk API in one of my applications? I couldn't find any information on that.
Looks like no method is exported to the outside (all are lower-cased).

@SirWindfield, no, I did not expose its implementation publicly. You can copy/paste it into your own package if you'd like, but it's not something I want to support at this time.

For people looking for reusing the code into their own projecst, there is now this third-party library that is basically a clone of fastwalk: https://github.com/karrick/godirwalk

forked the repo and made an update script to clean things up with tags: https://github.com/s12chung/fastwalk

@bradfitz Sorry for revisiting this old issue, but I have a few questions I was hoping you'd be able to answer.

Your fastwalk implementation only provides Filemode and states file stat calls must be done by the user.. However, included in the implementation is a fallback when Dirent.Type == DT_UNKNOWN (https://github.com/golang/tools/blob/master/internal/fastwalk/fastwalk_unix.go#L54), where an lstat is performed. Does this not mean we're guaranteed a complete FileInfo, or are there still edge cases?

The fastwalk callback gives you an os.FileMode, not an os.FileInfo.

@bradfitz I guess my question is, is there any reason it couldn't provide an os.FileInfo with no additional overhead? Unless I'm mistaken, it providing os.FileMode was intended to remove the overhead filepath.Walk has with using lstat on each call, guaranteeing a full os.FileInfo, but it looks like your implementation already achieves this but throws the information away and provides only os.FileMode.

I'm not suggesting that you change your implementation, but I guess I'm just confused as to why file.ReadDir and filepath.Walk both rely on additional lstat calls, rather than doing something similar to your implementation where a fallback is done if the full information isn't available in a single readdir call.

@saracen, Size. See https://github.com/golang/go/issues/16399#issuecomment-236624924 above.

@bradfitz Thank you for the clarification, and sorry for adding to the noise there.

For people stumbling across this issue needing a fast version that returns a full os.FileInfo, I've just created this repository: https://github.com/saracen/walker - this performs the lstat call, and I'm hoping it's just as fast, or faster, than the implementations out there that don't.

There's a few people here that are really familiar with the issue, so I'd really appreciate if there's anybody willing to check my solution.

Years ago I started importing https://github.com/karrick/godirwalk into filepath.Walk, but stopped because it required creating another public data structure in the standard library. Rather than discuss it I just stopped working, and went back to my day job.

I am still open to a port of the godirwalk library as an extension to filepath if it's something folks would like to see.

(For the record, godirwalk was not a fork of FastWalk, but was started by me on 2017-08-21 while working on dep to account for dissimilarities when walking file system on Windows or UNIX.)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ianlancetaylor picture ianlancetaylor  路  138Comments

rsc picture rsc  路  242Comments

ianlancetaylor picture ianlancetaylor  路  519Comments

chai2010 picture chai2010  路  216Comments

derekperkins picture derekperkins  路  180Comments