go/build: Import does not find source directory for code in modules

Created on 20 Jul 2018  路  9Comments  路  Source: golang/go

go/build's Import and Context.Import can't find import paths in modules. We've been assuming this was unfixable and that people should move to golang.org/x/tools/go/packages. But I think we can and should fix this for Go 1.11.

I've been thinking about the problem wrong. I assumed people were using the export data from GOPATH/pkg, because that would be the only way to handle large programs efficiently. And I assumed that we really needed to get all the information out of the go command in one shot, instead of asking for every package in sequence. But it sounds like most tools actually just use the source importer, slow as that may be in general, since they are working with relatively small code bases.

It occurred to me this morning that we can keep the source importer working as long as we make package go/build's Context.Import invoke the go command to ask what directory to look in, for each package.

Using the go command this way will have a pseudo-quadratic behavior where you ask the go command about the top-level package, it loads everything it needs to know about that package (including info about transitive dependencies), and then reports only the directory of the top-level package. Then this repeats for each dependency, which in turn reloads all the information about the dependency's dependencies, hence the pseudo-quadratic (depending on the shape of the import graph). But if people are OK with the source importer, then that may be a signal that the package graph is small enough not to worry about this. And in any event, "a little slow" is certainly better than "not working at all", so it's a better stopgap for tools that aren't ready to convert to golang.org/x/tools/go/packages.

We can limit the go command invocation to happen only when:

  • the AllowBinary and IgnoreVendor flags are not set,
  • Context describes the local file system (all the helper functions are nil),
  • srcDir is outside GOPATH/src (or GO111MODULE=on),
  • there's a go.mod in srcDir or above,
  • the release tags are unmodified from Default.Context,
  • and path does not name a standard library package.

Those limitations should mean that invoking the go command only happens when is really needed, so that existing uses will be completely unaffected and still use the non-go-command code paths. The standard library constraint is not necessary for correctness but avoids unnecessary invocations to ask about the standard library, which can still be found by the old code.

In the long term we still want to move people to golang.org/x/tools/go/packages, but this should provide a much smoother transition.

Would fix most of #24661, magefile/mage#124, and probably many other issues, at least provisionally.

NeedsFix modules release-blocker

Most helpful comment

Maybe this is a silly question, but why do we need to shell out to the go command at all? (Can we move modload and its dependencies someplace where go/build can import them?)'

A very real problem we've had with go/build is that tools built with one Go version are used with another Go version. Concretely, if I go get github.com/magefile/mage, we want that one binary to work no matter which version of Go I am using. I should be able to flip back and forth between, say, Go 1.10 and Go 1.11beta2, without having to maintain "mage for Go 1.10" vs "mage for Go 1.11beta2". Same for golint, etc. The fundamental mistake here is to allow tools to have a copy of the logic for where you get source code from. The last source layout change was the introduction of vendor directories and we ran into the same problem. There should be one canonical implementation - inside the go command. Other tools should invoke the go command to access this implementation. If you update to a new Go distribution, you get a new go command, and all the tools that invoke it automatically adapt to whatever has changed.

It matters a lot that there's one source of truth.

Are tools invoking Import multiple times from the same binary? If so, it seems like we could avoid at least some of the quadratic behavior by using the -deps flag and caching all of the packages it returns in an in-process (and per-srcDir) cache.

If we must, we could think about using -deps and adding a cache. But I don't think we must - I think most things will run plenty fast enough. And there are two significant drawbacks to using -deps and adding a cache: -deps and the cache.

First, -deps: Go 1.10 and earlier did not have -deps, so a Go 1.11-compiled mage or golint will fail if used with a Go 1.10 or earlier go command. We really do want tools that work with multiple releases. We may issue a point release for Go 1.10 with more flags in go list, for use by golang.org/x/tools/go/packages, but if we can avoid making go/build depend on that hypothetical point release, it's better not to.

Second, the cache: for a one-shot program analysis, yes, the cache would make sense. But in general go/build.Import returns information about the current file system, so there are likely to be long-running programs that occasionally load packages, do things, and throw them away. Those need to not see stale results. The API has no cache invalidation because it has never had a cache. It does not seem prudent to add one at this point.

Programs that are too slow with the modified go/build should almost certainly update to golang.org/x/tools/go/packages instead of us adding a cache to go/build.

All 9 comments

Maybe this is a silly question, but why do we need to shell out to the go command at all? (Can we move modload and its dependencies someplace where go/build can import them?)

Are tools invoking Import multiple times from the same binary? If so, it seems like we could avoid at least some of the quadratic behavior by using the -deps flag and caching all of the packages it returns in an in-process (and per-srcDir) cache.

Change https://golang.org/cl/125296 mentions this issue: go/build: invoke go command to find modules during Import, Context.Import

Maybe this is a silly question, but why do we need to shell out to the go command at all? (Can we move modload and its dependencies someplace where go/build can import them?)'

A very real problem we've had with go/build is that tools built with one Go version are used with another Go version. Concretely, if I go get github.com/magefile/mage, we want that one binary to work no matter which version of Go I am using. I should be able to flip back and forth between, say, Go 1.10 and Go 1.11beta2, without having to maintain "mage for Go 1.10" vs "mage for Go 1.11beta2". Same for golint, etc. The fundamental mistake here is to allow tools to have a copy of the logic for where you get source code from. The last source layout change was the introduction of vendor directories and we ran into the same problem. There should be one canonical implementation - inside the go command. Other tools should invoke the go command to access this implementation. If you update to a new Go distribution, you get a new go command, and all the tools that invoke it automatically adapt to whatever has changed.

It matters a lot that there's one source of truth.

Are tools invoking Import multiple times from the same binary? If so, it seems like we could avoid at least some of the quadratic behavior by using the -deps flag and caching all of the packages it returns in an in-process (and per-srcDir) cache.

If we must, we could think about using -deps and adding a cache. But I don't think we must - I think most things will run plenty fast enough. And there are two significant drawbacks to using -deps and adding a cache: -deps and the cache.

First, -deps: Go 1.10 and earlier did not have -deps, so a Go 1.11-compiled mage or golint will fail if used with a Go 1.10 or earlier go command. We really do want tools that work with multiple releases. We may issue a point release for Go 1.10 with more flags in go list, for use by golang.org/x/tools/go/packages, but if we can avoid making go/build depend on that hypothetical point release, it's better not to.

Second, the cache: for a one-shot program analysis, yes, the cache would make sense. But in general go/build.Import returns information about the current file system, so there are likely to be long-running programs that occasionally load packages, do things, and throw them away. Those need to not see stale results. The API has no cache invalidation because it has never had a cache. It does not seem prudent to add one at this point.

Programs that are too slow with the modified go/build should almost certainly update to golang.org/x/tools/go/packages instead of us adding a cache to go/build.

without having to maintain "mage for Go 1.10" vs "mage for Go 1.11beta2"

Thank you! I was really worried about that.

If so, it seems like we could avoid at least some of the quadratic behavior by using the -deps flag and caching all of the packages it returns in an in-process (and per-srcDir) cache.

As another data point, some time ago I created the very much temporary https://github.com/myitcv/hybridimporter:

_myitcv.io/hybridimporter is an implementation of go/types.ImporterFrom that uses non-stale package dependency targets where they exist, else falls back to a source-file based importer._

My experience with hybridimporter is that having even moderately complex import graphs pushes you rapidly towards wanting to cache the result of go list -deps. I take Russ' point about long running programs and stale data, but I'm not aware that go/packages has a "solution" for that either (https://go-review.googlesource.com/c/tools/+/116359/1/go/packages/doc.go#51) as yet.

This isn't per se an argument in favour of caching; just a note of what I observed. I don't know what the "best" answer for go/build is.

But it sounds like most tools actually just use the source importer, slow as that may be in general, since they are working with relatively small code bases.

If you mean go/importer.For("source", nil), I think it has had more to do about correctness than size of input. For example, go/importer.For("gc", nil) has historically been a source of confusion for users if they forgot to go install the packages first.

How about modifying dirname of go modules?

For example, $GOMOD_ROOT=$GOPATH/src/mod

From

$GOMOD_ROOT/
       golang.org/x/[email protected]/
           xxx.go

to

$GOMOD_ROOT/
       golang.org/x/[email protected]/
            src/golang.org/x/tools/go/loader/
                  xxx.go
            src/golang.org/x/tools/go/loader/v1/   # or version alias
                  xxx.go

Then we could just append GOPATHs by list of go.mod when GO111MODULE=on.
like GOPATH=$GOPATH:$GOMODROOT/golang.org/x/[email protected].

Change https://golang.org/cl/183991 mentions this issue: cmd/doc: provide working directory to build.Import calls

Change https://golang.org/cl/218817 mentions this issue: go/build: populate partial package information in importGo

Was this page helpful?
0 / 5 - 0 ratings

Related issues

mingrammer picture mingrammer  路  3Comments

longzhizhi picture longzhizhi  路  3Comments

Miserlou picture Miserlou  路  3Comments

rakyll picture rakyll  路  3Comments

OneOfOne picture OneOfOne  路  3Comments