Go: x/tools/go/loader: improve and stabilize API

Created on 27 Jan 2016  Â·  21Comments  Â·  Source: golang/go

We should simplify and stabilize the go/loader package's API. Before that happens, there are a number of features to add and bugs to fix:

Features:

1) support ... patterns just like "go build" and "go list" do. (Ideally the logic would be shared. Russ suggested fork/execing "go list", but I don't like this approach: it uses the actual file system instead of the abstraction defined by build.Context.)

2) Allow packages specified on the command line to use cgo. Currently only imported packages are subject to cgo preprocessing.

3) Support pkg-config.

API simplifications:

4) The public fields of loader.Program are sometimes annoying to use. Study existing clients and come up with something more convenient.

5) Update the go/types tutorial at https://github.com/golang/example/tree/master/gotypes.

Once this is done, we should use go/loader in tools such as golint, govet and stringer.

cc: @griesemer

FrozenDueToAge

Most helpful comment

We are (once again) actively working on this and should have something to
share in a week or two. In brief, we are thinking along the following
lines:

The go/build.Package structure encodes too much of the 'go build' way of
organizing projects, so we need a data type that describes a package of Go
source code independent of the underlying build system. It should work
equally well with go build and vgo, and also other build systems such as
bazel & blaze, making it easy to construct analysis tools that work in all
these environments. (Currently, for example, tools such as errcheck and
staticcheck are essentially unavailable to the Go community at Google, and
some of Google's internal tools for Go are unavailable externally.) We
will provide a uniform way to obtain package metadata by querying each of
these build systems, optionally supporting their preferred command-line
notations for packages, so that tools integrate neatly with users' build
environments. The metadata query function will shell out to an external
query tool appropriate to the current workspace.

The go/loader package currently assumes "source all the way down": it
recursively parses, loads and type-checks all dependencies of the initial
packages. Much more efficient is to load source code only for the
requested packages, and to use export data (compiler-generated .a files)
for their dependencies. This may require a build to materialize the .a
files, but now that builds are incremental this cost is amortized. Building
.a files avoids the historical problem of using them: that they often don't
exist, as few people use "go install", and when they do exist they are
often stale.

The new x/tools/go/loader API, which we plan to call x/tools/go/packages,
and eventually contribute to the standard library, will provide a single
Package data type that describes a package's metadata (analogous to
go/build.Package), and optionally its syntax trees and type information, if
type-checking and/or parsing were requested for that package.

On 24 April 2018 at 11:36, Dominik Honnef notifications@github.com wrote:

Loading from export data and loading from source give drastically
different results. Export data doesn't provide enough information about
function bodies, unexported objects, associations with the AST, and so on.
People who use go/loader https://goto.google.com/loader usually depend
on it being a source importer, and changing that would break a lot of
existing tools such as errcheck, staticcheck and many more. That renders
your first option untenable.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/golang/go/issues/14120#issuecomment-383977611, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AICt9DTlLxwjxWMDXQlDsTXyVS5FPRhTks5tr0ZsgaJpZM4HNty4
.

All 21 comments

Alan, any chance of revisiting this early in 1.9? I'd be happy to be a sounding board.

I just found that even golang.org/x/tools/cmd/gotype doesn't use go/loader, and it pains me to invoke slow, heavy 'go install' just to get some code typechecked.

Hi Josh, thanks for the offer. I would like to finish off the loader but none of these changes are on my plate for Google project work in the near future so I can't promise any schedule.

Do you have a sense of how much churn loader clients outside x/tools can tolerate as we finalize the API? Since Google vendors everything we're insulated from it, but I got a panel question at GopherCon that was effectively "when will you stop breaking things?". Perhaps we need a mailing list of interested parties.

I think that churn during a dev cycle is expected. Once it gets frozen, then the Go 1 compatibility promise kicks in.

This issue could be the mailing list, for a first pass. Off the top of my head, the non-Google people that come to mind who work a lot with the go/* packages are: @dominikh @fatih @valyala

Coming back to this in light of the the 1.10 build cache (and vgo), I still think there is a use case for loading from source, more specifically partially loading from source.

The use case is code that does not (yet) fully compile. In particular I have code generators in mind, but the same could apply to tools like @mdempsky's go/types-based gocode etc where partial results are desirable.

My thinking is that x/tools/go/loader could be changed such that either:

  • x/tools/go/loader uses go/importer if a package is present in the build cache (determined by the changes that will land as part of the https://go-review.googlesource.com/c/go/+/107776 sequence), i.e. is current and compiles, and falls back to using source code loading if not
  • x/tools/go/loader remains as is but uses the supplied Config.TypeChecker.Importer if it implements types.ImporterFrom (with that implementation doing the build cache check), falling back to the current behaviour if nil, nil is returned (or some such defined behaviour)

Thoughts?

Loading from export data and loading from source give drastically different results. Export data doesn't provide enough information about function bodies, unexported objects, associations with the AST, and so on. People who use go/loader usually depend on it being a source importer, and changing that would break a lot of existing tools such as errcheck, staticcheck and many more. That renders your first option untenable.

We are (once again) actively working on this and should have something to
share in a week or two. In brief, we are thinking along the following
lines:

The go/build.Package structure encodes too much of the 'go build' way of
organizing projects, so we need a data type that describes a package of Go
source code independent of the underlying build system. It should work
equally well with go build and vgo, and also other build systems such as
bazel & blaze, making it easy to construct analysis tools that work in all
these environments. (Currently, for example, tools such as errcheck and
staticcheck are essentially unavailable to the Go community at Google, and
some of Google's internal tools for Go are unavailable externally.) We
will provide a uniform way to obtain package metadata by querying each of
these build systems, optionally supporting their preferred command-line
notations for packages, so that tools integrate neatly with users' build
environments. The metadata query function will shell out to an external
query tool appropriate to the current workspace.

The go/loader package currently assumes "source all the way down": it
recursively parses, loads and type-checks all dependencies of the initial
packages. Much more efficient is to load source code only for the
requested packages, and to use export data (compiler-generated .a files)
for their dependencies. This may require a build to materialize the .a
files, but now that builds are incremental this cost is amortized. Building
.a files avoids the historical problem of using them: that they often don't
exist, as few people use "go install", and when they do exist they are
often stale.

The new x/tools/go/loader API, which we plan to call x/tools/go/packages,
and eventually contribute to the standard library, will provide a single
Package data type that describes a package's metadata (analogous to
go/build.Package), and optionally its syntax trees and type information, if
type-checking and/or parsing were requested for that package.

On 24 April 2018 at 11:36, Dominik Honnef notifications@github.com wrote:

Loading from export data and loading from source give drastically
different results. Export data doesn't provide enough information about
function bodies, unexported objects, associations with the AST, and so on.
People who use go/loader https://goto.google.com/loader usually depend
on it being a source importer, and changing that would break a lot of
existing tools such as errcheck, staticcheck and many more. That renders
your first option untenable.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/golang/go/issues/14120#issuecomment-383977611, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AICt9DTlLxwjxWMDXQlDsTXyVS5FPRhTks5tr0ZsgaJpZM4HNty4
.

Thanks @alandonovan, great to hear this is being worked on.

Much more efficient is to load source code only for the
requested packages, and to use export data (compiler-generated .a files)
for their dependencies.

This was exactly the thinking behind my second option. Sounds good.

The new x/tools/go/loader API, which we plan to call x/tools/go/packages

Is the plan still to have @rsc's go list -deps work-chain and preliminary module support land in Go 1.11 and have your changes build on top of that?

The metadata query function will shell out to an external
query tool appropriate to the current workspace.

This might not be the appropriate place, but I'm going to ask the question in any case and we can move discussion elsewhere as required.

I use godef extensively, as do many others. With the arrival of vgo I think editors are going to have to develop some concept of context for source code navigation. At the moment they can use godef context-free. With vgo this ceases to be the case; import resolution happens in the context of a go.mod file. I guess this point may also be true for other build systems or go.mod equivalents.

I make this point because I assume external tooling is going to have to supply this context to x/tools/go/packages and the like in order for resolution to properly function.

I might have totally missed something obvious here...

In the (unlikely) event I haven't, none of this is, I suspect, news to you, but I'm guessing/hoping what you share in a couple of weeks will cover this point?

Thanks

@alandonovan I hope you can consider the following points for the design of the go/packages API:

For staticcheck I've been working on my own go/loader and my own export data format.

Much like your plans, my loader uses export data if available and up to date, and source code otherwise. My export data, however, is different from that of any existing Go toolchain in that it stores a full representation of the AST and all type information, so that the end result is identical to parsing from source. I would hope that your new API would allow me to use my own export data instead of being forced to use that produced by the build system (go, vgo, bazel, …)

I've been working on my entirely own loader because the go/loader API lacks certain associations that I need in my tools. Specifically, I need to map between all the different domains of go/token, go/ast, go/build, go/types and go/ssa. I need to be able to map from go/types.Package to loader.Package, from loader.Package to build.Package, from go/ast.File to loader.Package and even from token.File to ast.File.

go/loader doesn't maintain some of these mappings (for example it discards go/build.Packages), and relies on expensive operations like PathEnclosingInterval for others.

Excited to hear this is being worked on. I look forward to reviewing the API, for little things like being able to (optionally?) discover the size of a file prior to reading it.

Paul:

Is the plan still to have @rsc's go list -deps work-chain and preliminary module support land in Go 1.11 and have your changes build on top of that?

Exactly. The -deps flag was added primarily so that we could obtain package metadata in a single invocation. (Currently it requires no fewer than three.)

With the arrival of vgo I think editors are going to have to develop some concept of context for source code navigation. At the moment they can use godef context-free. With vgo this ceases to be the case; import resolution happens in the context of a go.mod file.

Well observed. Future analysis tools will require the working directory as an input, as the name of the file or directory on which to operate, even if absolute, will no longer be sufficient.

Dominik:

My export data, however, is different from that of any existing Go toolchain in that it stores a full representation of the AST and all type information, so that the end result is identical to parsing from source. I would hope that your new API would allow me to use my own export data instead of being forced to use that produced by the build system (go, vgo, bazel, …)

Interesting. We already have serial formats for ASTs (Go syntax) and for types (exportdata). Is it significantly more efficient to reconstruct typed ASTs by deserializing a third, combined format, rather than simply re-parsing and re-typechecking? Or is your goal to avoid the dichotomy between packages loaded from export data (which don't have typed syntax trees) and those produced by the type checker, which do?

I need to map between all the different domains... go/loader doesn't maintain some of these mappings and relies on expensive operations ... for others.

Yes, the plethora of different entities that are 1:1 with, say, a package, can be confusing, but I'm surprised that performance (not complexity or mere inconvenience) of the operations for converting from one representation to another was a significant problem. Can you point me at any examples where these conversions were so burdensome that they motivated your current work?

Josh:

I look forward to reviewing the API, for little things like being able to (optionally?) discover the size of a file prior to reading it.

Thanks. That requirement was not on my radar.

Interesting. We already have serial formats for ASTs (Go syntax) and for types (exportdata). Is it significantly more efficient to reconstruct typed ASTs by deserializing a third, combined format, rather than simply re-parsing and re-typechecking? Or is your goal to avoid the dichotomy between packages loaded from export data (which don't have typed syntax trees) and those produced by the type checker, which do?

For my tools, I always need the full AST (including comments), as well as the full type information (not just the export data), to compute various information (such as "is this function pure?", "is this function deprecated?" and soon taint analysis). Existing export data doesn't record this information, and even if it did, I continuously need new information computed, which makes it simpler to just serialize full type information + AST, than adding more and more aggregate information.

To draw a comparison, imagine guru's peers query. Imagine this: one package, which is loaded from source, passes a channel to a function in another package, which is loaded from export data. How would you implement the peers query so that it can point out individual channel operations in the function?

Yes, the plethora of different entities that are 1:1 with, say, a package, can be confusing, but I'm surprised that performance (not complexity or mere inconvenience) of the operations for converting from one representation to another was a significant problem. Can you point me at any examples where these conversions were so burdensome that they motivated your current work?

Convenience is certainly one factor. With go/loader, staticcheck computes a lot of the mappings based on the results of go/loader, which is certainly doable. Some of that, however, involves duplicating work, such as going back to disk to find and partially parse go/build packages.

I'd like to add to the point about custom export formats. I believe that what Dominik wants is to be able to cache extra data on top of the existing export data that gc produces. That may be the detailed syntax tree, or it might be mappings between go/ast and go/types, or it might be other static analysis computations that a linter would like to cache to speed up consecutive runs (such as "is this func pure?").

Would it not be possible to add this to the interface, as a way to attach extra information to each package's cached data? I imagine said extra data could be invalidated with the existing data such as type information. The extra interface could be similar to context.Context, mapping interface{} keys to interface{} values.

I'm not sure if that would remove the need to support arbitrary export formats, but at least it seems closer to what most Go tool authors would need. It also seems like it would be easier to do, rather than designing an entirely separate export format.

@mvdan Maintaining the go/ast <-> go/types mapping is one important aspect of my work, yes. The other, however, is retaining all information about unexported types.

@alandonovan

With the arrival of vgo I think editors are going to have to develop some concept of context for source code navigation. At the moment they can use godef context-free. With vgo this ceases to be the case; import resolution happens in the context of a go.mod file.

Well observed. Future analysis tools will require the working directory as an input, as the name of the file or directory on which to operate, even if absolute, will no longer be sufficient.

I pondered for a bit why you said "require the _working directory_" (emphasis my own) instead of go.mod file as input here. My guess would be that your thinking is that one can find the go.mod file from the working directory?

But I'm now struggling to see how the go/types.ImporterFrom interface holds up here; does that now not need to be extended?

type ImporterFrom interface {
    Importer
    ImportFrom(path, dir string, mode ImportMode) (*Package, error)
    ImportFrom2(path, dir, ctxtDir string, mode ImportMode) (*Package, error)
}

The suitably-poorly-named ImportFrom2 having three parameters:

  • path - the (possibly relative) import path of the target package (because we're stuck with relative imports paths until Go 2)
  • dir - the directory containing the package file that contains an import of path
  • ctxtDir - the "working directory" to use your terminology of the current module (i.e. a means of resolving the current go.mod)

All of this might well be pre-empting what you're going to share... in which case please just say and I'll hold my tongue for now!

@alandonovan that sounds very much like what I've always wanted :heart:.

However, a recurring pain point for me with the current system has been build tags. I doubt a new system will be able to (or even should) cover all of these concerns but I'll lay them out and hope that at least some will be covered, here or elsewhere:

The current tags used are all in build.Context, but not necessarily all in BuildTags so serializing them to pass to a -tags flag when invoking an external command has to be done by hand. It would be nice if there were an easy way to get all the build tags used to load a program as a []string.

Conversely, it would be nice to take a []string of tags and the analog of build.Context in the new system and return a version with the appropriate fields set while inheriting everything else appropriately. (Similar in purpose to buildutil.TagsFlag but it sets stuff like GOARCH so that the context can be easily introspected without having to know what all the tags mean).

It would also be nice to know, file by file, what build tags from the Context, if any, triggered that file's inclusion. (I want this primarily to list files in a godoc type interface, annotated with why they were included).

I like the general idea, but I'm concerned that almost none of my projects that currently use go/build will be able to use this new package, because almost all of them require access to source code. For example, one extracts doc comments; another prints file names of dependencies.

I'm also concerned about tags - some tools require a coherent source tree that can build; others care more about seeing all source files regardless of tag combination. In the latter group, I see most tools that do package dependency analysis - I generally want to know about all possible dependencies, not just the ones for one particular combination of build tags.

FWIW, the heuristic I used in my godeps (dependency locking) tool for tags is to treat any build tag expression as true for known OS and architecture tags even when negated, and fall back to the default tag matching for other tags. (code here)

Sorry for radio silence; this project is inching along.

Please take a look at the API outlined in https://go-review.googlesource.com/c/tools/+/116359. Feedback welcome.

dominikh: you should find the new loader's WholeProgram mode gives you a complete source program, just as the current loader does today.

But you hint at an interesting problem, that of building modular analyses that pass information derived from inspecting a lower-level package to the analysis of a higher-level one. I've been working on an "Analysis API" for package-at-a-time analyses that allows vet-like checkers to be expressed independent of the driver tool (such as vet, 'go test', 'go build', web-based CI or code review tools, etc). I hope it will enable us to write, share, and re-use checkers across tools more easily. Currently I'm working on a way for each checker to get at facts deduced during checking of lower-level packages such as "function p.F delegates to printf" or "function p.F never returns". Checkers may depend on other checkers both within a unit and across units. Facts must be serializable so they may be passed between units, allowing "separate analysis" analogous to separate compilation. I don't plan to specify the binary encoding; each fact can choose its own. I would love your comments on it (will share link soon).

myitcv: the ImporterFrom interface doesn't have a "working directory" parameter but it's trivial to implement it using (in effect) a closure over the loader's working directory (os.Getwd by default).

jimmyfrasche, rogpeppe: re: build tags, I have not thought through the implications of build tags yet, but it's on the list. I don't expect we'll provide a mechanism to explain which tags were significant, or which files were excluded from compilation based on tags (or other aspects of configuration) as it appears infeasible to implement on all build systems. A recurring challenge has been the range of tools that use go/build, which defies easy generalization. A tool that runs the type-checker must necessarily follow the compiler's definition of a package---a concrete list of files that make up one compilation unit under a specific configuration---whereas a tool that, say, enumerates the source files of a project needs to generalize across all possible configurations. The new go/packages API biases towards the former, not the latter. (This is problematic even for type-based tools: a refactoring tool must modify all the files of a package, not just the ones for the specific configuration that was analyzed by the type checker.)

the ImporterFrom interface doesn't have a "working directory" parameter but it's trivial to implement it using (in effect) a closure over the loader's working directory (os.Getwd by default).

Thanks. Given the details you've shared in the CL, I think this becomes a moot question in any case.

To add two related questions here:

go generate

Currently the interface go generate presents to the generators it runs is (taken from go help generate):

  • The following environment variables:
$GOARCH
        The execution architecture (arm, amd64, etc.)
$GOOS
        The execution operating system (linux, windows, etc.)
$GOFILE
        The base name of the file.
$GOLINE
        The line number of the directive in the source file.
$GOPACKAGE
        The name of the package of the file containing the directive.
$DOLLAR
        A dollar sign.
  • A working directory that corresponds to the go list .Dir of the package being go generate-d

So I think we're missing the working directory of go generate itself as part of this interface (possibly another environment variable, GOGENERATEDIR?). A generator needs this value to correctly use a Loader.

Editors

At least as far as I've understood, it seems editors are going going to need to retain some sort of state relating to the current vgo module (or the directory containing the vgo module at least, the module itself is probably more intuitive to the user). Reason being, this directory will then be required for all calls to external tools that use the Loader: gocode, goimports etc.

Is it worth fleshing this out as we go? That way we can communicate something to editor folks/plugin writers etc as soon as possible? Because it seems the tool changes by themselves (to gocode, goimports etc) won’t be sufficient.

Thoughts?

@alandonovan

The new go/packages API biases towards the former, not the latter. (This is problematic even for type-based tools: a refactoring tool must modify all the files of a package, not just the ones for the specific configuration that was analyzed by the type checker.)

I was hoping for go/packages to be a generalized loader that didn't need to pay attention to specific GOOS and GOARCH or other tags, but I discovered using go/packages in the current API will be impossible. I also discovered that there is a field called "Flags" that tags can be passed through to the underlying tools. (The issue that I filed was #26697 .)

This seems like a mistake to me (both counts) and will force packages will be ported over to using like guru and gorename to be forced to ignore files that guarded by OS/ARCH/other tags, which is not what I want the tools to do at all, ever.

Perhaps the underlying tools are hard to discover a maximal graph (might be the case with bazel?) Then maybe return E_NOT_SUPPORTED and tools could fall back using "build mode" and setting a specific GOOS/GOARCH/tags.

Anyway, I have a specific need that requires the more general mode and I could work on this enabling this if that was desired. If not I'll just solve my case in my own repo. Thanks.

I think we should close this issue, as discussion on go/packages has been happening elsewhere for almost a year. go/loader will be deprecated any day now, so I don't think the original issue is worth keeping open. /cc @matloob @ianthehat

Was this page helpful?
0 / 5 - 0 ratings