Go: x/tools/go/generated: parser for https://golang.org/s/generatedcode format

Created on 9 Oct 2018  Â·  11Comments  Â·  Source: golang/go

Given that #13560 has been accepted, resolved, and by now, widely accepted by the Go community, I think it can be helpful to have a Go parser for it that tools written in Go could use (if desired).

It's relatively easy to write an ad hoc parser using the regexp package, but it's also possible to write a more specialized one that has less overhead.

I already wrote one a while ago, and it currently lives at github.com/shurcooL/go/generated.

I want to move it out of the repository it's currently in, which contains many miscellaneous Go packages of lower utility and quality. I was originally planning to move it out into a standalone repository on my personal site, but then I thought it might be a good fit under x/tools subrepo, specifically, in the x/tools/go directory, since it deals with Go code. The proposed import path would be:

import "golang.org/x/tools/go/generated"

Hence this proposal. If accepted, I'm happy to maintain it/be the owner. The scope is very narrow, so it should be very low volume of work.

Not sure how this intersects with #17244.

If not accepted, I would likely move it here instead:

import "dmitri.shuralyov.com/go/generated"

(The code is currently MIT licensed, but in either case, I'd relicense it under the Go license.)

/cc @andybons @bradfitz @alandonovan @matloob @ianthehat

NeedsDecision Proposal

Most helpful comment

Regular expressions are great for experimenting with complicated patterns, but you don't need the regexp package for an expression this trivial:

strings.HasPrefix(line, "// Code generated ") && strings.HasSuffix(line, " DO NOT EDIT.")

All 11 comments

Is the only concern about the regular expression its speed? If so, I'd rather improve the speed of regexp instead. There are some performance issues for the package already, so one of them may already cover this particular expression: #11646 #26623 #21463

I realise that the regex package may take a while to be fast even for this one case, but I don't think there's a sense of urgency to have a faster implementation of it in x/tools.

Can the predicate be expressed as a function of the AST? Very often the tools that need this predicate have already parsed the file.

package astutil

// Generated reports whether the file was generated by a program,
// not handwritten, following the conventions described in #13560.
// The syntax tree must have been invoked with the ParseComment flag.
// Example:
//   f, err := parser.ParseFile(fset, filename, parser.ParseComment|parser.ImportOnly)
//   if err != nil { ... }
//   g := Generated(f)
func Generated(f *ast.File) bool { ... }

It seems like a good fit for the x/tools/go/astutil package since it doesn't add any dependencies.

Regular expressions are great for experimenting with complicated patterns, but you don't need the regexp package for an expression this trivial:

strings.HasPrefix(line, "// Code generated ") && strings.HasSuffix(line, " DO NOT EDIT.")

This POSIX-compliant grep command has worked fine for my needs in shell scripts:

grep -Exq '^// Code generated .* DO NOT EDIT\.$' "$file"
# Exits 0 if $file matches.

In my case, I'm not concerned about a false positive of that line occurring in a block comment or in a raw string literal.

Is the only concern about the regular expression its speed?

No. I wanted to avoid the regexp dependency because it was easy to implement the matching behavior myself.

The regexp package can still be optimized independently of this code.

Can the predicate be expressed as a function of the AST?

Good question. I'd be curious to try. I expect it should be possible, but I'm not sure if it'd be faster.

Note that not only comments would have to be checked in the AST, but raw string literals as well. A .go file with the following code needs to be reported as a positive match for the pattern:

package p

import "fmt"

func Foo() {
    const s = `this is a raw string literal. it happens to contain
some text in column 1, including a string that matches
the "Code generated ... DO NOT EDIT." comment, like so:
// Code generated  DO NOT EDIT.
to product a correct result, it needs to be detected`
    fmt.Println(len(s))
}

Since it does in fact contain a line of text that matches the regular expression ^// Code generated .* DO NOT EDIT\.$.

This POSIX-compliant grep command has worked fine for my needs in shell scripts:

Indeed. It's very quick and easy to implement a parser for https://golang.org/s/generatedcode format in any language that has support for regular expressions. This package is meant to be available for use by tools that are written in Go, and prefer to avoid incurring the cost of spawning a process or importing the regexp package.

In my case, I'm not concerned about a false positive of that line occurring in a block comment or in a raw string literal.

Those are not false positives. According to the spec, a file is considered generated if a matching line of text appears anywhere in the file, which can include block comments and raw string literals.

Those are not false positives.

Good to know that grepping this way shouldn't hit any false positives then :)

This package is meant to be available for use by tools that are written in Go

Anecdotally: my most frequent case for needing to check whether a file is auto-generated has been to manually filter through go list output in a shell script, to decide whether a file should be run through go fmt. So for me, it would be more helpful if I could use {{if .Generated}} as part of a template passed to go list -f. But changing go list is not this issue.

This would be a 1-declaration 1-line package. We try to avoid those. Please propose a new method or top-level function in an existing package, like probably go/ast.

Ping @dmitshur; see previous comment.

I agree with your comment @rsc that this would be a small package, and it's better to avoid that in x/tools.

I've been thinking about where this functionality could fit into existing packages, and I have some initial thoughts I can share. Please note this isn't a final proposal; just an iteration I want to share. I'm not completely happy with it and wouldn't want it implemented as is.

WIP Counter-Proposal Revision 1

Out of the existing packages, this functionality seemed to belong most in go/parser. The API surface of the change would be somewhat large. It would involve a new parser.Mode:

// A Mode value is a set of flags (or 0).
// They control the amount of source code parsed and other optional
// parser functionality.
//
type Mode uint

const (
    PackageClauseOnly Mode             = 1 << iota // stop parsing after package clause
    ImportsOnly                                    // stop parsing after import declarations
    ParseComments                                  // parse comments and add them to AST
+   GeneratedComment                               // determine whether the file contains a "// Code generated … DO NOT EDIT." line comment
    Trace                                          // print a trace of parsed productions
    DeclarationErrors                              // report declaration errors
    SpuriousErrors                                 // same as AllErrors, for backward-compatibility
    AllErrors         = SpuriousErrors             // report all errors (not just the first 10 on different lines)
)

When the mode is turned on, parser.ParseFile would perform additional work to determine if the .go file has a "// Code generated … DO NOT EDIT." line comment.

(There is some precedent to this in the go/build package, specifically the ImportComment import mode.)

In order for parser.ParseFile to report the result, ast.File in go/ast package would also need to be modified:

// A File node represents a Go source file.
//
// [...]
type File struct {
    Doc        *CommentGroup   // associated documentation; or nil
    Package    token.Pos       // position of "package" keyword
    Name       *Ident          // package name
    Decls      []Decl          // top-level declarations; or nil
    Scope      *Scope          // package scope (this file only)
    Imports    []*ImportSpec   // imports in this file
    Unresolved []*Ident        // unresolved identifiers in this file
    Comments   []*CommentGroup // list of all comments in the source file
+   Generated  *Comment        // a "// Code generated … DO NOT EDIT." line comment; or nil
}

At first, this seemed like the most appropriate place in existing packages to add this functionality. However, after considering the details more closely, it doesn't seem like a good fit.

The "// Code generated … DO NOT EDIT." comment lives in parallel to the actual AST structure of a file. According to the spec, such a comment can appear inside a block comment, and it would still count. It can also appear within a raw string literal, and match. The work that must be done to determine if a .go file has such a comment has very little to do with parsing the actual AST of a .go file.

WIP Counter-Proposal Revision 2

An alternative proposal to consider is to add this functionality to go/build package in the following way. Add a GeneratedComment import mode, similar to the existing build.ImportComment:

 // If ImportComment is set, parse import comments on package statements.
 // Import returns an error if it finds a comment it cannot understand
 // or finds conflicting comments in multiple source files.
 // See golang.org/s/go14customimport for more information.
 ImportComment
+
+// If GeneratedComment is set, determine which .go files contain
+// a "// Code generated … DO NOT EDIT." line comment,
+// and populate GeneratedGoFiles with the results.
+// See golang.org/s/generatedcode for more information.
+GeneratedComment

The result would be populated in a new build.Package field:

type Package struct {
    // ...

    // Source files
    GoFiles          []string // .go source files (excluding CgoFiles, TestGoFiles, XTestGoFiles)
    CgoFiles         []string // .go source files that import "C"
    IgnoredGoFiles   []string // .go source files ignored for this build
    InvalidGoFiles   []string // .go source files with detected problems (parse error, wrong package name, and so on)
+   GeneratedGoFiles []string // .go source files containing a contains a "// Code generated … DO NOT EDIT." line comment; populated only if GeneratedComment mode is set
    CFiles           []string // .c source files
    // ...

This approach seems to be cleaner than adding something to go/parser and ast.File, but I'm not sure if it's good either. The information about whether or not a file is generated feels like it should belong at the file level, not the package level.

There's also the factor that golang.org/x/tools/go/packages is meant to be used instead of go/build by now, so adding something new to go/build doesn't seem like a good idea.

A "WIP Counter-Proposal Revision 3" section can/should be added where a similar approach as in revision 2 can be evaluated for the golang.org/x/tools/go/packages package.

Conclusion

I would lean towards either putting this proposal on hold (or waiting to iterate on it further), or outright declining it. I made the proposal because I was planning to move the existing package, and wanted to check if there'd be a more fitting place for it in x/tools. It seems the existing package doesn't meet the criteria for inclusion.

Additionally, given how easy it is to parse the generated comment with just regexp, there isn't a high demand for this functionality as far as I can tell (e.g., github.com/shurcooL/go/generated currently has only 1 known public importer), so it's unclear if the additional work of integrating it into an existing package is justified at this time.

@dmitshur Let's discuss this in person. Per @golang/proposal-review we're ok with adding a utility function iff we can decide on a good place for it and a good API.

Change https://golang.org/cl/158817 mentions this issue: cmd/go: copy missing bit of documentation about code generated comment

Was this page helpful?
0 / 5 - 0 ratings