We already have go/ast.IsExported, which is helpful; it makes the operation simple and expressive, and it avoids the simple mistake of only checking ASCII and ignoring Unicode.
Checking if a string is a valid identifier as per the spec (https://golang.org/ref/spec#Identifiers) falls under the same category, I believe. It seems intuitive, so developers tend to write a simple implementation by hand without much effort. However, there are many gotchas which are easy to get wrong:
_ is allowed, even as a first characterFor example, take this implementation I just came across this morning:
for _, c := range str {
switch {
case c >= '0' && c <= '9':
case c >= 'a' && c <= 'z':
case c >= 'A' && c <= 'Z':
case c == '_':
default:
return false
}
}
return true
It gets three of the gotchas wrong; it would return true on "" and "123", and return false on "伪尾". I imagine that a correct implementation would be like:
if len(str) == 0 {
return false
}
for i, c := range str {
if unicode.IsLetter(c) || c == '_' || (i > 0 && unicode.IsDigit(c)) {
continue
}
return false
}
return true
However, I'm still not 100% confident that this is correct. I'd probably compare it to implementations in go/* and x/tools before using the code, just in case.
The standard library implements variants of this in multiple places; go/scanner does it on a per-rune basis while scanning, go/build implements it in its importReader, and cmd/doc has an isIdentifier func. The same applies to other official repos like x/tools, where I've found godoc.isIdentifier, semver.isIdentChar, analysis.validIdent, and rename.isValidIdentifier.
I think we should have a canonical implementation with a proper godoc and tests. My proposal is to add this to go/ast, as it would already be useful to other packages within the standard library. I also think this is a fairly low-level API, and it's commonly needed whenever one is modifying Go syntax trees, creating new Go files, or otherwise tinkering with Go syntax.
If adding API to go/ast is a problem, perhaps golang.org/x/tools/go/ast/astutil. However, in my mind that package contains much higher-level functions, so it feels a bit out of place.
I realise this could have been a proposal, but I thought the proposed change wouldn't be controversial enough to warrant one. Feel free to retitle it as a proposal if preferred.
/cc @griesemer @josharian @alandonovan @dominikh
I think this is a fine idea, but the function belongs in go/token, not go/ast.
go/token sounds fine; I just suggested go/ast as that's where IsExported is.
@ianthehat pointed out keywords, which I was forgetting about. The spec mentions:
The following keywords are reserved and may not be used as identifiers.
So I presume that IsIdentifier should return false for all Go keywords.
Ian also mentions that go/scanner could be used for this by the user, checking that the entire string scans as a single identifier. I've implemented what I think is a correct way to do it below:
func isIdentifier(str string) bool {
src := []byte(str)
var s scanner.Scanner
fset := token.NewFileSet()
file := fset.AddFile("", fset.Base(), len(src))
s.Init(file, src, nil, scanner.ScanComments)
_, tok, lit := s.Scan()
return tok == token.IDENT && lit == str
}
However, I still think that an IsIdentifier func in the standard library would be better. Compared to using go/scanner directly, it would be a single line and require zero allocations.
We might consider IsKeyword in go/token as well. In addition to being convenient, the implementation could use a perfect hash for faster checks (as does cmd/compile/internal/syntax).
I'd be happy for us to add IsIdentifier, IsKeyword, and IsExported to go/token.
IsKeyword sounds like a great idea; I imagine that IsIdentifier would still return false for inputs like for, though. That is, IsIdentifier would use IsKeyword as part of its implementation.
and IsExported to go/token
Not sure if this is strictly necessary, since it's already part of go/ast. I assume you're suggesting it for the sake of consistency.
I assume you're suggesting it for the sake of consistency.
Exactly. The go/ast function would delegate to the go/token implementation.
I'm ok with @adonovan said.
Please send CLs to me. Thanks.
Hi @mvdan @adonovan @griesemer. Can I work on this one? I don't see a CL attached.
I鈥檓 pretty sure @mvdan is working on this; he assigned it to himself.
@josharian Cool, thanks. :)
I will get to it soon - I just have a moderately sized backlog of stuff to go through, now that the tree is open again :)
Change https://golang.org/cl/169018 mentions this issue: go/token: add IsIdentifier, IsKeyword, and IsExported
Most helpful comment
I'd be happy for us to add IsIdentifier, IsKeyword, and IsExported to go/token.