I propose adding two functions to the package's API:
// Export returns name as an exported identifier.
func Export(name string) string {
// TODO: possibly avoid the alloc+copy if the first rune is already upper
ch, w := utf8.DecodeRuneInString(name)
return string(unicode.ToUpper(ch)) + name[w:]
}
// Unexport returns name as an unexported identifier.
func Unexport(name string) string {
// TODO: possibly avoid the alloc+copy if the first rune is already lower
ch, w := utf8.DecodeRuneInString(name)
return string(unicode.ToLower(ch)) + name[w:]
}
With #30064 merged, we already have some operations to deal with identifiers and names in go/token, such as IsIdentifier and IsExported.
Another common need is to export or unexport a certain name. I've personally implemented either of those functions a handful of times over the years, and I can easily find implementations in popuar projects like this one.
However, I've also seen plenty of bad implementations. For example:
exported = strings.ToUpper(s[:1]) + s[1:] // What if the first rune isn't ASCII?
exported = strings.Title(s) // Almost! unicode.ToTitle isn't the same as unicode.ToUpper
A very quick google search yielded an example of Title in a google-owned repository. And the last time I've seen the first mistake is here today, which made me think filing this proposal was worthwhile.
The correct implementation of each func is just two lines, but it's easy to forget about unicode, or to fall for the "false friend" that is strings.Title. Moreover, I think the API names would make code expressive and self-explanatory.
One detail to keep in mind is that some implementations, like google/wire's I linked first, handle prefixes like acronyms separately. For example, I believe they would unexport URLFoo as urlFoo.
However, I don't think this should be within the scope of token.Export; there isn't a way to impement this easily, covering all cases. A user can always implement this on top of token.Export by replacing certain well-known prefixes with their unexported counterparts.
If this proposal is accepted, I'm happy to upload a CL for Go 1.14.
I forgot to mention - I'm of course open to suggestions when it comes to the package for this. I thought of go/token because that's where IsExported was added, but perhaps another is more appropriate.
The implementation above does not work for unexported identifiers that have no upper-case equivalents, like CJK languages. We've encouraged a "soft standard" of using X for those if needed, but then Export(涓浗) = X涓浗 but Unexport(X涓浗) = x涓浗, so they are not inverses.
(The same problem would occur if we changed the rules to make those identifiers exported by default.)
I'm also bothered by your example Unexport("URLFoo") => "urlFoo" (hard to figure out how to know to do this) or "uRLFoo".
This doesn't seem like it is a well-formed operation. When you have a name you either generate an exported or unexported identifier to begin with.
What is the use case for this operation? Is it well-enough defined to merit being in the standard library?
In contrast, #30064 was very well-defined inspection of these, not modification.
/cc @zombiezen for wire's use case
The test cases for the export/unexport function may be helpful for understanding.
Wire's generated code requires a bunch of temporary variables of distinct types. In early versions of the tool, the temporaries were very unhelpful (e.g. v0, arg42, etc.) The linked algorithm is used to create friendlier names for these temporaries based on the type of the variable, although in some cases it gives up and falls back to the undescriptive name. The symbols returned from the export and unexport functions are never intended to be used by anything other than Wire. They are observable in the generated code, but are intended as a human inspection aid, and may still be weirdly capitalized for initialisms.
so they are not inverses
Ah, I hadn't thought about that. It certainly makes the problem even trickier.
In a way, it will would make the API a bit more confusing. On the other hand, it can be used as an argument that even more (all?) of the implementations out there are wrong.
I'm also bothered by your example Unexport("URLFoo") => "urlFoo" (hard to figure out how to know to do this) or "uRLFoo".
I only brought up this point to clarify that the API wouldn't try to do anything fancy here. So it would simply convert URLFoo into uRLFoo.
What is the use case for this operation? Is it well-enough defined to merit being in the standard library?
Code generation, similar to Wire's use case. For example, in the chromedp link I gave above, we code generate a list of exported devices via their names. The original names aren't exported, so we must do that operation.
I agree with you that the two operations here aren't well understood generally. Perhaps that means the APIs shouldn't exist. That still worries me, though, because of all the misconceptions and wrong implementations out in the wild.
@mvdan, it's not that the operations aren't well understood. It's that they are fundamentally not well-defined. They depend on knowledge of the underlying human language, which is quite complex.
I looked more carefully at the three examples given earlier.
Wire. I don't think wire supports the argument that a general library is needed. The unexport function is unused. The export function has exactly one use:
name := typeVariableName(t, "", func(name string) string { return "_wire" + export(name) + "Value" }, g.nameInFileScope)
The name on the right-hand side is the name of the function being called. So for names foo, Bar, you get _wireFooValue and _wireBarValue. It's kind of cute but I think ultimately not worth the complexity involved in defining export. It seems like it would have been completely fine to just use a form that did not change the names at all, like _wire_foo and _wire_Bar. The latter would actually be _easier_ to match back up with the functions being called, since no case changes would be required. If anything, I think export should not have been defined or used, and therefore that having token.Export would have been an attractive nuisance in this case more than a help.
Chromedp. Another motivating example was a use in chromedp, which says:
name := cleanRE.ReplaceAllString(d.Name, "")
name = strings.ToUpper(name[0:1]) + name[1:]
The claim was that the second line is wrong. But the first line strips out any non-ASCII-alphanumeric characters, so the second line is correct as written. The domain-specific knowledge here (in particular, that the input is not general text but is a specific JSON structure](https://raw.githubusercontent.com/GoogleChrome/puppeteer/master/lib/DeviceDescriptors.js)) makes this code fine. A general solution token.Export would not have this knowledge.
Go-endpoints. The use of strings.Title may be wrong, but since the input is multiple words, using token.Export would not be any better, unless token.Export handled conversions like foo_bar -> FooBar, which would be surprising. In this case, the code should be using the same conversion that the protocol buffer support for Go does, which is more involved. Maybe that should be exported; maybe it should be copied. But it wouldn't be token.Export.
What I see from these three examples is that conversion between naming styles is sometimes unnecessary and often requires other adjustments throughout the name.
It looks like token.Export is difficult to define in general, and on top of that would rarely be sufficient on its own. And we have zero examples of needing token.Unexport.
Fair enough, I agree that this isn't clearly a good idea. I've become less sure about this idea over the past few months.
Most helpful comment
@mvdan, it's not that the operations aren't well understood. It's that they are fundamentally not well-defined. They depend on knowledge of the underlying human language, which is quite complex.
I looked more carefully at the three examples given earlier.
Wire. I don't think wire supports the argument that a general library is needed. The
unexportfunction is unused. Theexportfunction has exactly one use:The
nameon the right-hand side is the name of the function being called. So for namesfoo,Bar, you get_wireFooValueand_wireBarValue. It's kind of cute but I think ultimately not worth the complexity involved in definingexport. It seems like it would have been completely fine to just use a form that did not change the names at all, like_wire_fooand_wire_Bar. The latter would actually be _easier_ to match back up with the functions being called, since no case changes would be required. If anything, I thinkexportshould not have been defined or used, and therefore that havingtoken.Exportwould have been an attractive nuisance in this case more than a help.Chromedp. Another motivating example was a use in chromedp, which says:
The claim was that the second line is wrong. But the first line strips out any non-ASCII-alphanumeric characters, so the second line is correct as written. The domain-specific knowledge here (in particular, that the input is not general text but is a specific JSON structure](https://raw.githubusercontent.com/GoogleChrome/puppeteer/master/lib/DeviceDescriptors.js)) makes this code fine. A general solution token.Export would not have this knowledge.
Go-endpoints. The use of strings.Title may be wrong, but since the input is multiple words, using token.Export would not be any better, unless token.Export handled conversions like
foo_bar->FooBar, which would be surprising. In this case, the code should be using the same conversion that the protocol buffer support for Go does, which is more involved. Maybe that should be exported; maybe it should be copied. But it wouldn't be token.Export.What I see from these three examples is that conversion between naming styles is sometimes unnecessary and often requires other adjustments throughout the name.
It looks like token.Export is difficult to define in general, and on top of that would rarely be sufficient on its own. And we have zero examples of needing token.Unexport.