The Hangul filler codepoints (U+115F, U+1160, U+3164) are rendered as zero-width white space as specified by the Unicode standard. And they are allowed in Go identifiers.
https://twitter.com/omengue/status/1293476660268998656
They are only useful for obfuscation. Despites that would be a breaking change I propose to forbid them in Go 1 spec as I only see nasty uses.
go version)?$ go version 1.4.6
yes
var á…Ÿ, á… , ã…¤ = 0, 1, 2
Go Playground: https://play.golang.org/p/bOmUvRfC3Co
Compile time failure.
Compiles fine.
Apart from code golf, can these been used maliciously?
@davecheney I see much potential for confusion in code review by hijacking names of basic types:
Go playground: https://play.golang.org/p/EYIrCh9XtI_u
func main() {
var v interface{}
fmt.Println(v)
v = interfaceá…Ÿ{}
fmt.Println(v)
}
type interfaceá…Ÿ struct{}
Output:
<nil>
{}
I'd hope that widespread use of this style of identifier would be self correcting via natural selection.
Said another way, rather than changing the language, can this problem be solved by telling people that this isn't a good way to write code?
Relevant issue about homographs attacks with go/vet: https://github.com/golang/go/issues/20115 Seems this would be a similar use case.
Referencing rsc in https://github.com/golang/go/issues/20209#issuecomment-335273273
I think we agree that the first step is to add a vet check and only after building experience with it think about actual language restrictions (or not). Marking this proposal-hold until there is a proposal (a short list would be fine) of what a vet "unicode" check would check.
The Hangul filler codepoints (U+115F, U+1160, U+3164) are rendered as zero-width white space as specified by the Unicode standard.
Looks like you missed U+FFA0.
But where in the Unicode standard does it say these are zero-width spaces?
They are classified as Lo (Letter, other).
I do see that they (including U+FFA0) have the Default_Ignorable_Code_Point property, and they are the only Letter-classified code points that do. So we should probably include them in the vet checks being contemplated in #20115.
Those vet checks are on hold for higher priority work.
Also, at least in the font I'm using, there's a clear indication that dodgy Unicode is happening:

I suggest closing this as a duplicate of #20115.
Most helpful comment
I suggest closing this as a duplicate of #20115.