Go: proposal: spec: disallow Hangul filler codepoints in Go identifiers

Created on 12 Aug 2020 · 6Comments · Source: golang/go

The Hangul filler codepoints (U+115F, U+1160, U+3164) are rendered as zero-width white space as specified by the Unicode standard. And they are allowed in Go identifiers.
https://twitter.com/omengue/status/1293476660268998656
They are only useful for obfuscation. Despites that would be a breaking change I propose to forbid them in Go 1 spec as I only see nasty uses.

What version of Go are you using (`go version`)?

$ go version
1.4.6

Does this issue reproduce with the latest release?

yes

What did you do?

var ᅟ, ᅠ, ㅤ = 0, 1, 2

Go Playground: https://play.golang.org/p/bOmUvRfC3Co

What did you expect to see?

Compile time failure.

What did you see instead?

Compiles fine.

Proposal

Source

dolmen

👍1

Most helpful comment

I suggest closing this as a duplicate of #20115.

rsc on 12 Aug 2020

👍3

All 6 comments

Apart from code golf, can these been used maliciously?

davecheney on 12 Aug 2020

@davecheney I see much potential for confusion in code review by hijacking names of basic types:

Go playground: https://play.golang.org/p/EYIrCh9XtI_u

func main() {
        var v interface{}
    fmt.Println(v)
    v = interfaceᅟ{}
    fmt.Println(v)
}

type interfaceᅟ struct{}

Output:

<nil>
{}

dolmen on 12 Aug 2020

I'd hope that widespread use of this style of identifier would be self correcting via natural selection.

Said another way, rather than changing the language, can this problem be solved by telling people that this isn't a good way to write code?

davecheney on 12 Aug 2020

👍2

Relevant issue about homographs attacks with go/vet: https://github.com/golang/go/issues/20115 Seems this would be a similar use case.

Referencing rsc in https://github.com/golang/go/issues/20209#issuecomment-335273273

I think we agree that the first step is to add a vet check and only after building experience with it think about actual language restrictions (or not). Marking this proposal-hold until there is a proposal (a short list would be fine) of what a vet "unicode" check would check.

martisch on 12 Aug 2020

The Hangul filler codepoints (U+115F, U+1160, U+3164) are rendered as zero-width white space as specified by the Unicode standard.

Looks like you missed U+FFA0.

But where in the Unicode standard does it say these are zero-width spaces?
They are classified as Lo (Letter, other).

I do see that they (including U+FFA0) have the Default_Ignorable_Code_Point property, and they are the only Letter-classified code points that do. So we should probably include them in the vet checks being contemplated in #20115.
Those vet checks are on hold for higher priority work.

Also, at least in the font I'm using, there's a clear indication that dodgy Unicode is happening:

Screen Shot 2020-08-12 at 10 35 52 AM

rsc on 12 Aug 2020

I suggest closing this as a duplicate of #20115.

rsc on 12 Aug 2020

👍3

Was this page helpful?

0 / 5 - 0 ratings

Related issues

x/build/cmd/coordinator: ssh proxy should support scp

bradfitz · 3Comments

x/tools/cmd/godoc: not work with symbolic links

gopherbot · 3Comments

cmd/compile: testing/quick misbehaves on Nexus 9 linux/arm64

rsc · 3Comments

cmd/compile: regression in 69a7c15

OneOfOne · 3Comments

x/net/webdav: fails on broken links

enoodle · 3Comments

Go: proposal: spec: disallow Hangul filler codepoints in Go identifiers

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What did you do?

What did you expect to see?

What did you see instead?

Most helpful comment

All 6 comments

Related issues

What version of Go are you using (`go version`)?