Dotty: Dotty's name encoding scheme does not match Scala 2 in some situations

Created on 16 Feb 2019  Â·  10Comments  Â·  Source: lampepfl/dotty

Given:

class A {
  def a_+(): Unit = {}
  def `+_a`(): Unit = {}
}

Dotty generates methods:

   public void a_$plus()
   public void +_a()

Whereas Scala 2 generates:

  public void a_$plus()
  public void $plus_a()

I was about to fix this but the comment above NameTransformer#encode written by @odersky suggests that this is intentional: https://github.com/lampepfl/dotty/blob/12eef6d8e51dcd8a9a73387659b0f860d7b9ca80/compiler/src/dotty/tools/dotc/util/NameTransformer.scala#L87-L88

@odersky Is being different from Scala 2 actually intended here ? This makes it impossible to call some Scala 2 methods from Dotty (we may want to change the name encoding scheme, but only if we get Scala 2 to change it in lockstep with us).

Marked as a blocker for the release because upgrading to ASM 7 lead to a name-encoding-related issue: https://github.com/lampepfl/dotty/pull/5917#issuecomment-464368458, I know how to fix it but the fix interacts with the current semantics of encode and I'd like to get some clarity here.

scala2 bug blocker

Most helpful comment

We could use € instead of $. It's a valid character in Java identifiers.

🤫

All 10 comments

The commit where the current encoding scheme was implemented: https://github.com/lampepfl/dotty/commit/debdafacb2a7047b5d11973dc3b6374cbc60a4dd#diff-e19ba84e185a59cba334b987a1cadc6a

Subsidiary question: why does the new scheme have both avoidIllegalChars and encode, instead of doing all these things in encode ? avoidIllegalChars does its mangling early, which forced @nicolasstucki to introduce a decodeIllegalChars recently to avoid having the name printer output mangled names. Furthermore, at least / is both a character that avoidIllegalChars replaces by u$002F and an operator name that encode replaces by $div, resulting in a different encoding depending on whether an identifier is backquoted or not :scream:

class A {
  def a_/(): Unit = {} // encoded as: public void a_$div()
  def `b_/`(): Unit = {} // encoded as: public void b_$u002F()
}

@smarter The reason to do it this way is that we cannot simply translate all operator occurrences, since they might be unintentional, i.e. some internal name might end up containing a nested $eq which does not come from an =. Say I write in Scala-2

def foo$equals

With blind encode/decode this would be mapped to foo=uals, which is definitely not what we want.
avoidIllegalChars could probably be rolled into encode. It existed before the change to semantic names.

Could we simplify the problem by simply disallowing $ in user-written Scala identifiers ? (We could keep accessible behind a compiler flag for people who really need it for some reason).

Please don't. $ is regularly used in JavaScript identifiers in libraries. Disallowing $ in Scala will worsen the experience of interacting with those libraries.

(also $ is even valid in user-written Java identifiers, IIRC)

Maybe we should use some other character than $ in our encodings then :).

paulp had a lot to say on this subject... I believe he said that a lot of
bugs in scalac could have been prevented by having a more principled set of
encoding rules.

I think there are a lot of unicode characters available that can be used,
and wouldn't be used by users (can be outlawed). And using more characters
means the encodings can be more robust since you don't need to overload
meaning.

On Mon, Feb 18, 2019 at 5:27 AM Guillaume Martres notifications@github.com
wrote:

Maybe we should use some other character than $ in our encodings then :).

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/lampepfl/dotty/issues/5936#issuecomment-464675684,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAGAUAWYYpwcES-aPcjhQMvFGvJiaZQ1ks5vOn_9gaJpZM4a_Og4
.

Name encoding in Dotty is already much more principled than in Scala 2. We could replace $ by a Unicode character but that would make Java interop impossible in some cases, e.g. an especially common pattern is writing a Java static class that forwards to a Scala object using the MODULE$ field.

We could use € instead of $. It's a valid character in Java identifiers.

🤫

I like that actually, since it's not present on non-european keyboards it's harder to misuse, let's abandon the dollar peg and join the Eurozone!

Was this page helpful?
0 / 5 - 0 ratings