Dotty: Unsound use of ClassTag.unapply

Created on 14 Nov 2019 · 16Comments · Source: lampepfl/dotty

Issue

  /** A ClassTag[T] can serve as an extractor that matches only objects of type T.
   *
   * The compiler tries to turn unchecked type tests in pattern matches into checked ones
   * by wrapping a `(_: T)` type pattern as `ct(_: T)`, where `ct` is the `ClassTag[T]` instance.
   * Type tests necessary before calling other extractors are treated similarly.
   * `SomeExtractor(...)` is turned into `ct(SomeExtractor(...))` if `T` in `SomeExtractor.unapply(x: T)`
   * is uncheckable, but we have an instance of `ClassTag[T]`.
   */
  def unapply(x: Any): Option[T] = 
    ...

This unfortunately is known to break when used with path dependent type.

Example

The following example shows a couple of cases where it fails

trait R {
  type Nat
  type Succ <: Nat
  type Idx
  given ClassTag[Nat]
  given ClassTag[Succ]
  given ClassTag[Idx]
  def n: Nat
  def one: Succ
}
class RI extens R {
  type Nat = Int
  type Succ = Int
  type Idx = Int
  given ClassTag[Nat] = classOf[Integer]
  given ClassTag[Succ] = new ClassTag[Integer] {
    def runtimeClass = classOf[Integer]
    def unapply(x: Any): Option[Succ] = x match
      case n: Int if n > 0 => Some(n)
      case _ => None
  }
  given ClassTag[Idx] = classOf[Integer]
  def n: Nat = 4
  def one: Succ = 1
}
object App {
  val r1: R = new RI
  val r2: R = new RI
  r1.n match {
    case n: r2.Nat => // Should not match or should have an unchecked waring
    case n: r1.Idx => // Should not match or should have an unchecked waring
    case n: r1.Succ => // Should match only if r1.n is an r1.Succ under the constraints set in r1
  }

  r1.one match {
    case n: r2.Nat => // Should not match or should have an unchecked waring
    case n: r1.Idx => // Should not match or should have an unchecked waring
    case n: r1.Nat => // Ok
  }
}

language enhancement

Source

nicolasstucki

Most helpful comment

Note that the shapeless Typable fixes only some of the unsoundness that comes from having it encoded inside ClassTag but not all of them. ClassTag.unapply provides the exact same functionality as Typeable.cast just that it does not provide the same instances. Typeable notably still receive an Any which may lead to when involving path-dependent types and no way to check the path at runtime.

The second important contribution of the TypeTest is to avoid syntactic overhead. The extra syntactic overhead would make the reflection API almost unusable. We used to have a version of the reflection API with explicit casting extractors (Typable like extractors specialized for specific). For example here is a relatively common kind of pattern that would suffer from this

tree match

   // Using TypeTest
+  case tree @ If(cond: Literal, thenp, elsep) => 

   // using old Refect with explicit casting extractors
-  case IsIf(tree @ If(IsLiteral(cond), thenp, elsep)) => 

   // using Typable
-  case Typeable.instanceOf[If](tree @ If(Typeable.instanceOf[Literal](cond), thenp, elsep)) =>

This example reflects only a simple case that is still short enough for one line, but when we used nested pattern things get way out of hand. This kind of pattern proved to be extremely hard to reason about and to learn to use properly. Using Typable.isInstanceOf directly seems to only complicate this. They also tend to repeat the same type checks at runtime a couple of times due to the encoding.

Typable was the basis on which we build the TypeTest version on which we fixed the other unsoundness of the paths by allowing the argument of the cast to be more restrictive. And then made use of the same known mechanism used by ClassTag to avoid the syntactic overhead.

nicolasstucki on 21 Sep 2020

👍3

All 16 comments

@sjrd said in this comment

To me, from the last discussion we had, there is only one reasonable path forward that will solve all the problems at once, with a clean design, and no migration issue:

First, introduce a new typeclass
trait TypeTest[A] {
  def isInstance(x: Any): Boolean
}
that can be synthesized by the compiler for A = p.T in exactly the same situations where an x.isInstanceOf[p.T] would be valid, and synthesizes it precisely as
new TypeTest[p.T] {
  def isInstance(x: Any): Boolean = x.isInstanceOf[p.T]
}
Then, in pattern-matching, for case x: A or case Extractor(x) where the Extractor expects an A, if A cannot be tested natively, try summoning a TypeTest[A]. Only if that fails as well, try summoning a ClassTag[A].

Finally, deprecated ClassTag.unapply and summoning a ClassTag[A] to implement a type test, since they are unsound.

ClassTags were designed for Arrays. This is the use case they are widely used for, and they work _perfectly_ for that use case. It is only later on that the type-test feature was bolted on, without giving it more thought than a PR review. Please let's not pretend that ClassTags are broken and need fixing because _that bolted-on feature_ is broken. ClassTags are correct. It's using ClassTags for type tests that's broken. The correct fix is not to fix ClassTags but to fix type-tests.

The above TypeTest typeclass is slightly unsound because one can implement a custom TypeTest for any A that always returns true. The subsequent compiler-generated .asInstanceOf[A] will throw a CCE. We can fix that design by complicating a bit the TypeTest API as follows:
trait TypeTest[A] {
  def isInstance(x: Any): TypeTest.Result[x.type & A]
}
object TypeTest {
  opaque type Result[A] = Boolean

  def success[A](x: A): Result[A] = true
  def failure[A]: Result[A] = false
}
Now a synthesized TypeTest[p.T] would look like
new TypeTest[p.T] {
  def isInstance(x: Any): TypeTest.Result[x.type & p.T] = x match {
    case x: p.T => TypeTest.success(x)
    case _      => TypeTest.failure
  }
}
and we can safely write custom TypeTest[A] instances that are sound down the line.

nicolasstucki on 14 Nov 2019

Small fix:

new TypeTest[p.T] {
  def isInstance(x: Any): TypeTest.Result[x.type & p.T] = x match {
    case x: (p.T & x.type) => TypeTest.success(x)
    case _                 => TypeTest.failure
  }
}

nicolasstucki on 14 Nov 2019

There is still something missing, when we do have an implicit p2.Y in scope we can match a p1.Y without an unchecked warning.

object Test {
  def main(args: Array[String]): Unit = {
    val p1: T = T1
    import p1.given

    val p2: T = T1
    import p2.given

    (p1.y: p1.X) match {
      case x: p2.Y => // should be an unchecked cast but is not
      case x: p1.Y =>
      case _ =>
    }
  }

}

trait T {
  type X
  type Y <: X
  def x: X
  def y: Y
  given TypeTest[Y] = typeTestOfY
  protected def typeTestOfY: TypeTest[Y]
}

object T1 extends T {
  type X = Boolean
  type Y = true
  def x: X = false
  def y: Y = true
  protected def typeTestOfY: TypeTest[Y] = new {
    def isInstance(x: Any): TypeTest.Result[x.type & Y] = x match
      case x: (true & x.type) => TypeTest.success(x)
      case _ => TypeTest.failure
  }
}

This can be solved by adding adding a second type parameter like in #7394.

trait TypeTest[S, T <: S] {

  def isInstance(x: S): TypeTest.Result[x.type & T]

  def unapply(x: S): Option[x.type & T] =
    TypeTest.unapply(x)(given this)

}

object TypeTest {

  opaque type Result[A] = Boolean

  def success[A](x: A): Result[A] = true

  def failure[A]: Result[A] = false

  def unapply[S, T <: S](x: S)(given tt: TypeTest[S, T]): Option[x.type & T] =
    if tt.isInstance(x) then Some(x.asInstanceOf[x.type & T])
    else None

}

And using it in the following way

trait T {
  type X
  type Y <: X
  def x: X
  def y: Y
  given TypeTest[X, Y] = typeTestOfY
  protected def typeTestOfY: TypeTest[X, Y]
}
object T1 extends T {
  type X = Boolean
  type Y = true
  def x: X = false
  def y: Y = true
  protected def typeTestOfY: TypeTest[X, Y] = new {
    def isInstance(x: X): TypeTest.Result[x.type & Y] = x match
      case x: (true & x.type) => TypeTest.success(x)
      case _ => TypeTest.failure
  }
}

Though we would need to ensure that S in TypeTest[S, T] is defined locally to ensure we do not get the previous unsoundness.

nicolasstucki on 14 Nov 2019

We could also consider defining it like

trait TypeTest[S, T <: S] {
  def unapply(x: S): TypeTest.Result[x.type & T]
}

object TypeTest {

  opaque type Result[A] = A | Failure.type

  object Result {
    given [A](res: Result[A]) { // currently fails, not sure where to put the extension methods
      def isEmpty: Boolean = res == Failure
      def get: A = res.asInstanceOf[A]
    }
  }

  private object Failure

  def success[A](x: A): Result[A] = x

  def failure[A]: Result[A] = Failure

}

nicolasstucki on 14 Nov 2019

See also https://gist.github.com/Blaisorblade/a0eebb6a4f35344e48c4c60dc2a14ce6 by @Blaisorblade

smarter on 14 Nov 2019

👍1

In the links from my gist, there was also another proposal: https://github.com/scala/bug/issues/9565#issuecomment-293885951. IIRC it was sound, so maybe somebody should compare.

Not that I see anything wrong in @sjrd proposal, but I haven't studied it.

Fwiw, re the "Small fix" in https://github.com/lampepfl/dotty/issues/7554#issuecomment-553779688, I thought the original code was supposed to compile: the extra Singleton should be inferred, and that's central to @AleksanderBG's GADT work, even tho that wasn't part of the spec. Isn't that settled by now?

Blaisorblade on 14 Nov 2019

👍1

A less ad-hoc alternative to the opaque type approach that also does not allocate would be to define an abstract base class of <:< called <?< which would have a new Not_<:< subclass.

Then, we can define:

trait TypeTest[A] {
  def isInstance(x: Any): x.type <?< A
}

And GADT pattern matching (ping @AleksanderBG) allows us to recover the x.type <: A relation in the case where the return value is indeed <:<. Like is done today, all <:< instances can be shared behind the scenes.

Implementing TypeTest becomes:

new TypeTest[p.T] {
  def isInstance(x: Any): x.type <?< p.T = x match {
    case x: (p.T & x.type) => implicitly
    case _                 => Not_<:<
  }
}

Maybe we'll need to help the compiler and write implicitly[x.type <:< (p.T & x.type)] though.

LPTK on 15 Nov 2019

Fwiw, re the "Small fix" in #7554 (comment), I thought the original code was supposed to compile: the extra Singleton should be inferred, and that's central to @AleksanderBG's GADT work, even tho that wasn't part of the spec. Isn't that settled by now?

It's settled, but I've never gotten around to actually implementing it in the compiler. So far it's necessary to manually add the singleton type to the pattern.

And GADT pattern matching (ping @AleksanderBG) allows us to recover the x.type <: A relation in the case where the return value is indeed <:<.

That won't work, GADT patmat cannot currently constrain singleton types. A mechanism for constraining singleton types is already implemented for explicit null types in their PR, so after that gets merged, we could look into reusing it for GADTs.

abgruszecki on 18 Nov 2019

The least ad-hoc alternative and most straightforward version so far is

trait TypeTest[-S, T] {
  /** A TypeTest[S, T] can serve as an extractor that matches only S of type T.
   *
   * The compiler tries to turn unchecked type tests in pattern matches into checked ones
   * by wrapping a `(_: T)` type pattern as `tt(_: T)`, where `ct` is the `TypeTest[S, T]` instance.
   * Type tests necessary before calling other extractors are treated similarly.
   * `SomeExtractor(...)` is turned into `tt(SomeExtractor(...))` if `T` in `SomeExtractor.unapply(x: T)`
   * is uncheckable, but we have an instance of `TypeTest[S, T]`.
   */
  def unapply(x: S): Option[x.type & T]
}

The semantics are trivial as no extra concept must be added to encode the result. It also seamlessly works as ClassTag use to in the compiler. It's drawback is that we need to instantiate an Option which is the same as with the current ClassTag version.

The synthesized version would be

new TypeTest[S, T] {
  def unapply(x: S): Option[x.type & T] = x match {
    case x: T => Some(x)
    case _      => None
  }
}

which can be synthesised as a SAM

((x: S) => x match { case x: T => Some(x); case _ => None }): TypeTest[S, T]

nicolasstucki on 19 Nov 2019

👍1

There are some cases where we must provide a type test but by providing a sound type test we may provide a ClasstTag that is unsound for arrays (see #8993). This shows that the two concepts should definitively be disjoint.

nicolasstucki on 18 May 2020

Also see https://gist.github.com/nicolasstucki/ae3eaae553f594066bf9ad0e817f9a63

nicolasstucki on 20 May 2020

I agree that

case x: T

where T is abstract with a ClassTag[T] is unsound, and I am now convinced that there is no way to make it sound. So, we have to change that one.

I think we still need to issue the class-tag based test since otherwise we would change runtime behavior wrt Scala 2. But we should add an unchecked warning that this will not work for types that are not simple class types.

And then we need a type-safe way to bring back the functionality. I have been playing with a variant of shapeless Typeable in #9840. That seems to do the trick. Importantly, it does this without requiring any extension to the language.

odersky on 21 Sep 2020

tree match

   // Using TypeTest
+  case tree @ If(cond: Literal, thenp, elsep) => 

   // using old Refect with explicit casting extractors
-  case IsIf(tree @ If(IsLiteral(cond), thenp, elsep)) => 

   // using Typable
-  case Typeable.instanceOf[If](tree @ If(Typeable.instanceOf[Literal](cond), thenp, elsep)) =>

nicolasstucki on 21 Sep 2020

👍3

I forgot to mention that the explicit casting extractors also avoided the unsoundness present with the Typable.

nicolasstucki on 21 Sep 2020

Maybe @milessabin could weigh in on the soundness issue.

// Using TypeTest

case tree @ If(cond: Literal, thenp, elsep) =>

How is that obtained? Is that just an unapply in If or is there something akin to ClassTag magic but now with TypeTest?

EDIT: Looking at #7555, it's done with compiler magic. That's problematic. It's one thing to propose a complicated and hard to explain trait like TypeTest in a library. It's a different story to make this part of pattern matching semantics. This would be a significant increase of the complexity of the language itself.

Maybe the question we should ask first is: What are general improvements to extractors that would allow us to make this work with convenient syntax? I believe the general understanding and implementation of extractors is lacking. I have stated many times that it would be very worthwhile for someone to really understand pattern matching typing and be able to come up with a spec and possible improvements. The previous attempts at this fell short. So currently the situation is that, because unapply is underspecified and hard to change, nobody touches it and we work around this. But that strategy tends to add more complexity...

odersky on 22 Sep 2020

TypeTest uses exactly the same mechanisms as ClassTag does not extra magic involved. It is the same mechanism described in the ClassTag docs

implicit def IfClassTag: ClassTag[If] = ....
given IfTypeTest as TypeTest[Tree, If] = ...
object If:
  def unapply(tree: If): Some[(Tree, Tree, Tree)] = Some((tree.cond, tree.thenp, tree.elsep))

As the unapply takes an If as argument instead of the usual scrutinee type of Tree, we must do a type tests first. As the type is abstract we try to do it it using a ClassTag or TypeTest.

tree match
  case tree @ If(cond, thenp, elsep) =>
// case IfClassTag(tree @ If(cond, thenp, elsep)) =>
// case IfTypeTest(tree @ If(cond, thenp, elsep)) => 

  case tree: If =>
// case IfClassTag(tree) =>
// case IfTypeTest(tree) =>

TypeTest does not change the behaviour of unapplies with ClassTags in any way. It only restricts which instances are applicable without emmiting an unchecked waring. For example, ClassTag[If] can take any argument and assume the check can always be performed at runtime, wheras the TypeTest[Tree, If] allows us to state that only Tree can be safely checked at runtime.

The stricter argument type is what allows us to soundly state that we cannot check check at runtime that p1.Tree is a p2.If as we only have a way to check that a p2.Tree is an p2.It but p1.Tree and p2.Tree are unrelated and do not have enough information at runtime to be distinguished.

The proposed generalization in #9843 is otrtogonal to the enhancements made by TypeTest. That proposes to support new kinds of patterns which may me useful in general but does not help Reflection API as we can already konw of a simpler and sound way to encode these directly in the Reflection API. A way that was abandoned to make the API less unintuitive and verbose.

ThThe definition of Typable is perfectly fine for use in type parameters, but it does not work for abstract type members that need to be implemented without extram overhead costs. In the current state, all type members should be implemented with a class that knows of its parent at runtime and each type test must check that all parents match. If this check is not performed, then the Typable instance implemented by the user is unsound and the compiler will unfortunately not complain about it at definition nor call sites.

nicolasstucki on 22 Sep 2020

❤1

Was this page helpful?

0 / 5 - 0 ratings