Dotty: Emit more efficient code for simple for-comprehensions

Created on 4 May 2020  路  21Comments  路  Source: lampepfl/dotty

I've been using Scala for years to write performance-critical code. One the pain points has always been the sub-par performance or seemingly benign for-comprehensions.

While it is clear that complex for-comprehensions emit inefficient code, a very common use-case is using them as a replacement for simple for-loops.

As an example, the two loops below do the same thing. The for-comprehension is cleaner (i is scoped more narrowly, no need for var). But one look at the decompiled code shows you just how much more inefficient the for-comprehension is.

Minimized code

class LoopTest {
  final val count = 1000

  def testWhile(): Unit = {
    var i = 0
    while (i < count) {
      Console.println("Number: " + i)
      i += 1
    }
  }

  def testFor(): Unit = {
    for (i <- 0 until count) {
      Console.println("Number: " + i)
    }
  }
}

Output

I compiled using dotty-0.24.0-RC1. This is the decompiled output:

public void testWhile() {
  for (int i = 0; i < 1000; ++i) {
    Console$.MODULE$.println((Object)("Number: " + i));
  }
}

public void testFor() {
  RichInt$.MODULE$.until$extension(
    Predef$.MODULE$.intWrapper(0), 1000).foreach(
      (Function1)(i -> Console$.MODULE$.println((Object)("Number: " + i))));
}

Expectation

Is there some way to allow using for-comprehensions with a guarantee that performant code is being produced? Maybe it would be possible to introduce something like @scala.annotation.for (analogous to @scala.annotation.switch)?

For me, such an improvement would have a big impact: In my project, I spent hours translating for-comprehensions into simple while loops to reap (measurable) performance benefits. But it is unfortunate that in the process I had to make my code less readable and more error prone. Scala 2.13's inliner helped somewhat, but the results weren't reliable, so I mostly ended up avoiding for-comprehensions.

enhancement

Most helpful comment

also the version that takes an inline parameter should not be called when the argument we pass to it has side effects like foreach({println("a"); x => ()), that would break semantics

Ouch, that sounds like a pretty big semantic landmine... almost at the level of surprise factor as the old (deprecated) Map#mapValues. It's not uncommon to see code like xs.foreach(foo(bar)) where foo is a curried function. One wouldn't expect foo(bar) to be called again for each element of the collection (that could be terrible for performance, even if foo was pure).

Are there legitimate uses of inline parameters which do not create these semantic landmines?

All 21 comments

A simple way to remove the foreach is using inline

case class MyRange(start: Int, end: Int) {
  inline def foreach(inline f: Int => Unit) =
    var i = start
    val e = end
    while (i < e) {
      f(i)
      i += 1
    }
}

object App {
  val count: Int = 4
  for (i <- MyRange(0, count)) {
    Console.println("Number: " + i)
  }
}

Which generates the code

val count: Int = 4
    {
      val MyRange_this: MyRange = MyRange.apply(0, App.count)
      {
        var i: Int = MyRange_this.start
        val e: Int = MyRange_this.end
        while(i < e) {
          Console.println("Number: ".+(i))
          i = i.+(1)
        }
      }:Unit
    }

We could also try to use opaque types to avoid the creation of the object in some cases.

inline def foreach(inline f: Int => Unit) =

We cannot do this in practice currently because Range#foreach overrides a non-inline foreach so its parameter cannot be marked inline, so we would need to allow inline overloads which only differ in whether their parameter is inline or not (also the version that takes an inline parameter should not be called when the argument we pass to it has side effects like foreach({println("a"); x => ()), that would break semantics). Alternatively, maybe the inline typer could automatically inline pure arguments like lambdas ?

also the version that takes an inline parameter should not be called when the argument we pass to it has side effects like foreach({println("a"); x => ()), that would break semantics

Ouch, that sounds like a pretty big semantic landmine... almost at the level of surprise factor as the old (deprecated) Map#mapValues. It's not uncommon to see code like xs.foreach(foo(bar)) where foo is a curried function. One wouldn't expect foo(bar) to be called again for each element of the collection (that could be terrible for performance, even if foo was pure).

Are there legitimate uses of inline parameters which do not create these semantic landmines?

We could also try to use opaque types

I gave that a shot but pretty quickly ran into compiler issues. It seems that inline and opaque types are mutually exclusive (see https://github.com/lampepfl/dotty/issues/6802 )

@dhoepelman What if you worked around it by defining extension methods _outside_ of the companion object in a trait, then implement the trait as given in companion object?

object X {
  given ExtensionOps
}
trait ExtensionOps {
  inline final def map(...)...
}

With extension methods, I got pretty close.

object Ranges {
  opaque type ZeroBasedRange = Int

  object ZeroBasedRange {
    given ExtensionOps

    def apply(end: Int): ZeroBasedRange = end
  }

  trait ExtensionOps {
    inline final def (range: ZeroBasedRange) foreach(inline f: Int => Unit) = {
      var i = 0
      val e = range.asInstanceOf[Int]
      while (i < e) {
        f(i)
        i += 1
      }
    }
  }
}

object App {
  final val count: Int = 4
  for (i <- Ranges.ZeroBasedRange(count)) {
    Console.println("Number: " + i)
  }
}

This compiles into this:

    private App$() {
        MODULE$ = this;
        this.count = 4;
        final Ranges$ module$ = Ranges$.MODULE$;
        final int range = Ranges.ZeroBasedRange$.MODULE$.apply(this.count());
        for (int i = 0, e = range; i < e; ++i) {
            Console$.MODULE$.println((Object)("Number: " + i));
        }
        final BoxedUnit unit = BoxedUnit.UNIT;
    }

Apart from that one apply function call (which I'd assume hotspot would optimize out), this seems to be optimal code. Of course, it only works for ranges starting with 0.

I wanted to see if there's a way to generalize this to Ranges that don't start at 0. The only way I could think of is by packing two Ints into a Long like this:

object Ranges {
  opaque type AnyRange = Long

  object AnyRange {
    given AnyRangeExtensionOps

    def apply(start: Int, end: Int): AnyRange = start.toLong << 32 | end.toLong
  }

  trait AnyRangeExtensionOps {
    inline final def (range: AnyRange) foreach(inline f: Int => Unit) = {
      var i = (range.asInstanceOf[Long] >> 32).toInt
      var e = range.asInstanceOf[Long].toInt
      while (i < e) {
        f(i)
        i += 1
      }
    }
  }
}

object App {
  final val count: Int = 4
  for (i <- Ranges.AnyRange(0, count)) {
    Console.println("Number: " + i)
  }
}

This produces relatively efficient code:

    private App$() {
        MODULE$ = this;
        this.count = 4;
        final Ranges$ module$ = Ranges$.MODULE$;
        final long range = Ranges.AnyRange$.MODULE$.apply(0, this.count());
        for (int i = (int)(range >> 32), e = (int)range; i < e; ++i) {
            Console$.MODULE$.println((Object)("Number: " + i));
        }
        final BoxedUnit unit = BoxedUnit.UNIT;
    }

    // AnyRange$.apply:
    public long apply(final int start, final int end) {
        return (long)start << 32 | (long)end;
    }

While there's some bit-shifting, we at least don't pay any cost for object creation.

Note that using bitwise tricks like that can potentially confuse the JIT and prevent it from eliding bounds checks.

You can use inline matches to create and then pattern-match on a phantom compile-time only case class - this should allow you to omit the structure completely when it's known at compile-time and fallback to accessing the structure fields otherwise.

Note that using bitwise tricks like that can potentially confuse the JIT and prevent it from eliding bounds checks.

Any idea how to accomplish this with opaque types? Also, I'd assume that while it isn't ideal, the current Scala/Dotty solution of creating a Range instance would have the same issue?

You can use inline matches to create and then pattern-match on a phantom compile-time only case class - this should allow you to omit the structure completely when it's known at compile-time and fallback to accessing the structure fields otherwise.

Would you mind giving me a pointer to how this could be done? I looked up phantom classes, but it seems like they were replaced by unused, which was then renamed to ghost. And I can't find any current document about that now :-)

Any idea how to accomplish this with opaque types?

No, but I don't think that's a good solution anyway compared to getting inlining working right.

Also, I'd assume that while it isn't ideal, the current Scala/Dotty solution of creating a Range instance would have the same issue?

Depends what code the JIT manages to inline.

@danlehmann
You should be able to unpack a normal case class with inline match, e.g.:

final case class MyRange(from: Int, to: Int)

inline def (from: Int) to (to: Int): MyRange = MyRange(from, to)

inline def (inline myRange: MyRange).foreach(inline f: Int => Unit): Unit = {
  inline myRange match {
    case MyRange(from, to) => while(...) { ... }
  }
}

def test = for (i <- 0 to 5) { println(i) }

see http://dotty.epfl.ch/docs/reference/metaprogramming/inline.html#inline-matches

@neko-kai

This is perfect, thanks a lot. This creates close to optimal code without any objects or even function calls.

This could be quite useful in case the inliner isn't able to resolve the general case by the time Scala 3.0 ships.

It's possible to add an inline method overriding Range#foreach in Predef while keeping compatibility with Range type:

package example

import scala.quoted.unsafe.UnsafeExpr

object MyPredef {
  type MyRange <: Range.Inclusive & InlineForeach

  inline def (inline from: Int) to (inline to: Int): MyRange = new Range.Inclusive(from, to, 1).asInstanceOf[MyRange]

  sealed trait InlineForeach { this: MyRange =>
    inline final def foreach(inline f: Int => Unit): Unit = ${ macroImpl('this)('f) }
  }

  import quoted._
  def macroImpl(self: Expr[MyRange])(f: Expr[Int => Unit])(using qctx: QuoteContext): Expr[Unit] = {
    UnsafeExpr.underlyingArgument(self) match {
      case '{new Range.Inclusive($from, $to, 1).asInstanceOf[MyRange]} =>
        '{ inlineForeach($from, $to + 1, $f) }
    }
  }

  inline def inlineForeach(inline from: Int, inline end: Int, inline f: Int => Unit): Unit = {
    var i = from
    while (i < end) {
      f(i)
      i += 1
    }
  }

}


import MyPredef._

object App {
  def main(args: Array[String]): Unit = {
    for (i <- 1 to 5) do println(i)
  }
}

Although this version with tasty-reflect generates less optimal code than an inline match does:

    public void main(String[] args) {
        new Inclusive(1, 5, 1);

        for(int i = 1; i < 6; ++i) {
            ((ix) -> {
                .MODULE$.println(BoxesRunTime.boxToInteger(ix));
            }).apply(BoxesRunTime.boxToInteger(i));
        }

        BoxedUnit var10000 = BoxedUnit.UNIT;
    }

No idea how to eliminate the lambda and the redundant new Inclusive(1, 5, 1); at the start, it seems that unlike Scala 2 macros, dotty inline instance methods do not eliminate this when applied and I don't know how to work around that.

The only sound way to demand the elimination of the prefix is using an extension method where you define the prefix as inline (or maybe by name). Otherwise, we can only rely on a backend optimizer (including JIT).

The use of UnsafeExpr.underlyingArgument may also introduce some unsoundness.

Hmm, is there a way to unpack new Range.Inclusive($from, $to, 1) using inline match facilities without having to resort to a tasty-reflect macro? I tried a custom unapply but it didn't work.

You would need to define the inline method as

  inline def (inline self: MyRange).foreach(inline f: Int => Unit): Unit = ${ macroImpl('self)('f) }

@nicolasstucki That wouldn't work because there's already a member method Range#foreach in MyRange <: Range.Inclusive & InlineForeach that would be chosen instead of an extension method - extension method search only happens after a member method is not found, so it must be a member method - hence the cast to InlineForeach trait with a member inline method.

Indeed. Maybe, in this case, we should MyRange not extended Range.Inclusive but have an implicit conversion to it.

Why do you need to have it fall back to Range.Inclusive anyway? If I have we have an InlineRange wouldn't we expect all the operations to be inlined (hence reimplemented)?

Maybe, in this case, we should MyRange not extended Range.Inclusive but have an implicit conversion to it.
Why do you need to have it fall back to Range.Inclusive anyway? If I have we have an InlineRange wouldn't we expect all the operations to be inlined (hence reimplemented)?

Implicit conversions tend to fail in many cases where variance works well. Because I'm trying to show something that could work as part of DottyPredef, it must be maxiamlly source compatible and can't use implicit conversions. Although I guess if this issue was solved in dotty-library it would probably be easiest to just completely shadow scala.collection.immutable.Range and add inlines there directly.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ohze picture ohze  路  3Comments

LaymanMergen picture LaymanMergen  路  3Comments

odersky picture odersky  路  3Comments

liufengyun picture liufengyun  路  3Comments

ohze picture ohze  路  3Comments