Cats: Reconsider the use of fold in Validated

Created on 6 Oct 2017 · 8Comments · Source: typelevel/cats

A lot of the method implementations in Validated are extremely inefficient, which isn't ideal for projects like circe that rely on it extensively and are aiming for reasonable performance. For example, given this simplified definition (where eqF is the current === and eqP is a lightly optimized version):

import cats.kernel.Eq

sealed abstract class Validated[+E, +A] extends Product with Serializable {
  import Validated.{Invalid, Valid}

  def fold[B](fe: E => B, fa: A => B): B = this match {
    case Invalid(e) => fe(e)
    case Valid(a) => fa(a)
  }

  def eqF[EE >: E, AA >: A](that: Validated[EE, AA])
    (implicit EE: Eq[EE], AA: Eq[AA]): Boolean = fold(
      a => that.fold(EE.eqv(a, _), _ => false),
      b => that.fold(_ => false, AA.eqv(b, _))
    )

  def eqP[EE >: E, AA >: A](that: Validated[EE, AA])
    (implicit EE: Eq[EE], AA: Eq[AA]): Boolean = this match {
      case Invalid(e) => that match {
        case Invalid(ee) => EE.eqv(e, ee)
        case _ => false
      }
      case Valid(a) => that match {
        case Valid(aa) => AA.eqv(a, aa)
        case _ => false
      }
    }
}

object Validated {
  final case class Valid[+A](a: A) extends Validated[Nothing, A]
  final case class Invalid[+E](e: E) extends Validated[E, Nothing]
}

And a benchmark like this:

import cats.kernel.instances.all._
import java.util.concurrent.TimeUnit
import org.openjdk.jmh.annotations._

@State(Scope.Thread)
class ValidatedBenchmark {
  val a: Validated[String, Int] = Validated.Valid(1)
  val b: Validated[String, Int] = Validated.Valid(2)
  val c: Validated[String, Int] = Validated.Invalid("bad")
  val d: Validated[String, Int] = Validated.Invalid("worse")

  @Benchmark def eqF: Boolean =
    a.eqF(a) && !a.eqF(b) && !a.eqF(c) && c.eqF(c) && !c.eqF(d) && !c.eqF(a)

  @Benchmark def eqP: Boolean =
    a.eqP(a) && !a.eqP(b) && !a.eqP(c) && c.eqP(c) && !c.eqP(d) && !c.eqP(a)
}

The pattern matching implementation gets over four times the throughput on 2.12:

Benchmark                Mode  Cnt          Score         Error  Units
ValidatedBenchmark.eqF  thrpt   40   27131124.981 ±  164113.218  ops/s
ValidatedBenchmark.eqP  thrpt   40  118711782.519 ± 1166236.166  ops/s

The situation is even worse on 2.11, where the difference is a little larger and the fold version also involves a lot of unnecessary allocations:

Benchmark                                        Mode  Cnt          Score         Error   Units
ValidatedBenchmark.eqF:gc.alloc.rate.norm       thrpt    5        144.000 ±       0.001    B/op
ValidatedBenchmark.eqP:gc.alloc.rate.norm       thrpt    5         ≈ 10⁻⁵                  B/op

For most use cases these differences are unlikely to matter at all, but it seems like a shame to impose this extra cost on everyone just for the sake of slightly more concise implementations.

(There are similar issues in IndexedStateT, etc., but Validated is the one piece I personally really care that much about, because of how it affects circe.)

help wanted low-hanging fruit

Source

travisbrown

👍4

Most helpful comment

In general core library code in my view should have a nice API but then be
implemented as efficiently as possible. I’d love to see us use the highest
performance approach we can find.

On Thu, Oct 5, 2017 at 21:10 Julien Richard-Foy notifications@github.com
wrote:

What if fold is an abstract method in Validated and implemented in Invalid
and Valid? (we would trade pattern matching for dynamic dispatch)

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/typelevel/cats/issues/1951#issuecomment-334676701,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEJdkvM7HPVPALDnEoC2dUPjC-cRCTlks5spdJbgaJpZM4PwDNP
.

>

P. Oscar Boykin, Ph.D. | http://twitter.com/posco | http://pobox.com/~boykin

johnynek on 6 Oct 2017

👍6

All 8 comments

👍 , it's the reasoning we took in https://github.com/typelevel/cats/pull/1289 as well

adelbertc on 6 Oct 2017

What if fold is an abstract method in Validated and implemented in Invalid and Valid? (we would trade pattern matching for dynamic dispatch)

julienrf on 6 Oct 2017

👍1

In general core library code in my view should have a nice API but then be
implemented as efficiently as possible. I’d love to see us use the highest
performance approach we can find.

On Thu, Oct 5, 2017 at 21:10 Julien Richard-Foy notifications@github.com
wrote:

What if fold is an abstract method in Validated and implemented in Invalid
and Valid? (we would trade pattern matching for dynamic dispatch)

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/typelevel/cats/issues/1951#issuecomment-334676701,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEJdkvM7HPVPALDnEoC2dUPjC-cRCTlks5spdJbgaJpZM4PwDNP
.

>

P. Oscar Boykin, Ph.D. | http://twitter.com/posco | http://pobox.com/~boykin

johnynek on 6 Oct 2017

👍6

It will be interesting to benchmark @julienrf 's abstract fold approach.

kailuowang on 6 Oct 2017

@julienrf @kailuowang I don't think pattern matching is the real culprit here—it's more the overhead of the functions. Making fold abstract does seem to help a bit on 2.12:

Benchmark                 Mode  Cnt          Score        Error  Units
ValidatedBenchmark.eqF   thrpt   40   26938203.527 ± 189709.343  ops/s
ValidatedBenchmark.eqF2  thrpt   40   29927603.497 ± 199789.080  ops/s
ValidatedBenchmark.eqP   thrpt   40  118540469.078 ± 754414.614  ops/s

…but there's no measurable difference on 2.11 and the allocation situation is the same on both.

travisbrown on 6 Oct 2017

👍2

functions, I've always heard, are very hard on the jit due to them being megamorphic. So, one of the best optimizations you can often do in scala is remove a function and replace with literal code that the JIT won't skip over.

I would love if this could be solved in the future with dotty linker, or changes to the JIT to better handle lambdas (maybe java will start to care about this now that they have lambdas).

johnynek on 6 Oct 2017

How can we apply that in our context?

julienrf on 6 Oct 2017

@julienrf I think just rewriting fold in all of the implementations to use pattern matching (as in my example above) is a reasonable way to avoid unnecessary functions.

Users can decide whether for their own cases which approach they want. For example I use Json#fold all the time in application code that depends on circe, but don't use it anywhere internally in circe itself.

travisbrown on 6 Oct 2017

Was this page helpful?

0 / 5 - 0 ratings