Cats: Reconsider the use of fold in Validated

Created on 6 Oct 2017  ·  8Comments  ·  Source: typelevel/cats

A lot of the method implementations in Validated are extremely inefficient, which isn't ideal for projects like circe that rely on it extensively and are aiming for reasonable performance. For example, given this simplified definition (where eqF is the current === and eqP is a lightly optimized version):

import cats.kernel.Eq

sealed abstract class Validated[+E, +A] extends Product with Serializable {
  import Validated.{Invalid, Valid}

  def fold[B](fe: E => B, fa: A => B): B = this match {
    case Invalid(e) => fe(e)
    case Valid(a) => fa(a)
  }

  def eqF[EE >: E, AA >: A](that: Validated[EE, AA])
    (implicit EE: Eq[EE], AA: Eq[AA]): Boolean = fold(
      a => that.fold(EE.eqv(a, _), _ => false),
      b => that.fold(_ => false, AA.eqv(b, _))
    )

  def eqP[EE >: E, AA >: A](that: Validated[EE, AA])
    (implicit EE: Eq[EE], AA: Eq[AA]): Boolean = this match {
      case Invalid(e) => that match {
        case Invalid(ee) => EE.eqv(e, ee)
        case _ => false
      }
      case Valid(a) => that match {
        case Valid(aa) => AA.eqv(a, aa)
        case _ => false
      }
    }
}

object Validated {
  final case class Valid[+A](a: A) extends Validated[Nothing, A]
  final case class Invalid[+E](e: E) extends Validated[E, Nothing]
}

And a benchmark like this:

import cats.kernel.instances.all._
import java.util.concurrent.TimeUnit
import org.openjdk.jmh.annotations._

@State(Scope.Thread)
class ValidatedBenchmark {
  val a: Validated[String, Int] = Validated.Valid(1)
  val b: Validated[String, Int] = Validated.Valid(2)
  val c: Validated[String, Int] = Validated.Invalid("bad")
  val d: Validated[String, Int] = Validated.Invalid("worse")

  @Benchmark def eqF: Boolean =
    a.eqF(a) && !a.eqF(b) && !a.eqF(c) && c.eqF(c) && !c.eqF(d) && !c.eqF(a)

  @Benchmark def eqP: Boolean =
    a.eqP(a) && !a.eqP(b) && !a.eqP(c) && c.eqP(c) && !c.eqP(d) && !c.eqP(a)
}

The pattern matching implementation gets over four times the throughput on 2.12:

Benchmark                Mode  Cnt          Score         Error  Units
ValidatedBenchmark.eqF  thrpt   40   27131124.981 ±  164113.218  ops/s
ValidatedBenchmark.eqP  thrpt   40  118711782.519 ± 1166236.166  ops/s

The situation is even worse on 2.11, where the difference is a little larger and the fold version also involves a lot of unnecessary allocations:

Benchmark                                        Mode  Cnt          Score         Error   Units
ValidatedBenchmark.eqF:gc.alloc.rate.norm       thrpt    5        144.000 ±       0.001    B/op
ValidatedBenchmark.eqP:gc.alloc.rate.norm       thrpt    5         ≈ 10⁻⁵                  B/op

For most use cases these differences are unlikely to matter at all, but it seems like a shame to impose this extra cost on everyone just for the sake of slightly more concise implementations.

(There are similar issues in IndexedStateT, etc., but Validated is the one piece I personally really care that much about, because of how it affects circe.)

help wanted low-hanging fruit

Most helpful comment

+1

In general core library code in my view should have a nice API but then be
implemented as efficiently as possible. I’d love to see us use the highest
performance approach we can find.

On Thu, Oct 5, 2017 at 21:10 Julien Richard-Foy notifications@github.com
wrote:

What if fold is an abstract method in Validated and implemented in Invalid
and Valid? (we would trade pattern matching for dynamic dispatch)


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/typelevel/cats/issues/1951#issuecomment-334676701,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEJdkvM7HPVPALDnEoC2dUPjC-cRCTlks5spdJbgaJpZM4PwDNP
.

>

P. Oscar Boykin, Ph.D. | http://twitter.com/posco | http://pobox.com/~boykin

All 8 comments

👍 , it's the reasoning we took in https://github.com/typelevel/cats/pull/1289 as well

What if fold is an abstract method in Validated and implemented in Invalid and Valid? (we would trade pattern matching for dynamic dispatch)

+1

In general core library code in my view should have a nice API but then be
implemented as efficiently as possible. I’d love to see us use the highest
performance approach we can find.

On Thu, Oct 5, 2017 at 21:10 Julien Richard-Foy notifications@github.com
wrote:

What if fold is an abstract method in Validated and implemented in Invalid
and Valid? (we would trade pattern matching for dynamic dispatch)


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/typelevel/cats/issues/1951#issuecomment-334676701,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEJdkvM7HPVPALDnEoC2dUPjC-cRCTlks5spdJbgaJpZM4PwDNP
.

>

P. Oscar Boykin, Ph.D. | http://twitter.com/posco | http://pobox.com/~boykin

It will be interesting to benchmark @julienrf 's abstract fold approach.

@julienrf @kailuowang I don't think pattern matching is the real culprit here—it's more the overhead of the functions. Making fold abstract does seem to help a bit on 2.12:

Benchmark                 Mode  Cnt          Score        Error  Units
ValidatedBenchmark.eqF   thrpt   40   26938203.527 ± 189709.343  ops/s
ValidatedBenchmark.eqF2  thrpt   40   29927603.497 ± 199789.080  ops/s
ValidatedBenchmark.eqP   thrpt   40  118540469.078 ± 754414.614  ops/s

…but there's no measurable difference on 2.11 and the allocation situation is the same on both.

functions, I've always heard, are very hard on the jit due to them being megamorphic. So, one of the best optimizations you can often do in scala is remove a function and replace with literal code that the JIT won't skip over.

I would love if this could be solved in the future with dotty linker, or changes to the JIT to better handle lambdas (maybe java will start to care about this now that they have lambdas).

How can we apply that in our context?

@julienrf I think just rewriting fold in all of the implementations to use pattern matching (as in my example above) is a reasonable way to avoid unnecessary functions.

Users can decide whether for their own cases which approach they want. For example I use Json#fold all the time in application code that depends on circe, but don't use it anywhere internally in circe itself.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

SimY4 picture SimY4  ·  4Comments

Atry picture Atry  ·  5Comments

durban picture durban  ·  4Comments

diesalbla picture diesalbla  ·  4Comments

durban picture durban  ·  3Comments