Graal: Java array access from within C is slow.

Created on 10 Jul 2018  Â·  5Comments  Â·  Source: oracle/graal

I've been testing a mandelbrot benchmark for sulong using both java and C for the work, with java mainly handling threading and some looping and c handling the calculation. While working with this, I found that sulong is slow to iterate through java arrays. Benchmarking this with JMH shows that it's incredibly slow:

[info] MandelbrotC.incrementValuesC    avgt    5  239.103 ±   5.537  ms/op
[info] MandelbrotC.incrementValuesJ    avgt    5    0.017 ±   0.002  ms/op
[info] MandelbrotC.initValuesC         avgt    5  101.384 ±  20.213  ms/op
[info] MandelbrotC.initValuesJ         avgt    5    0.040 ±   0.035  ms/op
[info] MandelbrotC.stepThroughValuesC  avgt    5  155.466 ± 149.067  ms/op
[info] MandelbrotC.stepThroughValuesJ  avgt    5    0.019 ±   0.012  ms/op

Here's the benchmark code:

@Benchmark
  @BenchmarkMode(Array(Mode.AverageTime)) @OutputTimeUnit(TimeUnit.MILLISECONDS)
  @Warmup(time = 15, iterations=10, timeUnit = TimeUnit.SECONDS)
  def initValuesC(ms: MandelbrotState): Unit = {
    val doubleArray = Array.ofDim[Double](16000)
    ms.contexts(0).initValues(doubleArray, 16000)
  }


  @Benchmark
  @BenchmarkMode(Array(Mode.AverageTime)) @OutputTimeUnit(TimeUnit.MILLISECONDS)
  @Warmup(time = 15, iterations=10, timeUnit = TimeUnit.SECONDS)
  def initValuesJ(blackhole: Blackhole): Unit = {
    val wid_ht = 16000
    val i0 = Array.ofDim[Double](wid_ht)
    var xy = 0
    while(xy < wid_ht) {
      i0(xy) = 2.0 / wid_ht * xy - 1.0
      i0(xy + 1) = 2.0 / wid_ht * (xy + 1) - 1.0
      xy += 2
    }
    blackhole.consume(i0)
  }

  @Benchmark
  @BenchmarkMode(Array(Mode.AverageTime)) @OutputTimeUnit(TimeUnit.MILLISECONDS)
  @Warmup(time = 15, iterations=10, timeUnit = TimeUnit.SECONDS)
  def stepThroughValuesJ(mandelbrotState: MandelbrotState): Unit = {
    val wid_ht = 16000
    val values = Array.ofDim[Double](16000)
    var i = 0
    while(i < wid_ht) {
      mandelbrotState.blkhole = values(i)
      i += 1
    }
  }

  @Benchmark
  @BenchmarkMode(Array(Mode.AverageTime)) @OutputTimeUnit(TimeUnit.MILLISECONDS)
  @Warmup(time = 15, iterations=10, timeUnit = TimeUnit.SECONDS)
  def stepThroughValuesC(mandelbrotState: MandelbrotState): Unit = {
    val wid_ht = 16000
    val values = Array.ofDim[Double](wid_ht)
    mandelbrotState.contexts(0).stepThroughValues.executeVoid(values, wid_ht.asInstanceOf[Object])
  }

  @Benchmark
  @BenchmarkMode(Array(Mode.AverageTime)) @OutputTimeUnit(TimeUnit.MILLISECONDS)
  @Warmup(time = 15, iterations=10, timeUnit = TimeUnit.SECONDS)
  def incrementValuesJ(blackhole: Blackhole): Unit = {
    val wid_ht = 16000
    val values = Array.ofDim[Double](16000)
    var i = 0
    while(i < wid_ht) {
      values(i) += 1
      i += 1
    }
    blackhole.consume(values)
  }

  @Benchmark
  @BenchmarkMode(Array(Mode.AverageTime)) @OutputTimeUnit(TimeUnit.MILLISECONDS)
  @Warmup(time = 15, iterations=10, timeUnit = TimeUnit.SECONDS)
  def incrementValuesC(mandelbrotState: MandelbrotState): Unit = {
    val wid_ht = 16000
    val values = Array.ofDim[Double](wid_ht)
    mandelbrotState.contexts(0).incrementValues.executeVoid(values, wid_ht.asInstanceOf[Object])
  }

and the c code:

void initValues(double* i0, long wid_ht) {
    for(long xy=0; xy<wid_ht; xy+=2)
    {
        i0[xy]    = 2.0 / wid_ht *  xy    - 1.0;
        i0[xy+1]  = 2.0 / wid_ht * (xy+1) - 1.0;
    }
}

double blkhole;

void stepThroughValues(double* i0, long wid_ht) {
    for(long xy = 0; xy < wid_ht; xy++) {
        blkhole = i0[xy];
    }
}

void incrementValues(double* i0, long wid_ht) {
    for(long xy=0; xy < wid_ht; xy++) {
        i0[xy] += 1;
    }
}

You can run the benches I wrote from this version of my benchmark repository: https://github.com/markehammons/languageshootout-jmh/commit/336523efd1591cd598df02315185e6deb13f9348

You will need an environmental variable GRAALVM_HOME set to the location of your graalvm. Then you will need to run sbt irCompile followed by sbt "jmh:run MandelbrotC.*Values"

Most helpful comment

Second issue should be fixed in https://github.com/oracle/graal/commit/dc37c9a244fc97a92b7b37f716e484cb4b7ee5ed

Looking a lot better already ;)

[info] Benchmark                       Mode  Cnt  Score    Error  Units
[info] MandelbrotC.incrementValuesC    avgt    5  0.031 ±  0.006  ms/op
[info] MandelbrotC.incrementValuesJ    avgt    5  0.021 ±  0.001  ms/op
[info] MandelbrotC.initValuesC         avgt    5  0.068 ±  0.001  ms/op
[info] MandelbrotC.initValuesJ         avgt    5  0.068 ±  0.003  ms/op
[info] MandelbrotC.stepThroughValuesC  avgt    5  0.023 ±  0.001  ms/op
[info] MandelbrotC.stepThroughValuesJ  avgt    5  0.016 ±  0.001  ms/op

All 5 comments

Thanks for the report!

This is actually two separate issues, one in Sulong, the other in the Java interop code.

The first issue is already fixed (https://github.com/graalvm/sulong/commit/abadb1c796d766733b8d95df1b86e76ae36dce56). This should fix the biggest chunk of the performance difference:

[info] Benchmark                       Mode  Cnt  Score    Error  Units
[info] MandelbrotC.incrementValuesC    avgt    5  1.543 ±  0.249  ms/op
[info] MandelbrotC.incrementValuesJ    avgt    5  0.022 ±  0.003  ms/op
[info] MandelbrotC.initValuesC         avgt    5  0.523 ±  0.029  ms/op
[info] MandelbrotC.initValuesJ         avgt    5  0.068 ±  0.006  ms/op
[info] MandelbrotC.stepThroughValuesC  avgt    5  0.979 ±  0.013  ms/op
[info] MandelbrotC.stepThroughValuesJ  avgt    5  0.016 ±  0.001  ms/op

I'm currently working on the second issue in the Java interop code. I'm confident we can get the C performance very close, if not equal, to the Java performance.

@rschatz Great! I can't wait to see the results! Thanks for the quick work!

Second issue should be fixed in https://github.com/oracle/graal/commit/dc37c9a244fc97a92b7b37f716e484cb4b7ee5ed

Looking a lot better already ;)

[info] Benchmark                       Mode  Cnt  Score    Error  Units
[info] MandelbrotC.incrementValuesC    avgt    5  0.031 ±  0.006  ms/op
[info] MandelbrotC.incrementValuesJ    avgt    5  0.021 ±  0.001  ms/op
[info] MandelbrotC.initValuesC         avgt    5  0.068 ±  0.001  ms/op
[info] MandelbrotC.initValuesJ         avgt    5  0.068 ±  0.003  ms/op
[info] MandelbrotC.stepThroughValuesC  avgt    5  0.023 ±  0.001  ms/op
[info] MandelbrotC.stepThroughValuesJ  avgt    5  0.016 ±  0.001  ms/op

looking forward to testing it. I'm guessing this will be put into graalvm-ce-1.0.0-rc4?

I guess in theory i could build graalvm myself.

It will be in the release targeted for beginning of August.

Was this page helpful?
0 / 5 - 0 ratings