Openj9: MathLoadTest+LockingLoadTest Systemtests failing on HotSpot VMs (8 and 9) on x86

Created on 23 Jan 2018 · 8Comments · Source: eclipse/openj9

Issue as follows - need to determine correct behaviour and decide which VM is giving the correct result, and adjust test accordingly. Currently test is set up with the expected result from J9

MLT testStarted : testExp(net.adoptopenjdk.test.math.MathAPITest)
MLT testFailure: testExp(net.adoptopenjdk.test.math.MathAPITest): exp(double)[9] :: expected:<11.3842406513381> but was:<11.384240651338098>
MLT junit.framework.AssertionFailedError: exp(double)[9] :: expected:<11.3842406513381> but was:<11.384240651338098>
MLT at junit.framework.Assert.fail(Assert.java:57)

Standalone test case that should show the problem:


import java.math.*;

public class Exp
{
 public static void main(String[] args)
 {
   String rc_String;
   rc_String = Double.toString(Math.exp(-1.2232D));
   if ( rc_String.equals("0.2942869403572507") ) {
     System.out.println ("Test 1 passed: Expected 0.2942869403572507, got " + rc_String);
   }
   else {
     System.out.println ("Test 1 failed: Expected 0.2942869403572507 (value returned by OpenJDK), got " + rc_String);
   }
 }
}

Build links:
https://ci.adoptopenjdk.net/view/System%20tests/job/openjdk9_hs_systemtest_x86-64_linux/6/
https://ci.adoptopenjdk.net/view/System%20tests/job/openjdk8_hs_systemtest_x86-64_linux/4/

question

Source

Mesbah-Alam

Most helpful comment

While it may be permissible to return different values for the same mathematical operation on different implementations, a user may simply (and not unreasonably) expect the result of the operation to be the same across all java platforms,

As Peter says, users who want that should be using StrictMath. Different compilers, OSes, CPUs, etc may result in different values which are still within the allowed bounds of the Math operations.

The test should validate that the result is within 1 ulp as required by the spec. Someone (Mark?) pointed out that Math has helper methods to help with this validation. The test should be using those methods to determine if the result is "valid"

DanHeidinga on 31 Jan 2018

👍2

All 8 comments

Related issue in openjdk-systemtest repo: https://github.com/AdoptOpenJDK/openjdk-systemtest/issues/8

Mesbah-Alam on 23 Jan 2018

The results from both JVMs are correct. Please check the spec for java.lang.Math.

Using java.lang.StrictMath, for this particular test case the result is "0.29428694035725067", which matches the OpenJ9 result.

The spec for Math.exp(double) states The computed result must be within 1 ulp of the exact result. Results must be semi-monotonic.

The spec for java.lang.Math explains this:

Unlike some of the numeric methods of class StrictMath, all implementations of the equivalent functions of class Math are not defined to return the bit-for-bit same results. This relaxation permits better-performing implementations where strict reproducibility is not required.

By default many of the Math methods simply call the equivalent method in StrictMath for their implementation. Code generators are encouraged to use platform-specific native libraries or microprocessor instructions, where available, to provide higher-performance implementations of Math methods. Such higher-performance implementations still must conform to the specification for Math.

The quality of implementation specifications concern two properties, accuracy of the returned result and monotonicity of the method. Accuracy of the floating-point Math methods is measured in terms of ulps, units in the last place. For a given floating-point format, an ulp of a specific real number value is the distance between the two floating-point values bracketing that numerical value. When discussing the accuracy of a method as a whole rather than at a specific argument, the number of ulps cited is for the worst-case error at any argument. If a method always has an error less than 0.5 ulps, the method always returns the floating-point number nearest the exact result; such a method is correctly rounded. A correctly rounded method is generally the best a floating-point approximation can be; however, it is impractical for many floating-point methods to be correctly rounded. Instead, for the Math class, a larger error bound of 1 or 2 ulps is allowed for certain methods. Informally, with a 1 ulp error bound, when the exact result is a representable number, the exact result should be returned as the computed result; otherwise, either of the two floating-point values which bracket the exact result may be returned. For exact results large in magnitude, one of the endpoints of the bracket may be infinite. Besides accuracy at individual arguments, maintaining proper relations between the method at different arguments is also important. Therefore, most methods with more than 0.5 ulp errors are required to be semi-monotonic: whenever the mathematical function is non-decreasing, so is the floating-point approximation, likewise, whenever the mathematical function is non-increasing, so is the floating-point approximation. Not all approximations that have 1 ulp accuracy will automatically meet the monotonicity requirements.

pshipton on 23 Jan 2018

👍2

@lumpfish FYI

sxa on 25 Jan 2018

While it may be permissible to return different values for the same mathematical operation on different implementations, a user may simply (and not unreasonably) expect the result of the operation to be the same across all java platforms, and for that result to be that of the reference implementation. What is not clear to me from the description above is whether it would be possible for openjdk-openj9 to mimic the behavior of openjdk-hotspot.
I guess in theory an application could be developed on one platform and executed in production on another when the difference might result in a critical failure.
Regarding the failing test case, the options appear to be:

Make the JVMs behave the same
Change the test case to do its own rounding so as to compare like with like
Change the test case to query the JVM implementation and change the expected result value accordingly.

lumpfish on 31 Jan 2018

👍1

I suppose even OpenJDK might return different results on different platforms. Users who want consistent results across all JVMs should use StrictMath instead of Math.

pshipton on 31 Jan 2018

👍1

While it may be permissible to return different values for the same mathematical operation on different implementations, a user may simply (and not unreasonably) expect the result of the operation to be the same across all java platforms,

As Peter says, users who want that should be using StrictMath. Different compilers, OSes, CPUs, etc may result in different values which are still within the allowed bounds of the Math operations.

DanHeidinga on 31 Jan 2018

👍2

Yes, using Math.nextUp(double d) in the example code above makes the example code pass (e.g. rc_String = Double.toString(Math.nextUp(Math.exp(-1.2232D)));).

But this may not be a reliable test. Since the value obtained from Oracle can also be within 1 ulp of the exact. Hence, if we calculate 1 ulp for the IBM result based on Oracle's result (which is taken as the 'reference' value in this test), we may end up accepting an IBM result which is more than 1 ulp from the exact value, for example.

I asked Robert Enenkel about the issue and he gave the following explanation:

...suppose your result is x, Oracle's is o, and the exact mathematical result is e. Then (omitting some complications that occur when e is near a power of 2), suppose f0
f0 < f1 < e < f2 < f3,

f1-f0 = f2-f1 = f3-f2 = 1 ulp of e.

The only FP numbers that are within 1 ulp of e are f1 and f2.

Since o is assumed accurate to 1 ulp, we have

o = f1 or o = f2.

If x = o, then x is within 1 ulp of e and you can conclude x is ok.

But what if x != o? o is either f1 or f2, but you don't know which without access to a more accurate value. You can't simply check whether

o < x < nextUP(o).

This would work if o = f1, but we don't know if it is. If o = f2, then the above test translates to

f2 < x < f3,

which would allow an x that is > 1 ulp from e.

Conversely, you cannot simply check whether

nextDOWN(o) < x < o,

since if o = f1, this would translate to

f0 < x < f1,

which is also not what you want.

Given only Oracle's results as a reference, the only situations in which you can conclude that your value is or is not good to 1 ulp are:

If x = o then x is good to 1 ulp.

If x < nextDOWN(o) or x > nextUP(o) then x is not good to 1 ulp.

But if nextDown(o) <= x <= nextUP(o) you cannot conclude anything about x without a higher accuracy reference.

What you could do is generate a list of test arguments, and a corresponding list of allowable pairs of functions values (f1,f2) such that your value is good if f1 <= x <= f2. Generating this list would require a high accuracy reference function, but it would not have to be available to the open source program doing the test.

Mesbah-Alam on 2 Feb 2018

Here's the changeset where I updated the validation to :
assertTrue("exp(double)[9] ::", (Math.nextDown(Math.exp(2.43223D)) <= 11.3842406513381) || Math.nextUp(Math.exp(2.43223D)) >= 11.3842406513381);
(i.e., if nextDown(o) <= x <= nextUP(o) )