Glow: [Bug] GemmTest error

Created on 18 Sep 2018 · 16Comments · Source: pytorch/glow

cmake -G Ninja -DCMAKE_BUILD_TYPE=Debug -DGLOW_WITH_OPENCL=0 ../glow

"ninja all" seems OK:
[1/167] Building CXX object lib/Backends/CPU/CMakeFiles/CPURuntime.dir/libjit/libjit.cpp.o
[2/167] Building CXX object lib/Backends/CPU/CMakeFiles/CPURuntime.dir/libjit/libjit_conv.cpp.o
[3/167] Building CXX object lib/Backends/CPU/CMakeFiles/CPURuntime.dir/libjit/libjit_matmul.cpp.o
[4/167] Building CXX object lib/Backends/CPU/CMakeFiles/CPURuntimeNative.dir/libjit/libjit.cpp.o
... skip ...
[162/167] Building CXX object tests/unittests/CMakeFiles/GemmTest.dir/GemmTest.cpp.o
[163/167] Building CXX object tests/unittests/CMakeFiles/onnxImporterTest.dir/onnxImporterTest.cpp.o
[164/167] Linking CXX executable tests/caffe2ImporterTest
[165/167] Linking CXX executable tests/BackendCorrectnessTest
[166/167] Linking CXX executable tests/GemmTest
In file included from /home/walker/glow/glow/tests/unittests/onnxImporterTest.cpp:21:0:
/home/walker/glow/glow/tests/googletest/googletest/include/gtest/gtest.h: In instantiation of ‘testing::AssertionResult testing::internal::CmpHelperEQ(const char, const char, const T1&, const T2&) [with T1 = long unsigned int; T2 = int]’:
/home/walker/glow/glow/tests/googletest/googletest/include/gtest/gtest.h:1421:23: required from ‘static testing::AssertionResult testing::internal::EqHelper::Compare(const char, const char, const T1&, const T2&) [with T1 = long unsigned int; T2 = int; bool lhs_is_null_literal = false]’
/home/walker/glow/glow/tests/unittests/onnxImporterTest.cpp:53:3: required from here
/home/walker/glow/glow/tests/googletest/googletest/include/gtest/gtest.h:1392:11: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
if (lhs == rhs) {
^
[167/167] Linking CXX executable tests/onnxImporterTest

ninja test:
[1/1] Running tests...
Test project /home/walker/glow/build_Debug
Start 1: partitionTest
1/20 Test #1: partitionTest .................... Passed 0.05 sec
Start 2: tensorsTest
2/20 Test #2: tensorsTest ...................... Passed 0.43 sec
Start 3: gradCheckTest
3/20 Test #3: gradCheckTest .................... Passed 4.04 sec
Start 4: IROptTest
4/20 Test #4: IROptTest ........................ Passed 0.08 sec
Start 5: basicIRTest
5/20 Test #5: basicIRTest ...................... Passed 0.07 sec
Start 6: backendTest
6/20 Test #6: backendTest ...................... Passed 1.32 sec
Start 7: MLTest
7/20 Test #7: MLTest ........................... Passed 22.99 sec
Start 8: operatorTest
8/20 Test #8: operatorTest ..................... Passed 4.51 sec
Start 9: graphTest
9/20 Test #9: graphTest ........................ Passed 0.83 sec
Start 10: graphGradTest
10/20 Test #10: graphGradTest .................... Passed 0.03 sec
Start 11: graphOptzTest
11/20 Test #11: graphOptzTest .................... Passed 0.18 sec
Start 12: graphSchedulerTest
12/20 Test #12: graphSchedulerTest ............... Passed 0.06 sec
Start 13: quantizationTest
13/20 Test #13: quantizationTest ................. Passed 1.08 sec
Start 14: UtilsTest
14/20 Test #14: UtilsTest ........................ Passed 0.04 sec
Start 15: BackendCorrectnessTest
15/20 Test #15: BackendCorrectnessTest ........... Passed 14.09 sec
Start 16: GemmTest
16/20 Test #16: GemmTest .........................*Failed 0.03 sec
Start 17: LLVMIRGenTest
17/20 Test #17: LLVMIRGenTest .................... Passed 0.00 sec
Start 18: memoryAllocatorTest
18/20 Test #18: memoryAllocatorTest .............. Passed 0.25 sec
Start 19: caffe2ImporterTest
19/20 Test #19: caffe2ImporterTest ............... Passed 0.09 sec
Start 20: onnxImporterTest
20/20 Test #20: onnxImporterTest ................. Passed 0.01 sec

95% tests passed, 1 tests failed out of 20

Total Test time (real) = 50.25 sec

The following tests FAILED:
16 - GemmTest (Failed)
FAILED: cd /home/walker/glow/build_Debug && /usr/local/bin/ctest --force-new-ctest-process
ninja: build stopped: subcommand failed.

Why the GemmTest failed? Can you help me?

bug

Source

HKLee2040

👍1

All 16 comments

16/20 Test #16: GemmTest .........................*Failed 0.03 sec

@HKLee2040 Can you copy/paste the exact failure? It should be in the test output (Testing/Temporary/...)

rdzhabarov on 18 Sep 2018

LastTest.log:

16/20 Testing: GemmTest
16/20 Test: GemmTest
Command: "/home/walker/glow/build_Debug/tests/GemmTest"
Directory: /home/walker/glow/build_Debug/tests/unittests
"GemmTest" start time: Sep 18 09:37 CST

Output:

[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Gemm
[ RUN ] Gemm.jitTest
/home/walker/glow/glow/tests/unittests/GemmTest.cpp:83: Failure
Value of: out1.isEqual(out2, 0.001)
Actual: false
Expected: true
/home/walker/glow/glow/tests/unittests/GemmTest.cpp:83: Failure
Value of: out1.isEqual(out2, 0.001)
Actual: false
Expected: true
/home/walker/glow/glow/tests/unittests/GemmTest.cpp:83: Failure
Value of: out1.isEqual(out2, 0.001)
Actual: false
Expected: true
/home/walker/glow/glow/tests/unittests/GemmTest.cpp:83: Failure
Value of: out1.isEqual(out2, 0.001)
Actual: false
Expected: true
/home/walker/glow/glow/tests/unittests/GemmTest.cpp:83: Failure
Value of: out1.isEqual(out2, 0.001)
Actual: false
Expected: true
/home/walker/glow/glow/tests/unittests/GemmTest.cpp:83: Failure
Value of: out1.isEqual(out2, 0.001)
Actual: false
Expected: true
[ FAILED ] Gemm.jitTest (28 ms)
[----------] 1 test from Gemm (28 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (28 ms total)
[ PASSED ] 0 tests.
[ FAILED ] 1 test, listed below:
[ FAILED ] Gemm.jitTest

1 FAILED TEST

Test time = 0.03 sec

Test Failed.
"GemmTest" end time: Sep 18 09:37 CST

"GemmTest" time elapsed: 00:00:00

HKLee2040 on 18 Sep 2018

This seems weird. Can you print some values from comparison? (Certainly cannot reproduce this locally or on CI, assuming you try to run that on latest master).

rdzhabarov on 18 Sep 2018

1: -22.137230 (out1) -22.137230 (ou2)
2: 129.427002 129.427002
....
2390: 0.000000 -5.027690 ==> failure
/home/walker/glow/glow/tests/unittests/GemmTest.cpp:83: Failure
Value of: out1.isEqual(out2, 0.001)
Actual: false
Expected: true
1: -42.782421 -42.782421
2: 0.000000 (out1) 10.534184 (out2) ==> failure
....

There are 6 failed comparison in the test and out1 are all zero.

HKLee2040 on 18 Sep 2018

I got my code using following command (today):
git clone https://github.com/pytorch/glow.git

HKLee2040 on 18 Sep 2018

@HKLee2040 what's LLVM version you are using?

@bertmaher any other ideas on what could be wrong here?

rdzhabarov on 18 Sep 2018

@rdzhabarov
It's LLVM 6.0.

HKLee2040 on 19 Sep 2018

If I modify the dimension of input matrix from 1024 to 31, then it can pass the test.
If the dimension larger than 31, e.g., 32, the test is failed.

TEST(Gemm, jitTest) {              // line 36 of GemmTest.cpp
  PseudoRNG PRNG;

  for (size_t m : {1, 4, 5, 8}) {
    for (size_t n : {1, 16, 17, 31}) {
      for (size_t k : {1, 3}) {
             ........ skip
      }
     }
    }

HKLee2040 on 19 Sep 2018

👍1

@HKLee2040 Thanks for working on this. Our matrix multiplication code is implemented here:

https://github.com/pytorch/glow/blob/master/lib/Backends/CPU/libjit/libjit_matmul.cpp in the function:
libjit_matmul_f

I wonder if one of our fast-path implementations have a bug in them. I would try to disable packing, vectorizing and try to fall back to the dumb 3-loop implementation. Another possibility is that our fast-math implementation perturbs the values and we simply need to increase the delta. It could be useful to print the values and check the delta. Are we almost there or totally off?

nadavrot on 19 Sep 2018

One failure case:
lhs: 5 1
rhs: 1 32
out: 5 32

lhs:
2.07712
7.90869
0.09933
4.40183
3.58721

rhs:
4.10583 5.17816 -5.11624 4.17190 4.08716 -5.74890 2.37430 -5.17158 -1.44285 -1.05856 -0.96462 2.40617 4.05037 4.43291 10.00989 0.38496 2.16319 7.14769 5.58105 5.48109 -2.27257 9.58585 -0.84964 2.41388 5.97719 -0.96761 4.90065 -4.56768 4.87285 3.71972 -5.00699 6.47236

output
8.52830 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 -2.99697 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 4.49319 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 12.41533 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
32.47178 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 -11.41106 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 17.10797 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 47.27177 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
0.40782 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 -0.14331 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.21486 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.59369 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
18.07318 22.79339 -22.52084 18.36402 17.99098 -25.30567 10.45126 -22.76442 -6.35118 -4.65962 -4.24610 10.59157 17.82903 19.51292 44.06184 1.69451 9.52197 31.46293 24.56682 24.12682 -10.00349 42.19529 -3.73996 10.62548 26.31058 -4.25926 21.57184 -20.10616 21.44947 16.37359 -22.03990 28.49022
14.72850 18.57518 -18.35306 14.96552 14.66152 -20.62253 8.51712 -18.55157 -5.17581 -3.79730 -3.46031 8.63146 14.52954 15.90180 35.90762 1.38092 7.75981 25.64031 20.02041 19.66184 -8.15221 34.38650 -3.04784 8.65909 21.44147 -3.47103 17.57969 -16.38525 17.47996 13.34344 -17.96113 23.21774

I think increase the delta should not work.

HKLee2040 on 19 Sep 2018

@HKLee2040 This is good information. We'll need to debug libjit_matmul_f and check what breaks. Notice that you could use printf inside this function.

nadavrot on 20 Sep 2018

My machine seems not to support float8 in "vectorized dot product".
Only the first element of float8 is calculated such that I got the following result:
8.52830 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 -2.99697 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000

And the lower part of my result is calculated by "libjit_matmul_odd", and the result is correct.
18.07318 22.79339 -22.52084 ...
14.72850 18.57518 -18.35306 ...

So, what system requirements do I need to support float8?

HKLee2040 on 20 Sep 2018

Good catch! Oh, wow, this is interesting. This is really strange. What operating system and compiler are you using? I wonder if gcc just ignores the vector pragmas that turn float8 to a vector of floats.

nadavrot on 20 Sep 2018

ubuntu 16.04

-- The C compiler identification is GNU 5.4.0
-- The CXX compiler identification is GNU 5.4.0

HKLee2040 on 20 Sep 2018

👍1

Ah, we need to define float16 using the gcc vector extensions if we use gcc as our host compiler:

https://gcc.gnu.org/onlinedocs/gcc-4.7.2/gcc/Vector-Extensions.html

Great finding @HKLee2040 !

nadavrot on 20 Sep 2018

👍1

Wow, great find! Writing portable, optimized, vectorized code is hard :-)