Glow: [Bug] GemmTest error

Created on 18 Sep 2018  Â·  16Comments  Â·  Source: pytorch/glow

cmake -G Ninja -DCMAKE_BUILD_TYPE=Debug -DGLOW_WITH_OPENCL=0 ../glow

"ninja all" seems OK:
[1/167] Building CXX object lib/Backends/CPU/CMakeFiles/CPURuntime.dir/libjit/libjit.cpp.o
[2/167] Building CXX object lib/Backends/CPU/CMakeFiles/CPURuntime.dir/libjit/libjit_conv.cpp.o
[3/167] Building CXX object lib/Backends/CPU/CMakeFiles/CPURuntime.dir/libjit/libjit_matmul.cpp.o
[4/167] Building CXX object lib/Backends/CPU/CMakeFiles/CPURuntimeNative.dir/libjit/libjit.cpp.o
... skip ...
[162/167] Building CXX object tests/unittests/CMakeFiles/GemmTest.dir/GemmTest.cpp.o
[163/167] Building CXX object tests/unittests/CMakeFiles/onnxImporterTest.dir/onnxImporterTest.cpp.o
[164/167] Linking CXX executable tests/caffe2ImporterTest
[165/167] Linking CXX executable tests/BackendCorrectnessTest
[166/167] Linking CXX executable tests/GemmTest
In file included from /home/walker/glow/glow/tests/unittests/onnxImporterTest.cpp:21:0:
/home/walker/glow/glow/tests/googletest/googletest/include/gtest/gtest.h: In instantiation of ‘testing::AssertionResult testing::internal::CmpHelperEQ(const char, const char, const T1&, const T2&) [with T1 = long unsigned int; T2 = int]’:
/home/walker/glow/glow/tests/googletest/googletest/include/gtest/gtest.h:1421:23: required from ‘static testing::AssertionResult testing::internal::EqHelper::Compare(const char, const char, const T1&, const T2&) [with T1 = long unsigned int; T2 = int; bool lhs_is_null_literal = false]’
/home/walker/glow/glow/tests/unittests/onnxImporterTest.cpp:53:3: required from here
/home/walker/glow/glow/tests/googletest/googletest/include/gtest/gtest.h:1392:11: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
if (lhs == rhs) {
^
[167/167] Linking CXX executable tests/onnxImporterTest

ninja test:
[1/1] Running tests...
Test project /home/walker/glow/build_Debug
Start 1: partitionTest
1/20 Test #1: partitionTest .................... Passed 0.05 sec
Start 2: tensorsTest
2/20 Test #2: tensorsTest ...................... Passed 0.43 sec
Start 3: gradCheckTest
3/20 Test #3: gradCheckTest .................... Passed 4.04 sec
Start 4: IROptTest
4/20 Test #4: IROptTest ........................ Passed 0.08 sec
Start 5: basicIRTest
5/20 Test #5: basicIRTest ...................... Passed 0.07 sec
Start 6: backendTest
6/20 Test #6: backendTest ...................... Passed 1.32 sec
Start 7: MLTest
7/20 Test #7: MLTest ........................... Passed 22.99 sec
Start 8: operatorTest
8/20 Test #8: operatorTest ..................... Passed 4.51 sec
Start 9: graphTest
9/20 Test #9: graphTest ........................ Passed 0.83 sec
Start 10: graphGradTest
10/20 Test #10: graphGradTest .................... Passed 0.03 sec
Start 11: graphOptzTest
11/20 Test #11: graphOptzTest .................... Passed 0.18 sec
Start 12: graphSchedulerTest
12/20 Test #12: graphSchedulerTest ............... Passed 0.06 sec
Start 13: quantizationTest
13/20 Test #13: quantizationTest ................. Passed 1.08 sec
Start 14: UtilsTest
14/20 Test #14: UtilsTest ........................ Passed 0.04 sec
Start 15: BackendCorrectnessTest
15/20 Test #15: BackendCorrectnessTest ........... Passed 14.09 sec
Start 16: GemmTest
16/20 Test #16: GemmTest .........................*Failed 0.03 sec
Start 17: LLVMIRGenTest
17/20 Test #17: LLVMIRGenTest .................... Passed 0.00 sec
Start 18: memoryAllocatorTest
18/20 Test #18: memoryAllocatorTest .............. Passed 0.25 sec
Start 19: caffe2ImporterTest
19/20 Test #19: caffe2ImporterTest ............... Passed 0.09 sec
Start 20: onnxImporterTest
20/20 Test #20: onnxImporterTest ................. Passed 0.01 sec

95% tests passed, 1 tests failed out of 20

Total Test time (real) = 50.25 sec

The following tests FAILED:
16 - GemmTest (Failed)
FAILED: cd /home/walker/glow/build_Debug && /usr/local/bin/ctest --force-new-ctest-process
ninja: build stopped: subcommand failed.

Why the GemmTest failed? Can you help me?

bug

All 16 comments

16/20 Test #16: GemmTest .........................*Failed 0.03 sec

@HKLee2040 Can you copy/paste the exact failure? It should be in the test output (Testing/Temporary/...)

LastTest.log:

16/20 Testing: GemmTest
16/20 Test: GemmTest
Command: "/home/walker/glow/build_Debug/tests/GemmTest"
Directory: /home/walker/glow/build_Debug/tests/unittests
"GemmTest" start time: Sep 18 09:37 CST

Output:

[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Gemm
[ RUN ] Gemm.jitTest
/home/walker/glow/glow/tests/unittests/GemmTest.cpp:83: Failure
Value of: out1.isEqual(out2, 0.001)
Actual: false
Expected: true
/home/walker/glow/glow/tests/unittests/GemmTest.cpp:83: Failure
Value of: out1.isEqual(out2, 0.001)
Actual: false
Expected: true
/home/walker/glow/glow/tests/unittests/GemmTest.cpp:83: Failure
Value of: out1.isEqual(out2, 0.001)
Actual: false
Expected: true
/home/walker/glow/glow/tests/unittests/GemmTest.cpp:83: Failure
Value of: out1.isEqual(out2, 0.001)
Actual: false
Expected: true
/home/walker/glow/glow/tests/unittests/GemmTest.cpp:83: Failure
Value of: out1.isEqual(out2, 0.001)
Actual: false
Expected: true
/home/walker/glow/glow/tests/unittests/GemmTest.cpp:83: Failure
Value of: out1.isEqual(out2, 0.001)
Actual: false
Expected: true
[ FAILED ] Gemm.jitTest (28 ms)
[----------] 1 test from Gemm (28 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (28 ms total)
[ PASSED ] 0 tests.
[ FAILED ] 1 test, listed below:
[ FAILED ] Gemm.jitTest

1 FAILED TEST

Test time = 0.03 sec

Test Failed.
"GemmTest" end time: Sep 18 09:37 CST

"GemmTest" time elapsed: 00:00:00

This seems weird. Can you print some values from comparison? (Certainly cannot reproduce this locally or on CI, assuming you try to run that on latest master).

1: -22.137230 (out1) -22.137230 (ou2)
2: 129.427002 129.427002
....
2390: 0.000000 -5.027690 ==> failure
/home/walker/glow/glow/tests/unittests/GemmTest.cpp:83: Failure
Value of: out1.isEqual(out2, 0.001)
Actual: false
Expected: true
1: -42.782421 -42.782421
2: 0.000000 (out1) 10.534184 (out2) ==> failure
....

There are 6 failed comparison in the test and out1 are all zero.

I got my code using following command (today):
git clone https://github.com/pytorch/glow.git

@HKLee2040 what's LLVM version you are using?

@bertmaher any other ideas on what could be wrong here?

@rdzhabarov
It's LLVM 6.0.

If I modify the dimension of input matrix from 1024 to 31, then it can pass the test.
If the dimension larger than 31, e.g., 32, the test is failed.

TEST(Gemm, jitTest) {              // line 36 of GemmTest.cpp
  PseudoRNG PRNG;

  for (size_t m : {1, 4, 5, 8}) {
    for (size_t n : {1, 16, 17, 31}) {
      for (size_t k : {1, 3}) {
             ........ skip
      }
     }
    }

@HKLee2040 Thanks for working on this. Our matrix multiplication code is implemented here:

https://github.com/pytorch/glow/blob/master/lib/Backends/CPU/libjit/libjit_matmul.cpp in the function:
libjit_matmul_f

I wonder if one of our fast-path implementations have a bug in them. I would try to disable packing, vectorizing and try to fall back to the dumb 3-loop implementation. Another possibility is that our fast-math implementation perturbs the values and we simply need to increase the delta. It could be useful to print the values and check the delta. Are we almost there or totally off?

One failure case:
lhs: 5 1
rhs: 1 32
out: 5 32

lhs:
2.07712
7.90869
0.09933
4.40183
3.58721

rhs:
4.10583 5.17816 -5.11624 4.17190 4.08716 -5.74890 2.37430 -5.17158 -1.44285 -1.05856 -0.96462 2.40617 4.05037 4.43291 10.00989 0.38496 2.16319 7.14769 5.58105 5.48109 -2.27257 9.58585 -0.84964 2.41388 5.97719 -0.96761 4.90065 -4.56768 4.87285 3.71972 -5.00699 6.47236

output
8.52830 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 -2.99697 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 4.49319 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 12.41533 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
32.47178 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 -11.41106 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 17.10797 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 47.27177 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
0.40782 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 -0.14331 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.21486 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.59369 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
18.07318 22.79339 -22.52084 18.36402 17.99098 -25.30567 10.45126 -22.76442 -6.35118 -4.65962 -4.24610 10.59157 17.82903 19.51292 44.06184 1.69451 9.52197 31.46293 24.56682 24.12682 -10.00349 42.19529 -3.73996 10.62548 26.31058 -4.25926 21.57184 -20.10616 21.44947 16.37359 -22.03990 28.49022
14.72850 18.57518 -18.35306 14.96552 14.66152 -20.62253 8.51712 -18.55157 -5.17581 -3.79730 -3.46031 8.63146 14.52954 15.90180 35.90762 1.38092 7.75981 25.64031 20.02041 19.66184 -8.15221 34.38650 -3.04784 8.65909 21.44147 -3.47103 17.57969 -16.38525 17.47996 13.34344 -17.96113 23.21774

I think increase the delta should not work.

@HKLee2040 This is good information. We'll need to debug libjit_matmul_f and check what breaks. Notice that you could use printf inside this function.

My machine seems not to support float8 in "vectorized dot product".
Only the first element of float8 is calculated such that I got the following result:
8.52830 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 -2.99697 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000

And the lower part of my result is calculated by "libjit_matmul_odd", and the result is correct.
18.07318 22.79339 -22.52084 ...
14.72850 18.57518 -18.35306 ...

So, what system requirements do I need to support float8?

Good catch! Oh, wow, this is interesting. This is really strange. What operating system and compiler are you using? I wonder if gcc just ignores the vector pragmas that turn float8 to a vector of floats.

ubuntu 16.04

-- The C compiler identification is GNU 5.4.0
-- The CXX compiler identification is GNU 5.4.0

Ah, we need to define float16 using the gcc vector extensions if we use gcc as our host compiler:

https://gcc.gnu.org/onlinedocs/gcc-4.7.2/gcc/Vector-Extensions.html

Great finding @HKLee2040 !

Wow, great find! Writing portable, optimized, vectorized code is hard :-)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

pjaaskel picture pjaaskel  Â·  4Comments

qcolombet picture qcolombet  Â·  5Comments

stoklund picture stoklund  Â·  5Comments

dati91 picture dati91  Â·  3Comments

opti-mix picture opti-mix  Â·  4Comments