Caffe: OpenCL branch: runtest fails NetTest/0 on CPUDevice<float>

Created on 23 May 2017  路  2Comments  路  Source: BVLC/caffe

Issue summary

I ran "make -j8 runtest" and noticed that a single test out of 2000 or so fails. This is a CPU test. There seems to be a slight difference in the numerical output, by eye it seems less than <0.001. The error persists if I use openBlas instead of Atlas (using update-alternatives).

Steps to reproduce

```
----------] 26 tests from NetTest/0, where TypeParam = caffe::CPUDevice
[ RUN ] NetTest/0.TestReshape
[ OK ] NetTest/0.TestReshape (1 ms)
[ RUN ] NetTest/0.TestAllInOneNetDeploy
[ OK ] NetTest/0.TestAllInOneNetDeploy (0 ms)
[ RUN ] NetTest/0.TestUnsharedWeightsDataNet
[ OK ] NetTest/0.TestUnsharedWeightsDataNet (0 ms)
[ RUN ] NetTest/0.TestSharedWeightsDataNet
[ OK ] NetTest/0.TestSharedWeightsDataNet (1 ms)
[ RUN ] NetTest/0.TestComboLossWeight
[ OK ] NetTest/0.TestComboLossWeight (2 ms)
[ RUN ] NetTest/0.TestFromTo
[ OK ] NetTest/0.TestFromTo (2 ms)
[ RUN ] NetTest/0.TestBottomNeedBackward
[ OK ] NetTest/0.TestBottomNeedBackward (0 ms)
[ RUN ] NetTest/0.TestLossWeight
[ OK ] NetTest/0.TestLossWeight (3 ms)
[ RUN ] NetTest/0.TestLossWeightMidNet
[ OK ] NetTest/0.TestLossWeightMidNet (3 ms)
[ RUN ] NetTest/0.TestUnsharedWeightsDiffNet
[ OK ] NetTest/0.TestUnsharedWeightsDiffNet (1 ms)
[ RUN ] NetTest/0.TestBottomNeedBackwardEuclideanForce
[ OK ] NetTest/0.TestBottomNeedBackwardEuclideanForce (0 ms)
[ RUN ] NetTest/0.TestSharedWeightsUpdate
src/caffe/test/test_net.cpp:1282: Failure
Value of: shared_params.cpu_diff()[i]
Actual: 15.553734
Expected: ip1_weights->cpu_diff()[i] + ip2_weights->cpu_diff()[i]
Which is: 15.553722
src/caffe/test/test_net.cpp:1282: Failure
Value of: shared_params.cpu_diff()[i]
Actual: 6.0058942
Expected: ip1_weights->cpu_diff()[i] + ip2_weights->cpu_diff()[i]
Which is: 6.0058975
src/caffe/test/test_net.cpp:1282: Failure
Value of: shared_params.cpu_diff()[i]
Actual: 13.251015
Expected: ip1_weights->cpu_diff()[i] + ip2_weights->cpu_diff()[i]
Which is: 13.251003
src/caffe/test/test_net.cpp:1282: Failure
Value of: shared_params.cpu_diff()[i]
Actual: 13.776414
Expected: ip1_weights->cpu_diff()[i] + ip2_weights->cpu_diff()[i]
Which is: 13.776409
src/caffe/test/test_net.cpp:1282: Failure
Value of: shared_params.cpu_diff()[i]
Actual: 3.7031746
Expected: ip1_weights->cpu_diff()[i] + ip2_weights->cpu_diff()[i]
Which is: 3.7031784
src/caffe/test/test_net.cpp:1282: Failure
Value of: shared_params.cpu_diff()[i]
Actual: 11.618912
Expected: ip1_weights->cpu_diff()[i] + ip2_weights->cpu_diff()[i]
Which is: 11.618902
src/caffe/test/test_net.cpp:1282: Failure
Value of: shared_params.cpu_diff()[i]
Actual: 14.129814
Expected: ip1_weights->cpu_diff()[i] + ip2_weights->cpu_diff()[i]
Which is: 14.129803
src/caffe/test/test_net.cpp:1282: Failure
Value of: shared_params.cpu_diff()[i]
Actual: 14.655213
Expected: ip1_weights->cpu_diff()[i] + ip2_weights->cpu_diff()[i]
Which is: 14.655209
src/caffe/test/test_net.cpp:1282: Failure
Value of: shared_params.cpu_diff()[i]
Actual: 4.5819745
Expected: ip1_weights->cpu_diff()[i] + ip2_weights->cpu_diff()[i]
Which is: 4.5819778
src/caffe/test/test_net.cpp:1282: Failure
Value of: shared_params.cpu_diff()[i]
Actual: 13.396614
Expected: ip1_weights->cpu_diff()[i] + ip2_weights->cpu_diff()[i]
Which is: 13.396606
src/caffe/test/test_net.cpp:1282: Failure
Value of: shared_params.cpu_diff()[i]
Actual: 9.9528046
Expected: ip1_weights->cpu_diff()[i] + ip2_weights->cpu_diff()[i]
Which is: 9.9527969
src/caffe/test/test_net.cpp:1282: Failure
Value of: shared_params.cpu_diff()[i]
Actual: 0.40497398
Expected: ip1_weights->cpu_diff()[i] + ip2_weights->cpu_diff()[i]
Which is: 0.40497208
src/caffe/test/test_net.cpp:1282: Failure
Value of: shared_params.cpu_diff()[i]
Actual: 6.8997421
Expected: ip1_weights->cpu_diff()[i] + ip2_weights->cpu_diff()[i]
Which is: 6.8997383
src/caffe/test/test_net.cpp:1282: Failure
Value of: shared_params.cpu_diff()[i]
Actual: 7.425149
Expected: ip1_weights->cpu_diff()[i] + ip2_weights->cpu_diff()[i]
Which is: 7.4251442
src/caffe/test/test_net.cpp:1282: Failure
Value of: shared_params.cpu_diff()[i]
Actual: -2.6480846
Expected: ip1_weights->cpu_diff()[i] + ip2_weights->cpu_diff()[i]
Which is: -2.6480865
src/caffe/test/test_net.cpp:1282: Failure
Value of: shared_params.cpu_diff()[i]
Actual: 14.29718
Expected: ip1_weights->cpu_diff()[i] + ip2_weights->cpu_diff()[i]
Which is: 14.297173
src/caffe/test/test_net.cpp:1282: Failure
Value of: shared_params.cpu_diff()[i]
Actual: 8.3312454
Expected: ip1_weights->cpu_diff()[i] + ip2_weights->cpu_diff()[i]
Which is: 8.331234
src/caffe/test/test_net.cpp:1282: Failure
Value of: shared_params.cpu_diff()[i]
Actual: 8.8566446
Expected: ip1_weights->cpu_diff()[i] + ip2_weights->cpu_diff()[i]
Which is: 8.8566399
src/caffe/test/test_net.cpp:1282: Failure
Value of: shared_params.cpu_diff()[i]
Actual: -1.216589
Expected: ip1_weights->cpu_diff()[i] + ip2_weights->cpu_diff()[i]
Which is: -1.2165909
src/caffe/test/test_net.cpp:1282: Failure
Value of: shared_params.cpu_diff()[i]
Actual: 7.9619598
Expected: ip1_weights->cpu_diff()[i] + ip2_weights->cpu_diff()[i]
Which is: 7.9619541
[ FAILED ] NetTest/0.TestSharedWeightsUpdate, where TypeParam = caffe::CPUDevice (2 ms)

```

Final output:
`[----------] Global test environment tear-down
[==========] 2066 tests from 274 test cases ran. (1821400 ms total)
[ PASSED ] 2065 tests.
[ FAILED ] 1 test, listed below:
[ FAILED ] NetTest/0.TestSharedWeightsUpdate, where TypeParam = caffe::CPUDevice

1 FAILED TEST
* Aborted at 1495549404 (unix time) try "date -d @1495549404" if you are using GNU date
PC: @ 0x7fd64b48c68f caffe::SyncedMemory::~SyncedMemory()
SIGSEGV (@0x0) received by PID 9535 (TID 0x7fd64da47ac0) from PID 0; stack trace:
@ 0x7fd64a9a0390 (unknown)
@ 0x7fd64b48c68f caffe::SyncedMemory::~SyncedMemory()
@ 0x48c2d2 boost::detail::sp_counted_impl_p<>::dispose()
@ 0x48050a boost::detail::sp_counted_base::release()
@ 0x482862 boost::detail::sp_counted_impl_p<>::dispose()
@ 0x7fd64b294361 boost::detail::sp_counted_impl_p<>::dispose()
@ 0x7fd64b2920a8 std::vector<>::~vector()
@ 0x7fd64a60036a __cxa_finalize
@ 0x7fd64b22fe63 (unknown)
Makefile:672: recipe for target 'runtest' failed
make: *
* [runtest] Segmentation fault
`

Your system configuration

Operating system: linux mint (similar to Ubuntu 16.04 with custom kernel 4.10)
Compiler: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4)
CUDA version (if applicable): no
CUDNN version (if applicable): no
BLAS: ATLAS
Python or MATLAB version (for pycaffe and matcaffe respectively):

I suppose it shouldn't change anything, but the CPU is a Ryzen 1700X and my compilation flags are:
CFLAGS=-O2 -mprefer-avx128 -mavx2 -mcx16 -mmovbe -mf16c -mpopcnt -mbmi -mbmi2 -mclflushopt -fomit-frame-pointer
CXXFLAGS= the same

Most helpful comment

This is ok, just the precision is slightly lower than expected. You can safely ignore this, especially if you plan to use GPUs anyways.

All 2 comments

This is ok, just the precision is slightly lower than expected. You can safely ignore this, especially if you plan to use GPUs anyways.

OK, thanks!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

serimp picture serimp  路  3Comments

dfotland picture dfotland  路  3Comments

sdemyanov picture sdemyanov  路  3Comments

malreddysid picture malreddysid  路  3Comments

lixin7895123 picture lixin7895123  路  3Comments