In the directory /src/caffe/layers/batch_norm_layer.cpp, I notice that caffe_powx() is replaced with caffe_sqrt(), and I go to /src/caffe/util/math_functions.cpp and find the definition of caffe_sqrt():
template <>
void caffe_sqrt<float>(const int n, const float* a, float* y) {
vsSqrt(n, a, y);
}
template <>
void caffe_sqrt<double>(const int n, const double* a, double* y) {
vdSqrt(n, a, y);
}
when I compile caffe, it reminds me that vsSqrt() and vdSqrt() are not defined, and I can't really find the definition of them, too.
I believe you are using older version of caffe and when you compile caffe, you use the default BLAS which is altas, while vsSqrt and vdSqrt are from mkl. Caffe provides an alternate way to handle this. In the folder /include/caffe/util, find mkl_alternate.hpp, add the following line "DEFINE_VSL_UNARY_FUNC(Sqrt, y[i] = sqrt(a[i]));" after the line "DEFINE_VSL_UNARY_FUNC(Sqr, y[i] = a[i] * a[i]);"
Thanks for your reply, by the way, what's the strength of replacing caffe_powx() with caffe_sqrt(), just for conciseness(one less parameter needed) or computation efficiency
You are welcome. Not really looked into why.
Have a look at the PR https://github.com/BVLC/caffe/pull/5136, for CPU code, mkl may be faster (I'm not sure).
I will close this as @du0002in's comment together with the comment in #5136 answer this:
The problem was that the square inside the variance-term was computed via the caffe_gpu_powx function. This is slow on the one hand, but also unstable, especially for negative numbers. Therefore it was replaced by a caffe_gpu_mul call.
Most helpful comment
I believe you are using older version of caffe and when you compile caffe, you use the default BLAS which is altas, while vsSqrt and vdSqrt are from mkl. Caffe provides an alternate way to handle this. In the folder /include/caffe/util, find mkl_alternate.hpp, add the following line "DEFINE_VSL_UNARY_FUNC(Sqrt, y[i] = sqrt(a[i]));" after the line "DEFINE_VSL_UNARY_FUNC(Sqr, y[i] = a[i] * a[i]);"