Hello!
I have been using dlib for a while and now I would like to see how to improve my programs.
One way, is to be able to make full use of AVX or SSE4. After using either -DUSE_AVX_INSTRUCTIONS=ON or -DUSE_SSE4_INSTRUCTIONS=ON, I started wondering how I could really make sure that was happening (it could be that I messed up something somewhere).
So, I tried adding these lines on my code:
#ifdef __AVX__
cout << "AVX on" << endl;
#ifdef DLIB_HAVE_SSE2
cout << "DLIB_HAVE_SSE2 on" << endl;
#endif
#ifdef DLIB_HAVE_SSE3
cout << "DLIB_HAVE_SSE3 on" << endl;
#endif
#ifdef DLIB_HAVE_SSE41
cout << "DLIB_HAVE_SSE41 on" << endl;
#endif
#ifdef DLIB_HAVE_AVX
cout << "DLIB_HAVE_AVX on" << endl;
#endif
#endif
Result: None of them worked.
Then I went ahead and checked the filesize for both AVX and SSE4:

And absolutely no difference on that.
So, I went ahead and looked at the executables themselves:

And there I did see a minor change, which I can possibly connect to a difference in a flag.
Here is my cmake for the example I am trying to use:
cmake_minimum_required(VERSION 2.8.12)
add_subdirectory(../dlib dlib_build)
macro(add_example name)
add_executable(${name} ${name}.cpp)
target_link_libraries(${name} dlib::dlib )
endmacro()
if (DLIB_NO_GUI_SUPPORT)
message("No GUI support, so we won't build the ${name} example.")
else()
add_example(${name})
endif()
endmacro()
if (DLIB_NO_GUI_SUPPORT)
message("No GUI support, so we won't build the webcam_face_pose_ex example.")
else()
find_package(OpenCV QUIET)
if (OpenCV_FOUND)
include_directories(${OpenCV_INCLUDE_DIRS})
add_executable(webcam_face_pose_ex webcam_face_pose_ex.cpp)
target_link_libraries(webcam_face_pose_ex dlib::dlib ${OpenCV_LIBS} )
else()
message("OpenCV not found, so we won't build the webcam_face_pose_ex example.")
endif()
endif()
if (USE_AVX_INSTRUCTIONS)
message("Using AVX")
else()
message("Not Using AVX")
endif()
This does tell me whether I set the flag or not.
So, I was wondering, is that the only visible difference between the different types?
For more information:
Also: if I just enable AVX, SSE4 and SS2, will the application just pick the most suitable one? Or should I keep building different exe's for each type?
Thanks and great job with this project :)
If you enable all of them then only the newest will be used, which is AVX since SSE4 is part of AVX and SSE2 is part of SSE4.
Anyway, if you want to see what's happening look at the compiler switches that are set in visual studio's project properties. Or open cmake-gui and see if the AVX check boxes are checked. If they are checked then it's being used. Or go look at the code in dlib/simd and stuff something in there and see what happens.
Also, yes, if DLIB_HAVE_AVX isn't defined then it's not being used. Look at cmake-gui.
If you enable all of them then only the newest will be used, which is AVX since SSE4 is part of AVX and SSE2 is part of SSE4.
Does that mean that it will always use AVX or will it dynamically check if the computer can run it? My goal is to have the application running at their best on newer and older machines. So, if the app dynamically switches between AVX and SSE4/2 then I just need to build 1 with all of them selected. But if it doesn't then I still need to build different versions. Which one is it then? 😄
Anyway, if you want to see what's happening look at the compiler switches that are set in visual studio's project properties. Or open cmake-gui and see if the AVX check boxes are checked. If they are checked then it's being used. Or go look at the code in dlib/simd and stuff something in there and see what happens.
The AVX checkboxes are checked by me, so it doesn't matter much. Here is my cmake-gui:

Also, yes, if DLIB_HAVE_AVX isn't defined then it's not being used. Look at cmake-gui.
Then this means that, even though I checked USE_AVX_INSTRUCTIONS, it is not using them... How can I fix that?
Actually, I just noticed that changing the code for checking to this:
#ifdef __AVX__
cout << "AVX on" << endl;
#endif
#ifdef DLIB_HAVE_SSE2
cout << "DLIB_HAVE_SSE2 on" << endl;
#endif
#ifdef DLIB_HAVE_SSE3
cout << "DLIB_HAVE_SSE3 on" << endl;
#endif
#ifdef DLIB_HAVE_SSE41
cout << "DLIB_HAVE_SSE41 on" << endl;
#endif
#ifdef DLIB_HAVE_AVX
cout << "DLIB_HAVE_AVX on" << endl;
#endif
it correctly shows what is using. So, yeah 😅
There is no dynamic checking. It's going to use whatever instructions you compile it with. You need to build executables for each target machine architecture.
Ok! Thanks 😄
When I run the HOG face detector in a loop on Windows with SSE4, I see 15% CPU utilization spread out over 8 (4×2) cores? Does that make sense?
Yes, it's single threaded.
Actually, I just noticed that changing the code for checking to this:
#ifdef __AVX__ cout << "AVX on" << endl; #endif #ifdef DLIB_HAVE_SSE2 cout << "DLIB_HAVE_SSE2 on" << endl; #endif #ifdef DLIB_HAVE_SSE3 cout << "DLIB_HAVE_SSE3 on" << endl; #endif #ifdef DLIB_HAVE_SSE41 cout << "DLIB_HAVE_SSE41 on" << endl; #endif #ifdef DLIB_HAVE_AVX cout << "DLIB_HAVE_AVX on" << endl; #endifit correctly shows what is using. So, yeah sweat_smile
this dont work for me, i got the error
error: ‘cout’ does not name a type; did you mean ‘cosl’
but i previously define the namespace std and the iostream, any idea than what happening?
Most helpful comment
Actually, I just noticed that changing the code for checking to this:
it correctly shows what is using. So, yeah 😅