I was trying to make a function call to 'MathFunctions' from Alpaka. Here is the structure of my workspace:
/src
- sqrt.cpp // where I modified from vectorAdd example to call ‘mysqrt’ function
- mysqrt.cpp
/MathFunction
- MathFunction.h // mysqrt is defined here
The program simply pass the value 4 and 9 to calculate sqrt and add them up.
The approach seems to work only for CPU, when I compile for GPU the 'cmake' process got through but the 'make' gave me this errors:
herbst66@hypnos4:~/HOME/alpaka/alpaka-develop/example/sqrt_GPU/build$ make
[ 33%] Building NVCC (Device) object CMakeFiles/sqrt.dir/src/sqrt_generated_sqrt.cpp.o
ptxas fatal : Unresolved extern function '_Z6mysqrtd'
CMake Error at sqrt_generated_sqrt.cpp.o.cmake:282 (message):
Error generating file
/home/herbst66/HOME/alpaka/alpaka-develop/example/sqrt_GPU/build/CMakeFiles/sqrt.dir/src/./sqrt_generated_sqrt.cpp.o
make[2]: * [CMakeFiles/sqrt.dir/src/sqrt_generated_sqrt.cpp.o] Error 1
make[1]:
[CMakeFiles/sqrt.dir/all] Error 2
make: ** [all] Error 2
Does anyone have any suggestion to fix it?
PS. I enclosed the workspace as a .zip file so you can have a look.
I already had an email conversation with @jiradaherbst about the problem. This was my answer fyi:
I got it to work, but alpaka needs a fix for this to work in general. So the problem is, that the g++ linker cannot link device code. So we need to prelink the device code to an intermediate object file with nvcc. This can be activated in cmake with setting CUDA_SEPARABLE_COMPILATION to ON. However then while prelinking the intermediate object file, nvcc complains nvcc fatal : Unknown option '-generate-code arch', so some options given by alpaka should only be given to the compilation pass but not the linking pass. For now I made the making verbose with make VERBOSE=1 to see the failing command, for me it looked like this:
/usr/bin/nvcc --expt-extended-lambda --expt-relaxed-constexpr "--generate-code arch=compute_30,code=sm_30 --generate-code arch=compute_30,code=compute_30" -std=c++11 --use_fast_math --ftz=false -m64 -ccbin /usr/bin/g++-5 -dlink /tmp/sqrt_GPU/build/CMakeFiles/sqrt.dir/src/./sqrt_generated_mysqrt.cpp.o /tmp/sqrt_GPU/build/CMakeFiles/sqrt.dir/src/./sqrt_generated_sqrt.cpp.o -o /tmp/sqrt_GPU/build/CMakeFiles/sqrt.dir/./sqrt_intermediate_link.o
I just removed every --generate-code statements so that it looked like this:
/usr/bin/nvcc --expt-extended-lambda --expt-relaxed-constexpr -std=c++11 --use_fast_math --ftz=false -m64 -ccbin /usr/bin/g++-5 -dlink /tmp/sqrt_GPU/build/CMakeFiles/sqrt.dir/src/./sqrt_generated_mysqrt.cpp.o /tmp/sqrt_GPU/build/CMakeFiles/sqrt.dir/src/./sqrt_generated_sqrt.cpp.o -o /tmp/sqrt_GPU/build/CMakeFiles/sqrt.dir/./sqrt_intermediate_link.o
I ran this code by hand and then did make again. make found sqrt_intermediate_link.o (thinking it made it itself...) and continues compilation.
So there are two options for you now:
Device linking by hand as I did is no option. ;)
If someone could give me an example project I could try to make it work and add it as an integration test. I am not sure if I fully understand what you are trying to do.
I attached the example project in 'sqrt.zip' with my post.
Thank you! I will have a look at it.
Related link: http://lists.llvm.org/pipermail/llvm-dev/2017-August/116946.html
Only nvcc supports separable compilation, clang as native CUDA compiler does not
Even though it took some time it is now implemented. It can be enabled when using nvcc with -DALPAKA_CUDA_NVCC_SEPARABLE_COMPILATION=ON.
Most helpful comment
Even though it took some time it is now implemented. It can be enabled when using nvcc with
-DALPAKA_CUDA_NVCC_SEPARABLE_COMPILATION=ON.