Alpaka: Add support for CUDA_SEPARABLE_COMPILATION

Created on 4 Jul 2018 · 6Comments · Source: alpaka-group/alpaka

I was trying to make a function call to 'MathFunctions' from Alpaka. Here is the structure of my workspace:
/src
- sqrt.cpp // where I modified from vectorAdd example to call ‘mysqrt’ function
- mysqrt.cpp
/MathFunction
- MathFunction.h // mysqrt is defined here

The program simply pass the value 4 and 9 to calculate sqrt and add them up.
The approach seems to work only for CPU, when I compile for GPU the 'cmake' process got through but the 'make' gave me this errors:

herbst66@hypnos4:~/HOME/alpaka/alpaka-develop/example/sqrt_GPU/build$ make
[ 33%] Building NVCC (Device) object CMakeFiles/sqrt.dir/src/sqrt_generated_sqrt.cpp.o
ptxas fatal : Unresolved extern function '_Z6mysqrtd'
CMake Error at sqrt_generated_sqrt.cpp.o.cmake:282 (message):
Error generating file
/home/herbst66/HOME/alpaka/alpaka-develop/example/sqrt_GPU/build/CMakeFiles/sqrt.dir/src/./sqrt_generated_sqrt.cpp.o
make[2]: * [CMakeFiles/sqrt.dir/src/sqrt_generated_sqrt.cpp.o] Error 1
make[1]:
[CMakeFiles/sqrt.dir/all] Error 2
make: ** [all] Error 2

Does anyone have any suggestion to fix it?
PS. I enclosed the workspace as a .zip file so you can have a look.

sqrt.zip

Enhancement

Source

jiradaherbst

Most helpful comment

Even though it took some time it is now implemented. It can be enabled when using nvcc with -DALPAKA_CUDA_NVCC_SEPARABLE_COMPILATION=ON.

BenjaminW3 on 11 Jun 2019

🚀1 ❤1

All 6 comments

I already had an email conversation with @jiradaherbst about the problem. This was my answer fyi:

I got it to work, but alpaka needs a fix for this to work in general. So the problem is, that the g++ linker cannot link device code. So we need to prelink the device code to an intermediate object file with nvcc. This can be activated in cmake with setting CUDA_SEPARABLE_COMPILATION to ON. However then while prelinking the intermediate object file, nvcc complains nvcc fatal : Unknown option '-generate-code arch', so some options given by alpaka should only be given to the compilation pass but not the linking pass. For now I made the making verbose with make VERBOSE=1 to see the failing command, for me it looked like this:

/usr/bin/nvcc --expt-extended-lambda --expt-relaxed-constexpr "--generate-code arch=compute_30,code=sm_30 --generate-code arch=compute_30,code=compute_30" -std=c++11 --use_fast_math --ftz=false -m64 -ccbin /usr/bin/g++-5 -dlink /tmp/sqrt_GPU/build/CMakeFiles/sqrt.dir/src/./sqrt_generated_mysqrt.cpp.o /tmp/sqrt_GPU/build/CMakeFiles/sqrt.dir/src/./sqrt_generated_sqrt.cpp.o -o /tmp/sqrt_GPU/build/CMakeFiles/sqrt.dir/./sqrt_intermediate_link.o

I just removed every --generate-code statements so that it looked like this:

/usr/bin/nvcc --expt-extended-lambda --expt-relaxed-constexpr -std=c++11 --use_fast_math --ftz=false -m64 -ccbin /usr/bin/g++-5 -dlink /tmp/sqrt_GPU/build/CMakeFiles/sqrt.dir/src/./sqrt_generated_mysqrt.cpp.o /tmp/sqrt_GPU/build/CMakeFiles/sqrt.dir/src/./sqrt_generated_sqrt.cpp.o -o /tmp/sqrt_GPU/build/CMakeFiles/sqrt.dir/./sqrt_intermediate_link.o

I ran this code by hand and then did make again. make found sqrt_intermediate_link.o (thinking it made it itself...) and continues compilation.

So there are two options for you now:

Fix alpaka or if not able to do so, describe the problem in an alpaka issue (feel free to use any parts of my mail for this).
Don't link device code.

Device linking by hand as I did is no option. ;)

theZiz on 4 Jul 2018

👍1

If someone could give me an example project I could try to make it work and add it as an integration test. I am not sure if I fully understand what you are trying to do.

BenjaminW3 on 25 Jul 2018

I attached the example project in 'sqrt.zip' with my post.

jiradaherbst on 26 Jul 2018

Thank you! I will have a look at it.

BenjaminW3 on 31 Jul 2018

Related link: http://lists.llvm.org/pipermail/llvm-dev/2017-August/116946.html
Only nvcc supports separable compilation, clang as native CUDA compiler does not

BenjaminW3 on 19 Apr 2019

👀1

Even though it took some time it is now implemented. It can be enabled when using nvcc with -DALPAKA_CUDA_NVCC_SEPARABLE_COMPILATION=ON.

BenjaminW3 on 11 Jun 2019

🚀1 ❤1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

readthedocs does not build new documentation

SimeonEhrig · 5Comments

Destructors should not throw

theZiz · 5Comments

clangFormat broke indetation of complex compile-time guards

jkelling · 3Comments

remove IDE specifics from .gitignore

psychocoderHPC · 5Comments

Make sure that the kernel function returns void with `ALPAKA_ACC_GPU_CUDA_ONLY_MODE`

BenjaminW3 · 5Comments