I want to test alpaka with CUDA 9, especially CUDA 9.2. Regarding CUDA 9.0 and 9.1 see issue #426 .
At the moment I use the modules on ZIH/Taurus cluster. Trying to compile examples/bufferCopy from current develop. At the moment on Taurus CUDA 8.0.61 is only working (with GCC 5.3.0 and Boost 1.64.0). CUDA 9.x requires at least Boost 1.65.1. I still need to build & try Boost with gcc 5.4 as it is missing on Taurus.
|CUDA 9.2.88|Boost >=1.65.1|
|---|---|
|gcc 4.9.3|:x:|
|gcc 5.3|:x:|
|gcc 5.4|:question:|
|gcc 5.5|:x:|
|gcc 6.x|:x:|
|gcc 7.x|:x:|
Currently Loaded Modules:
1) modenv/classic (S) 2) cmake/3.10.1 3) git/2.15.1 4) gcc/7.1.0 5) bullxmpi/1.2.8.4 6) boost/1.65.1-gnu7.1 7) cuda/9.2.88
```bash
cmake .. && make
Fails with:
```bash
[ 50%] Building NVCC (Device) object CMakeFiles/bufferCopy.dir/src/bufferCopy_generated_bufferCopy.cpp.o
/sw/taurus/libraries/cuda/9.2.88/bin/nvcc /alpaka/example/bufferCopy/src//bufferCopy.cpp -x=cu -c -o /alpaka/example/bufferCopy/build_cuda/CMakeFiles/bufferCopy.dir/src/./bufferCopy_generated_bufferCopy.cpp.o -ccbin /sw/global/compilers/gcc/7.1.0/bin/g++ -m64 -DALPAKA_ACC_CPU_B_SEQ_T_SEQ_ENABLED -DALPAKA_ACC_CPU_B_SEQ_T_THREADS_ENABLED -DALPAKA_ACC_CPU_B_SEQ_T_FIBERS_ENABLED -DALPAKA_ACC_CPU_B_OMP2_T_SEQ_ENABLED -DALPAKA_ACC_CPU_B_SEQ_T_OMP2_ENABLED -DALPAKA_ACC_GPU_CUDA_ENABLED -DALPAKA_DEBUG=0 -Xcompiler ,\"-fopenmp\",\"-g\" --expt-extended-lambda --expt-relaxed-constexpr --generate-code arch=compute_30,code=sm_30 --generate-code arch=compute_30,code=compute_30 -std=c++11 --use_fast_math --ftz=false -DNVCC -I/sw/taurus/libraries/cuda/9.2.88/include -I/sw/taurus/libraries/boost/1.65.1-gnu7.1/include -I/alpaka/include
/alpaka/include/alpaka/core/Assert.hpp(102): error: this pragma must immediately precede a declaration
Same error occurs for GCC 5.5 + Boost 1.65.1 + CUDA 9.2.88 and GCC 7.1 + Boost 1.66.0 + CUDA 9.2.88.
This is setup working, but it is only for CUDA 8.
1) modenv/classic (S) 2) gcc/5.3.0 3) bullxmpi/1.2.8.4 4) boost/1.64.0-gnu5.3 5) git/2.15.1 6) cmake/3.10.1 7) cuda/8.0.61
Edit: gcc 5.3 and 4.9.3 added to table.
Also, as a first try, can you just switch the order of ALPAKA_NO_HOST_ACC_WARNING and ALPAKA_FN_HOST_ACC in the erroring lines?
Is this a CUDA 9.2 or a NVCC 9.2 issue?
(Can you try if clang -x cuda as a device compiler with CUDA 9.2 is affected as well?)
@BenjaminW3 let us write in the matrix in README.md not CUDA 8.0+ but instead the range we currently test. This us useful when looking at already released versions if they list CUDA 8.0 - 9.1 then one knows at least what was tested.
That does not mean we should assume things break on new releases, but we can document what we know to work.
At least nvcc 9.2 using clang 5.0.1 fails with the following errors:
/home/travis/llvm/lib/clang/5.0.1/include/smmintrin.h(674): error: argument of type "const __v2di *" is incompatible with parameter of type "__attribute((vector_size(16))) long long *"
/home/travis/llvm/lib/clang/5.0.1/include/avx2intrin.h(836): error: argument of type "const __v4di_aligned *" is incompatible with parameter of type "__attribute((vector_size(32))) long long *"
/home/travis/llvm/lib/clang/5.0.1/include/avx512fintrin.h(9041): error: argument of type "const __v8di_aligned *" is incompatible with parameter of type "__attribute((vector_size(64))) long long *"
I am currently doing a CI run testing nvcc 9.2 using clang 4.0.0 but I assume similar results.
mistakenly tested vectorAdd, but should not be different to bufferCopy
/sw/taurus/libraries/cuda/9.2.88/bin/nvcc /alpaka/example/vectorAdd/src//main.cpp -x=cu -c -o /alpaka/example/vectorAdd/build_cuda/CMakeFiles/vectorAdd.dir/src/./vectorAdd_generated_main.cpp.o -ccbin /sw/global/compilers/llvm/4.0.0/bin/clang++ -m64 -DALPAKA_ACC_CPU_B_SEQ_T_SEQ_ENABLED -DALPAKA_ACC_CPU_B_SEQ_T_THREADS_ENABLED -DALPAKA_ACC_CPU_B_OMP2_T_SEQ_ENABLED -DALPAKA_ACC_CPU_B_SEQ_T_OMP2_ENABLED -DALPAKA_ACC_GPU_CUDA_ENABLED -DALPAKA_DEBUG=0 -Xcompiler ,\"-fopenmp=libiomp5\",\"-g\" --expt-extended-lambda --expt-relaxed-constexpr --generate-code arch=compute_30,code=sm_30 --generate-code arch=compute_30,code=compute_30 -std=c++11 --use_fast_math --ftz=false -DNVCC -I/sw/taurus/libraries/cuda/9.2.88/include -I/alpaka/include -I/software/boost_1.65.1/taurus/include
/alpaka/include/alpaka/core/Assert.hpp(103): error: this pragma must immediately precede a declaration
.
--cuda-path is correctly set, maybe clang needs something?[ 50%] Building CXX object CMakeFiles/bufferCopy.dir/src/bufferCopy.cpp.o
/sw/taurus/eb/Clang/5.0.0-GCC-6.4.0-2.28/bin/clang++ -DALPAKA_ACC_CPU_B_OMP2_T_SEQ_ENABLED -DALPAKA_ACC_CPU_B_SEQ_T_OMP2_ENABLED -DALPAKA_ACC_CPU_B_SEQ_T_SEQ_ENABLED -DALPAKA_ACC_CPU_B_SEQ_T_THREADS_ENABLED -DALPAKA_ACC_GPU_CUDA_ENABLED -DALPAKA_DEBUG=0 -isystem /sw/taurus/libraries/cuda/9.2.88/include -I/alpaka/include -isystem /software/boost_1.65.1/taurus/include -fopenmp=libomp -Werror -Weverything -Wno-c++98-compat -Wno-c++98-compat-pedantic -Wno-disabled-macro-expansion -Wno-global-constructors -Wno-padded --system-header-prefix=boost/ -fopenmp=libomp --cuda-path=/sw/taurus/libraries/cuda/9.2.88 --cuda-gpu-arch=sm_30 -Qunused-arguments -Wno-unused-local-typedef -ffast-math -ffp-contract=fast -std=c++11 -ftemplate-depth=512 -x cuda -o CMakeFiles/bufferCopy.dir/src/bufferCopy.cpp.o -c /alpaka/example/bufferCopy/src/bufferCopy.cpp
clang-5.0: error: cannot find libdevice for sm_30. Provide path to different CUDA installation via --cuda-path, or pass -nocudalib to build without linking with libdevice.
CUDA 9.2 is now supported with both gcc and clang. I will investigate clang 5.0 in a separate ticket.
Most helpful comment
CUDA 9.2 is now supported with both gcc and clang. I will investigate clang 5.0 in a separate ticket.