Alpaka: Strategy update for nvc++

Created on 18 May 2020 · 1Comment · Source: alpaka-group/alpaka

NVIDIA just announced their HPC-SDK: https://news.developer.nvidia.com/HPC-SDK/ and https://developer.nvidia.com/hpc-sdk

What strikes me the most is:
C++17 parallel algorithms enable portable parallel programming using the Standard Template Library (STL). The NVIDIA HPC SDK C++ compiler supports full C++17 on CPUs and offloading of parallel algorithms to NVIDIA GPUs, enabling GPU programming with no directives, pragmas, or annotations. Programs that use C++17 parallel algorithms are readily portable to most C++ implementations for Linux, Windows, and macOS.

This essentially means that we could through away a lot of macros that handle __device__, __host__ etc. Furthermore, large parts of the C++ standard library would just work, like std::array, std::atomic and maybe even std::vector etc.

Alpaka should at least support this new compiler (which seems to be called nvc++). In the longer run, we should monitor if nvcc will be deprecated by NVIDIA and if AMD will mainstream something similar. clang seems to be able to compile standard C++ to HSA as well.

Enhancement

Source

bernhardmgruber

👍4

Most helpful comment

Just to add first-hand Q&As I publicly asked the corresponding Nvidia folks:
As of today, NVC++ is a standard compliant C++17 compiler with a GPU-enabled parallel STL. It also seems to support OpenACC.
It is not (yet?) a CUDA C++ compiler so you have a similar viewpoint onto the device as you have when coding against thrust - just put in ISO C++ containers and algorithms instead. In fact, the Thrust maintainers develop these ISO C++17 algorithm implementations for the Nvidia platform and will replace thrust backends with them over time.

Furthermore, NVC++ is a unified host-device compiler. If they should add CUDA C++ support, this would become really neat to implement highly tuned kernels with less pain. Other use cases would be mixing Alpaka kernels and C++17 device algorithms.

Other things that you want to use in kernels are managed in Nvidia's libcu++, which becomes essentially an ISO std library for GPU and a library with std-conforming C++ extensions for CUDA. https://gist.github.com/ax3l/9489132#device-side-c-standard-support