See this blog post: https://devblogs.nvidia.com/introducing-low-level-gpu-virtual-memory-management/
The disadvantage is that this function require the driver API. Currently only the tests in alpaka depends the driver API.
If we implement something like that we need to take care that we are able to build on a system without a CUDA driver. The CUDA driver ships cuda.so.
On a system without CUDA driver we need to build against lib/stubs/libcuda.so. At all it is no big deal but we need to check if our CMake can handle it correctly.
CUDA 11.2 introduces cudaMallocAsync and cudaFreeAsync with stream semantic:
Most helpful comment
CUDA 11.2 introduces
cudaMallocAsyncandcudaFreeAsyncwith stream semantic: