Is it beneficial to copy the functor given to alpaka into const memory before calling it in the __global__ function?
This stackoverflow answer says that it might have been useful for Compute Capability 1.x but is automatically done in newer versions.
The 9.1 CUDA Programming Guide says that this is true.
However, KOKKOS is doing it manually.
As I know it is a good workaround to avoid the 256byte limit for kernel paramaters.
I also thought if we should do it but I have currently no time to evaluate the dis/adantages.
As written above, it is already copied into constant device memory for compute capability 2.0 and greater.
Therefore I will close this ticket.
Most helpful comment
As written above, it is already copied into constant device memory for compute capability 2.0 and greater.
Therefore I will close this ticket.