Kokkos: Add "DefaultExecutionSpace"-exclusive variants of `KOKKOS_LAMBDA` and `KOKKOS_FUNCTION`

Created on 26 Aug 2019  路  4Comments  路  Source: kokkos/kokkos

For the bulk of application code, kernels will always execute in the DefaultExecutionSpace. In order to address some of the slow compilation times caused by Kokkos having to compile every single file in the application with both the host and device compiler, it would be nice if the developer could specify something like KOKKOS_LAMBDA_DEFAULT so that the target lambda was only compiled for the default execution space. This would potentially save on code size and compilation time.

For Cuda-as-default, this would mean "default" functions would only receive __device__ and not __host__.

Some work would have to be done to determine if this would actually result in compile-time savings - I'm not fully aware of how much of Cuda compilation time is just from parsing the code vs. actually generating code.

Feature Request awaiting feedback

All 4 comments

We need to discuss this in a larger group. There are some potential downsides down the road for allowing this.

In order to address some of the slow compilation times caused by Kokkos having to compile every single file in the application with both the host and device compiler,

This sounds like an assertion that needs some experimental backing. Doesn't most of the cost of CUDA compilation come from nvcc needing to make more passes over the file than a regular host-only compiler would need to do? If so, then it would not matter much whether you build the lambdas for host as well as device.

Moreover, there's nothing stopping the user from doing this themselves (it's like three lines of preprocessor code), but doing so within the context of the Kokkos programming model implies that there's a clean way to reason about it in that context and to differentiate it from KOKKOS_LAMBDA. That seems like unnecessary burden for a small thing unless there's substantial experimental evidence that this speeds up compile times by a lot.

We discussed this at the team meeting, and we agree that we need more data on this before we proceed. If this is like 0.1% of compile time, then the extra cognitive load isn't worth it, but if it's like 15%, then we should probably provide it, at least as an advanced feature.

Was this page helpful?
0 / 5 - 0 ratings