So it turns out that having KOKKOS_INLINE_FUNCTION_DEFAULTED=inline for GCC+NVCC isn't right. Even though the NVCC compiler spits out tons of warnings like this:
/home/daibane/src/Trilinos/kokkos/core/src/Kokkos_MemoryPool.hpp(273): warning: __device__ annotation on a defaulted function("MemoryPool") is ignored
/home/daibane/src/Trilinos/kokkos/core/src/Kokkos_MemoryPool.hpp(273): warning: __host__ annotation on a defaulted function("MemoryPool") is ignored
It turns out that removing __host__ __device__ gives incorrect behavior. In particular, NVCC will choose to call the __host__ version of a defaulted copy constructor even if the call is inside device code. It emits no warnings about the declaration but does spit out this warning at the call site:
/home/daibane/src/Trilinos/kokkos/core/unit_test/TestTaskScheduler.hpp(91): warning: calling a __host__ function from a __host__ __device__ function is not allowed
None of our tests were actually testing this case, but Tacho was (by having a MemoryPool be a member of a task).
By adding a test that did have a MemoryPool as a member of a task, I replicated the failure (the test segfaults). MemoryPool's default copy constructor uses KOKKOS_INLINE_FUNCTION_DEFAULTED. If one changes that to KOKKOS_INLINE_FUNCTION, many warnings of the first kind ensue, but the code runs without segfaulting.
I consider this a very frustrating bug on the part of NVIDIA, and can see no better option than to simply stop using defaulted methods altogether.
I'm putting together a PR with the reproducer test as one commit and another commit that removes defaulted functions.
@kyungjoo-kim @crtrott @ndellingwood
Filed NVIDIA bug 2083242 for this contradictory behavior.
Based on discussion in #1473 , we've found that this warning can be suppressed:
-Xcudafe --diag_suppress=esa_on_defaulted_function_ignored
Most helpful comment
1471 is the PR, and we should consider patching Trilinos as well (currently Tacho just segfaults).