I get an compile error when using the CUDA atomic add function. If the CPU serial accelerator is used instead of the CUDA accelerator, it compiles and runs flawlessly.
The code used to create this error can be found here:
atomicAddTest.cpp.txt
The accompanying log can be found here:
log.txt
Edit:
Loaded Modules:
Test System:
I have not had a look at your code but my first thought is that you might have used a type that is not supported with CUDA atomic ops. CPU atomic ops are supported for many (if not all) types. Which type do you use?
My code uses atomic add with uint64_t values.
Ah, that's an odd problem deeply rooted in the C (!) type system.
CUDA provides atomic (add) functions for (unsigned) int and long long (int), which we implement.
C defines the following fundamental integer types (signed for sake of simplicity):
shortintlonglong longthese are the fundamental types, all others such as uint64_t are aliases of those.
Now on most x86 systems, such as yours, long and long long implement the same precision. Yet, they are not identical types nor are they aliases of each other.
Now things get weird, because:
uint64_t is an alias of unsigned long int on your systemunsigned long long int which we overloadunsigned long int is the same implementation as a unsigned long long intIn order to make this more comfortable for you, we should overload all fundamental types in our atomic functions and reinterpret_cast when allowed. I did something very similar recently in https://github.com/openPMD/openPMD-api/pull/337
Short-term work-around for users: use unsigned long long instead of uint64_t.
(Yes, that's sad but at least in actual data models of implementations always a 64bit uint).
Mind-blown? At least I was. Play around here: https://cuda.godbolt.org/z/DQyfge
#include <type_traits>
#include <cstdint>
// fundamental types are distinct
static_assert(std::is_same<unsigned long long int, unsigned long int>::value); // Nope
// yet might implement the same precision (and arithmetics)
static_assert(sizeof(unsigned long long int) == sizeof(unsigned long int)); // OK
// all other types we love are just aliases of these fundamental types
static_assert(std::is_same<unsigned long int, uint64_t>::value); // OK
static_assert(std::is_same<unsigned long long int, uint64_t>::value); // Nope
We may not need to use overly specific types that depend on the platform. We should be able to use SFINAE and use std::enable_if<std::is_integral<T>::value && std::is_unsigned<T>::value && sizeof(T) == 8>::value and a reinterpret_cast.
We had ticket #614 that asked for a better compiler error for such unsupported atomic type compilation errors. Within the ticket I only impemented error messages only for the cases where the operation is not supported due to too old sm. I think this user issue is a good reason to add a better error message for the cases where CUDA does not support the type.
I would not really say that this is a bug, it is more like a feature request. Furthermore, I do not think that we should backport every such enhancement to 0.3.4. It is already hard to create those backports. We should create a 0.4.0 release directly after releasing 0.3.4.
With #676 the error should not look so cryptic.
I agree with the approach for the enable_if.
We may not need to use overly specific types that depend on the platform.
Just to be precise, there are no platform specific types, only platform specific type aliases. If we allow in our API all fundamental int/float types and static_assert the ones not matching the "concept" check we are good to go, as you described.
I do not think that we should backport every such enhancement to 0.3.4
I know, but that's a bug so it needs to be at least documented for this version. We don't need it in the latest PIConGPU stable right now, so we don't backport it.
What I did: I just added it to the project "0.3.4" and moved it directly to column "skipped". We are using the project tab as a "known issues" list for such reports, in case one would hit/need it later on. It just documents our workflow and downstream issue triage.
Thanks for clearing that up! The code works now with unsigned long long.