Alpaka: Allowed Types in CUDA atomic add

Created on 8 Oct 2018  路  7Comments  路  Source: alpaka-group/alpaka

I get an compile error when using the CUDA atomic add function. If the CPU serial accelerator is used instead of the CUDA accelerator, it compiles and runs flawlessly.

The code used to create this error can be found here:
atomicAddTest.cpp.txt

The accompanying log can be found here:
log.txt

Edit:

Loaded Modules:

  • cmake 3.11.3
  • git 2.19.0
  • gcc 6.4.0
  • cuda 9.2
  • boost 1.68.0

Test System:

  • 2x Intel(R) Xeon(R) Gold 6136 CPU @ 3.00GHz (12 Cores/24 Threads)
  • 4x NVIDIA Tesla P100
  • CentOS 7 (with Linux kernel 3.10)
CUDA Bug Refactoring

All 7 comments

I have not had a look at your code but my first thought is that you might have used a type that is not supported with CUDA atomic ops. CPU atomic ops are supported for many (if not all) types. Which type do you use?

My code uses atomic add with uint64_t values.

Ah, that's an odd problem deeply rooted in the C (!) type system.

CUDA provides atomic (add) functions for (unsigned) int and long long (int), which we implement.
C defines the following fundamental integer types (signed for sake of simplicity):

  • short
  • int
  • long
  • long long

these are the fundamental types, all others such as uint64_t are aliases of those.

Now on most x86 systems, such as yours, long and long long implement the same precision. Yet, they are not identical types nor are they aliases of each other.

Now things get weird, because:

  • uint64_t is an alias of unsigned long int on your system
  • Nvidia implements only unsigned long long int which we overload
  • yes, the unsigned long int is the same implementation as a unsigned long long int

In order to make this more comfortable for you, we should overload all fundamental types in our atomic functions and reinterpret_cast when allowed. I did something very similar recently in https://github.com/openPMD/openPMD-api/pull/337

Short-term work-around for users: use unsigned long long instead of uint64_t.
(Yes, that's sad but at least in actual data models of implementations always a 64bit uint).

Mind-blown? At least I was. Play around here: https://cuda.godbolt.org/z/DQyfge

#include <type_traits>
#include <cstdint>

// fundamental types are distinct
static_assert(std::is_same<unsigned long long int, unsigned long int>::value); // Nope

// yet might implement the same precision (and arithmetics)
static_assert(sizeof(unsigned long long int) == sizeof(unsigned long int)); // OK

// all other types we love are just aliases of these fundamental types
static_assert(std::is_same<unsigned long int, uint64_t>::value); // OK
static_assert(std::is_same<unsigned long long int, uint64_t>::value); // Nope

We may not need to use overly specific types that depend on the platform. We should be able to use SFINAE and use std::enable_if<std::is_integral<T>::value && std::is_unsigned<T>::value && sizeof(T) == 8>::value and a reinterpret_cast.

We had ticket #614 that asked for a better compiler error for such unsupported atomic type compilation errors. Within the ticket I only impemented error messages only for the cases where the operation is not supported due to too old sm. I think this user issue is a good reason to add a better error message for the cases where CUDA does not support the type.

I would not really say that this is a bug, it is more like a feature request. Furthermore, I do not think that we should backport every such enhancement to 0.3.4. It is already hard to create those backports. We should create a 0.4.0 release directly after releasing 0.3.4.

With #676 the error should not look so cryptic.

I agree with the approach for the enable_if.

We may not need to use overly specific types that depend on the platform.

Just to be precise, there are no platform specific types, only platform specific type aliases. If we allow in our API all fundamental int/float types and static_assert the ones not matching the "concept" check we are good to go, as you described.

I do not think that we should backport every such enhancement to 0.3.4

I know, but that's a bug so it needs to be at least documented for this version. We don't need it in the latest PIConGPU stable right now, so we don't backport it.

What I did: I just added it to the project "0.3.4" and moved it directly to column "skipped". We are using the project tab as a "known issues" list for such reports, in case one would hit/need it later on. It just documents our workflow and downstream issue triage.

Thanks for clearing that up! The code works now with unsigned long long.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

tdd11235813 picture tdd11235813  路  5Comments

ax3l picture ax3l  路  5Comments

SimeonEhrig picture SimeonEhrig  路  5Comments

BenjaminW3 picture BenjaminW3  路  3Comments

shefmarkh picture shefmarkh  路  4Comments