Alpaka: random number generators not identical across accelerators

Created on 30 Aug 2017  路  7Comments  路  Source: alpaka-group/alpaka

Currently the generation method for random numbers for a accelerator is fixed defined within alpaka.
To provide different generators per accelerator depending of the users needs we should think about an interface change.

Why we need different generators:

  • each generator provide different qualities of random numbers
  • the size of the stored random number state is different and influence the memory usage and performance

e.g. PIConGPU provides different methods up to my pull request to use the native alpaka generator which removes the possibility for the user to control the quality of the RNG generator.

Wontfix Enhancement

Most helpful comment

Thanks for the summary. Point 3 is one reason why I opened this pull request. I think it is not possible or to hard to maintain that we have all Generators on all platforms.
I will use this issue also for thinking about solution how we can handle that each platform maybe ships different algorithm and never the less give the user the opportunity to write code without #if to support the differences between the platforms.

One idea is to create something like a factory where the user can set properties like quality, performance and memory usage and gets back the type of the best fitting generator. If a platform has only implemented one algorithm than there will be always the same generator returned.

All 7 comments

In alpaka the generators are already seperated from the distribution.
So in theory it would be possible to use different generators. However, there is some work to do:


    1. the CUDA backend only provides the Xor generator. XorMin, MRG32k3a and MRG32k3aMin could easily be added to RandCuRand.hpp. Please create corresponding Pull Requests.


    1. the CPU backends only provide a mersenne twister generator. ~(Not even as a standalone generator but only direclty within alpaka::rand::generator::createDefault) This generator should be made its own class.~ There are some other generators provided by the C++ standarad library which could be added to RandStl.hpp.


    1. I have no Idea how to implement Xor, XorMin, ... generators similar to the CUDA ones for the CPU backends.


    1. due to those differences between the available generators for CUDA and CPU, there is no way to use the same generator on all backends. alpaka::rand::generator::createDefault simply uses an unspecified generator.

  • ~5. there are no unit tests for the random functions~

Points 1, 2 and 5 can be solved, but point 4 depends on point 3 which may be very hard.

Edit: point 5 has been solved.
Edit: parts of point 2 have been solved.

Thanks for the summary. Point 3 is one reason why I opened this pull request. I think it is not possible or to hard to maintain that we have all Generators on all platforms.
I will use this issue also for thinking about solution how we can handle that each platform maybe ships different algorithm and never the less give the user the opportunity to write code without #if to support the differences between the platforms.

One idea is to create something like a factory where the user can set properties like quality, performance and memory usage and gets back the type of the best fitting generator. If a platform has only implemented one algorithm than there will be always the same generator returned.

Such a factory might be the only viable option. It might be hard to find the correct properties to describe the generators.
I will work on point 5 and write some unit tests for the existing generators/distributions because I am already adding some stream and event unit tests at the moment.

Admittedly, a typical PIConGPU 0.4.0-dev simulation on Tesla P100 currently uses (wastes) 18-25% of its main memory (3 our of 12/16 GByte) just to the RNG state. Can we do anything to allow backend-specific RNGs like the one we had before https://github.com/ComputationalRadiationPhysics/picongpu/pull/2226 again (~50% mem footprint)? It would be totally fine if that RNG is only usable on a specific backend (e.g. via a less-specific wrapper/factory as above) and an other implementation (and API) is used on other backends.

cross-linking https://github.com/ComputationalRadiationPhysics/picongpu/pull/2410 as @psychocoderHPC work-arounds back the XorMin (6xint32 state per thread; 50% footprint of current RNG) back into PIConGPU in parallel

Proposed implementation:

enum class
Generator {
    Default,
    MersenneTwister
    // , ...
};

// ...
auto genMersenneTwister = alpaka::rand::generator::create<
    alpaka::rand::Generator::MersenneTwister
>(
            acc,
            12345u,
            6789u
);

We discussed this in today's meeting. @sliwowitz is currently working on a separate RNG library on top of alpaka that will adress this issue. This is therefore WONTFIX and will be closed once the new RNG library is public.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ax3l picture ax3l  路  4Comments

ax3l picture ax3l  路  5Comments

shefmarkh picture shefmarkh  路  4Comments

jkelling picture jkelling  路  3Comments

tdd11235813 picture tdd11235813  路  5Comments