Alpaka: number of threads per block

Created on 23 May 2018  路  7Comments  路  Source: alpaka-group/alpaka

I am still playing with vectorAdd example. And when I execute the program on different Acc, the number of threads per block seem to be fixed according to Acc I choose. For example when I use

AccCpuFibers: blockThreadExtent: (4) [I understand that blockThreadExtent is number of threads per block]
AccCpuThreads: blockThreadExtent: (256)
AccCpuOmp2Threads: blockThreadExtent: (32)

So the number of threads are 4, 256 and 32 accordingly. Where does alpaka have number of threads set for each Acc?

Question

Most helpful comment

The number of threads per block can be chosen freely. It is part of the work division (index domain subdivision) as well as the number of blocks per grid and the number of elements per thread.

However, some accelerators are very limited in what they support. Especially the number of threads per block is often limited by the hardware because the threads really have to be executed in parallel to enable thread synchronization. The limits allowed by a given accelerator type on a given device can be read out via alpaka::acc::getAccDevProps<Acc>(dev). This method returns a AccDevProps structure with all the limits.

A valid work division for a given problem size (index domain) depends on the accelerator and device in use. To make it easier to switch between different accelerators, alpaka provides a alpaka::workdiv::getValidWorkDiv helper function which takes the given problem size, the accelerator, the device and some addition constraints and calculates a valid work division for this accelerator.
This getValidWorkDiv helper method is used by the vecAdd example. However, this is not necessary to use alpaka (all the other examples simply hard code the work division for the hard coded accelerator).

All 7 comments

@ax3l, thanks for your answer!

Exactly, a backend already chooses an "optimal" block size (number of
threads per block) depending on the target. One can still overwrite them
with a C++ trait or derive a backend with different work-splitting.

Currently, the optimal sizes are calculated in

https://github.com/ComputationalRadiationPhysics/alpaka/blob/master/include/alpaka/workdiv/WorkDivHelpers.hpp

from device properties.

Thanks for documenting the question & answer! :)

I do not see any open question anymore. @ax3l Why have you reopened it?

I was unsure if this is something we want to add to the manual, e.g. in a FAQ section

The number of threads per block can be chosen freely. It is part of the work division (index domain subdivision) as well as the number of blocks per grid and the number of elements per thread.

However, some accelerators are very limited in what they support. Especially the number of threads per block is often limited by the hardware because the threads really have to be executed in parallel to enable thread synchronization. The limits allowed by a given accelerator type on a given device can be read out via alpaka::acc::getAccDevProps<Acc>(dev). This method returns a AccDevProps structure with all the limits.

A valid work division for a given problem size (index domain) depends on the accelerator and device in use. To make it easier to switch between different accelerators, alpaka provides a alpaka::workdiv::getValidWorkDiv helper function which takes the given problem size, the accelerator, the device and some addition constraints and calculates a valid work division for this accelerator.
This getValidWorkDiv helper method is used by the vecAdd example. However, this is not necessary to use alpaka (all the other examples simply hard code the work division for the hard coded accelerator).

@ax3l @psychocoderHPC
I am still thinking about renaming the "work division" (workdiv, workDiv) into "subdivision" (subdiv, subDiv) and to move it into the idx namespace because I think "index (domain) subdivision" matches better than "work division".
The current alpaka::workdiv namespace could be merged into the alpaka::idx namespace.
This would result in the following renamings:

  • alpaka::workdiv::getWorkDiv -> alpaka::idx::getSubDiv
  • alpaka::workdiv::getValidWorkDiv -> alapaka::idx::calcValidSubDiv

From the naming aspect that sounds reasonable, but why would you like to merge the namespaces into idx? Could this be causing some confusion and it might be easier to grasp separated?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

BenjaminW3 picture BenjaminW3  路  3Comments

ax3l picture ax3l  路  5Comments

BenjaminW3 picture BenjaminW3  路  3Comments

ax3l picture ax3l  路  5Comments

jkelling picture jkelling  路  4Comments