Alpaka has a well structured hierarchy of folders and namespaces. In the current structure all backends life in one folder tree. The advantage of this structure is that you have the implementation of traits for all backends near to each other. A disadvantage is that it is hard to isolate all code to a corresponding backend.
My suggestion is to restructure the folders and group all code for a specific backend under a subfolder.
suggestion:
alpaka
|-- atomic
|-- core
|-- dim
...
|-- backends (name is not fixed can also be "architectures", ...)
|- cuda
|- cpu
|- acc
Such a change will also allow that we create a unified code for hip/cuda without the limitation that we con only activate hip or cuda within a binary. This is possible by adding a define which specify which backend should be used before we include all files of the backend. Than the define is changed to the next backend type and we can include it also.
cc-ing: @ComputationalRadiationPhysics/alpaka-developers @tdd11235813
What do you expect to be placed under backends/cpu/?
I already thought about placing what is currently within the alpaka/dev, alpaka/acc and alpaka/exec subfolders within the correct folders of the alpaka/pltf folder but as far as I understand you, you want to go much further. (This would not even be correct for a HIP backend because it can not be moved into folders for gpu, cpu, cuda or something like this).
Currently the folder structure also exactly mirrors the namespace structure. So you have to keep this in mind when doing such changes.
Furthermore, you have to keep in mind that many files can (and most should not) be assigned to a specific accelerator/backend. Many files are used by multiple accelerators. Future accelerators could create even more such dependencies. It could be possible to create another cuda accelerator with some different characteristics (other RNG or math implementations, etc) so even the implementation details of the CUDA accelerator may not exclusively be used by a CUDA accelerator and may therefore not be placed within a backends/cuda folder.
I do not see a problem for unified code between HIP/CUDA in the current state, we would just need some new headers and it should already work exactly as you describe it.
What do you expect to be placed under backends/cpu/?
everything which is currently branded by the name *CPU
This would not even be correct for a HIP backend because it can not be moved into folders for gpu, cpu, cuda or something like this
Hip would get his own subfolder because it is currently also have it's own naming *Hip* e.g. QueueHipRtAsync. It is than also possible to move all function which are in the subfolder Hip into an own namespace and this will allow that we not need to brand all classes by CPU or CUDA because we can seperate implementations by the namespace.
Currently the folder structure also exactly mirrors the namespace structure. So you have to keep this in mind when doing such changes.
We can keep this structur. The only difference is than that we will not create a namepsace for the subfolder backends. All under the folder backends has the same structure as they would be directly under includealpaka`.
I do not see a problem for unified code between HIP/CUDA in the current state, we would just need some new headers and it should already work exactly as you describe it.
The point here is that it is very hard to maintain. To have a collection include for a subfolder is easy to understand. Create somewhere a collection include for a backend which picks files from everywhere in the alpaka code is confusing.
I agree that it is hard to navigate through the current structure and I also would like to see a back-end centric structure, where its subfolders are more self-contained, and maybe it allows to reduce the depth of the structure tree as well. The interface for a back-end would be more clear.
It will not be possible or reasonable to keep the source subtrees completely disjunct regarding back-end specific codes, but to keep it at minimum.
So the new structure could try to reduce the amount of back-end related hidden codes at the same time.
The namespace structure is currently mostly mapped to file structure, but there are namespaces like traits which are concretized among several files, IIRC.
Maybe most of the namespace logic can be kept (from the namespace hierarchy
alpaka::exec::cuda still looks more preferable than alpaka::cuda::exec).
Just to be sure, the back-end subfolder includes a subfolder scheme like math, atomic, ..?
|-- backends (name is not fixed can also be "architectures", ...)
|- cuda
|- atomic
|- block
|- shared
|- dyn
|- st // maybe pack st/* and dyn/* all under shared/*
|- mem
|- alloc
|- buf
|- ... //maybe pack it all under mem/
|- rand // ... could still provide more than one RNG library
|- ...
Ok, now there are sources which target multiple back-ends, as Benjamin already mentioned.
I looked for CUDA in non-CUDA back-ends (btw the search would be easier in a back-end centric structure):
How about BOOST_ARCH_PTX switches:
$ grep -r "BOOST_ARCH_PTX" include|grep -v Cuda|grep -v cuda
include/alpaka/core/ConcurrentExecPool.hpp:#if !(BOOST_COMP_CLANG_CUDA && BOOST_ARCH_PTX)
include/alpaka/core/ConcurrentExecPool.hpp:#if !(BOOST_COMP_CLANG_CUDA && BOOST_ARCH_PTX)
include/alpaka/core/ConcurrentExecPool.hpp:#if !(BOOST_COMP_CLANG_CUDA && BOOST_ARCH_PTX)
include/alpaka/core/ConcurrentExecPool.hpp:#if !(BOOST_COMP_CLANG_CUDA && BOOST_ARCH_PTX)
include/alpaka/core/ConcurrentExecPool.hpp:#if !(BOOST_COMP_CLANG_CUDA && BOOST_ARCH_PTX)
include/alpaka/core/ConcurrentExecPool.hpp:#if !(BOOST_COMP_CLANG_CUDA && BOOST_ARCH_PTX)
include/alpaka/core/Common.hpp://! Most cases can be solved by #if BOOST_ARCH_PTX or #if BOOST_LANG_CUDA.
include/alpaka/core/Common.hpp:#if BOOST_LANG_CUDA && BOOST_ARCH_PTX
include/alpaka/core/Common.hpp:#if BOOST_LANG_CUDA && BOOST_ARCH_PTX
include/alpaka/core/BoostPredef.hpp:#if !defined(BOOST_ARCH_PTX)
include/alpaka/core/BoostPredef.hpp: #define BOOST_ARCH_PTX BOOST_PREDEF_MAKE_10_VRP(__CUDA_ARCH__)
include/alpaka/core/BoostPredef.hpp: #define BOOST_ARCH_PTX BOOST_VERSION_NUMBER_NOT_AVAILABLE
include/alpaka/core/Unroll.hpp:#if BOOST_ARCH_PTX
include/alpaka/queue/QueueCpuAsync.hpp:#if !(BOOST_COMP_CLANG_CUDA && BOOST_ARCH_PTX)
include/alpaka/queue/QueueCpuAsync.hpp:#if !(BOOST_COMP_CLANG_CUDA && BOOST_ARCH_PTX)
include/alpaka/event/EventCpu.hpp:#if !(BOOST_COMP_CLANG_CUDA && BOOST_ARCH_PTX)
include/alpaka/event/EventCpu.hpp:#if !(BOOST_COMP_CLANG_CUDA && BOOST_ARCH_PTX)
include/alpaka/event/EventCpu.hpp:#if !(BOOST_COMP_CLANG_CUDA && BOOST_ARCH_PTX)
include/alpaka/event/EventCpu.hpp:#if !(BOOST_COMP_CLANG_CUDA && BOOST_ARCH_PTX)
include/alpaka/exec/ExecCpuThreads.hpp:#if !(BOOST_COMP_CLANG_CUDA && BOOST_ARCH_PTX)
include/alpaka/exec/ExecCpuThreads.hpp:#if !(BOOST_COMP_CLANG_CUDA && BOOST_ARCH_PTX)
include/alpaka/exec/ExecCpuThreads.hpp:#if !(BOOST_COMP_CLANG_CUDA && BOOST_ARCH_PTX)
include/alpaka/exec/ExecCpuFibers.hpp:#if !(BOOST_COMP_CLANG_CUDA && BOOST_ARCH_PTX)
include/alpaka/exec/ExecCpuFibers.hpp:#if !(BOOST_COMP_CLANG_CUDA && BOOST_ARCH_PTX)
I guess many switches are there to hide code from the other back-end like some Fibers code should not be seen by Clang(CUDA), so it safely compiles.
Another example is include/alpaka/core/ConcurrentExecPool.hpp. Here are several lines of code, that are disabled for Clang(CUDA). I think it should stay there.
In my humble opinion this case does not speak against the back-end centric structure.
$ grep -r "CUDA" include/alpaka|grep -i -E "(CuRand|Cuda|Predef|Common)[^.]*.hpp" -v |uniq
include/alpaka/core/ConcurrentExecPool.hpp:#if !(BOOST_COMP_CLANG_CUDA && BOOST_ARCH_PTX)
include/alpaka/core/Utility.hpp:#if BOOST_LANG_CUDA && BOOST_COMP_CLANG_CUDA
include/alpaka/core/Vectorize.hpp:#elif defined(__CUDA_ARCH__)
include/alpaka/queue/QueueCpuAsync.hpp:#if !(BOOST_COMP_CLANG_CUDA && BOOST_ARCH_PTX)
include/alpaka/event/EventCpu.hpp:#if !(BOOST_COMP_CLANG_CUDA && BOOST_ARCH_PTX)
include/alpaka/exec/ExecCpuThreads.hpp:#if !(BOOST_COMP_CLANG_CUDA && BOOST_ARCH_PTX)
include/alpaka/exec/ExecCpuFibers.hpp:#if !(BOOST_COMP_CLANG_CUDA && BOOST_ARCH_PTX)
include/alpaka/mem/buf/BufCpu.hpp:// \TODO: Remove CUDA inclusion for BufCpu by replacing pinning with non CUDA code!
include/alpaka/mem/buf/BufCpu.hpp:#if defined(ALPAKA_ACC_GPU_CUDA_ENABLED) && BOOST_LANG_CUDA
include/alpaka/mem/buf/BufCpu.hpp: // The memory returned by this call will be considered as pinned memory by all CUDA contexts, not just the one that performed the allocation.
include/alpaka/mem/buf/BufCpu.hpp: ALPAKA_CUDA_RT_CHECK_IGNORE(
include/alpaka/mem/buf/BufCpu.hpp: "Memory pinning of BufCpu is not implemented when CUDA is not enabled!");
include/alpaka/mem/buf/BufCpu.hpp:#if defined(ALPAKA_ACC_GPU_CUDA_ENABLED) && BOOST_LANG_CUDA
include/alpaka/mem/buf/BufCpu.hpp: ALPAKA_CUDA_RT_CHECK_IGNORE(
include/alpaka/mem/buf/BufCpu.hpp: "Memory unpinning of BufCpu is not implemented when CUDA is not enabled!");
include/alpaka/mem/buf/BufCpu.hpp:#if defined(ALPAKA_ACC_GPU_CUDA_ENABLED) && BOOST_LANG_CUDA
include/alpaka/mem/view/ViewCompileTimeArray.hpp: // \FIXME: CUDA device?
include/alpaka/mem/view/ViewPlainPtr.hpp:#ifdef ALPAKA_ACC_GPU_CUDA_ENABLED
include/alpaka/mem/view/ViewPlainPtr.hpp: //! The CUDA RT device CreateStaticDevMemView trait specialization.
include/alpaka/mem/view/ViewPlainPtr.hpp: ALPAKA_CUDA_RT_CHECK(
include/alpaka/vec/Vec.hpp:// - the nvcc CUDA compiler (at least 8.0)
include/alpaka/vec/Vec.hpp:#if BOOST_COMP_NVCC || BOOST_COMP_INTEL || (BOOST_COMP_CLANG_CUDA >= BOOST_VERSION_NUMBER(4, 0, 0)) || (BOOST_COMP_GNUC >= BOOST_VERSION_NUMBER(8, 0, 0))
Looking for CUDA types in non CUDA back-end code, ViewPlainPtr.hpp is a candidate.
#ifdef ALPAKA_ACC_GPU_CUDA_ENABLED
//#############################################################################
//! The CUDA RT device CreateStaticDevMemView trait specialization.
template<>
struct CreateStaticDevMemView<
dev::DevCudaRt>
{
This is an example, where back-end mixed code has to be separated (it is a struct, so ok). I think moving this code to the back-end subfolder would be the better case, because otherwise such back-end specific codes stay hidden in files like ViewPlainPtr.hpp.
Afterwards there would be a file in:
|-- backends (name is not fixed can also be "architectures", ...)
|- cuda
|- mem
|- ViewPlainPtr.hpp // ah, there is CUDA specific code for ViewPlainPtr
This image (even though it is outdated) shows best why alpaka is structured by the concepts (rows in the image) and not by some arbitrary accelerator backends (columns): https://github.com/ComputationalRadiationPhysics/alpaka/blob/develop/doc/markdown/user/implementation/library/structure.png?raw=true
Alpaka is not a set of hard-coded accelerator backends (even though there are some default accelerators) but a construction kit for arbitrary accelerator backends. You can add and mix concept implementations to create your own accelerator. You should not have to look into all the default accelerators just to find an implementation you want to use for a specific concept.
I am still strongly against such a inconsistent confusing ordering by accelerator backend.
Yes, some things like the CreateStaticDevMemView<dev::DevCudaRt> are inconsistent and should be corrected.
I fully agree the current design also has good benefits.
The construction kit functionality is way more exposed and not sure, if it is transferable to the accelerator-centric structure without loosing benefits of the acc-centric structure.
And it is far less likely, that the API conventions may break with a new back-end. In your "own" back-end you hesitate less to introduce funny new namespaces and classes, especially when you do not care about the other back-ends one might construct in combination.
And maybe the number of accelerators will decrease in the long-term future?
Hip already covers CUDA and HCC (although HIP restricts expressiveness), hipSYCL would cover even more (a prototype for SYCL with HIP support).
The namespace-file mapping is a good thing鹿, although you would only "loose" one layer in the mapping consistency (corresponding alpaka namespaces would still begin with concepts, while files for these are located under accelerators).
The acc-centric structure has its advantages as well, but not sure, if it is worth refactoring the current code.
鹿) there are still some things like alpaka::ignore_unused (is in core),..
We internally discussed this issue again and we would keep the current design.
@psychocoderHPC maybe we can close this?
Yep, the arguments of @BenjaminW3 are convincing and we agreed on it on the last weekly meeting. Of course, we need to clean up some stray implementations such as ignore_unused (sorry! :D )
I will close this for now ;-)
Most helpful comment
This image (even though it is outdated) shows best why alpaka is structured by the concepts (rows in the image) and not by some arbitrary accelerator backends (columns): https://github.com/ComputationalRadiationPhysics/alpaka/blob/develop/doc/markdown/user/implementation/library/structure.png?raw=true
Alpaka is not a set of hard-coded accelerator backends (even though there are some default accelerators) but a construction kit for arbitrary accelerator backends. You can add and mix concept implementations to create your own accelerator. You should not have to look into all the default accelerators just to find an implementation you want to use for a specific concept.
I am still strongly against such a inconsistent confusing ordering by accelerator backend.