Alpaka: Reduce namespace overusage

Created on 8 Jun 2020  路  31Comments  路  Source: alpaka-group/alpaka

I gave an introduction to alpaka to a group of people from the EP-SFT group at CERN and while presenting an example there was feedback that alpaka makes too much use of namespaces.

Examples:

alpaka::dim::DimInt<1u>;
alpaka::acc::AccGpuCudaRt<Dim, Idx>;
alpaka::queue::Queue<Acc, alpaka::queue::Blocking>;
alpaka::vec::Vec<Dim, Idx>;
alpaka::dev::DevCpu
alpaka::mem::buf::alloc<float, Idx>
alpaka::mem::view::getPtrNative
alpaka::mem::view::copy
alpaka::workdiv::getValidWorkDiv
alpaka::kernel::createTaskKernel
alpaka::queue::enqueue
alpaka::idx::getIdx<alpaka::Grid, alpaka::Threads>
alpaka::math::min
alpaka::atomic::atomicOp<alpaka::atomic::op::Add>
alpaka::block::shared::st::allocVar<unsigned[MaxThreadsPerBlock::value / warpSize], __COUNTER__>

Very often the name of the namespace repreats in the name of the entity referred to. Like alpaka::dim::DimInt contains dim in the namespace and Dim in the referred class template.

While there was consensus from the audience to simplify some of the namespaces and types, in my personal opinion one could probably drop all namespaces underneath of alpaka and maybe rename a few entities:

alpaka::DimInt<1u>;
alpaka::AccGpuCudaRt<Dim, Idx>;
alpaka::Queue<Acc, alpaka::Blocking>;
alpaka::Vec<Dim, Idx>;
alpaka::DevCpu
alpaka::allocBuffer<float, Idx>
alpaka::getPtrNative
alpaka::copyMem
alpaka::getValidWorkDiv
alpaka::createTaskKernel
alpaka::enqueue
alpaka::getIdx<alpaka::Grid, alpaka::Threads>
alpaka::min
alpaka::atomicAdd
alpaka::allocSharedVar<unsigned[MaxThreadsPerBlock::value / warpSize], __COUNTER__>

We believe such a refactoring would relieve some of the cognitive burden when remembering the alpaka API.

Question Refactoring

Most helpful comment

Here is a list of all namespaces in alpaka:

alpaka::acc
alpaka::acc::traits
alpaka::atomic
alpaka::atomic::detail
alpaka::atomic::op
alpaka::atomic::traits
alpaka::block
alpaka::block::shared
alpaka::block::shared::dyn
alpaka::block::shared::dyn::traits
alpaka::block::shared::st
alpaka::block::shared::st::detail
alpaka::block::shared::st::traits
alpaka::block::sync
alpaka::block::sync::op
alpaka::block::sync::traits
alpaka::block::sync::traits::detail
alpaka::concepts
alpaka::concepts::detail
alpaka::core
alpaka::core::align
alpaka::core::detail
alpaka::core::threads
alpaka::core::threads::detail
alpaka::core::vectorization
alpaka::cuda
alpaka::cuda::detail
alpaka::cuda::traits
alpaka::dev
alpaka::dev::cpu
alpaka::dev::cpu::detail
alpaka::dev::traits
alpaka::dim
alpaka::dim::traits
alpaka::elem
alpaka::elem::traits
alpaka::event
alpaka::event::generic
alpaka::event::generic::detail
alpaka::event::traits
alpaka::event::uniform_cuda_hip
alpaka::event::uniform_cuda_hip::detail
alpaka::example
alpaka::extent
alpaka::extent::detail
alpaka::extent::traits
alpaka::hierarchy
alpaka::idx
alpaka::idx::bt
alpaka::idx::detail
alpaka::idx::gb
alpaka::idx::traits
alpaka::intrinsic
alpaka::intrinsic::traits
alpaka::kernel
alpaka::kernel::detail
alpaka::kernel::traits
alpaka::kernel::uniform_cuda_hip
alpaka::kernel::uniform_cuda_hip::detail
alpaka::math
alpaka::math::traits
alpaka::mem
alpaka::mem::alloc
alpaka::mem::alloc::traits
alpaka::mem::buf
alpaka::mem::buf::cpu
alpaka::mem::buf::cpu::detail
alpaka::mem::buf::traits
alpaka::mem::view
alpaka::mem::view::cpu
alpaka::mem::view::cpu::detail
alpaka::mem::view::detail
alpaka::mem::view::traits
alpaka::mem::view::traits::detail
alpaka::mem::view::uniform_cuda_hip
alpaka::mem::view::uniform_cuda_hip::detail
alpaka::meta
alpaka::meta::detail
alpaka::offset
alpaka::offset::detail
alpaka::offset::traits
alpaka::origin
alpaka::pltf
alpaka::pltf::traits
alpaka::queue
alpaka::queue::cpu
alpaka::queue::generic
alpaka::queue::generic::detail
alpaka::queue::property
alpaka::queue::traits
alpaka::queue::uniform_cuda_hip
alpaka::queue::uniform_cuda_hip::detail
alpaka::rand
alpaka::rand::distribution
alpaka::rand::distribution::cpu
alpaka::rand::distribution::traits
alpaka::rand::distribution::uniform_cuda_hip
alpaka::rand::generator
alpaka::rand::generator::cpu
alpaka::rand::generator::traits
alpaka::rand::generator::uniform_cuda_hip
alpaka::time
alpaka::time::traits
alpaka::uniform_cuda_hip
alpaka::uniform_cuda_hip::detail
alpaka::unit
alpaka::vec
alpaka::vec::detail
alpaka::vec::traits
alpaka::wait
alpaka::wait::traits
alpaka::wait::traits::generic
alpaka::warp
alpaka::warp::traits
alpaka::workdiv
alpaka::workdiv::detail
alpaka::workdiv::traits

All 31 comments

Just as a note, PIConGPU also uses a deep hierarchy of namespaces that match the directory structure, so that for example a file include/picongpu/aaa/bbb.hpp puts things into namespace picongpu::aaa (or picongpu::aaa::bbb when necessary). However, there is normally no duplication of namespace into the prefix, so hypothetical PIConGPU-style alpaka names would have been alpaka::dim::Int, alpaka::dev::Cpu, etc.

One could also simply eliminate some hierarchies by employing using namespace on the client side:

namespace myAlpaka
{
using namespace alpaka;
using namespace alpaka::mem;
...
}

Just as a note, PIConGPU also uses a deep hierarchy of namespaces that match the directory structure, so that for example a file include/picongpu/aaa/bbb.hpp puts things into namespace picongpu::aaa (or picongpu::aaa::bbb when necessary).

While this may be true, alpaka seems to be a framework independent of PIConGPU. At least I feel it is advertised as such. Therefore I think that there is no reason for typing alpakas style to PIConGPU.

However, there is normally no duplication of namespace into the prefix, so hypothetical PIConGPU-style alpaka names would have been alpaka::dim::Int, alpaka::dev::Cpu, etc.

Fully agreed! Although I would rather remove the namespaces, since that eliminates even more clutter.

One could also simply eliminate some hierarchies by employing using namespace on the client side

Unfortunately, alpaka is spread over many namespaces, so there will be many using declarations in many files. And this does also not solve the issue that I need to remember which namespaces I need to use ;) And since alpaka is template heavy, library code based on alpaka (e.g. mallocMC) is forced to header files where you normally try to avoid using declarations.

Yes, sure. My point was not to say let's do everything the PIConGPU way, but just to share information how we tackle this in another project.

Couldn't a user app just have a header and this myAlpaka-type namespace, just to be used everywhere in the app? Maybe alpaka could have provided one, but I feel this would make things confusing with regards to examples and documentation.

While this may be true, alpaka seems to be a framework independent of PIConGPU.

Alpaka independently came to the same solution.
Alpaka and PIConGPU are not the only projects following the style of having the namespaces follow the folder structure because this is much more intuitive.

And this does also not solve the issue that I need to remember which namespaces I need to use ;)

I do not really believe that this is the reason for disliking the current naming scheme. I heard this from others as well but it is not something that is special to alpaka. Having to remember names and where to find a method and how it is called exactly is as well an issue for CUDA, HIP or any other library. When they do not have namespaces, the same namespace hierarchy is encoded in the method name itself. I even think that having namespaces together with some IDE auto fill makes it much easier to find something.
What I understand as disadvantage is that having the namespaces makes the names longer due to the namespace syntax (on average we have 2 x ::). But this is part of the price we have to pay for this C++ feature compared to C interfaces.
We could try to get rid of the prefixes duplicating the namespace as a first step.

For sure could a user side application have it's own header for making alpaka simpler. But this is a burden every user has to carry.

Is there actually any strong reason for having all the subnamespaces in alpaka? alpaka is not a large library.

I even think that having namespaces together with some IDE auto fill makes it much easier to find something.

Unfortunately, I find this to be the exact opposite. When I type alpaka::foo, code completion only displays matches from the alpaka namespace. So if I need a buffer, a search for alpaka::buf will not find anything. I need to know, that it lives in alpaka::mem. Good, there is my buffer. Now I want to copy it. I enter alpaka::mem::copy (because copying is related to buffers, so I assume they live in the same namespace) and again, my IDE gives me no results. I feel lost and pull out an alpaka example. copyis in alpaka::mem::view.

Moving everything into a global alpaka namespace will make some searches faster but it will equally make other searches slower because you now get hundreds of proposed methods and have to scroll through everything instead of having an easy option to filter via the next subnamespace.

The problem you had with the buffer copy is that alpaka has a much more structured interface when it comes to memory handling than when using CUDA. CUDA buffers are only raw pointers (C-style). In alpaka you have objects (C++-style) and even a std::vector can be treated as a view for alpaka. The higher type-safety and support for multiple types of buffers/views requires some more structure that is mirrored by the namespaces. A Buffer is a concept which inherits from the View concept. You can only allocate buffers but not views but you can do everything to a buffer that you can do with a view. Therefore copy is part of the view namespace. Having this as part of the name makes it much mor obvious what you can pass to a method and what is expected from the parameters because the parameters in alpaka are templates where this is not easy to see in comparison to primitve pointers in CUDA.

With the buffer and view I agree with @bernhardmgruber that copying may benefit from having one namespace less. It could all follow the logic of "each memory operation but allocation works with views, and buffer is also conceptually a view" (well, it is not, what i meant is more like each container is a range for its full self in C++20), so mem::view:: could have been just in mem::

Today in the Alpaka VC we have started to discuss how we can make the alpaca namespace more intuitive to use. In general there is a discussion whether we want to keep the deep namespaces or whether we want to change to a flatter approach where the information is also encoded in the function name.

e.g.: alpaka::block::shared::st::allocVar() vs alpaka::block::allocStaticSharedMemory()

In combination with the flat namespace, it was also the idea to create a flat API for users and to keep the deep namespace API for the internal implementation.

Btw, due to ADL one could already write e.g.
enqueue(queue, taskKernel);
which is the same as
alpaka::queue::enqueue(queue, taskKernel);
as queue is already of type alpaka::queue::Something.

I do not think we should really go for it, just mentioning as an option. In case alpaka namespaces are flattened a little bit (e.g. mem::buf and mem::view are merged to mem, etc.), quite a lot of things can be written like that. But I don't particularly like it.

Here is a list of all namespaces in alpaka:

alpaka::acc
alpaka::acc::traits
alpaka::atomic
alpaka::atomic::detail
alpaka::atomic::op
alpaka::atomic::traits
alpaka::block
alpaka::block::shared
alpaka::block::shared::dyn
alpaka::block::shared::dyn::traits
alpaka::block::shared::st
alpaka::block::shared::st::detail
alpaka::block::shared::st::traits
alpaka::block::sync
alpaka::block::sync::op
alpaka::block::sync::traits
alpaka::block::sync::traits::detail
alpaka::concepts
alpaka::concepts::detail
alpaka::core
alpaka::core::align
alpaka::core::detail
alpaka::core::threads
alpaka::core::threads::detail
alpaka::core::vectorization
alpaka::cuda
alpaka::cuda::detail
alpaka::cuda::traits
alpaka::dev
alpaka::dev::cpu
alpaka::dev::cpu::detail
alpaka::dev::traits
alpaka::dim
alpaka::dim::traits
alpaka::elem
alpaka::elem::traits
alpaka::event
alpaka::event::generic
alpaka::event::generic::detail
alpaka::event::traits
alpaka::event::uniform_cuda_hip
alpaka::event::uniform_cuda_hip::detail
alpaka::example
alpaka::extent
alpaka::extent::detail
alpaka::extent::traits
alpaka::hierarchy
alpaka::idx
alpaka::idx::bt
alpaka::idx::detail
alpaka::idx::gb
alpaka::idx::traits
alpaka::intrinsic
alpaka::intrinsic::traits
alpaka::kernel
alpaka::kernel::detail
alpaka::kernel::traits
alpaka::kernel::uniform_cuda_hip
alpaka::kernel::uniform_cuda_hip::detail
alpaka::math
alpaka::math::traits
alpaka::mem
alpaka::mem::alloc
alpaka::mem::alloc::traits
alpaka::mem::buf
alpaka::mem::buf::cpu
alpaka::mem::buf::cpu::detail
alpaka::mem::buf::traits
alpaka::mem::view
alpaka::mem::view::cpu
alpaka::mem::view::cpu::detail
alpaka::mem::view::detail
alpaka::mem::view::traits
alpaka::mem::view::traits::detail
alpaka::mem::view::uniform_cuda_hip
alpaka::mem::view::uniform_cuda_hip::detail
alpaka::meta
alpaka::meta::detail
alpaka::offset
alpaka::offset::detail
alpaka::offset::traits
alpaka::origin
alpaka::pltf
alpaka::pltf::traits
alpaka::queue
alpaka::queue::cpu
alpaka::queue::generic
alpaka::queue::generic::detail
alpaka::queue::property
alpaka::queue::traits
alpaka::queue::uniform_cuda_hip
alpaka::queue::uniform_cuda_hip::detail
alpaka::rand
alpaka::rand::distribution
alpaka::rand::distribution::cpu
alpaka::rand::distribution::traits
alpaka::rand::distribution::uniform_cuda_hip
alpaka::rand::generator
alpaka::rand::generator::cpu
alpaka::rand::generator::traits
alpaka::rand::generator::uniform_cuda_hip
alpaka::time
alpaka::time::traits
alpaka::uniform_cuda_hip
alpaka::uniform_cuda_hip::detail
alpaka::unit
alpaka::vec
alpaka::vec::detail
alpaka::vec::traits
alpaka::wait
alpaka::wait::traits
alpaka::wait::traits::generic
alpaka::warp
alpaka::warp::traits
alpaka::workdiv
alpaka::workdiv::detail
alpaka::workdiv::traits

IMO: block needs to die, mem needs to be unified (no sub-namespaces), move device-side functions to alpaka::device or make them members of acc.

Here is a list of all entities:
all.txt

If we decide to do changes to the namespaces, please do it step by step and not in a huge PR.

I would also prefer if we would be using namespace.
From an implementation side the split between alpaka::mem::view and alpaka::mem::buf is useful but I can understand that from a user perspective it does not bring any advantage. Adding view and buf into mem via using and adapting the examples could be a first step.

Today at the meeting we discussed an idea, that is orthogonal to these namespace simplifications. Currently, all device-side functions take acc as the first parameter, and that an informal definition of what is the device function, but they are in a few namespaces and from the namespace name it's not always clear what is device-side and what not (e.g. alpaka::dev and alpaka::kernel are for host side). We discussed an idea to make device-side functions callable as methods of acc. Technically those would be methods of the underlying implementations of atomics, warps, etc. which are already inherited by accelerator classes, and those member functions will call the currently existing free functions or traits. As a side product, then for a user in the kernel there is no need to know alpaka namespaces, as everything is accessed as acc.doStuff();

However, there was no consensus for this idea, and we also figured this may be quite a large interface change, that perhaps is not very reasonable to do right now even if everyone is for it. Writing it here to document and potentially discuss further.

Regarding my earlier ADL suggestion, no one was in favor of it, me included. Just put that as an option.

I took a bit of time and looked through all structs/classes/functions and what namespaces can be inlined or removed. Here is what we could do:

edit: Changed suggesting inline namespace to remove namespace. Added checkboxes.

  • [x] remove namespace alpaka::acc
  • [x] remove namespace alpaka::atomic
  • [x] remove namespace alpaka::block::sync
  • [x] remove namespace alpaka::block::shared
    - [ ] maybe remove namespace alpaka::core, I guess it is not supposed to be called by users?
  • [x] remove namespace alpaka::dev
  • [x] remove namespace alpaka::dim
  • [x] remove namespace alpaka::elem
  • [x] remove namespace alpaka::event

    • [x] rename alpaka::event::test to alpaka::isComplete

  • [x] remove namespace alpaka::example
  • [ ] maybe remove namespace alpaka::extent

    • possibly rename getDepth, getHeight, getWidth, setDepth, setHeight, setWidth

      - [ ] remove namespace alpaka::hierarchy

  • [x] remove namespace alpaka::idx
  • [x] remove namespace alpaka::intrinsic
  • [x] remove namespace alpaka::kernel
  • [x] remove namespace alpaka::mem
  • [x] remove namespace alpaka::offset
  • [x] remove namespace alpaka::pltf
  • [x] remove namespace alpaka::queue
    - [ ] remove namespace alpaka::rand

    • maybe rename alpaka::rand::distribution and alpaka::rand::generator

  • [x] remove namespace alpaka::time
  • [x] remove namespace alpaka::vec

    • [x] rename alpaka::vec::cast to alpaka::vec::castVec

    • [x] rename alpaka::vec::concat to alpaka::vec::concatVec

    • [x] rename alpaka::vec::reverse to alpaka::vec::reverseVec

  • [x] remove namespace alpaka::wait
  • [x] remove namespace alpaka::workdiv

Here is a tab separated table with all refactored entities:
all_ref.txt

To document my suggestion:

My suggestion was to separate device side functions into alpaka::device and the host side api into alpaka. After looking to @bernhardmgruber list I see also stuff which is living in both worlds.

IMO we should move the device API into alpaka::device, host API into alpaka::host and all other directly into alpaka namespace.

I agree, that alpaka::device is nice idea. It is true, that every device function has acc parameter, but it makes easier for auto completion navigation and doxygen search, if you have a namespace, which says, every inside the namespace is available on the device.

But I don't understand the namespace host. Which function is in host and which not? For example, is it alpaka::mem::alloc() or alpaka::host::mem::alloc()? Memory allocation is done on the host normally.

Also, some stuff like vec::Vec is for both host and device. Which may be not obvious for a user who only needs it on one side.

To clarify my last message. One can use vec::Vec on both host and device side. And acc is not needed to access components, of course, so it's in that regard not like other device-side API. So if we put it to alpaka::device, it would be confusing how to use in on host. If we e.g. name a renaming for the host-side namespace, that also does not solve the ambiguity, as if a user is not aware of that, it would be not clear what happens e.g. when a "host-side" vector is passed to kernel.

To clarify my last message. One can use vec::Vec on both host and device side. And acc is not needed to access components, of course, so it's in that regard not like other device-side API. So if we put it to alpaka::device, it would be confusing how to use in on host. If we e.g. name a renaming for the host-side namespace, that also does not solve the ambiguity, as if a user is not aware of that, it would be not clear what happens e.g. when a "host-side" vector is passed to kernel.

Therefore I suggested to keep something like Vec in the namespace alpaka and have a namespace alpaka::host and alpaka::device for the API which can only be used on device xor host

I agree, that alpaka::device is nice idea. It is true, that every device function has acc parameter, but it makes easier for auto completion navigation and doxygen search, if you have a namespace, which says, every inside the namespace is available on the device.

But I don't understand the namespace host. Which function is in host and which not? For example, is it alpaka::mem::alloc() or alpaka::host::mem::alloc()? Memory allocation is done on the host normally.

If it is host or device depends on where you call the function NOT where you use the result of the funtion.

Therefore I suggested to keep something like Vec in the namespace alpaka and have a namespace alpaka::host and alpaka::device for the API which can only be used on device xor host

I do not think this would really help a user. Imagine, a user wants to use Vec on the host side e.g. to make a work div. How do they know if it's alpaka::Vec or alpaka::host::Vec (or with some namespaces in between, but you get the point)? That would depend on whether Vec is also available on device, which a user may not know or care about.

So it turns out we cannot just inline namespaces in many cases.

Given the following example:

namespace alpaka {
  namespace dev {
    class Dev { ... };
    namespace omp5 { ... }
  }
}
namespace alpaka {
  namespace omp5 { ... }
}

We cannot inline the namespace dev, because then e.g. alpaka::omp5::... can no longer be resolved by the compiler. E.g. when resolving alpaka::omp5::detail::omp5Check(), the compiler will already error when trying to find the namespace omp5 insisde alpaka and not look into both, alpaka::dev::omp5 and alpaka::omp5.

So it seems we can only safely inline the innermost namespaces. Which means we really need to consider removing namespaces. Or creating a large list of using directives in alpaka.hpp.

As discussed during the alpaka VC today, we will document the namespace changes in a CHANGELOG.md file in the repository root: #1167

It turns out the inline namespace issue is actually a bug and it is being addressed: http://wg21.link/p1701

There are still 2 pending refactorings I would like to address before I consider this issue addressed: removing the alpaka::view namespace and simplifying the namespace hierarchy inside alpaka::block.

I think the first round of namespace refactorings is done. Thank you everybody!

thx I will pull this changes next week into 0.6.0 rc.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

tdd11235813 picture tdd11235813  路  4Comments

shefmarkh picture shefmarkh  路  4Comments

theZiz picture theZiz  路  5Comments

ax3l picture ax3l  路  4Comments

psychocoderHPC picture psychocoderHPC  路  4Comments