Alpaka: Using write-combined memory for host-device transfers ?

Created on 10 Jul 2019 · 9Comments · Source: alpaka-group/alpaka

Hi,
cudaHostAlloc and cudaMallocHost allow allocating write-combined memory:

cudaHostAllocWriteCombined: Allocates the memory as write-combined (WC). WC memory can be transferred across the PCI Express bus more quickly on some system configurations, but cannot be read efficiently by most CPUs. WC memory is a good option for buffers that will be written by the CPU and read by the device via mapped pinned memory or host->device transfers.

Is there an equivalent functionality in Alpaka or Cupla ?

Question

Source

fwyzard

👍2

All 9 comments

@darcato FYI

fwyzard on 10 Jul 2019

Currently, we are not supporting the different memory flags for CUDA.
A not so nice workaround is to wrapp it in a factory:

// pseudocode
void *ptr = nullptr;
size_t size = 1024u;
#if( ALPAKA_ACC_GPU_CUDA_ENABLED == 1 )
    CUDA_CHECK((cuplaError_t)cudaHostAlloc(
        &ptr, size * sizeof (Type), 
        cudaHostAllocMapped));
    using ViewPlainPtr = alpaka::mem::view::ViewPlainPtr<Dev, Type, Dim, Idx>;
    ViewPlainPtr plainBuf(ptr, dev, vec::Vec<1u, size_t>(size));
#else
    //create normal alpaka buffer and expose this buffer as ViewPlainPtr
#endif

psychocoderHPC on 10 Jul 2019

so, we could do something like this with Cupla ?

#ifdef cudaHostAlloc
#undef cudaHostAlloc
#endif

CUPLA_HEADER_ONLY_FUNC_SPEC
cuplaError_t
cuplaHostAlloc(
    void **ptrptr,
    size_t size,
    unsigned int flags
)
{
#if ALPAKA_ACC_GPU_CUDA_ENABLED == 1
  // if compiling for CUDA, use the native allocation functions
  return (cuplaError_t) cudaHostAlloc(ptrptr, size, flags);
#else
  // otherwise, use cuplaMallocHost
  return cuplaMallocHost(ptrptr, size);
#endif
}

fwyzard on 10 Jul 2019

@fwyzard looks reasonable. Though I do not think the starting #ifdef block is correct: we do not define cudaHostAlloc in cupla and it is a function in CUDA API.

sbastrakov on 10 Jul 2019

@fwyzard If you like you can extend cupla. cudaHostAlloc must be undefied to have access to native cuda functions and afterward we reintroduce it again.

// we will add the following line to the cuda_to_cupla.hpp
#define cudaHostAlloc(...) cuplaHostAlloc(__VA_ARGS__)


// this code will be added to cupla memory.hpp/cpp
CUPLA_HEADER_ONLY_FUNC_SPEC
cuplaError_t
cuplaHostAlloc(
    void **ptrptr,
    size_t size,
    unsigned int flags
)
{
#if ALPAKA_ACC_GPU_CUDA_ENABLED == 1
  #undef cudaHostAlloc
  // if compiling for CUDA, use the native allocation functions
  auto tmp = (cuplaError_t) cudaHostAlloc(ptrptr, size, flags);
  #define cudaHostAlloc(...) cuplaHostAlloc(__VA_ARGS__)
  return tmp;
#else
  // otherwise, use cuplaMallocHost
  return cuplaMallocHost(ptrptr, size);
#endif
}

psychocoderHPC on 10 Jul 2019

FYI, I did some more benchmarks of our application, and I didn't find any benefits from write-combined memory, so we likely won't be using it any more.

fwyzard on 3 Dec 2020

👍1

FYI, I did some more benchmarks of our application, and I didn't find any benefits from write-combined memory, so we likely won't be using it any more.

Thanks for this information.
So the provided memory allocation methods by cupla/alpaka are enough for your use cases?

psychocoderHPC on 4 Dec 2020

We're in the process of porting more code to Alpaka, but in principle think so.

fwyzard on 4 Dec 2020

👍1

We're in the process of porting more code to Alpaka, but in principle think so.

Info

We are on the way to release a new version of alpaka: https://github.com/alpaka-group/alpaka/tree/release-0.6.0-rc3
In the new release, we refactored the namespace to reduce the depth.

psychocoderHPC on 4 Dec 2020

Was this page helpful?

0 / 5 - 0 ratings