Alpaka: Texture/image support

Created on 5 Feb 2021  路  9Comments  路  Source: alpaka-group/alpaka

Alpaka currently lacks support for texture/image capabilities of certain backends. This currently concerns the CUDA backend and the currently developed SYCL backend.
Texture/image support was also requested in: https://github.com/alpaka-group/alpaka/issues/1065
The discussion also came up during the prototyping of kernel side accessors to buffers: https://github.com/alpaka-group/alpaka/issues/38 and https://github.com/alpaka-group/alpaka/pull/1249

Since backend support for this feature is scarce, we have two options to implement such a facility:

  1. emulation on backends without texture/image support, e.g. via a wrapper on alpaka::Buf
  2. do not provide the feature and fail to compile

While option 1 is certainly doable, given that only CUDA supports this feature, we might run into a situation where the feature performs suboptimally on non-CUDA backends, because we might not pick the right emulation approach for everyone. E.g. is Z-order storage really the best memory layout? How about weird texture formats (see: https://sycl.readthedocs.io/en/latest/iface/image.html#sycl-image-channel-order)? Bilinear/trilinear interpolation on access? Edge behavior? Normalized texture coordinates? There is a lot we could get wrong or at least bad.

Option 2 is safe from our perspective, but locks users into CUDA (and later SYCL) when they use the feature. So as it stands now they could just use CUDA directly.

We could also mix the options and just provide a very limited texture/image support that we are confident we can emulate.

What is the strategy to go forward wrt. texture/image support?

Enhancement

All 9 comments

So while HIP did not mention texture support in their documentation, the functionality seems to be there: https://github.com/ROCm-Developer-Tools/HIP/blob/main/include/hip/hcc_detail/texture_functions.h

I agree with your assessment. I do not think textures are that widely used in computational applications nowadays, as there are now for a long time caches on GPUs (was one of the reasons to use textures for computations in early CUDA days), and their operations like interpolation have limited accuracy. However emulating while I think not that difficult to do to make it just work, without performance requirements, would still require continuous maintenance.

Dear all, I firmly believe this is a side quest. I think there is more important stuff to do.

While this might be a less important task for the overall goal of alpaka, ISAAC would definitely benefit from that.

While this might be a less important task for the overall goal of alpaka, ISAAC would definitely benefit from that.

To give it a little bit more context: In ISAAC we can have the case that we visualize multiple data sources with different resolutions within the same kernel. Accessing the data in a texture-like way with normalized indices and automatic interpolation is simplifying the ray casting kernel.

Maybe we can propagate work at some point from ISAAC back into alpaka.

ISAAC would greatly benefit from textures.
The addressing is not really a problem, as it can easily be emulated with minimal overhead.
Bigger problems, which can be solved with a proper native texture support are:

  1. Caching: currently the data for 3D buffers is in a normal array and as such is cached normally along the array, resulting in many cache misses, as the accesses are most frequently on neighbouring voxels which are most likely at least in 2 of the 3 dimensions far from another in memory and therefore not cached, textures would solve this, as they cache locally in the dimension of the buffer
  2. Interpolation: currently the trilinear interpolation is emulated with 8 buffer reads on neighbouring voxels, which are most likely not cached due to problem 1. and therefore have a very high performance cost, with texture support the interpolation would be done automatically on access and much cheaper
  3. Buffer boundary handling: currently all reads of the 3D buffers need to be boundary checked on every read and different functionalities are emulated if a boundary is reached like texture repeat, clamp and constant color, which would be done much more efficiently with a native texture implementation

How long would a texture imp in Alpaka take? Can we test the perf gain by trying it in a CUDA only branch for ISAAC?

Right now I'm trying to integrate the native cuda textures in ISAAC, that I can hopefully include some performance numbers in my master thesis. And as @psychocoderHPC said, maybe we can propagate some of the work to alpaka, as I need to implement a software emulation for all non cuda capable architectures anyway

Was this page helpful?
0 / 5 - 0 ratings

Related issues

psychocoderHPC picture psychocoderHPC  路  4Comments

ax3l picture ax3l  路  4Comments

psychocoderHPC picture psychocoderHPC  路  5Comments

tdd11235813 picture tdd11235813  路  5Comments

BenjaminW3 picture BenjaminW3  路  5Comments