Alpaka: mem* tests fail since e8b70cc2a7

Created on 11 Jun 2020  路  4Comments  路  Source: alpaka-group/alpaka

The memory buffer/view tests are failing on my system with the changes from https://github.com/alpaka-group/alpaka/commit/e8b70cc2a7caff43824e99827b3bd18d8ebb42d6 active.

Steps to reproduce

  • configure build with cmake, enable building of tests
  • make -j && ctest

expected result

All tests pass.

actual result

Tests memBuf, memView (and memP2P) fail.

details

System: Ubuntu 18.04
Compiler: tried clang 11, gcc 7.3.0

Result vary between backends and a bit between compilers:

  • with only CpuSequential enabled:

    • clang11

      Start 23: memBuf 22/29 Test #22: memBuf ...........................***Exception: SegFault 0.52 sec Start 23: memView 23/29 Test #23: memView ..........................Child aborted***Exception: 0.57 sec Start 24: memP2P 24/29 Test #24: memP2P ........................... Passed 0.05 sec

    • gcc 7.3

      Start 23: memBuf 23/30 Test #23: memBuf ...........................Child aborted***Exception: 0.64 sec Start 24: memView 24/30 Test #24: memView ..........................Child aborted***Exception: 0.69 sec Start 25: memP2P 25/30 Test #25: memP2P ........................... Passed 0.00 sec

      CpuSequential must be enabled for all tests to compile. In the following always only CpuSequential and the other named backend are enabled.

  • with CpuOmp2Blocks enabled:

    • clang11

      Start 23: memBuf 22/29 Test #22: memBuf ...........................Child aborted***Exception: 0.87 sec Start 23: memView 23/29 Test #23: memView ..........................Child aborted***Exception: 0.99 sec Start 24: memP2P 24/29 Test #24: memP2P ........................... Passed 0.05 sec

    • gcc 7.3

      Start 23: memBuf 23/30 Test #23: memBuf ...........................Child aborted***Exception: 1.15 sec Start 24: memView 24/30 Test #24: memView ..........................Child aborted***Exception: 1.26 sec Start 25: memP2P 25/30 Test #25: memP2P ........................... Passed 0.00 sec

  • with GpuCudaRt enabled:

    • gcc7.3, cuda 10.1

      Start 23: memBuf 22/29 Test #22: memBuf ...........................***Failed 0.42 sec Start 23: memView 23/29 Test #23: memView ..........................***Failed 0.73 sec Start 24: memP2P 24/29 Test #24: memP2P ...........................***Failed 1.24 sec

Reverting https://github.com/alpaka-group/alpaka/commit/e8b70cc2a7caff43824e99827b3bd18d8ebb42d6 makes the tests pass. This commit changes the affected tests.

Bug Testing

All 4 comments

The commit changes the dimensions of the buffers allocated to tests. In 4D, the dimensions are (16, 14, 12, 10) which means 26880 elements.

I only looked at memBuf for now. The failing test there uses TIdx = short, longer types pass. One reason for this to fail is that TaskSetCpuBase stores m_extentWidthBytes (here) as ExtentSize which is typedefed here to TIdx which is short in this case. If the element type of the buffer (which I did not check I found the type of the buffer, it is float, i.e. four bytes) is larger than char, this will overflow.

This poses the question, if the behavior is considered correct (i.e. the test is asking for too much memory for TIdx = short) or if this is a bug and we should store m_extentWidthBytes as size_t.

Yes, this was my fault. The extents are too large for smaller Idx types. I will fix this tomorrow by using smaller extents.

I created a PR which hopefully fixes the issue.

Should be fixed now

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ax3l picture ax3l  路  3Comments

BenjaminW3 picture BenjaminW3  路  3Comments

kloppstock picture kloppstock  路  3Comments

jkelling picture jkelling  路  3Comments

tdd11235813 picture tdd11235813  路  5Comments