Picongpu: CUDA Bug: volatile isParticle Flag necessary

Created on 26 Aug 2014  路  12Comments  路  Source: ComputationalRadiationPhysics/picongpu

While testing the Bunch example in 2D, I discovered the folowing bug, that also exists in 3D.

Not all particles get the given momentum and thus stay at their initial possition.

This is clearly visable in 2D.
initnomomentum
Some particles at the outer edge of the Gaussian blob do not move and stay as halo behind.
In 3D this is not directly visible.

However, in the BinEnergyElectrons.dat there are particles in the first bin (zero Energy) at the first time step for both 3D and 2D.

With the help of @psychocoderHPC - we saw that if we set isParticle in Particles.kernel to true, all particles are initialized correctly. Setting blockingKernel to on did not help. Same for setting typedef uint16_t lcellId_t; to typedef uint32_t lcellId_t; in frame_types.hpp.

The error does not occure for the KelvinHelmholtz example.

affects latest release cuda bug core third party

All 12 comments

I noticed that we have 128 cells in y-direction and we have 32 cells in y-direction per GPU. The black slices look like they have a length of 1/4 of the GPU length in y-direction.
Therefore, the error occurs for 8 cells at once.

I added a workaround that solves the problem.
I will check if I can create a minimal example to submit a BUG to the nvcc developer.

please do not close this issue

539 is a workaround. The issue will stay open till a general solution for this problem is found.

Do you agree @psychocoderHPC and @ax3l ?

The issue stays open for now, #539 implemented a work around.

We have to write a minimal example to report the bug to get it fixed in future versions of CUDA.

Is this solved by other means than the workaround #539?

no, that was an auto-close due to the merge to master and should stay open (we changed the scope of the issue after we fixed it).

work-around was applied with #539 and does not cause problems right now that we know of.

new scope of this issue

Since the volatile flag should not be necessary, it looks like either a race condition in our code (that is circumvented by the flag) or a CUDA 5.5+ bug that we should write an example for.

What is the status of this issue?

It was solved with #539 but it is not clear if the workaround can be removed or not. We should keep it open.

We could try if it still occurs, when removing the work-around, with CUDA 8+ and if not then it was fixed upstream in nvcc.

This and other volatile things might be explained by libcu++ slides from SC19 shared by @ax3l . We need to check, there are not that many occurances.
cc @psychocoderHPC .

@sbastrakov we are setting volatile to a thread-local variable to break compiler optimizations. The example in the slide (please link when available) is showing an example where you guard data via an atomic variable. To get the correct data value you must call __threadfence to be sure that you are reading the latest version of the value.

Was this page helpful?
0 / 5 - 0 ratings