Picongpu: CUDA Bug: volatile isParticle Flag necessary

Created on 26 Aug 2014 · 12Comments · Source: ComputationalRadiationPhysics/picongpu

While testing the Bunch example in 2D, I discovered the folowing bug, that also exists in 3D.

Not all particles get the given momentum and thus stay at their initial possition.

This is clearly visable in 2D.
initnomomentum
Some particles at the outer edge of the Gaussian blob do not move and stay as halo behind.
In 3D this is not directly visible.

However, in the BinEnergyElectrons.dat there are particles in the first bin (zero Energy) at the first time step for both 3D and 2D.

With the help of @psychocoderHPC - we saw that if we set isParticle in Particles.kernel to true, all particles are initialized correctly. Setting blockingKernel to on did not help. Same for setting typedef uint16_t lcellId_t; to typedef uint32_t lcellId_t; in frame_types.hpp.

The error does not occure for the KelvinHelmholtz example.

affects latest release cuda bug core third party

Source

PrometheusPi

All 12 comments

I noticed that we have 128 cells in y-direction and we have 32 cells in y-direction per GPU. The black slices look like they have a length of 1/4 of the GPU length in y-direction.
Therefore, the error occurs for 8 cells at once.

PrometheusPi on 26 Aug 2014

I added a workaround that solves the problem.
I will check if I can create a minimal example to submit a BUG to the nvcc developer.

please do not close this issue

psychocoderHPC on 29 Aug 2014

539 is a workaround. The issue will stay open till a general solution for this problem is found.

Do you agree @psychocoderHPC and @ax3l ?

PrometheusPi on 1 Sep 2014

The issue stays open for now, #539 implemented a work around.

We have to write a minimal example to report the bug to get it fixed in future versions of CUDA.

ax3l on 30 Sep 2014

Is this solved by other means than the workaround #539?

PrometheusPi on 25 Nov 2014

no, that was an auto-close due to the merge to master and should stay open (we changed the scope of the issue after we fixed it).

ax3l on 25 Nov 2014

work-around was applied with #539 and does not cause problems right now that we know of.

new scope of this issue

Since the volatile flag should not be necessary, it looks like either a race condition in our code (that is circumvented by the flag) or a CUDA 5.5+ bug that we should write an example for.

ax3l on 5 Jan 2015

What is the status of this issue?

PrometheusPi on 12 Nov 2018

It was solved with #539 but it is not clear if the workaround can be removed or not. We should keep it open.

psychocoderHPC on 13 Nov 2018

We could try if it still occurs, when removing the work-around, with CUDA 8+ and if not then it was fixed upstream in nvcc.

ax3l on 14 Nov 2018

This and other volatile things might be explained by libcu++ slides from SC19 shared by @ax3l . We need to check, there are not that many occurances.
cc @psychocoderHPC .

sbastrakov on 22 Nov 2019

👍1

@sbastrakov we are setting volatile to a thread-local variable to break compiler optimizations. The example in the slide (please link when available) is showing an example where you guard data via an atomic variable. To get the correct data value you must call __threadfence to be sure that you are reading the latest version of the value.

psychocoderHPC on 22 Nov 2019

👍1

Was this page helpful?

0 / 5 - 0 ratings