Picongpu: negative density leads to mallocMC error

Created on 10 Sep 2018  路  9Comments  路  Source: ComputationalRadiationPhysics/picongpu

When initializing a negative density in density.param, PIConGPU crashes with a mallocMC error.
This could be avoided by capping densities to zero or larger values.

@kossag14 Could you verify whether a non-zero density solved your problem?
@ComputationalRadiationPhysics/picongpu-developers would you prefer for capping or crashing?

question

Most helpful comment

@sbastrakov It definitely is a case of incorrect parameters given by a user. No question about that.
The question for me is, whether it would be more user-friendly to avoid this issue by default or throw a meaningful error message.

Current case:
In the current case, the density is described by a complex and lengthy equation. A corner case was not correctly covered and thus a negative density could be set. That can happen quite easily because in most free formula density descriptions, the density is described in parts and the equations only produce a valid result in their specific case.

As a context:
For LWFA simulations, running into memory errors is not that uncommon and usually points to an underestimation of density accumulation due to the wakefields and requires a better distribution of the compute units (GPUs).

My current opinion:
I actually liked the fact, that the simulation crashed because it pointed to an (unspecific) error in the setup. However, the fact that it crashed with a memory error is quite confusing.
I personally would prefer catching negative densities and throwing a meaningful error message as it is much more user-friendly.

All 9 comments

Yes, that's why we have a s = (s>0.)*s; in every free density example.
We don't do it generally since it's a cost not necessarily needed.

@ax3l That is a good point. One should never remove that multiplication.
I totally agree with you from a performance point of view. However, how much does such a check or multiplication cost?

I probably don't fully understand the context, so a naive question. Why isn't it a case of incorrect parameters given by a user? I mean, does negative density make any more sense than e.g. negative amount of cells? My point is, is negative density some kind of special case that is actually useful, so that it is reasonable to have this workaround.

@sbastrakov It definitely is a case of incorrect parameters given by a user. No question about that.
The question for me is, whether it would be more user-friendly to avoid this issue by default or throw a meaningful error message.

Current case:
In the current case, the density is described by a complex and lengthy equation. A corner case was not correctly covered and thus a negative density could be set. That can happen quite easily because in most free formula density descriptions, the density is described in parts and the equations only produce a valid result in their specific case.

As a context:
For LWFA simulations, running into memory errors is not that uncommon and usually points to an underestimation of density accumulation due to the wakefields and requires a better distribution of the compute units (GPUs).

My current opinion:
I actually liked the fact, that the simulation crashed because it pointed to an (unspecific) error in the setup. However, the fact that it crashed with a memory error is quite confusing.
I personally would prefer catching negative densities and throwing a meaningful error message as it is much more user-friendly.

@PrometheusPi thanks for such a detailed answer! I agree with your opinion.

The only thing we can do is to (silently) crop/clip to zero by default and on all user-given (free) functors. By that, the user still needs to figure out why no target is there but at least it does not crash. The RT costs are fine for that, imho, since it does not need anything that's not already in the kernel (data-wise).

@ax3l What about writing out a meaningful warning or error. Especially from the GPUs that might not be that easy?

That's not easily / cleanly possible besides verbose printf spilling, since you are evaluating / sampling a function at unknown points at runtime in a highly parallel manner. Also, adding such RT warnings would always come at a (data, register, latency) cost.

fixed with #2831 (in PIConGPU 0.4.3+)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

berceanu picture berceanu  路  3Comments

HighIander picture HighIander  路  4Comments

saipavankalyan picture saipavankalyan  路  3Comments

hightower8083 picture hightower8083  路  4Comments

ax3l picture ax3l  路  3Comments