Picongpu: `math::pow()` and `::powf()` broken in PIConGPU

Created on 20 Oct 2020 · 22Comments · Source: ComputationalRadiationPhysics/picongpu

Hey,

evaluating the functions math::pow() or ::powf() in the following way results in nan.
I execute the following code in some kernel (in my case in operator() in YeePML.kernel)

{
    float const base = -4.;
    float_X const exponent = -math::pow( base , 2._X);
    float_X const exponent_D = -math::pow( double(base) , 2._X);
    float_X const exponent_powf = -powf( base , 2.f);

    printf("base = %.2e\n", double(base)); // Result: -4.00e00
    printf("exponent (float_X,float_X) = %.2e\n", double(exponent)); // Result: nan
    printf("exponent (double,float_X) = %.2e\n", double(exponent_D)); // Result: -1.60e01
    printf("exponent ::powf() = %.2e\n", double(exponent_powf)); // Result: nan
}

Current workaround is to use the constexpr implementation of pow() from the radiation plugin

#include "picongpu/plugins/radiation/utilities.hpp"

float_X base = -4._X;
float_X result = -::picongpu::plugins::radiation::util::pow( base, 2._X )

printf( "radiation pow = %.2e\n", double( result ) ); // Result: -1.60e01

bug

Source

steindev

👍1

All 22 comments

I can reproduce the issue on k20 hosted in hemera.

I tried a native mini app with the same code. The mini app is not showing the issue.
Changing base from -4 to 4 is fixing the 'nan'
[ ] test disabling fast math

psychocoderHPC on 20 Oct 2020

👍1

@psychocoderHPC and I just figured that powf() falls back to __powf() when using -use_fast_math as compiler option. __powf(x,y) is implemented by

exp2f(y *__log2f(x))

CUDA C++ Programming Guide, PG-02829-001_v11.1, p. 261
As the logarithm is only defined for positive values, this explains the error.

In a mini app I was able to reproduce the error when compiling with -use_fast_math.
(Interestingly, also the version with powf( double(base), 2.) results in nan.)
Without -use_fast_math the error does not occur.

In conclusion, the observed behavior can be expected with -use_fast_math and is not an error.

Is there a way to avoid this issue in PIConGPU? Do we actually want to avoid this issue?

steindev on 21 Oct 2020

👍1

By the way, the pow( base, exponent ) implementation in the radiation plugin allows only integers for the the exponent. Thus there is a potential error around the corner due to implicit type conversion when calling it with floating point exponentss.

steindev on 21 Oct 2020

To be fair, the document does not say it's implemented that way exactly, but rather relies on that expression. They could theoretically have a check for a negative number in integer positive power. But from what you observed seems like they do just that.

sbastrakov on 26 Oct 2020

I think the way to avoid the issue is to use an implementation for integer power value in the code. Since such an expression with a negative base only makes sense with the integer power (in the real numbers domain, where we are).

sbastrakov on 26 Oct 2020

I am not sure I understand you correctly. I think there is definitely a need for a pow implementation with floating point exponents. But I agree that this only makes sense for positive bases. The question for me is, do we want to implement such a case discrimination (potentially taking the absolute value of the base which may be unexpected by the user)? Or is returning nan the most effective way to tell the user that there is something "wrong" in his code because he did not chose the implementation that fits his setup.

steindev on 26 Oct 2020

My point was, indeed, that with a negative base only integer exponents make sense. And maybe it's okay to require using only this version (and I guess the radiation plugin already has it). And with a non-negative base of course a floating-point base as well, in this case as I understand the CUDA implementation is good both with and without fast math.

sbastrakov on 26 Oct 2020

@steindev If I see NaN in my output I would guess that something went awry, but it is still a formally correct FP value, so using it to tell the user/developer something is ambiguous, IMHO.

I think a better way would be, if possible, to throw an error rather than let the user work as a detective to track down the source of the nan.

Anton-Le on 27 Oct 2020

I believe the issue is it is unclear why NaNs were produced in that case, as the operation is mathematically sound. I guess this falls within what we allow with the fast math flag, but this is very unintuitive.

sbastrakov on 27 Oct 2020

We should check if we can enforce the non fastmath pow function in alpaka to avoid this unexpected behavior!

Better losing performance than have this unexpected results.

psychocoderHPC on 27 Oct 2020

In this document the behavior with NAN is defined.

https://www.google.com/url?sa=t&source=web&rct=j&url=https://www.clear.rice.edu/comp422/resources/cuda/pdf/CUDA_Math_API.pdf&ved=2ahUKEwjK1KD-mdXsAhWiyIUKHSfsBpMQFjAMegQICBAB&usg=AOvVaw2DHuW-t5GE_ecY-hoEHq1V&cshid=1603816653382

psychocoderHPC on 27 Oct 2020

https://en.cppreference.com/w/cpp/numeric/math/pow

Is also defining that negative base results in NAN so it looks like we need only update the documentation.
Maybe we missed to check the documentation 😅

psychocoderHPC on 27 Oct 2020

Maybe we could add an ASSERT that is only checked when compiling in Debug mode?

steindev on 27 Oct 2020

Maybe we could add an ASSERT that is only checked when compiling in Debug mode?

Yes we can the normal assert is doing it. This assert should be added to the alpaka pow implementation.

psychocoderHPC on 27 Oct 2020

I do not immediately see that statement in your linked cppreference @psychocoderHPC . Could you post a quote?

sbastrakov on 28 Oct 2020

Ah, I guess you meant

pow(base, exp) returns NaN and raises FE_INVALID if base is finite and negative and exp is finite and non-integer.

Which I interpreted as integer value, but they probably mean integer type there.

sbastrakov on 28 Oct 2020

Ah, I guess you meant

pow(base, exp) returns NaN and raises FE_INVALID if base is finite and negative and exp is finite and non-integer.

Which I interpreted as integer value, but they probably mean integer type there.

Yes this is what I mean if base is finite and negative and exp is finite and non-integer what means if exponent is a floating point.

psychocoderHPC on 28 Oct 2020

👍1

Okay, so it's all good in PIConGPU then? Just for the case of negative floating-point number in an integer degree one does not use the math::pow / math::powf but rather a pow from radiation that accepts integers.

sbastrakov on 28 Oct 2020

Yes IMO in PIConGPu all is fine. We should add maybe the assert to alpaka and introduce a flotingpoint,integer version in alpaka and add better description to the math trait for pow.

Am 28. Oktober 2020 11:37:31 MEZ schrieb Sergei Bastrakov notifications@github.com:

Okay, so it's all good in PIConGPU then? Just for the case of negative
floating-point number in an integer degree one does not use the
math::pow / math::powf but rather a pow from radiation that
accepts integers.

--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/ComputationalRadiationPhysics/picongpu/issues/3400#issuecomment-717845860

psychocoderHPC on 28 Oct 2020

I am not sure we should even do that. Alpaka currently does not check for domain errors (e.g. sqrt or log of a negative number). And I think it's probably not alpaka's job as then we have to define and support it as well.

sbastrakov on 30 Oct 2020

We can maybe add an version of pow that expresses floating_point^integer to alpaka, but this version does not exist in neither CUDA nor the standard library (accordig to cppreference, these version are only pre C++11).

sbastrakov on 30 Oct 2020

The comments were added to alpaka by the PR linked above. Maybe we should do the same for cupla as well. Closing this PIConGPU issue, since we decided to not tackle this on PIConGPU level.

sbastrakov on 3 Nov 2020

Was this page helpful?

0 / 5 - 0 ratings