DLR reported via mail that they see a performance decrease when the OpenMP backend (block parellel) is used and the load per block is very different.
Following this discussion: https://stackoverflow.com/questions/42970700/openmp-dynamic-vs-guided-scheduling
we should think about changeing the OpenMP scheduling strategy from current used guided to schedule(auto) or schedule(runtime) and select a schedular based on the grid size.
I was also concerned about the guided schedule in the past. And we actually checked it a few month ago using @kloppstock 's code, which did not show the difference. However, of course it does not mean the difference doesn't exist for any code.
Can we maybe think of adding a cmake option that controls it, and by default use something reasonable?
What do you guys think of some scheduling hints in the alpaka API? I'm thinking of something like
class Kernel
{
public:
using SchedulerHint = alpaka::ImbalancedHint;
};
And if the kernel class doesn't have a hint assume the current way by default.
Oh yes, that can also be an option. And I actually like it, if we do it as a trait. Because then by default it can use our default option, and a user (kernel developer) does not have to, but can specialize it.
(we can of course also check if a kenel class has internally defined type with a fixed name like SchedulerHint and wrap it into a trait, however i feel just plain trait would be more alpaka way)
solved by #1223
Most helpful comment
What do you guys think of some scheduling hints in the alpaka API? I'm thinking of something like
And if the kernel class doesn't have a hint assume the current way by default.