Picongpu: PIConPhi

Created on 13 Sep 2017  路  4Comments  路  Source: ComputationalRadiationPhysics/picongpu

Does EPOCH support Intel's many integrated core architecture, the Xeon Phi?
Would I have to make any changes to the source or makefiles in order for the code to take advantage of this architecture?

documentation question

Most helpful comment

Thanks for asking!

Yes, we did!

These are the papers investigating our underlying library alpaka on various architectures in order to proof zero-overhead abstraction with C++ meta programming (aka performance portability):

and on PIConGPU porting with Alpaka:

All 4 comments

Thank you for your question!

Does EPOCH support [...] Xeon Phi?

We are PIConGPU ;-) , but yes we do support KNL already in dev and more convenient in our next stable release (in "native"/non-offloading mode).

@theZiz is currently fine tuning a profile for the Taurus (TU Dresden) cluster in #2210 and we will write up some example setups when we have good tuning and installs figured out.

Running on CPU or GPU will be extremely easy to control for users and can be steered with a simple switch during pic-build (pic-configure). You need to change nothing else to your source code or build scripts. We will give instructions/templates on how to configure the KNL hardware and how many MPI ranks to run per card for optimal performance.

That sounds great! Did you benchmark PIConGPU's performance on the various architectures (GPGPU, KNL, CPU)? Do you expect GPGPU to be the fastest due to way the code is written?

Thanks for asking!

Yes, we did!

These are the papers investigating our underlying library alpaka on various architectures in order to proof zero-overhead abstraction with C++ meta programming (aka performance portability):

and on PIConGPU porting with Alpaka:

As you see in the papers above, we already investigated Alpaka+GPU/Power/CPU/KNL/... and PIConGPU+Alpaka+GPU/Power/CPU/... the benchmarks for the latest Xeon Phi (KNL) are currently being tuned for the next release in #2210

Do you expect GPGPU to be the fastest due to way the code is written?

As we outline in more detail in the papers, the so called floating-point efficiency, which is the performance you get relative to what your hardware implements, is similar across most platforms we benchmarked on. That is great! Still, GPUs tend to be still a bit more efficient but we have to investigate how this plays out the more we tune (we have plenty of options with alpaka). The reasons for that are manifold and can be elaborated a bit more, e.g. regarding memory hierarchies (+ bandwidths and latencies) and the relatively low arithmetic intensity of the basic PIC algorithm. An interesting aspect is also power consumption per Flop, which is generally better on RISC-like architectures.

For more details, these blog posts of Karl Rupp + [2] try to organize the current horse race in HPC and are an interesting read.

Consequently, this leads to the most PIConGPU Flops/invested dollar for manycore-hardware such as GPUs (and also to the fastest time-to-solution).

Was this page helpful?
0 / 5 - 0 ratings

Related issues

psychocoderHPC picture psychocoderHPC  路  4Comments

berceanu picture berceanu  路  4Comments

bussmann picture bussmann  路  4Comments

ax3l picture ax3l  路  4Comments

ax3l picture ax3l  路  4Comments