Thank you for your question!
Does EPOCH support [...] Xeon Phi?
We are PIConGPU ;-) , but yes we do support KNL already in dev and more convenient in our next stable release (in "native"/non-offloading mode).
@theZiz is currently fine tuning a profile for the Taurus (TU Dresden) cluster in #2210 and we will write up some example setups when we have good tuning and installs figured out.
Running on CPU or GPU will be extremely easy to control for users and can be steered with a simple switch during pic-build (pic-configure). You need to change nothing else to your source code or build scripts. We will give instructions/templates on how to configure the KNL hardware and how many MPI ranks to run per card for optimal performance.
That sounds great! Did you benchmark PIConGPU's performance on the various architectures (GPGPU, KNL, CPU)? Do you expect GPGPU to be the fastest due to way the code is written?
Thanks for asking!
Yes, we did!
These are the papers investigating our underlying library alpaka on various architectures in order to proof zero-overhead abstraction with C++ meta programming (aka performance portability):
DOI:10.1109/IPDPSW.2016.50 (http://arxiv.org/abs/1602.08477), paper in AsHES2016
DOI:10.5281/zenodo.49768 (thesis: diploma)
Alexander Matthes, Ren茅 Widera, Erik Zenker, Benjamin Worpitz, Axel Huebl and Michael Bussmann
"Tuning and optimization for a variety of many-core architectures without changing a single line of implementation code using the Alpaka library"
paper in ISC17 (P3MA), DOI:10.1007/978-3-319-67630-2_36, preprint: https://arxiv.org/abs/1706.10086
and on PIConGPU porting with Alpaka:
DOI:10.1007/978-3-319-46079-6_21 (https://arxiv.org/abs/1606.02862), paper in ISC16 (IWOPH), see cupla
E. Zenker, R. Widera, G. Juckeland et al., Porting the Plasma Simulation PIConGPU to Heterogeneous Architectures with Alpaka (GTC16 talk)
https://mygtc.gputechconf.com/events/32/schedules/2792
Video: http://on-demand.gputechconf.com/gtc/2016/video/S6298.html
As you see in the papers above, we already investigated Alpaka+GPU/Power/CPU/KNL/... and PIConGPU+Alpaka+GPU/Power/CPU/... the benchmarks for the latest Xeon Phi (KNL) are currently being tuned for the next release in #2210
Do you expect GPGPU to be the fastest due to way the code is written?
As we outline in more detail in the papers, the so called floating-point efficiency, which is the performance you get relative to what your hardware implements, is similar across most platforms we benchmarked on. That is great! Still, GPUs tend to be still a bit more efficient but we have to investigate how this plays out the more we tune (we have plenty of options with alpaka). The reasons for that are manifold and can be elaborated a bit more, e.g. regarding memory hierarchies (+ bandwidths and latencies) and the relatively low arithmetic intensity of the basic PIC algorithm. An interesting aspect is also power consumption per Flop, which is generally better on RISC-like architectures.
For more details, these blog posts of Karl Rupp + [2] try to organize the current horse race in HPC and are an interesting read.
Consequently, this leads to the most PIConGPU Flops/invested dollar for manycore-hardware such as GPUs (and also to the fastest time-to-solution).
Most helpful comment
Thanks for asking!
Yes, we did!
These are the papers investigating our underlying library alpaka on various architectures in order to proof zero-overhead abstraction with C++ meta programming (aka performance portability):
DOI:10.1109/IPDPSW.2016.50 (http://arxiv.org/abs/1602.08477), paper in AsHES2016
DOI:10.5281/zenodo.49768 (thesis: diploma)
Alexander Matthes, Ren茅 Widera, Erik Zenker, Benjamin Worpitz, Axel Huebl and Michael Bussmann
"Tuning and optimization for a variety of many-core architectures without changing a single line of implementation code using the Alpaka library"
paper in ISC17 (P3MA), DOI:10.1007/978-3-319-67630-2_36, preprint: https://arxiv.org/abs/1706.10086
and on PIConGPU porting with Alpaka:
DOI:10.1007/978-3-319-46079-6_21 (https://arxiv.org/abs/1606.02862), paper in ISC16 (IWOPH), see cupla
E. Zenker, R. Widera, G. Juckeland et al., Porting the Plasma Simulation PIConGPU to Heterogeneous Architectures with Alpaka (GTC16 talk)
https://mygtc.gputechconf.com/events/32/schedules/2792
Video: http://on-demand.gputechconf.com/gtc/2016/video/S6298.html