Hello,
The picongpu@develop does not compile with [email protected]. The compilation of the LWFA example ended up with this error:
/home/quasar/src/spack/opt/spack/linux-ubuntu20.04-skylake/gcc-9.3.0/picongpu-develop-bg7garcln6uuhro2352lmgcgefsn3htr/thirdParty/cupla/alpaka/include/alpaka/event/EventGenericThreads.hpp:280:19: error: ‘__T30’ was not declared in this scope
280 | auto vQueues(dev.getAllQueues());
There are some incompatibilities when using the newest gcc 9 compilers and the error message is also mentioned here: https://gcc.gnu.org/gcc-9/changes.html
P.S. In fact I discovered later that the same error appears when compiling picongpu (master version) with [email protected]
Regards,
Cristian
Hello @cbontoiu thanks for your report. We've just got a similar report for alpaka (which is used inside PIConGPU) with CUDA 11.3. So it may be that CUDA 11.3 is the issue (you are also using it according to the log), not gcc 9.3. Note that alpaka only officially supports up to CUDA 11.2 so far. Could you try with an earlier CUDA version?
@cbontoiu If you have installed the CUDA 11.3 driver on your system you can simply compile PIConGPU with CUDA 11.2 and run it on a system with CUDA 11.3 driver.
@psychocoderHPC This option is interesting, but I don't know how should I apply it. I thought compilation goes with the CUDA version used at the installation of PIConGPU (in my case with Spack but using CUDA and CUDA aware openMPI from the system).
In the meantime I had a fresh install of PIConGPU, dev version, using CUDA 11.2 and openMPI 4.1.1 (from the system) but the latter one gives some errors at the compilations shown in the text file attached. You may want to investigate this incompatibility as well.
Thank you.
Hello @cbontoiu . The issue is that your build uses two MPI libraries and that causes a conflict. One is your system openMPI. And another one is MPI at anaconda, that is pulled by ADIOS in anaconda. I think it is reasonable to approach it gradually. First disable ADIOS and try without it.
After that works, you could either rebuild that ADIOS thing or tell it to use your system openMPI and thus avoid the conflict
@sbastrakov Thank you. Indeed there was a clash between conda and spack and I managed to compile disabling the lines written by anaconda in my bashrc file. I don't know another way. I also managed to run the LWFA model and I am surprised how slow it was. This model used to complete in 1 min and 10 secs on this machine before the openpmd plugin was included. But now out of the box the running with the 1.cfg file took 3 mins and 25 seconds.
PIConGPU: 0.6.0-dev
Build-Type: Release
Third party:
OS: Linux-5.8.0-50-generic
arch: x86_64
CXX: GNU (7.5.0)
CMake: 3.20.1
CUDA: 11.2.67
mallocMC: 2.6.0
Boost: 1.70.0
MPI:
standard: 3.1
flavor: OpenMPI (4.1.0)
PNGwriter: 0.7.0
libSplash: 1.7.0 (Format 4.0)
ADIOS: NOTFOUND
openPMD: 0.13.3
PIConGPUVerbose PHYSICS(1) | Sliding Window is OFF
PIConGPUVerbose PHYSICS(1) | used Random Number Generator: RNGProvider3XorMin seed: 42
PIConGPUVerbose PHYSICS(1) | Courant c*dt <= 1.00229 ? 1
PIConGPUVerbose PHYSICS(1) | Resolving plasma oscillations?
Estimates are based on DensityRatio to BASE_DENSITY of each species
(see: density.param, speciesDefinition.param).
It and does not cover other forms of initialization
PIConGPUVerbose PHYSICS(1) | species e: omega_p * dt <= 0.1 ? 0.0247974
PIConGPUVerbose PHYSICS(1) | y-cells per wavelength: 18.0587
PIConGPUVerbose PHYSICS(1) | macro particles per device: 4718592
PIConGPUVerbose PHYSICS(1) | typical macro particle weighting: 6955.06
PIConGPUVerbose PHYSICS(1) | UNIT_SPEED 2.99792e+08
PIConGPUVerbose PHYSICS(1) | UNIT_TIME 1.39e-16
PIConGPUVerbose PHYSICS(1) | UNIT_LENGTH 4.16712e-08
PIConGPUVerbose PHYSICS(1) | UNIT_MASS 6.33563e-27
PIConGPUVerbose PHYSICS(1) | UNIT_CHARGE 1.11432e-15
PIConGPUVerbose PHYSICS(1) | UNIT_EFIELD 1.22627e+13
PIConGPUVerbose PHYSICS(1) | UNIT_BFIELD 40903.8
PIConGPUVerbose PHYSICS(1) | UNIT_ENERGY 5.69418e-10
initialization time: 10sec 833msec = 10 sec
0 % = 0 | time elapsed: 9sec 150msec | avg time per step: 0msec
4 % = 102 | time elapsed: 18sec 46msec | avg time per step: 15msec
9 % = 204 | time elapsed: 26sec 977msec | avg time per step: 15msec
14 % = 306 | time elapsed: 35sec 980msec | avg time per step: 15msec
19 % = 408 | time elapsed: 45sec 6msec | avg time per step: 16msec
24 % = 510 | time elapsed: 54sec 121msec | avg time per step: 16msec
29 % = 612 | time elapsed: 1min 3sec 237msec | avg time per step: 17msec
34 % = 714 | time elapsed: 1min 12sec 362msec | avg time per step: 17msec
39 % = 816 | time elapsed: 1min 21sec 528msec | avg time per step: 17msec
44 % = 918 | time elapsed: 1min 30sec 715msec | avg time per step: 18msec
49 % = 1020 | time elapsed: 1min 40sec 102msec | avg time per step: 19msec
54 % = 1122 | time elapsed: 1min 49sec 440msec | avg time per step: 19msec
59 % = 1224 | time elapsed: 1min 58sec 815msec | avg time per step: 19msec
64 % = 1326 | time elapsed: 2min 8sec 243msec | avg time per step: 20msec
69 % = 1428 | time elapsed: 2min 17sec 718msec | avg time per step: 20msec
74 % = 1530 | time elapsed: 2min 27sec 140msec | avg time per step: 21msec
79 % = 1632 | time elapsed: 2min 36sec 562msec | avg time per step: 21msec
84 % = 1734 | time elapsed: 2min 45sec 877msec | avg time per step: 20msec
89 % = 1836 | time elapsed: 2min 55sec 71msec | avg time per step: 19msec
94 % = 1938 | time elapsed: 3min 4sec 289msec | avg time per step: 19msec
99 % = 2040 | time elapsed: 3min 13sec 423msec | avg time per step: 20msec
calculation simulation time: 3min 13sec 582msec = 193 sec
full simulation time: 3min 25sec 8msec = 205 sec
source $HOME/src/spack/share/spack/setup-env.sh && spack load picongpu && spack load openpmd-api && export PIC_BACKEND="cuda:75" && export OMPI_MCA_io=^ompio
cd /home/quasar/PIC_INPUT/PICONGPU/TESTS/myLWFA
rm -r .build/ && pic-build &> log_out.txt
tbg -s bash -c etc/picongpu/1.cfg -t etc/picongpu/bash/mpiexec.tpl /media/quasar/RawDataDisk/PICONGPU/TESTS/myLaserWakefield
If you want to see how the output affects run time, you could disable it in your .cfg file. Or change the period of output. Look for the TBG_openPMD variable there
Indeed, 1 min and 25 secs can be spared when running without the creation of check points.
Do you keep a change log file from version 0.5.0 such that we could have an idea of where the developing is pointing and maybe suggest features?
Yes, here. Feature suggestions and external contributions are welcome. Ideally please create an new issue per suggestion.
We work on a fix for CUDA 11.3 https://github.com/alpaka-group/alpaka/pull/1295
This will take some time, after we have CUDA 11.3 support in alpaka and cupla we will support CUDA 11.3 in PIConGPU too.
Most helpful comment
Yes, here. Feature suggestions and external contributions are welcome. Ideally please create an new issue per suggestion.