Picongpu: not compiling with CUDA 11.3

Created on 17 Apr 2021 · 13Comments · Source: ComputationalRadiationPhysics/picongpu

Hello,

The picongpu@develop does not compile with [email protected]. The compilation of the LWFA example ended up with this error:

/home/quasar/src/spack/opt/spack/linux-ubuntu20.04-skylake/gcc-9.3.0/picongpu-develop-bg7garcln6uuhro2352lmgcgefsn3htr/thirdParty/cupla/alpaka/include/alpaka/event/EventGenericThreads.hpp:280:19: error: ‘__T30’ was not declared in this scope
  280 |                 auto vQueues(dev.getAllQueues());

There are some incompatibilities when using the newest gcc 9 compilers and the error message is also mentioned here: https://gcc.gnu.org/gcc-9/changes.html

P.S. In fact I discovered later that the same error appears when compiling picongpu (master version) with [email protected]

log_out.txt

Regards,
Cristian

cuda bug

Source

cbontoiu

Most helpful comment

Yes, here. Feature suggestions and external contributions are welcome. Ideally please create an new issue per suggestion.

sbastrakov on 19 Apr 2021

👍2

All 13 comments

Hello @cbontoiu thanks for your report. We've just got a similar report for alpaka (which is used inside PIConGPU) with CUDA 11.3. So it may be that CUDA 11.3 is the issue (you are also using it according to the log), not gcc 9.3. Note that alpaka only officially supports up to CUDA 11.2 so far. Could you try with an earlier CUDA version?

sbastrakov on 19 Apr 2021

👍1

@cbontoiu If you have installed the CUDA 11.3 driver on your system you can simply compile PIConGPU with CUDA 11.2 and run it on a system with CUDA 11.3 driver.

psychocoderHPC on 19 Apr 2021

👍2

@psychocoderHPC This option is interesting, but I don't know how should I apply it. I thought compilation goes with the CUDA version used at the installation of PIConGPU (in my case with Spack but using CUDA and CUDA aware openMPI from the system).

In the meantime I had a fresh install of PIConGPU, dev version, using CUDA 11.2 and openMPI 4.1.1 (from the system) but the latter one gives some errors at the compilations shown in the text file attached. You may want to investigate this incompatibility as well.

log_out.txt

Thank you.

cbontoiu on 19 Apr 2021

Hello @cbontoiu . The issue is that your build uses two MPI libraries and that causes a conflict. One is your system openMPI. And another one is MPI at anaconda, that is pulled by ADIOS in anaconda. I think it is reasonable to approach it gradually. First disable ADIOS and try without it.

sbastrakov on 19 Apr 2021

After that works, you could either rebuild that ADIOS thing or tell it to use your system openMPI and thus avoid the conflict

sbastrakov on 19 Apr 2021

@sbastrakov Thank you. Indeed there was a clash between conda and spack and I managed to compile disabling the lines written by anaconda in my bashrc file. I don't know another way. I also managed to run the LWFA model and I am surprised how slow it was. This model used to complete in 1 min and 10 secs on this machine before the openpmd plugin was included. But now out of the box the running with the 1.cfg file took 3 mins and 25 seconds.

cbontoiu on 19 Apr 2021

PIConGPU: 0.6.0-dev
Build-Type: Release

Third party:
OS: Linux-5.8.0-50-generic
arch: x86_64
CXX: GNU (7.5.0)
CMake: 3.20.1
CUDA: 11.2.67
mallocMC: 2.6.0
Boost: 1.70.0
MPI:
standard: 3.1
flavor: OpenMPI (4.1.0)
PNGwriter: 0.7.0
libSplash: 1.7.0 (Format 4.0)
ADIOS: NOTFOUND
openPMD: 0.13.3
PIConGPUVerbose PHYSICS(1) | Sliding Window is OFF
PIConGPUVerbose PHYSICS(1) | used Random Number Generator: RNGProvider3XorMin seed: 42
PIConGPUVerbose PHYSICS(1) | Courant c*dt <= 1.00229 ? 1
PIConGPUVerbose PHYSICS(1) | Resolving plasma oscillations?
Estimates are based on DensityRatio to BASE_DENSITY of each species
(see: density.param, speciesDefinition.param).
It and does not cover other forms of initialization
PIConGPUVerbose PHYSICS(1) | species e: omega_p * dt <= 0.1 ? 0.0247974
PIConGPUVerbose PHYSICS(1) | y-cells per wavelength: 18.0587
PIConGPUVerbose PHYSICS(1) | macro particles per device: 4718592
PIConGPUVerbose PHYSICS(1) | typical macro particle weighting: 6955.06
PIConGPUVerbose PHYSICS(1) | UNIT_SPEED 2.99792e+08
PIConGPUVerbose PHYSICS(1) | UNIT_TIME 1.39e-16
PIConGPUVerbose PHYSICS(1) | UNIT_LENGTH 4.16712e-08
PIConGPUVerbose PHYSICS(1) | UNIT_MASS 6.33563e-27
PIConGPUVerbose PHYSICS(1) | UNIT_CHARGE 1.11432e-15
PIConGPUVerbose PHYSICS(1) | UNIT_EFIELD 1.22627e+13
PIConGPUVerbose PHYSICS(1) | UNIT_BFIELD 40903.8
PIConGPUVerbose PHYSICS(1) | UNIT_ENERGY 5.69418e-10
initialization time: 10sec 833msec = 10 sec
0 % = 0 | time elapsed: 9sec 150msec | avg time per step: 0msec
4 % = 102 | time elapsed: 18sec 46msec | avg time per step: 15msec
9 % = 204 | time elapsed: 26sec 977msec | avg time per step: 15msec
14 % = 306 | time elapsed: 35sec 980msec | avg time per step: 15msec
19 % = 408 | time elapsed: 45sec 6msec | avg time per step: 16msec
24 % = 510 | time elapsed: 54sec 121msec | avg time per step: 16msec
29 % = 612 | time elapsed: 1min 3sec 237msec | avg time per step: 17msec
34 % = 714 | time elapsed: 1min 12sec 362msec | avg time per step: 17msec
39 % = 816 | time elapsed: 1min 21sec 528msec | avg time per step: 17msec
44 % = 918 | time elapsed: 1min 30sec 715msec | avg time per step: 18msec
49 % = 1020 | time elapsed: 1min 40sec 102msec | avg time per step: 19msec
54 % = 1122 | time elapsed: 1min 49sec 440msec | avg time per step: 19msec
59 % = 1224 | time elapsed: 1min 58sec 815msec | avg time per step: 19msec
64 % = 1326 | time elapsed: 2min 8sec 243msec | avg time per step: 20msec
69 % = 1428 | time elapsed: 2min 17sec 718msec | avg time per step: 20msec
74 % = 1530 | time elapsed: 2min 27sec 140msec | avg time per step: 21msec
79 % = 1632 | time elapsed: 2min 36sec 562msec | avg time per step: 21msec
84 % = 1734 | time elapsed: 2min 45sec 877msec | avg time per step: 20msec
89 % = 1836 | time elapsed: 2min 55sec 71msec | avg time per step: 19msec
94 % = 1938 | time elapsed: 3min 4sec 289msec | avg time per step: 19msec
99 % = 2040 | time elapsed: 3min 13sec 423msec | avg time per step: 20msec
calculation simulation time: 3min 13sec 582msec = 193 sec
full simulation time: 3min 25sec 8msec = 205 sec

cbontoiu on 19 Apr 2021

source $HOME/src/spack/share/spack/setup-env.sh && spack load picongpu && spack load openpmd-api && export PIC_BACKEND="cuda:75" && export OMPI_MCA_io=^ompio
cd /home/quasar/PIC_INPUT/PICONGPU/TESTS/myLWFA
rm -r .build/ && pic-build &> log_out.txt
tbg -s bash -c etc/picongpu/1.cfg -t etc/picongpu/bash/mpiexec.tpl /media/quasar/RawDataDisk/PICONGPU/TESTS/myLaserWakefield

cbontoiu on 19 Apr 2021

If you want to see how the output affects run time, you could disable it in your .cfg file. Or change the period of output. Look for the TBG_openPMD variable there

sbastrakov on 19 Apr 2021

👍1

Indeed, 1 min and 25 secs can be spared when running without the creation of check points.

cbontoiu on 19 Apr 2021

Do you keep a change log file from version 0.5.0 such that we could have an idea of where the developing is pointing and maybe suggest features?

cbontoiu on 19 Apr 2021

Yes, here. Feature suggestions and external contributions are welcome. Ideally please create an new issue per suggestion.

sbastrakov on 19 Apr 2021

👍2

We work on a fix for CUDA 11.3 https://github.com/alpaka-group/alpaka/pull/1295
This will take some time, after we have CUDA 11.3 support in alpaka and cupla we will support CUDA 11.3 in PIConGPU too.

psychocoderHPC on 21 Apr 2021

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Memory consumption

cbontoiu · 3Comments

Field-Ionization: Damp E-Field

ax3l · 4Comments

Pusher: Structure-preserving, E-Field Compensating

ax3l · 4Comments

travis-ci suggestions only valid for GNU sed

PrometheusPi · 3Comments

Docs suggestion: extend Basics with info on stdout and output directory?

sbastrakov · 3Comments