I am trying to check my understanding of the LWFA model.
the speciesDefinition.param contains define PARAM_IONS 0 which means that ions (protons here) are not included in the using VectorAllSpecies = MakeSeq_t< ... >;. Does this mean that there is no data stored for the ions but still they matter in the simulation? I guess so. On the other hand including them by setting define PARAM_IONS 1 extends the initialization time from 4 minutes to 40 mins in my case. This happens although the cfg file does not ask for data to be written out i.e. TBG_plugins="!TBG_pngYX". Is this normal behaviour? Is this efficient?
Related to data storing:
what happens with the png figures written out? What would namespace preParticleDensCol = colorScales::red; mean in each case?
The speciesDefinition.param contains the block
, boundElectrons
which signals a simulation in which the Hydrogen atoms enter in neutral state and can be ionized during the interaction with the laser. I guess for Carbon atoms this is the way to go. Still I cannot see where PARAM_IONIZATION is defined in the original model.
Could this #if () ... # endif blocks be used in the starter.param file? Otherwise, what is this file used for?
Thank you.
Hello @cbontoiu ,
So technically one can set values for such defines from both inside .param files and externally, via command-line options during compilation. To order this a little bit, our examples use the following naming scheme.
First, there are definitions starting with PARAM_, they are supposed to be provided externally. We normally do not require them to be provided and so check if the variable is defined and set some default otherwise. For example, here we check if "input" variable PARAM_IONS is provided and if not set it to 0. To set such variables, the easiest way is to modify the cmakeFlags file at the root level of each input directory, e.g. this file for the standard LWFA example (of course, your copy of it after doing pic-create). By default, only flags[0] is used for building. So e.g. to enable ions and ionization to that simulation, you can change it to
flags[0]="-DPARAM_OVERWRITES:LIST='-DPARAM_IONS=1;-DPARAM_IONIZATION=1'"" (-D is for the command line of compiler).
In the .param files we use both those PARAM_... variables, and sometimes derive other macro definitions from it. In this case, the derived names are also in all capital, but not supposed to be directly provided from a user, like the PARAM_ are.
Regarding much larger initialization time with ions, I am not sure what's the reason.
@n01r could you comment, maybe initialization with boundElectrons already involves some ionization calculations?
You can use #if () ... # endif in all .param files since they are just C++ files. However, I would imagine one needs a really good reason to tinker with starter.param, and most things are much easier accomplished without doing it.
@n01r could you comment, maybe initialization with
boundElectronsalready involves some ionization calculations?
Sure, no further ionization calculations are done on initialization.
The way the LWFA example was configured with its cmakeFlags is that in most cases, neither are ions created nor is ionization active. Only the flags[9] case activates both of these.
For most LWFA simulations, you do not need the ion motion or ionization. Still, if you care about a realistic charge in the accelerated electron bunch, you have to consider it. But @PrometheusPi knows more about this.
Our cmakeFlags in general, give you the ability to do parameter scans quickly and implement different configurations that you can switch on or off.
I would expect a higher initialization time for a case with ions but not an increase by a factor of 10.
@cbontoiu Your assumption is right. By default, the initialization only creates electrons. Since charge neutrally is initially assumed, we thus indirectly assume a compensation of the electron charge by an immobile ion background. This is a surprisingly good assumption for most LWFA cases. However, as you pointed out correctly, you could also initialize the ions. This will require more memory and some more copying. However, the ions are derived from the electrons and thus, this should be quicker than the electron initialization beforehand. Thus both ions and electrons should be initialized in less than 2x4=8 minutes in your case. Therefore, I find the 10-fold increase in initialization time a bit alarming. Even with IO included (which you said, is not running), the increase should be linear and thus "only" double the time. The only plugin that causes a severe compile time increase with more species is ISAAC. Did you compile the LWFA example together with ISAAC?
@PrometheusPi and all, thnaks for your quick reply. I didn't compile with ISAAC. In fact I haven't setup the server for ISAAC, though ISACC and its dependencies were installed via Spack together with picongpu@develop. I felt something weirg goes on when my usual model took a lot of time to initialize. Then I tought I am doing something wrong so I stated from scratch with the LWFA model. I ran first with define PARAM_IONS 0, all good, quick initialiation. Then I only changed to define PARAM_IONS 1 and it is much slower (factor of 10). You can easily check I guess, but i can also upload my model if necessary.
Here are the two outputs:
CONGPU/TESTS/myLaserWakefield_run_04
WARNING: 4 input file(s) in include/
have been modified since the last compile!
Did you forget to recompile?
Run 'pic-build -f' to recompile with the modified files.
List of modified files:
/home/cristian/PIC_INPUT/PICONGPU/TESTS/myLaserWakefield/include/picongpu/param/speciesInitialization.param
/home/cristian/PIC_INPUT/PICONGPU/TESTS/myLaserWakefield/include/picongpu/param/species.param
/home/cristian/PIC_INPUT/PICONGPU/TESTS/myLaserWakefield/include/picongpu/param/speciesDefinition.param
/home/cristian/PIC_INPUT/PICONGPU/TESTS/myLaserWakefield/include/picongpu/param/starter.param
Running program...
using default compiler
==> Error: Spec 'picongpu@develop%[email protected]+adios+hdf5+isaac+png backend=cuda cudacxx=nvcc arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+blosc~bzip2~fortran~hdf5~infiniband+lz4+mpi~netcdf+shared+sz~szip+zfp+zlib patches=01113e9efb929d71c28bf33cc8b7f215d85195ec700e99cb41164e2f8f830640,8ae17f655248e87cbab1d1ed794e15364a38d2f5f8d971b1086702f72d79bd42,d24b79b795f66e40ddcd331ea4be896ac9c393d6f68f4318616d23928b0694e9 staging=none arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+atomic+chrono~clanglibcpp~container~context~coroutine+date_time~debug+exception~fiber+filesystem+graph~icu+iostreams+locale+log+math~mpi+multithreaded~numpy~pic+program_options~python+random+regex+serialization+shared+signals~singlethreaded+system~taggedlayout+test+thread+timer~versionedlayout+wave cxxstd=11 visibility=hidden arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+shared arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+avx2~ipo build_type=RelWithDebInfo patches=cd40604a26157a0e018ea496cf3267e116e6ec5ff80a7d1cef11b841c154c388 arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~doc+ncurses+openssl+ownlibs~qt patches=bf695e3febb222da2ed94b3beea600650e4318975da90e4a71d6f31a6d5d8c3d arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+libbsd arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+bzip2+curses+git~libunistring+libxml2+tar+xz arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~cxx~debug~fortran~hl~java+mpi+pic+shared~szip~threadsafe api=none arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~cairo~cuda~gl~libudev+libxml2~netloc~nvml+pci+shared arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~ipo build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^isaac@develop%[email protected]+cuda~ipo build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^isaac-server@develop%[email protected]~ipo build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~ipo+shared build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] patches=26f26c6f29a7ce9bf370ad3ab2610f99365b4bdd7b82e7c31df41a3370d685c0 arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~ipo+mpi build_type=RelWithDebInfo patches=669608721dfce0ada7cef1ac84344352791a8916b7bb98ca8a0d4e6d4670e744 arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~ipo build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~python arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+sigsegv patches=3877ab548f88597ab2327a2230ee048d2d07ace1062efe81fc92e91b7f39cd00,fc9b61654a3ba1a8d6cd78ce087e7c96366c290bc8d2c299f09828d793b853c8 arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~symlinks+termlib arch=linux-linuxmint19-westmere ^[email protected]%[email protected] patches=4e1d78cbbb85de625bad28705e748856033eaafab92a66dffd383a3d7e00cc94 arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~atomics~cuda~cxx~cxx_exceptions+gpfs~java~legacylaunchers~lustre~memchecker~pmi~singularity~sqlite3+static~thread_multiple+vt+wrapper-rpath fabrics=none schedulers=none arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+systemcerts arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+cpanm+shared+threads arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~ipo build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+bz2+ctypes+dbm~debug+libxml2+lzma~nis~optimizations+pic+pyexpat+pythoncmd+readline+shared+sqlite3+ssl~tix~tkinter~ucs4+uuid+zlib patches=0d98e93189bc278fbc37a50ed7f183bd8aaf249a8e1670a465f0db6bb4f8cf87 arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~ipo+pic+shared build_type=RelWithDebInfo patches=c9cfecb1f7a623418590cf4e00ae7d308d1c3faeb15046c2e5090e38221da7cd arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+column_metadata+fts~functions~rtree arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~fortran~hdf5~ipo~netcdf~pastri~python~random_access+shared~stats~time_compression build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~pic arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~aligned~fasthash~ipo~profile+shared~strided~twoway bsws=64 build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+optimize+pic+shared arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+pic arch=linux-linuxmint19-westmere' matches no installed packages.
PIConGPU: 0.5.0-dev
Build-Type: Release
Third party:
OS: Linux-5.0.0-32-generic
arch: x86_64
CXX: GNU (7.5.0)
CMake: 3.18.4
CUDA: 10.2.89
mallocMC: 2.5.0
Boost: 1.70.0
MPI:
standard: 3.1
flavor: OpenMPI (3.1.6)
PNGwriter: 0.7.0
libSplash: 1.7.0 (Format 4.0)
ADIOS: 1.13.1
openPMD: NOTFOUND
PIConGPUVerbose PHYSICS(1) | Sliding Window is OFF
PIConGPUVerbose PHYSICS(1) | used Random Number Generator: RNGProvider3XorMin seed: 42
PIConGPUVerbose PHYSICS(1) | Courant c*dt <= 1.00229 ? 1
PIConGPUVerbose PHYSICS(1) | Resolving plasma oscillations?
Estimates are based on DensityRatio to BASE_DENSITY of each species
(see: density.param, speciesDefinition.param).
It and does not cover other forms of initialization
PIConGPUVerbose PHYSICS(1) | species e: omega_p * dt <= 0.1 ? 0.0247974
PIConGPUVerbose PHYSICS(1) | y-cells per wavelength: 18.0587
PIConGPUVerbose PHYSICS(1) | macro particles per device: 4718592
PIConGPUVerbose PHYSICS(1) | typical macro particle weighting: 6955.06
PIConGPUVerbose PHYSICS(1) | UNIT_SPEED 2.99792e+08
PIConGPUVerbose PHYSICS(1) | UNIT_TIME 1.39e-16
PIConGPUVerbose PHYSICS(1) | UNIT_LENGTH 4.16712e-08
PIConGPUVerbose PHYSICS(1) | UNIT_MASS 6.33563e-27
PIConGPUVerbose PHYSICS(1) | UNIT_CHARGE 1.11432e-15
PIConGPUVerbose PHYSICS(1) | UNIT_EFIELD 1.22627e+13
PIConGPUVerbose PHYSICS(1) | UNIT_BFIELD 40903.8
PIConGPUVerbose PHYSICS(1) | UNIT_ENERGY 5.69418e-10
initialization time: 3min 43sec 605msec = 223 sec
0 % = 0 | time elapsed: 41msec | avg time per step: 0msec
4 % = 102 | time elapsed: 2sec 459msec | avg time per step: 23msec
9 % = 204 | time elapsed: 4sec 894msec | avg time per step: 23msec
14 % = 306 | time elapsed: 7sec 389msec | avg time per step: 24msec
19 % = 408 | time elapsed: 9sec 952msec | avg time per step: 24msec
24 % = 510 | time elapsed: 12sec 580msec | avg time per step: 25msec
29 % = 612 | time elapsed: 15sec 259msec | avg time per step: 25msec
34 % = 714 | time elapsed: 17sec 997msec | avg time per step: 26msec
39 % = 816 | time elapsed: 20sec 791msec | avg time per step: 27msec
44 % = 918 | time elapsed: 23sec 651msec | avg time per step: 27msec
49 % = 1020 | time elapsed: 26sec 577msec | avg time per step: 28msec
54 % = 1122 | time elapsed: 29sec 551msec | avg time per step: 28msec
59 % = 1224 | time elapsed: 32sec 585msec | avg time per step: 29msec
64 % = 1326 | time elapsed: 35sec 665msec | avg time per step: 29msec
69 % = 1428 | time elapsed: 38sec 780msec | avg time per step: 30msec
74 % = 1530 | time elapsed: 41sec 908msec | avg time per step: 30msec
79 % = 1632 | time elapsed: 45sec 39msec | avg time per step: 30msec
84 % = 1734 | time elapsed: 48sec 79msec | avg time per step: 29msec
89 % = 1836 | time elapsed: 51sec 95msec | avg time per step: 29msec
94 % = 1938 | time elapsed: 54sec 115msec | avg time per step: 29msec
99 % = 2040 | time elapsed: 57sec 125msec | avg time per step: 29msec
calculation simulation time: 57sec 363msec = 57 sec
full simulation time: 4min 41sec 190msec = 281 sec
md5-4ad7d11a0909831ba070d25869ceae5d
cristian@T7500:~/PIC_INPUT/PICONGPU/TESTS/myLaserWakefield$ rm -r .build && pic-build &> out.txt && tbg -s bash -c etc/picongpu/1.cfg -t etc/picongpu/bash/mpiexec.tpl /media/cristian/RawDataDisk/PICONGPU/TESTS/myLaserWakefield_01
Running program...
using default compiler
==> Error: Spec 'picongpu@develop%[email protected]+adios+hdf5+isaac+png backend=cuda cudacxx=nvcc arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+blosc~bzip2~fortran~hdf5~infiniband+lz4+mpi~netcdf+shared+sz~szip+zfp+zlib patches=01113e9efb929d71c28bf33cc8b7f215d85195ec700e99cb41164e2f8f830640,8ae17f655248e87cbab1d1ed794e15364a38d2f5f8d971b1086702f72d79bd42,d24b79b795f66e40ddcd331ea4be896ac9c393d6f68f4318616d23928b0694e9 staging=none arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+atomic+chrono~clanglibcpp~container~context~coroutine+date_time~debug+exception~fiber+filesystem+graph~icu+iostreams+locale+log+math~mpi+multithreaded~numpy~pic+program_options~python+random+regex+serialization+shared+signals~singlethreaded+system~taggedlayout+test+thread+timer~versionedlayout+wave cxxstd=11 visibility=hidden arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+shared arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+avx2~ipo build_type=RelWithDebInfo patches=cd40604a26157a0e018ea496cf3267e116e6ec5ff80a7d1cef11b841c154c388 arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~doc+ncurses+openssl+ownlibs~qt patches=bf695e3febb222da2ed94b3beea600650e4318975da90e4a71d6f31a6d5d8c3d arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+libbsd arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+bzip2+curses+git~libunistring+libxml2+tar+xz arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~cxx~debug~fortran~hl~java+mpi+pic+shared~szip~threadsafe api=none arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~cairo~cuda~gl~libudev+libxml2~netloc~nvml+pci+shared arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~ipo build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^isaac@develop%[email protected]+cuda~ipo build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^isaac-server@develop%[email protected]~ipo build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~ipo+shared build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] patches=26f26c6f29a7ce9bf370ad3ab2610f99365b4bdd7b82e7c31df41a3370d685c0 arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~ipo+mpi build_type=RelWithDebInfo patches=669608721dfce0ada7cef1ac84344352791a8916b7bb98ca8a0d4e6d4670e744 arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~ipo build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~python arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+sigsegv patches=3877ab548f88597ab2327a2230ee048d2d07ace1062efe81fc92e91b7f39cd00,fc9b61654a3ba1a8d6cd78ce087e7c96366c290bc8d2c299f09828d793b853c8 arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~symlinks+termlib arch=linux-linuxmint19-westmere ^[email protected]%[email protected] patches=4e1d78cbbb85de625bad28705e748856033eaafab92a66dffd383a3d7e00cc94 arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~atomics~cuda~cxx~cxx_exceptions+gpfs~java~legacylaunchers~lustre~memchecker~pmi~singularity~sqlite3+static~thread_multiple+vt+wrapper-rpath fabrics=none schedulers=none arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+systemcerts arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+cpanm+shared+threads arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~ipo build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+bz2+ctypes+dbm~debug+libxml2+lzma~nis~optimizations+pic+pyexpat+pythoncmd+readline+shared+sqlite3+ssl~tix~tkinter~ucs4+uuid+zlib patches=0d98e93189bc278fbc37a50ed7f183bd8aaf249a8e1670a465f0db6bb4f8cf87 arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~ipo+pic+shared build_type=RelWithDebInfo patches=c9cfecb1f7a623418590cf4e00ae7d308d1c3faeb15046c2e5090e38221da7cd arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+column_metadata+fts~functions~rtree arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~fortran~hdf5~ipo~netcdf~pastri~python~random_access+shared~stats~time_compression build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~pic arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~aligned~fasthash~ipo~profile+shared~strided~twoway bsws=64 build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+optimize+pic+shared arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+pic arch=linux-linuxmint19-westmere' matches no installed packages.
md5-4ad7d11a0909831ba070d25869ceae5d
PIConGPU: 0.5.0-dev
Build-Type: Release
Third party:
OS: Linux-5.0.0-32-generic
arch: x86_64
CXX: GNU (7.5.0)
CMake: 3.18.4
CUDA: 10.2.89
mallocMC: 2.5.0
Boost: 1.70.0
MPI:
standard: 3.1
flavor: OpenMPI (3.1.6)
PNGwriter: 0.7.0
libSplash: 1.7.0 (Format 4.0)
ADIOS: 1.13.1
openPMD: NOTFOUND
PIConGPUVerbose PHYSICS(1) | Sliding Window is OFF
PIConGPUVerbose PHYSICS(1) | used Random Number Generator: RNGProvider3XorMin seed: 42
PIConGPUVerbose PHYSICS(1) | Courant c*dt <= 1.00229 ? 1
PIConGPUVerbose PHYSICS(1) | Resolving plasma oscillations?
Estimates are based on DensityRatio to BASE_DENSITY of each species
(see: density.param, speciesDefinition.param).
It and does not cover other forms of initialization
PIConGPUVerbose PHYSICS(1) | species e: omega_p * dt <= 0.1 ? 0.0247974
PIConGPUVerbose PHYSICS(1) | species i: omega_p * dt <= 0.1 ? 0.000578698
PIConGPUVerbose PHYSICS(1) | y-cells per wavelength: 18.0587
PIConGPUVerbose PHYSICS(1) | macro particles per device: 9437184
PIConGPUVerbose PHYSICS(1) | typical macro particle weighting: 6955.06
PIConGPUVerbose PHYSICS(1) | UNIT_SPEED 2.99792e+08
PIConGPUVerbose PHYSICS(1) | UNIT_TIME 1.39e-16
PIConGPUVerbose PHYSICS(1) | UNIT_LENGTH 4.16712e-08
PIConGPUVerbose PHYSICS(1) | UNIT_MASS 6.33563e-27
PIConGPUVerbose PHYSICS(1) | UNIT_CHARGE 1.11432e-15
PIConGPUVerbose PHYSICS(1) | UNIT_EFIELD 1.22627e+13
PIConGPUVerbose PHYSICS(1) | UNIT_BFIELD 40903.8
PIConGPUVerbose PHYSICS(1) | UNIT_ENERGY 5.69418e-10
initialization time: 43min 23sec 89msec = 2603 sec
0 % = 0 | time elapsed: 44msec | avg time per step: 0msec
4 % = 102 | time elapsed: 2sec 781msec | avg time per step: 26msec
9 % = 204 | time elapsed: 5sec 539msec | avg time per step: 26msec
14 % = 306 | time elapsed: 8sec 381msec | avg time per step: 27msec
19 % = 408 | time elapsed: 11sec 330msec | avg time per step: 28msec
24 % = 510 | time elapsed: 14sec 369msec | avg time per step: 29msec
29 % = 612 | time elapsed: 17sec 490msec | avg time per step: 30msec
34 % = 714 | time elapsed: 20sec 694msec | avg time per step: 31msec
39 % = 816 | time elapsed: 23sec 981msec | avg time per step: 31msec
44 % = 918 | time elapsed: 27sec 355msec | avg time per step: 32msec
49 % = 1020 | time elapsed: 30sec 845msec | avg time per step: 33msec
54 % = 1122 | time elapsed: 34sec 408msec | avg time per step: 34msec
59 % = 1224 | time elapsed: 38sec 38msec | avg time per step: 35msec
64 % = 1326 | time elapsed: 41sec 743msec | avg time per step: 35msec
69 % = 1428 | time elapsed: 45sec 484msec | avg time per step: 36msec
74 % = 1530 | time elapsed: 49sec 226msec | avg time per step: 36msec
79 % = 1632 | time elapsed: 52sec 977msec | avg time per step: 36msec
84 % = 1734 | time elapsed: 56sec 655msec | avg time per step: 35msec
89 % = 1836 | time elapsed: 1min 0sec 308msec | avg time per step: 35msec
94 % = 1938 | time elapsed: 1min 3sec 960msec | avg time per step: 35msec
99 % = 2040 | time elapsed: 1min 7sec 620msec | avg time per step: 35msec
calculation simulation time: 1min 7sec 909msec = 67 sec
full simulation time: 44min 31sec 263msec = 2671 sec
[formated by psychocoderHPC]
And there is also this problem with the openPMD plugin: unrecognised option '--openPMD.period'
though Spack contains the openPMD API

Regarding openPMD API not found, to check it please use .build/picongpu -v to see if it was used during compilation time, or .build/picongpu -h to see if it is in the list of options. The fact that the directory exists is not a guarantee it was found and linked with.
@cbontoiu Regarding your following answer:
In fact I haven't setup the server for ISAAC, though ISACC and its dependencies were installed via Spack together with picongpu@develop
Does this mean that ISAAC was added as library? This would be enough to increase compile time.
But as I reread your question, I realized that you were having issues with initialization time - not compile time. My fault.
Init time clearly looks like an IO problem. Could you please list all files in simOutput via find . and also provide the output of picongpu -h?
simOutput contains only some images and the usual log file, which is attached. As for the picongpu -h command, this does not work for me. I tried:
cristian@T7500:~/PIC_INPUT/PICONGPU/TESTS/myLaserWakefield$ source $HOME/src/spack/share/spack/setup-env.sh && spack load picongpu +adios %[email protected] && export PIC_BACKEND="cuda:72" && export OMPI_MCA_io=^ompio
cristian@T7500:~/PIC_INPUT/PICONGPU/TESTS/myLaserWakefield$ picongpu -h
picongpu: command not found
but this query shows the version:
cristian@T7500:~/PIC_INPUT/PICONGPU/TESTS/myLaserWakefield$ spack info picongpu
Package: picongpu
Description:
PIConGPU: A particle-in-cell code for GPGPUs
Homepage: https://github.com/ComputationalRadiationPhysics/picongpu
Maintainers: @ax3l
Tags:
None
Preferred version:
0.5.0 https://github.com/ComputationalRadiationPhysics/picongpu/archive/0.5.0.tar.gz
Safe versions:
develop [git] https://github.com/ComputationalRadiationPhysics/picongpu.git on branch dev
0.5.0 https://github.com/ComputationalRadiationPhysics/picongpu/archive/0.5.0.tar.gz
0.4.3 https://github.com/ComputationalRadiationPhysics/picongpu/archive/0.4.3.tar.gz
0.4.2 https://github.com/ComputationalRadiationPhysics/picongpu/archive/0.4.2.tar.gz
0.4.1 https://github.com/ComputationalRadiationPhysics/picongpu/archive/0.4.1.tar.gz
0.4.0-rc4 https://github.com/ComputationalRadiationPhysics/picongpu/archive/0.4.0-rc4.tar.gz
0.4.0-rc3 https://github.com/ComputationalRadiationPhysics/picongpu/archive/0.4.0-rc3.tar.gz
0.4.0-rc2 https://github.com/ComputationalRadiationPhysics/picongpu/archive/0.4.0-rc2.tar.gz
0.4.0 https://github.com/ComputationalRadiationPhysics/picongpu/archive/0.4.0.tar.gz
local [git] file:///home/cristian/src/picongpu
gtc18 [git] https://github.com/ax3l/picongpu.git on branch topic-NGCandGTC18
foilISAAC [git] https://github.com/ax3l/picongpu.git on branch topic-20171114-foilISAAC
Variants:
Name [Default] Allowed values Description
============== ============== ======================================
adios [off] on, off Enable the ADIOS plugin
backend [cuda] cuda, omp2b Control the computing backend
cudacxx [nvcc] nvcc, clang Device compiler for the CUDA backend
hdf5 [on] on, off Enable multiple plugins requiring HDF5
isaac [off] on, off Enable the ISAAC plugin
png [on] on, off Enable the PNG plugin
Installation Phases:
install
Build Dependencies:
adios boost cmake cuda isaac libsplash pngwriter zlib
Link Dependencies:
adios boost cuda isaac libsplash mpi pngwriter zlib
Run Dependencies:
cmake isaac-server mpi rsync util-linux
Virtual Packages:
None
just picongpu does not work, because it is not in that directory, but in .build inside it. Hence .build/picongpu -v. The spack output does not give full information there, as e.g. there could be a package that does not match for some reason, etc. While the build of PIConGPU knows for sure what was found and what wasn't.
EDIT: somehow duplication of @sbastrakov answer.
Okay - in your simulation directory, there is a tbg directory. It contains the file submit.start. In that file you should find a comment # Run PIConGPU - one line below, you will find the full path to picongpu. Could you please use that path to get the help of picongpu.
I first moved in to the input .build folder but neither of the two commands work

Then I went to the output folder as instrudcted in the tbg folder, but the file submit.start does not contain the comment # Run PIConGPU. I will attache the whole output folder here.
Thank you for looking into this.
@cbontoiu not needed - you are already in the right directory - you just need to type in ./picongpu instead of picongpu.
@PrometheusPi actully I canno upload more than 10 MB here. OK, so here is the output
cristian@T7500:~/PIC_INPUT/PICONGPU/TESTS/myLaserWakefield/.build$ ./picongpu -h
Usage picongpu [-d dx=1 dy=1 dz=1] -g width height depth [options]
:
-h [ --help ] print help message and exit
--validate validate command line parameters and
exit
-v [ --version ] print version information and exit
-c [ --config ] arg Config file(s)
PIConGPU:
-s [ --steps ] arg Simulation steps
--checkpoint.restart.loop arg (=0) Number of times to restart the
simulation after simulation has
finished (for presentations). Note:
does not yet work with all plugins, see
issue #1305
-p [ --percent ] arg (=5) Print time statistics after p percent
to stdout
--checkpoint.restart Restart simulation
--checkpoint.restart.directory arg (=checkpoints)
Directory containing checkpoints for a
restart
--checkpoint.restart.step arg Checkpoint step to restart from
--checkpoint.period arg Period for checkpoint creation
--checkpoint.directory arg (=checkpoints)
Directory for checkpoints
--author arg The author that runs the simulation and
is responsible for created output files
--mpiDirect use device direct for MPI communication
e.g. GPU direct
--versionOnce print version information once and
start
-d [ --devices ] arg number of devices in each dimension
-g [ --grid ] arg size of the simulation grid
--gridDist arg Regex to describe the static
distribution of the cells for each
device,default: equal distribution over
all devices
example:
-d 2 4 1
-g 128 192 12
--gridDist "64{2}" "64,32{2},64"
--periodic arg specifying whether the grid is periodic
(1) or not (0) in each dimension,
default: no periodic dimensions
-m [ --moving ] enable sliding/moving window
--windowMovePoint arg (=0.90000000000000002)
ratio of the global window size in y
which defines when to start sliding the
window. The window starts sliding at
the time required to pass the distance
of windowMovePoint * (global window
size in y) when moving with the speed
of light
--stopWindow arg (=-1) stops the window at stimulation step,
-1 means that window is never stopping
--autoAdjustGrid arg (=1) auto adjust the grid size if PIConGPU
conditions are not fulfilled
Initializers:
PluginController:
Checkpoint:
--checkpoint.backend arg Optional backend for checkpointing
[adios] default: adios
--checkpoint.file arg Optional checkpoint filename (prefix)
--checkpoint.restart.backend arg Optional backend for restarting [adios]
default: adios
--checkpoint.restart.file arg checkpoint restart filename (prefix)
--checkpoint.restart.chunkSize arg (=1000000)
Number of particles processed in one
kernel call during restart to prevent
frame count blowup
--checkpoint.adios.aggregators arg Number of aggregators [0 == number of
MPI processes] | default: 0
--checkpoint.adios.ost arg Number of OST | default: 1
--checkpoint.adios.disable-meta arg Disable online gather and write of a
global meta file, can be time consuming
(use `bpmeta` post-mortem) | default: 0
--checkpoint.adios.transport-params arg
additional transport parameters, see
ADIOS manual chapter 6.1.5, e.g.,
'random_offset=1;stripe_count=4' |
default:
--checkpoint.adios.compression arg ADIOS compression method, e.g., zlib
(see `adios_config -m` for help) |
default: none
EnergyFields: calculate the energy of the fields:
--fields_energy.period arg enable plugin [for each n-th step]
ADIOSWriter: dump simulation data with ADIOS:
--adios.period arg enable ADIOS IO [for each n-th step]
--adios.source arg data sources: [species_all, fields_all,
e_all, E, B, e_chargeDensity,
e_energyDensity, e_particleMomentumComp
onent] | default: species_all,
fields_all
--adios.file arg ADIOS output filename (prefix)
--adios.aggregators arg Number of aggregators [0 == number of
MPI processes] | default: 0
--adios.ost arg Number of OST | default: 1
--adios.disable-meta arg Disable online gather and write of a
global meta file, can be time consuming
(use `bpmeta` post-mortem) | default: 0
--adios.transport-params arg additional transport parameters, see
ADIOS manual chapter 6.1.5, e.g.,
'random_offset=1;stripe_count=4' |
default:
--adios.compression arg ADIOS compression method, e.g., zlib
(see `adios_config -m` for help) |
default: none
SumCurrents:
--sumcurr.period arg enable plugin [for each n-th step]
ChargeConservation: Print the maximum charge deviation between particles and div E to textfile 'chargeConservation.dat':
--chargeConservation.period arg enable plugin [for each n-th step]
IntensityPlugin: calculate the maximum and integrated E-Field energy
over laser propagation direction:
--E_intensity.period arg enable plugin [for each n-th step]
IsaacPlugin:
--isaac.period arg Enable IsaacPlugin [for each n-th
step].
--isaac.name arg (=default) The name of the simulation. Default is
"default".
--isaac.url arg (=localhost) The url of the isaac server to connect
to. Default is "localhost".
--isaac.port arg (=2460) The port of the isaac server to connect
to. Default is 2460.
--isaac.width arg (=1024) The width per isaac framebuffer.
Default is 1024.
--isaac.height arg (=768) The height per isaac framebuffer.
Default is 768.
--isaac.directPause arg (=0) Direct pausing after starting
simulation. Default is false.
--isaac.quality arg (=90) JPEG quality. Default is 90.
--isaac.reconnect arg (=1) Trying to reconnect every time an image
is rendered if the connection is lost
or could never established at all.
ResourceLog:
--resourceLog.period arg Enable ResourceLog plugin [for each
n-th step]
--resourceLog.prefix arg (=resourceLog_)
Set the filename prefix for output file
if a filestream was selected
--resourceLog.stream arg (=file) Output stream [stdout, stderr, file]
--resourceLog.properties arg List of properties to log [rank,
position, currentStep, cellCount,
particleCount]
--resourceLog.format arg (=json) Output format of log (pp for pretty
print) [json, jsonpp, xml, xmlpp]
SliceFieldPrinter: prints a slice of a field:
--B_slice.period arg notify period
--B_slice.fileName arg file name to store slices in
--B_slice.plane arg specifies the axis which stands on the
cutting plane (0,1,2)
--B_slice.slicePoint arg slice point 0.0 <= x <= 1.0
SliceFieldPrinter: prints a slice of a field:
--E_slice.period arg notify period
--E_slice.fileName arg file name to store slices in
--E_slice.plane arg specifies the axis which stands on the
cutting plane (0,1,2)
--E_slice.slicePoint arg slice point 0.0 <= x <= 1.0
SliceFieldPrinter: prints a slice of a field:
--J_slice.period arg notify period
--J_slice.fileName arg file name to store slices in
--J_slice.plane arg specifies the axis which stands on the
cutting plane (0,1,2)
--J_slice.slicePoint arg slice point 0.0 <= x <= 1.0
EnergyParticles: calculate the energy of a species:
--e_energy.period arg compute kinetic and total energy [for
each n-th step] enable plugin by
setting a non-zero value
--e_energy.filter arg particle filter: [all]
CalcEmittance: calculate the slice emittance of a species:
--e_emittance.period arg compute slice emittance[for each n-th
step] enable plugin by setting a
non-zero value
--e_emittance.filter arg particle filter: [all]
BinEnergyParticles: calculate a energy histogram of a species:
--e_energyHistogram.period arg enable plugin [for each n-th step]
--e_energyHistogram.filter arg particle filter: [all]
--e_energyHistogram.binCount arg number of bins for the energy range |
default: 1024
--e_energyHistogram.minEnergy arg minEnergy[in keV] | default: 0
--e_energyHistogram.maxEnergy arg maxEnergy[in keV]
CountParticles: count macro particles of a species:
--e_macroParticlesCount.period arg enable plugin [for each n-th step]
PngPlugin: create png's of a species and fields:
--e_png.period arg enable data output [for each n-th step]
--e_png.axis arg axis which are shown [valid values
x,y,z] example: yz
--e_png.slicePoint arg value range: 0 <= x <= 1 , point of the
slice
--e_png.folder arg folder for output files
ParticleCalorimeter: (virtually) propagates and collects particles to infinite distance:
--e_calorimeter.period arg enable plugin [for each n-th step]
--e_calorimeter.file arg output filename (prefix)
--e_calorimeter.filter arg particle filter: [all]
--e_calorimeter.numBinsYaw arg number of bins for angle yaw. |
default: 64
--e_calorimeter.numBinsPitch arg number of bins for angle pitch. |
default: 64
--e_calorimeter.numBinsEnergy arg number of bins for the energy spectrum.
Disabled by default. | default: 1
--e_calorimeter.minEnergy arg minimal detectable energy in keV. |
default: 0
--e_calorimeter.maxEnergy arg maximal detectable energy in keV. |
default: 1000
--e_calorimeter.logScale arg enable logarithmic energy scale. |
default: 0
--e_calorimeter.openingYaw arg opening angle yaw in degrees. 0 <= x <=
360. | default: 360
--e_calorimeter.openingPitch arg opening angle pitch in degrees. 0 <= x
<= 180. | default: 180
--e_calorimeter.posYaw arg yaw coordinate of calorimeter position
in degrees. Defaults to +y direction. |
default: 0
--e_calorimeter.posPitch arg pitch coordinate of calorimeter
position in degrees. Defaults to +y
direction. | default: 0
PhaseSpace: create phase space of a species:
--e_phaseSpace.period arg notify period
--e_phaseSpace.filter arg particle filter: [all]
--e_phaseSpace.space arg spatial component (x, y, z)
--e_phaseSpace.momentum arg momentum component (px, py, pz)
--e_phaseSpace.min arg min range momentum [m_species c]
--e_phaseSpace.max arg max range momentum [m_species c]
PositionsParticles: write position of one particle of a species to std::cout:
--e_position.period arg enable plugin [for each n-th step]
ParticleMerger: merges several macroparticles with similar position and momentum into a single one.
plugin disabled. Enable plugin by adding the `voronoiCellId` attribute to the particle attribute list.:
RandomizedParticleMerger: merges several macroparticles with similar position and momentum into a single one.
plugin disabled. Enable plugin by adding the `voronoiCellId` attribute to the particle attribute list.:
PerSuperCell: create hdf5 with macro particle count per superCell:
--e_macroParticlesPerSuperCell.period arg
enable plugin [for each n-th step]
and this one for the version:
cristian@T7500:~/PIC_INPUT/PICONGPU/TESTS/myLaserWakefield/.build$ ./picongpu -v
PIConGPU: 0.5.0-dev
Build-Type: Release
Third party:
OS: Linux-5.0.0-32-generic
arch: x86_64
CXX: GNU (7.5.0)
CMake: 3.18.4
CUDA: 10.2.89
mallocMC: 2.5.0
Boost: 1.70.0
MPI:
standard: 3.1
flavor: OpenMPI (3.1.6)
PNGwriter: 0.7.0
libSplash: 1.7.0 (Format 4.0)
ADIOS: 1.13.1
openPMD: NOTFOUND
@cbontoiu Thanks for uploading the output. It looks like you build ISAAC. As mentioned this slows down the build process but not the initialization. Furthermore, the executable you used did not use any ions. This is this most likely the 4 minutes initialization case. Could you please also provide the help output for the 40 minutes initialization case? So far, I see no plugin that will cause a massive slowdown.
Also, from this output openPMD API was not found. Did you spack load it before pic-build ?
PS: @cbontoiu Did you observe the increase in build/compile time from only electrons to electrons+ions?
only for electrons + ions and only at runtime. I cannot say what happened at compillation, but they seemed to be as fast as usual (about 2-3 mins compillation time for each case).
Also, from this output openPMD API was not found. Did you
spack loadit before pic-build ?
No, I didn't. This must be the reason. Thank you.
So a command like this would do the job.
source $HOME/src/spack/share/spack/setup-env.sh && spack load picongpu +adios +openpmd %[email protected] && export PIC_BACKEND="cuda:72" && export OMPI_MCA_io=^ompio
I am not expert in spack, so not sure how to do it best. But I think either your option, or doing a separate spack load openpmd should work
Please let me know if this is an easy fix and I should wait for you to update the release, or if it is something more involved, I could maybe uninstall the @develop version and install version 0.43.
Thank you
@cbontoiu I will try to reproduce the massive increase in initialization time. Could you please recap, how you initialized the ions (by -D, or by overwriting the value in the file, or etc.)?
Hello @PrometheusPi I only changed the parameter define PARAM_IONS from 0 to 1. This is defined in speciesDefinition.param
spack load openpmd
It is actually spack load openpmd-api separated from leading piconpu. So spack load picongpu +adios +openpmd %[email protected] does not work
But, there is still as mistery for me the issue with #if( PARAM_IONIZATION == 1 ). C++ is a strongly typed language as Java, which I understand better. If I do this without extending a class or implementing an interface where PARAM_IONIZATION is defined, I would get an error as this variable was not declared. It must be defined in one of the headers used in speciesDefinition.param otherwise I cannot understand
@cbontoiu Thanks for the input. I an currently compiling and testing your setup on a single k20 GPU.
@PrometheusPi
I installed version 0.4.3 on my other computer (similar in performance) and obtained different simulation times.
_electrons:_
initialization time: 22sec 257msec = 22 sec
full simulation time: 1min 11sec 204msec = 71 sec
_electrons + ions:_
initialization time: 32sec 91msec = 32 sec
full simulation time: 1min 29sec 866msec = 89 sec
Apart from the tiny growth of the initialization time when using ions, please note the significant difference overall. That is 5.0 or @develop version would require more than 3 minutes for something that version 0.4.3 requires about 20 seconds. Maybe this conclusion helps to understand what happens. The terminal output shown below is first for the default LWFA model (only electrons) and second, for the modified LWFA model with define PARAM_IONS 1 so electrons + ions:
(base) quasar@quasar:~/PIC_INPUT/PICONGPU/TESTS/myLWFA$ tbg -s bash -c etc/picongpu/1.cfg -t etc/picongpu/bash/mpiexec.tpl /media/quasar/RawDataDisk/PICONGPU/TESTS/myLaserWakefield_01
Running program...
using default compiler
==> Error: Spec '[email protected]%[email protected]+adios+hdf5~isaac+png backend=cuda cudacxx=nvcc arch=linux-linuxmint19-skylake ^[email protected]%[email protected]+blosc~bzip2~fortran~hdf5~infiniband+lz4+mpi~netcdf+shared+sz~szip+zfp+zlib patches=01113e9efb929d71c28bf33cc8b7f215d85195ec700e99cb41164e2f8f830640,8ae17f655248e87cbab1d1ed794e15364a38d2f5f8d971b1086702f72d79bd42,d24b79b795f66e40ddcd331ea4be896ac9c393d6f68f4318616d23928b0694e9 staging=none arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected]+atomic+chrono~clanglibcpp~container~context~coroutine+date_time~debug+exception~fiber+filesystem+graph~icu+iostreams+locale+log+math~mpi+multithreaded~numpy~pic+program_options~python+random+regex+serialization+shared+signals~singlethreaded+system~taggedlayout+test+thread+timer~versionedlayout+wave cxxstd=11 visibility=hidden arch=linux-linuxmint19-skylake ^[email protected]%[email protected]+shared arch=linux-linuxmint19-skylake ^[email protected]%[email protected]+avx2~ipo build_type=RelWithDebInfo patches=cd40604a26157a0e018ea496cf3267e116e6ec5ff80a7d1cef11b841c154c388 arch=linux-linuxmint19-skylake ^[email protected]%[email protected]~doc+ncurses+openssl+ownlibs~qt patches=bf695e3febb222da2ed94b3beea600650e4318975da90e4a71d6f31a6d5d8c3d arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected]+libbsd arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected]+bzip2+curses+git~libunistring+libxml2+tar+xz arch=linux-linuxmint19-skylake ^[email protected]%[email protected]~cxx~debug~fortran~hl~java+mpi+pic+shared~szip~threadsafe api=none arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected] patches=26f26c6f29a7ce9bf370ad3ab2610f99365b4bdd7b82e7c31df41a3370d685c0 arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected]~ipo+mpi build_type=RelWithDebInfo patches=669608721dfce0ada7cef1ac84344352791a8916b7bb98ca8a0d4e6d4670e744 arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected]~python arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected]+sigsegv patches=3877ab548f88597ab2327a2230ee048d2d07ace1062efe81fc92e91b7f39cd00,fc9b61654a3ba1a8d6cd78ce087e7c96366c290bc8d2c299f09828d793b853c8 arch=linux-linuxmint19-skylake ^[email protected]%[email protected]~symlinks+termlib arch=linux-linuxmint19-skylake ^[email protected]%[email protected]~atomics~cuda~cxx~cxx_exceptions+gpfs~java~legacylaunchers~lustre~memchecker~pmi~singularity~sqlite3+static~thread_multiple+vt+wrapper-rpath fabrics=none schedulers=none arch=linux-linuxmint19-skylake ^[email protected]%[email protected]+systemcerts arch=linux-linuxmint19-skylake ^[email protected]%[email protected]+cpanm+shared+threads arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected]~ipo build_type=RelWithDebInfo arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected]+bz2+ctypes+dbm~debug+libxml2+lzma~nis~optimizations+pic+pyexpat+pythoncmd+readline+shared+sqlite3+ssl~tix~tkinter~ucs4+uuid+zlib patches=0d98e93189bc278fbc37a50ed7f183bd8aaf249a8e1670a465f0db6bb4f8cf87 arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected]~ipo+pic+shared build_type=RelWithDebInfo patches=c9cfecb1f7a623418590cf4e00ae7d308d1c3faeb15046c2e5090e38221da7cd arch=linux-linuxmint19-skylake ^[email protected]%[email protected]+column_metadata+fts~functions~rtree arch=linux-linuxmint19-skylake ^[email protected]%[email protected]~fortran~hdf5~ipo~netcdf~pastri~python~random_access+shared~stats~time_compression build_type=RelWithDebInfo arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected]~pic arch=linux-linuxmint19-skylake ^[email protected]%[email protected]~aligned~fasthash~ipo~profile+shared~strided~twoway bsws=64 build_type=RelWithDebInfo arch=linux-linuxmint19-skylake ^[email protected]%[email protected]+optimize+pic+shared arch=linux-linuxmint19-skylake ^[email protected]%[email protected]+pic arch=linux-linuxmint19-skylake' matches no installed packages.
PIConGPU: 0.4.3
Build-Type: Release
Third party:
OS: Linux-5.0.0-32-generic
arch: x86_64
CXX: GNU (7.5.0)
CMake: 3.18.4
CUDA: 9.2.148
mallocMC: 2.3.1
Boost: 1.70.0
MPI:
standard: 3.1
flavor: OpenMPI (3.1.5)
PNGwriter: 0.7.0
libSplash: 1.7.0 (Format 4.0)
ADIOS: NOTFOUND
PIConGPUVerbose PHYSICS(1) | Sliding Window is OFF
PIConGPUVerbose PHYSICS(1) | used Random Number Generator: RNGProvider3XorMin seed: 42
PIConGPUVerbose PHYSICS(1) | Courant c*dt <= 1.00229 ? 1
PIConGPUVerbose PHYSICS(1) | species e: omega_p * dt <= 0.1 ? 0.0247974
PIConGPUVerbose PHYSICS(1) | y-cells per wavelength: 18.0587
PIConGPUVerbose PHYSICS(1) | macro particles per device: 4718592
PIConGPUVerbose PHYSICS(1) | typical macro particle weighting: 6955.06
PIConGPUVerbose PHYSICS(1) | UNIT_SPEED 2.99792e+08
PIConGPUVerbose PHYSICS(1) | UNIT_TIME 1.39e-16
PIConGPUVerbose PHYSICS(1) | UNIT_LENGTH 4.16712e-08
PIConGPUVerbose PHYSICS(1) | UNIT_MASS 6.33563e-27
PIConGPUVerbose PHYSICS(1) | UNIT_CHARGE 1.11432e-15
PIConGPUVerbose PHYSICS(1) | UNIT_EFIELD 1.22627e+13
PIConGPUVerbose PHYSICS(1) | UNIT_BFIELD 40903.8
PIConGPUVerbose PHYSICS(1) | UNIT_ENERGY 5.69418e-10
initialization time: 22sec 257msec = 22 sec
0 % = 0 | time elapsed: 66msec | avg time per step: 0msec
4 % = 102 | time elapsed: 1sec 940msec | avg time per step: 17msec
9 % = 204 | time elapsed: 3sec 824msec | avg time per step: 18msec
14 % = 306 | time elapsed: 5sec 729msec | avg time per step: 18msec
19 % = 408 | time elapsed: 7sec 748msec | avg time per step: 19msec
24 % = 510 | time elapsed: 9sec 973msec | avg time per step: 21msec
29 % = 612 | time elapsed: 12sec 135msec | avg time per step: 20msec
34 % = 714 | time elapsed: 14sec 307msec | avg time per step: 20msec
39 % = 816 | time elapsed: 16sec 563msec | avg time per step: 21msec
44 % = 918 | time elapsed: 18sec 905msec | avg time per step: 22msec
49 % = 1020 | time elapsed: 21sec 347msec | avg time per step: 23msec
54 % = 1122 | time elapsed: 23sec 810msec | avg time per step: 23msec
59 % = 1224 | time elapsed: 26sec 476msec | avg time per step: 25msec
64 % = 1326 | time elapsed: 29sec 213msec | avg time per step: 26msec
69 % = 1428 | time elapsed: 32sec 280msec | avg time per step: 29msec
74 % = 1530 | time elapsed: 35sec 211msec | avg time per step: 28msec
79 % = 1632 | time elapsed: 38sec 273msec | avg time per step: 29msec
84 % = 1734 | time elapsed: 40sec 888msec | avg time per step: 25msec
89 % = 1836 | time elapsed: 43sec 413msec | avg time per step: 24msec
94 % = 1938 | time elapsed: 45sec 949msec | avg time per step: 24msec
99 % = 2040 | time elapsed: 48sec 594msec | avg time per step: 25msec
calculation simulation time: 48sec 788msec = 48 sec
full simulation time: 1min 11sec 204msec = 71 sec
(base) quasar@quasar:~/PIC_INPUT/PICONGPU/TESTS/myLWFA$ rm -r .build/ && pic-build &> out.txt
(base) quasar@quasar:~/PIC_INPUT/PICONGPU/TESTS/myLWFA$ tbg -s bash -c etc/picongpu/1.cfg -t etc/picongpu/bash/mpiexec.tpl /media/quasar/RawDataDisk/PICONGPU/TESTS/myLaserWakefield_02
Running program...
using default compiler
==> Error: Spec '[email protected]%[email protected]+adios+hdf5~isaac+png backend=cuda cudacxx=nvcc arch=linux-linuxmint19-skylake ^[email protected]%[email protected]+blosc~bzip2~fortran~hdf5~infiniband+lz4+mpi~netcdf+shared+sz~szip+zfp+zlib patches=01113e9efb929d71c28bf33cc8b7f215d85195ec700e99cb41164e2f8f830640,8ae17f655248e87cbab1d1ed794e15364a38d2f5f8d971b1086702f72d79bd42,d24b79b795f66e40ddcd331ea4be896ac9c393d6f68f4318616d23928b0694e9 staging=none arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected]+atomic+chrono~clanglibcpp~container~context~coroutine+date_time~debug+exception~fiber+filesystem+graph~icu+iostreams+locale+log+math~mpi+multithreaded~numpy~pic+program_options~python+random+regex+serialization+shared+signals~singlethreaded+system~taggedlayout+test+thread+timer~versionedlayout+wave cxxstd=11 visibility=hidden arch=linux-linuxmint19-skylake ^[email protected]%[email protected]+shared arch=linux-linuxmint19-skylake ^[email protected]%[email protected]+avx2~ipo build_type=RelWithDebInfo patches=cd40604a26157a0e018ea496cf3267e116e6ec5ff80a7d1cef11b841c154c388 arch=linux-linuxmint19-skylake ^[email protected]%[email protected]~doc+ncurses+openssl+ownlibs~qt patches=bf695e3febb222da2ed94b3beea600650e4318975da90e4a71d6f31a6d5d8c3d arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected]+libbsd arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected]+bzip2+curses+git~libunistring+libxml2+tar+xz arch=linux-linuxmint19-skylake ^[email protected]%[email protected]~cxx~debug~fortran~hl~java+mpi+pic+shared~szip~threadsafe api=none arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected] patches=26f26c6f29a7ce9bf370ad3ab2610f99365b4bdd7b82e7c31df41a3370d685c0 arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected]~ipo+mpi build_type=RelWithDebInfo patches=669608721dfce0ada7cef1ac84344352791a8916b7bb98ca8a0d4e6d4670e744 arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected]~python arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected]+sigsegv patches=3877ab548f88597ab2327a2230ee048d2d07ace1062efe81fc92e91b7f39cd00,fc9b61654a3ba1a8d6cd78ce087e7c96366c290bc8d2c299f09828d793b853c8 arch=linux-linuxmint19-skylake ^[email protected]%[email protected]~symlinks+termlib arch=linux-linuxmint19-skylake ^[email protected]%[email protected]~atomics~cuda~cxx~cxx_exceptions+gpfs~java~legacylaunchers~lustre~memchecker~pmi~singularity~sqlite3+static~thread_multiple+vt+wrapper-rpath fabrics=none schedulers=none arch=linux-linuxmint19-skylake ^[email protected]%[email protected]+systemcerts arch=linux-linuxmint19-skylake ^[email protected]%[email protected]+cpanm+shared+threads arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected]~ipo build_type=RelWithDebInfo arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected]+bz2+ctypes+dbm~debug+libxml2+lzma~nis~optimizations+pic+pyexpat+pythoncmd+readline+shared+sqlite3+ssl~tix~tkinter~ucs4+uuid+zlib patches=0d98e93189bc278fbc37a50ed7f183bd8aaf249a8e1670a465f0db6bb4f8cf87 arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected]~ipo+pic+shared build_type=RelWithDebInfo patches=c9cfecb1f7a623418590cf4e00ae7d308d1c3faeb15046c2e5090e38221da7cd arch=linux-linuxmint19-skylake ^[email protected]%[email protected]+column_metadata+fts~functions~rtree arch=linux-linuxmint19-skylake ^[email protected]%[email protected]~fortran~hdf5~ipo~netcdf~pastri~python~random_access+shared~stats~time_compression build_type=RelWithDebInfo arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected]~pic arch=linux-linuxmint19-skylake ^[email protected]%[email protected]~aligned~fasthash~ipo~profile+shared~strided~twoway bsws=64 build_type=RelWithDebInfo arch=linux-linuxmint19-skylake ^[email protected]%[email protected]+optimize+pic+shared arch=linux-linuxmint19-skylake ^[email protected]%[email protected]+pic arch=linux-linuxmint19-skylake' matches no installed packages.
PIConGPU: 0.4.3
Build-Type: Release
Third party:
OS: Linux-5.0.0-32-generic
arch: x86_64
CXX: GNU (7.5.0)
CMake: 3.18.4
CUDA: 9.2.148
mallocMC: 2.3.1
Boost: 1.70.0
MPI:
standard: 3.1
flavor: OpenMPI (3.1.5)
PNGwriter: 0.7.0
libSplash: 1.7.0 (Format 4.0)
ADIOS: NOTFOUND
PIConGPUVerbose PHYSICS(1) | Sliding Window is OFF
PIConGPUVerbose PHYSICS(1) | used Random Number Generator: RNGProvider3XorMin seed: 42
PIConGPUVerbose PHYSICS(1) | Courant c*dt <= 1.00229 ? 1
PIConGPUVerbose PHYSICS(1) | species e: omega_p * dt <= 0.1 ? 0.0247974
PIConGPUVerbose PHYSICS(1) | species i: omega_p * dt <= 0.1 ? 0.000578698
PIConGPUVerbose PHYSICS(1) | y-cells per wavelength: 18.0587
PIConGPUVerbose PHYSICS(1) | macro particles per device: 9437184
PIConGPUVerbose PHYSICS(1) | typical macro particle weighting: 6955.06
PIConGPUVerbose PHYSICS(1) | UNIT_SPEED 2.99792e+08
PIConGPUVerbose PHYSICS(1) | UNIT_TIME 1.39e-16
PIConGPUVerbose PHYSICS(1) | UNIT_LENGTH 4.16712e-08
PIConGPUVerbose PHYSICS(1) | UNIT_MASS 6.33563e-27
PIConGPUVerbose PHYSICS(1) | UNIT_CHARGE 1.11432e-15
PIConGPUVerbose PHYSICS(1) | UNIT_EFIELD 1.22627e+13
PIConGPUVerbose PHYSICS(1) | UNIT_BFIELD 40903.8
PIConGPUVerbose PHYSICS(1) | UNIT_ENERGY 5.69418e-10
initialization time: 32sec 91msec = 32 sec
0 % = 0 | time elapsed: 55msec | avg time per step: 0msec
4 % = 102 | time elapsed: 2sec 213msec | avg time per step: 20msec
9 % = 204 | time elapsed: 4sec 442msec | avg time per step: 21msec
14 % = 306 | time elapsed: 6sec 790msec | avg time per step: 22msec
19 % = 408 | time elapsed: 9sec 332msec | avg time per step: 24msec
24 % = 510 | time elapsed: 11sec 886msec | avg time per step: 24msec
29 % = 612 | time elapsed: 14sec 445msec | avg time per step: 24msec
34 % = 714 | time elapsed: 17sec 29msec | avg time per step: 24msec
39 % = 816 | time elapsed: 19sec 684msec | avg time per step: 25msec
44 % = 918 | time elapsed: 22sec 410msec | avg time per step: 26msec
49 % = 1020 | time elapsed: 25sec 238msec | avg time per step: 27msec
54 % = 1122 | time elapsed: 28sec 155msec | avg time per step: 28msec
59 % = 1224 | time elapsed: 31sec 229msec | avg time per step: 29msec
64 % = 1326 | time elapsed: 34sec 473msec | avg time per step: 31msec
69 % = 1428 | time elapsed: 37sec 843msec | avg time per step: 32msec
74 % = 1530 | time elapsed: 41sec 326msec | avg time per step: 33msec
79 % = 1632 | time elapsed: 44sec 845msec | avg time per step: 33msec
84 % = 1734 | time elapsed: 48sec 6msec | avg time per step: 30msec
89 % = 1836 | time elapsed: 51sec 51msec | avg time per step: 29msec
94 % = 1938 | time elapsed: 54sec 182msec | avg time per step: 29msec
99 % = 2040 | time elapsed: 57sec 386msec | avg time per step: 30msec
calculation simulation time: 57sec 627msec = 57 sec
full simulation time: 1min 29sec 866msec = 89 sec
@cbontoiu I am still waiting for the jobs to start on our cluster. Currently, most GPUs are blocked by other jobs. I will keep you posted, if the jobs finished and I could reproduce the extended initialization time.
Just a quick question. Does spack install [email protected] %[email protected] install a different version than spack install picongpu@develop %[email protected]``. I couldn't understand from looking atpackage.py```
They should be different. 0.5.0 is the last released version, and so it does not change. While develop is the current branch where we add changes to.
I installed version 0.4.3 on my other computer (similar in performance) and obtained different simulation times.
Could you try 0.5.0 or rhe current dev branch on this system. How much main memory has the system which shows the slowdown with adios and how much memory has the system you used for 0.4.3
[update, I think I mixed rhis issue and the adios issue. So I removed my question.]
I tested also the distribution version 0.5.0 and shows similarly long initialization time as the development version as you can check below. Thanks for looking into this. Meanwhile I will use version 0.4.3 which does not show the problem. It would be great if you could find the problem and we could continue with 0.5.0 and make use of its PML.
cristian@T7500:~/PIC_INPUT/PICONGPU/TESTS/myLaserWakefield$ tbg -s bash -c etc/picongpu/1.cfg -t etc/picongpu/bash/mpiexec.tpl /media/cristian/RawDataDisk/PICONGPU/TESTS/myLaserWakefield_run_01
Running program...
using default compiler
==> Error: Spec '[email protected]%[email protected]+adios+hdf5+isaac+png backend=cuda cudacxx=nvcc arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+blosc~bzip2~fortran~hdf5~infiniband+lz4+mpi~netcdf+shared+sz~szip+zfp+zlib patches=01113e9efb929d71c28bf33cc8b7f215d85195ec700e99cb41164e2f8f830640,8ae17f655248e87cbab1d1ed794e15364a38d2f5f8d971b1086702f72d79bd42,d24b79b795f66e40ddcd331ea4be896ac9c393d6f68f4318616d23928b0694e9 staging=none arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+atomic+chrono~clanglibcpp~container~context~coroutine+date_time~debug+exception~fiber+filesystem+graph~icu+iostreams+locale+log+math~mpi+multithreaded~numpy~pic+program_options~python+random+regex+serialization+shared+signals~singlethreaded+system~taggedlayout+test+thread+timer~versionedlayout+wave cxxstd=11 visibility=hidden arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+shared arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+avx2~ipo build_type=RelWithDebInfo patches=cd40604a26157a0e018ea496cf3267e116e6ec5ff80a7d1cef11b841c154c388 arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~doc+ncurses+openssl+ownlibs~qt patches=bf695e3febb222da2ed94b3beea600650e4318975da90e4a71d6f31a6d5d8c3d arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+libbsd arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+bzip2+curses+git~libunistring+libxml2+tar+xz arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~cxx~debug~fortran~hl~java+mpi+pic+shared~szip~threadsafe api=none arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~cairo~cuda~gl~libudev+libxml2~netloc~nvml+pci+shared arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~ipo build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^isaac@develop%[email protected]+cuda~ipo build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^isaac-server@develop%[email protected]~ipo build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~ipo+shared build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] patches=26f26c6f29a7ce9bf370ad3ab2610f99365b4bdd7b82e7c31df41a3370d685c0 arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~ipo+mpi build_type=RelWithDebInfo patches=669608721dfce0ada7cef1ac84344352791a8916b7bb98ca8a0d4e6d4670e744 arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~ipo build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~python arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+sigsegv patches=3877ab548f88597ab2327a2230ee048d2d07ace1062efe81fc92e91b7f39cd00,fc9b61654a3ba1a8d6cd78ce087e7c96366c290bc8d2c299f09828d793b853c8 arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~symlinks+termlib arch=linux-linuxmint19-westmere ^[email protected]%[email protected] patches=4e1d78cbbb85de625bad28705e748856033eaafab92a66dffd383a3d7e00cc94 arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~atomics~cuda~cxx~cxx_exceptions+gpfs~java~legacylaunchers~lustre~memchecker~pmi~singularity~sqlite3+static~thread_multiple+vt+wrapper-rpath fabrics=none schedulers=none arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+systemcerts arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+cpanm+shared+threads arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~ipo build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+bz2+ctypes+dbm~debug+libxml2+lzma~nis~optimizations+pic+pyexpat+pythoncmd+readline+shared+sqlite3+ssl~tix~tkinter~ucs4+uuid+zlib patches=0d98e93189bc278fbc37a50ed7f183bd8aaf249a8e1670a465f0db6bb4f8cf87 arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~ipo+pic+shared build_type=RelWithDebInfo patches=c9cfecb1f7a623418590cf4e00ae7d308d1c3faeb15046c2e5090e38221da7cd arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+column_metadata+fts~functions~rtree arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~fortran~hdf5~ipo~netcdf~pastri~python~random_access+shared~stats~time_compression build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~pic arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~aligned~fasthash~ipo~profile+shared~strided~twoway bsws=64 build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+optimize+pic+shared arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+pic arch=linux-linuxmint19-westmere' matches no installed packages.
PIConGPU: 0.5.0
Build-Type: Release
Third party:
OS: Linux-5.0.0-32-generic
arch: x86_64
CXX: GNU (7.5.0)
CMake: 3.18.4
CUDA: 10.2.89
mallocMC: 2.3.1
Boost: 1.70.0
MPI:
standard: 3.1
flavor: OpenMPI (3.1.6)
PNGwriter: 0.7.0
libSplash: 1.7.0 (Format 4.0)
ADIOS: 1.13.1
PIConGPUVerbose PHYSICS(1) | Sliding Window is OFF
PIConGPUVerbose PHYSICS(1) | used Random Number Generator: RNGProvider3XorMin seed: 42
PIConGPUVerbose PHYSICS(1) | Courant c*dt <= 1.00229 ? 1
PIConGPUVerbose PHYSICS(1) | Resolving plasma oscillations?
Estimates are based on DensityRatio to BASE_DENSITY of each species
(see: density.param, speciesDefinition.param).
It and does not cover other forms of initialization
PIConGPUVerbose PHYSICS(1) | species e: omega_p * dt <= 0.1 ? 0.0247974
PIConGPUVerbose PHYSICS(1) | y-cells per wavelength: 18.0587
PIConGPUVerbose PHYSICS(1) | macro particles per device: 4718592
PIConGPUVerbose PHYSICS(1) | typical macro particle weighting: 6955.06
PIConGPUVerbose PHYSICS(1) | UNIT_SPEED 2.99792e+08
PIConGPUVerbose PHYSICS(1) | UNIT_TIME 1.39e-16
PIConGPUVerbose PHYSICS(1) | UNIT_LENGTH 4.16712e-08
PIConGPUVerbose PHYSICS(1) | UNIT_MASS 6.33563e-27
PIConGPUVerbose PHYSICS(1) | UNIT_CHARGE 1.11432e-15
PIConGPUVerbose PHYSICS(1) | UNIT_EFIELD 1.22627e+13
PIConGPUVerbose PHYSICS(1) | UNIT_BFIELD 40903.8
PIConGPUVerbose PHYSICS(1) | UNIT_ENERGY 5.69418e-10
initialization time: 4min 21sec 158msec = 261 sec
0 % = 0 | time elapsed: 43msec | avg time per step: 0msec
4 % = 102 | time elapsed: 2sec 541msec | avg time per step: 24msec
9 % = 204 | time elapsed: 5sec 50msec | avg time per step: 24msec
14 % = 306 | time elapsed: 7sec 618msec | avg time per step: 24msec
19 % = 408 | time elapsed: 10sec 321msec | avg time per step: 26msec
24 % = 510 | time elapsed: 13sec 135msec | avg time per step: 27msec
29 % = 612 | time elapsed: 15sec 975msec | avg time per step: 27msec
34 % = 714 | time elapsed: 18sec 860msec | avg time per step: 27msec
39 % = 816 | time elapsed: 21sec 791msec | avg time per step: 28msec
44 % = 918 | time elapsed: 24sec 777msec | avg time per step: 28msec
49 % = 1020 | time elapsed: 27sec 851msec | avg time per step: 29msec
54 % = 1122 | time elapsed: 30sec 995msec | avg time per step: 30msec
59 % = 1224 | time elapsed: 34sec 255msec | avg time per step: 31msec
64 % = 1326 | time elapsed: 37sec 681msec | avg time per step: 33msec
69 % = 1428 | time elapsed: 41sec 240msec | avg time per step: 34msec
74 % = 1530 | time elapsed: 44sec 913msec | avg time per step: 35msec
79 % = 1632 | time elapsed: 48sec 605msec | avg time per step: 35msec
84 % = 1734 | time elapsed: 51sec 940msec | avg time per step: 32msec
89 % = 1836 | time elapsed: 55sec 143msec | avg time per step: 31msec
94 % = 1938 | time elapsed: 58sec 407msec | avg time per step: 31msec
99 % = 2040 | time elapsed: 1min 1sec 752msec | avg time per step: 32msec
calculation simulation time: 1min 2sec 3msec = 62 sec
full simulation time: 5min 23sec 382msec = 323 sec
cristian@T7500:~/PIC_INPUT/PICONGPU/TESTS/myLaserWakefield$ rm -r .build && pic-build &> out.txt
cristian@T7500:~/PIC_INPUT/PICONGPU/TESTS/myLaserWakefield$ tbg -s bash -c etc/picongpu/1.cfg -t etc/picongpu/bash/mpiexec.tpl /media/cristian/RawDataDisk/PICONGPU/TESTS/myLaserWakefield_run_02
Running program...
using default compiler
==> Error: Spec '[email protected]%[email protected]+adios+hdf5+isaac+png backend=cuda cudacxx=nvcc arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+blosc~bzip2~fortran~hdf5~infiniband+lz4+mpi~netcdf+shared+sz~szip+zfp+zlib patches=01113e9efb929d71c28bf33cc8b7f215d85195ec700e99cb41164e2f8f830640,8ae17f655248e87cbab1d1ed794e15364a38d2f5f8d971b1086702f72d79bd42,d24b79b795f66e40ddcd331ea4be896ac9c393d6f68f4318616d23928b0694e9 staging=none arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+atomic+chrono~clanglibcpp~container~context~coroutine+date_time~debug+exception~fiber+filesystem+graph~icu+iostreams+locale+log+math~mpi+multithreaded~numpy~pic+program_options~python+random+regex+serialization+shared+signals~singlethreaded+system~taggedlayout+test+thread+timer~versionedlayout+wave cxxstd=11 visibility=hidden arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+shared arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+avx2~ipo build_type=RelWithDebInfo patches=cd40604a26157a0e018ea496cf3267e116e6ec5ff80a7d1cef11b841c154c388 arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~doc+ncurses+openssl+ownlibs~qt patches=bf695e3febb222da2ed94b3beea600650e4318975da90e4a71d6f31a6d5d8c3d arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+libbsd arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+bzip2+curses+git~libunistring+libxml2+tar+xz arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~cxx~debug~fortran~hl~java+mpi+pic+shared~szip~threadsafe api=none arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~cairo~cuda~gl~libudev+libxml2~netloc~nvml+pci+shared arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~ipo build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^isaac@develop%[email protected]+cuda~ipo build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^isaac-server@develop%[email protected]~ipo build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~ipo+shared build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] patches=26f26c6f29a7ce9bf370ad3ab2610f99365b4bdd7b82e7c31df41a3370d685c0 arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~ipo+mpi build_type=RelWithDebInfo patches=669608721dfce0ada7cef1ac84344352791a8916b7bb98ca8a0d4e6d4670e744 arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~ipo build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~python arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+sigsegv patches=3877ab548f88597ab2327a2230ee048d2d07ace1062efe81fc92e91b7f39cd00,fc9b61654a3ba1a8d6cd78ce087e7c96366c290bc8d2c299f09828d793b853c8 arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~symlinks+termlib arch=linux-linuxmint19-westmere ^[email protected]%[email protected] patches=4e1d78cbbb85de625bad28705e748856033eaafab92a66dffd383a3d7e00cc94 arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~atomics~cuda~cxx~cxx_exceptions+gpfs~java~legacylaunchers~lustre~memchecker~pmi~singularity~sqlite3+static~thread_multiple+vt+wrapper-rpath fabrics=none schedulers=none arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+systemcerts arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+cpanm+shared+threads arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~ipo build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+bz2+ctypes+dbm~debug+libxml2+lzma~nis~optimizations+pic+pyexpat+pythoncmd+readline+shared+sqlite3+ssl~tix~tkinter~ucs4+uuid+zlib patches=0d98e93189bc278fbc37a50ed7f183bd8aaf249a8e1670a465f0db6bb4f8cf87 arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~ipo+pic+shared build_type=RelWithDebInfo patches=c9cfecb1f7a623418590cf4e00ae7d308d1c3faeb15046c2e5090e38221da7cd arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+column_metadata+fts~functions~rtree arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~fortran~hdf5~ipo~netcdf~pastri~python~random_access+shared~stats~time_compression build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~pic arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~aligned~fasthash~ipo~profile+shared~strided~twoway bsws=64 build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+optimize+pic+shared arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+pic arch=linux-linuxmint19-westmere' matches no installed packages.
PIConGPU: 0.5.0
Build-Type: Release
Third party:
OS: Linux-5.0.0-32-generic
arch: x86_64
CXX: GNU (7.5.0)
CMake: 3.18.4
CUDA: 10.2.89
mallocMC: 2.3.1
Boost: 1.70.0
MPI:
standard: 3.1
flavor: OpenMPI (3.1.6)
PNGwriter: 0.7.0
libSplash: 1.7.0 (Format 4.0)
ADIOS: 1.13.1
PIConGPUVerbose PHYSICS(1) | Sliding Window is OFF
PIConGPUVerbose PHYSICS(1) | used Random Number Generator: RNGProvider3XorMin seed: 42
PIConGPUVerbose PHYSICS(1) | Courant c*dt <= 1.00229 ? 1
PIConGPUVerbose PHYSICS(1) | Resolving plasma oscillations?
Estimates are based on DensityRatio to BASE_DENSITY of each species
(see: density.param, speciesDefinition.param).
It and does not cover other forms of initialization
PIConGPUVerbose PHYSICS(1) | species e: omega_p * dt <= 0.1 ? 0.0247974
PIConGPUVerbose PHYSICS(1) | species i: omega_p * dt <= 0.1 ? 0.000578698
PIConGPUVerbose PHYSICS(1) | y-cells per wavelength: 18.0587
PIConGPUVerbose PHYSICS(1) | macro particles per device: 9437184
PIConGPUVerbose PHYSICS(1) | typical macro particle weighting: 6955.06
PIConGPUVerbose PHYSICS(1) | UNIT_SPEED 2.99792e+08
PIConGPUVerbose PHYSICS(1) | UNIT_TIME 1.39e-16
PIConGPUVerbose PHYSICS(1) | UNIT_LENGTH 4.16712e-08
PIConGPUVerbose PHYSICS(1) | UNIT_MASS 6.33563e-27
PIConGPUVerbose PHYSICS(1) | UNIT_CHARGE 1.11432e-15
PIConGPUVerbose PHYSICS(1) | UNIT_EFIELD 1.22627e+13
PIConGPUVerbose PHYSICS(1) | UNIT_BFIELD 40903.8
PIConGPUVerbose PHYSICS(1) | UNIT_ENERGY 5.69418e-10
initialization time: 52min 31sec 8msec = 3151 sec
0 % = 0 | time elapsed: 44msec | avg time per step: 0msec
4 % = 102 | time elapsed: 3sec 6msec | avg time per step: 28msec
9 % = 204 | time elapsed: 5sec 979msec | avg time per step: 28msec
14 % = 306 | time elapsed: 9sec 14msec | avg time per step: 29msec
19 % = 408 | time elapsed: 12sec 208msec | avg time per step: 30msec
24 % = 510 | time elapsed: 15sec 530msec | avg time per step: 32msec
29 % = 612 | time elapsed: 18sec 899msec | avg time per step: 32msec
34 % = 714 | time elapsed: 22sec 340msec | avg time per step: 33msec
39 % = 816 | time elapsed: 25sec 827msec | avg time per step: 33msec
44 % = 918 | time elapsed: 29sec 401msec | avg time per step: 34msec
49 % = 1020 | time elapsed: 33sec 99msec | avg time per step: 35msec
54 % = 1122 | time elapsed: 36sec 878msec | avg time per step: 36msec
59 % = 1224 | time elapsed: 40sec 805msec | avg time per step: 38msec
64 % = 1326 | time elapsed: 44sec 920msec | avg time per step: 39msec
69 % = 1428 | time elapsed: 49sec 162msec | avg time per step: 41msec
74 % = 1530 | time elapsed: 53sec 517msec | avg time per step: 42msec
79 % = 1632 | time elapsed: 57sec 908msec | avg time per step: 42msec
84 % = 1734 | time elapsed: 1min 1sec 959msec | avg time per step: 39msec
89 % = 1836 | time elapsed: 1min 5sec 897msec | avg time per step: 38msec
94 % = 1938 | time elapsed: 1min 9sec 884msec | avg time per step: 38msec
99 % = 2040 | time elapsed: 1min 13sec 980msec | avg time per step: 39msec
calculation simulation time: 1min 14sec 292msec = 74 sec
full simulation time: 53min 45sec 562msec = 3225 sec
@cbontoiu How much main memory does your system has?
What kind of GPU do you have?
What is the difference of the last two runs you posted?
From the output it is not possible to see it.
Hello @psychocoderHPC and thanks for your availability. The last two runs for the 0.5.0 version show:
LWFA with electrons only: initialization time: 4min 21sec 158msec = 261 sec
LWFA with electrons + ions: initialization time: 52min 31sec 8msec = 3151 sec
I am running on two different computers. One with RTX2070 Super and 32GB of RAM and the other one with RTX5000 (2 pieces) and 64 GB or RAM. When ions are enabled, both computers show long initialization times for the develop and 0.5.0 versions and tiny growth (from the electrons only setup) for the 0.4.3 version.
@cbontoiu I switch to a different cluster partition to test your issue on 1 K80 GPU. I could not reproduce the issue. Initialization times are pretty much equal (19 seconds) for both electrons only and electrons + ions.
stdout for electrons and ions:
Running program...
PIConGPU: 0.5.0
Build-Type: Release
Third party:
OS: Linux-3.10.0-693.17.1.el7.x86_64
arch: x86_64
CXX: GNU (7.3.0)
CMake: 3.15.2
CUDA: 10.0.130
mallocMC: 2.3.1
Boost: 1.68.0
MPI:
standard: 3.1
flavor: OpenMPI (2.1.6)
PNGwriter: 0.7.0
libSplash: 1.7.0 (Format 4.0)
ADIOS: 1.13.1
PIConGPUVerbose PHYSICS(1) | Sliding Window is OFF
PIConGPUVerbose PHYSICS(1) | used Random Number Generator: RNGProvider3XorMin seed: 42
PIConGPUVerbose PHYSICS(1) | Courant c*dt <= 1.00229 ? 1
PIConGPUVerbose PHYSICS(1) | Resolving plasma oscillations?
Estimates are based on DensityRatio to BASE_DENSITY of each species
(see: density.param, speciesDefinition.param).
It and does not cover other forms of initialization
PIConGPUVerbose PHYSICS(1) | species e: omega_p * dt <= 0.1 ? 0.0247974
PIConGPUVerbose PHYSICS(1) | species i: omega_p * dt <= 0.1 ? 0.000578698
PIConGPUVerbose PHYSICS(1) | y-cells per wavelength: 18.0587
PIConGPUVerbose PHYSICS(1) | macro particles per device: 9437184
PIConGPUVerbose PHYSICS(1) | typical macro particle weighting: 6955.06
PIConGPUVerbose PHYSICS(1) | UNIT_SPEED 2.99792e+08
PIConGPUVerbose PHYSICS(1) | UNIT_TIME 1.39e-16
PIConGPUVerbose PHYSICS(1) | UNIT_LENGTH 4.16712e-08
PIConGPUVerbose PHYSICS(1) | UNIT_MASS 6.33563e-27
PIConGPUVerbose PHYSICS(1) | UNIT_CHARGE 1.11432e-15
PIConGPUVerbose PHYSICS(1) | UNIT_EFIELD 1.22627e+13
PIConGPUVerbose PHYSICS(1) | UNIT_BFIELD 40903.8
PIConGPUVerbose PHYSICS(1) | UNIT_ENERGY 5.69418e-10
initialization time: 19sec 137msec = 19 sec
0 % = 0 | time elapsed: 555msec | avg time per step: 0msec
4 % = 102 | time elapsed: 4sec 952msec | avg time per step: 42msec
9 % = 204 | time elapsed: 9sec 490msec | avg time per step: 44msec
14 % = 306 | time elapsed: 14sec 356msec | avg time per step: 47msec
19 % = 408 | time elapsed: 19sec 727msec | avg time per step: 52msec
24 % = 510 | time elapsed: 25sec 715msec | avg time per step: 56msec
29 % = 612 | time elapsed: 31sec 874msec | avg time per step: 60msec
34 % = 714 | time elapsed: 38sec 404msec | avg time per step: 63msec
39 % = 816 | time elapsed: 45sec 343msec | avg time per step: 67msec
44 % = 918 | time elapsed: 52sec 628msec | avg time per step: 71msec
49 % = 1020 | time elapsed: 1min 0sec 364msec | avg time per step: 75msec
54 % = 1122 | time elapsed: 1min 8sec 517msec | avg time per step: 79msec
59 % = 1224 | time elapsed: 1min 17sec 170msec | avg time per step: 84msec
64 % = 1326 | time elapsed: 1min 26sec 471msec | avg time per step: 90msec
69 % = 1428 | time elapsed: 1min 36sec 249msec | avg time per step: 95msec
74 % = 1530 | time elapsed: 1min 46sec 394msec | avg time per step: 99msec
79 % = 1632 | time elapsed: 1min 56sec 737msec | avg time per step: 100msec
84 % = 1734 | time elapsed: 2min 6sec 45msec | avg time per step: 90msec
89 % = 1836 | time elapsed: 2min 14sec 870msec | avg time per step: 86msec
94 % = 1938 | time elapsed: 2min 23sec 832msec | avg time per step: 87msec
99 % = 2040 | time elapsed: 2min 33sec 107msec | avg time per step: 90msec
calculation simulation time: 2min 33sec 798msec = 153 sec
full simulation time: 2min 53sec 630msec = 173 sec
stdout for electrons only:
Running program...
PIConGPU: 0.5.0
Build-Type: Release
Third party:
OS: Linux-3.10.0-693.17.1.el7.x86_64
arch: x86_64
CXX: GNU (7.3.0)
CMake: 3.15.2
CUDA: 10.0.130
mallocMC: 2.3.1
Boost: 1.68.0
MPI:
standard: 3.1
flavor: OpenMPI (2.1.6)
PNGwriter: 0.7.0
libSplash: 1.7.0 (Format 4.0)
ADIOS: 1.13.1
PIConGPUVerbose PHYSICS(1) | Sliding Window is OFF
PIConGPUVerbose PHYSICS(1) | used Random Number Generator: RNGProvider3XorMin seed: 42
PIConGPUVerbose PHYSICS(1) | Courant c*dt <= 1.00229 ? 1
PIConGPUVerbose PHYSICS(1) | Resolving plasma oscillations?
Estimates are based on DensityRatio to BASE_DENSITY of each species
(see: density.param, speciesDefinition.param).
It and does not cover other forms of initialization
PIConGPUVerbose PHYSICS(1) | species e: omega_p * dt <= 0.1 ? 0.0247974
PIConGPUVerbose PHYSICS(1) | y-cells per wavelength: 18.0587
PIConGPUVerbose PHYSICS(1) | macro particles per device: 4718592
PIConGPUVerbose PHYSICS(1) | typical macro particle weighting: 6955.06
PIConGPUVerbose PHYSICS(1) | UNIT_SPEED 2.99792e+08
PIConGPUVerbose PHYSICS(1) | UNIT_TIME 1.39e-16
PIConGPUVerbose PHYSICS(1) | UNIT_LENGTH 4.16712e-08
PIConGPUVerbose PHYSICS(1) | UNIT_MASS 6.33563e-27
PIConGPUVerbose PHYSICS(1) | UNIT_CHARGE 1.11432e-15
PIConGPUVerbose PHYSICS(1) | UNIT_EFIELD 1.22627e+13
PIConGPUVerbose PHYSICS(1) | UNIT_BFIELD 40903.8
PIConGPUVerbose PHYSICS(1) | UNIT_ENERGY 5.69418e-10
initialization time: 18sec 965msec = 18 sec
0 % = 0 | time elapsed: 60msec | avg time per step: 0msec
4 % = 102 | time elapsed: 2sec 963msec | avg time per step: 28msec
9 % = 204 | time elapsed: 5sec 964msec | avg time per step: 29msec
14 % = 306 | time elapsed: 9sec 210msec | avg time per step: 31msec
19 % = 408 | time elapsed: 12sec 846msec | avg time per step: 35msec
24 % = 510 | time elapsed: 16sec 850msec | avg time per step: 38msec
29 % = 612 | time elapsed: 21sec 40msec | avg time per step: 40msec
34 % = 714 | time elapsed: 25sec 439msec | avg time per step: 42msec
39 % = 816 | time elapsed: 30sec 88msec | avg time per step: 45msec
44 % = 918 | time elapsed: 34sec 969msec | avg time per step: 47msec
49 % = 1020 | time elapsed: 40sec 175msec | avg time per step: 50msec
54 % = 1122 | time elapsed: 45sec 616msec | avg time per step: 53msec
59 % = 1224 | time elapsed: 51sec 433msec | avg time per step: 56msec
64 % = 1326 | time elapsed: 57sec 831msec | avg time per step: 62msec
69 % = 1428 | time elapsed: 1min 4sec 655msec | avg time per step: 66msec
74 % = 1530 | time elapsed: 1min 11sec 826msec | avg time per step: 70msec
79 % = 1632 | time elapsed: 1min 19sec 122msec | avg time per step: 71msec
84 % = 1734 | time elapsed: 1min 25sec 403msec | avg time per step: 61msec
89 % = 1836 | time elapsed: 1min 31sec 184msec | avg time per step: 56msec
94 % = 1938 | time elapsed: 1min 37sec 90msec | avg time per step: 57msec
99 % = 2040 | time elapsed: 1min 43sec 236msec | avg time per step: 59msec
calculation simulation time: 1min 43sec 683msec = 103 sec
full simulation time: 2min 2sec 829msec = 122 sec
So I am confused by the massive difference you encounter. Thus I can only guess possible issues:
git diff and perform a clean pic-create?@psychocoderHPC @sbastrakov Do you have any other ideas?
Hello all and thank you for looking to this.
There is still another possibility. It might happen that running through Spack gives me trouble. If you could check the behaviour using Spack on a local machine, we could judge. I used two different graphics cards (RTX2070 Super and RTX5000) on two different machines with two different CUDA versions (10 and 9) and with both develop and 0.5.0 versions, always using the LWFA model which came with the distribution.
From the past we know that running on the cluster through Spack and directly on the system using installed modules is different, so this is what makes me think that Spack plays a role here. The question is whay it does it form version 0.5.0 and develop but not for 0.4.3
@cbontoiu I agree, and I can not judge how likely this is. I will add this idea to the list above.
Spack itself merely provides dependencies. The only way it influences things if these dependencies get wrong, or very unoptimal, settings.
I tested with ISAAC enabled and cuda 10.2 - I still get the same init times.
electrons and ions:
Running program...
PIConGPU: 0.5.0
Build-Type: Release
Third party:
OS: Linux-3.10.0-693.11.6.el7.x86_64
arch: x86_64
CXX: GNU (7.3.0)
CMake: 3.15.2
CUDA: 10.2.89
mallocMC: 2.3.1
Boost: 1.68.0
MPI:
standard: 3.1
flavor: OpenMPI (2.1.6)
PNGwriter: 0.7.0
libSplash: 1.7.0 (Format 4.0)
ADIOS: 1.13.1
PIConGPUVerbose PHYSICS(1) | Sliding Window is OFF
PIConGPUVerbose PHYSICS(1) | used Random Number Generator: RNGProvider3XorMin seed: 42
PIConGPUVerbose PHYSICS(1) | Courant c*dt <= 1.00229 ? 1
PIConGPUVerbose PHYSICS(1) | Resolving plasma oscillations?
Estimates are based on DensityRatio to BASE_DENSITY of each species
(see: density.param, speciesDefinition.param).
It and does not cover other forms of initialization
PIConGPUVerbose PHYSICS(1) | species e: omega_p * dt <= 0.1 ? 0.0247974
PIConGPUVerbose PHYSICS(1) | species i: omega_p * dt <= 0.1 ? 0.000578698
PIConGPUVerbose PHYSICS(1) | y-cells per wavelength: 18.0587
PIConGPUVerbose PHYSICS(1) | macro particles per device: 9437184
PIConGPUVerbose PHYSICS(1) | typical macro particle weighting: 6955.06
PIConGPUVerbose PHYSICS(1) | UNIT_SPEED 2.99792e+08
PIConGPUVerbose PHYSICS(1) | UNIT_TIME 1.39e-16
PIConGPUVerbose PHYSICS(1) | UNIT_LENGTH 4.16712e-08
PIConGPUVerbose PHYSICS(1) | UNIT_MASS 6.33563e-27
PIConGPUVerbose PHYSICS(1) | UNIT_CHARGE 1.11432e-15
PIConGPUVerbose PHYSICS(1) | UNIT_EFIELD 1.22627e+13
PIConGPUVerbose PHYSICS(1) | UNIT_BFIELD 40903.8
PIConGPUVerbose PHYSICS(1) | UNIT_ENERGY 5.69418e-10
initialization time: 20sec 784msec = 20 sec
0 % = 0 | time elapsed: 67msec | avg time per step: 0msec
4 % = 102 | time elapsed: 4sec 470msec | avg time per step: 42msec
9 % = 204 | time elapsed: 8sec 995msec | avg time per step: 43msec
14 % = 306 | time elapsed: 13sec 860msec | avg time per step: 47msec
19 % = 408 | time elapsed: 19sec 237msec | avg time per step: 52msec
24 % = 510 | time elapsed: 25sec 116msec | avg time per step: 57msec
29 % = 612 | time elapsed: 31sec 293msec | avg time per step: 60msec
34 % = 714 | time elapsed: 37sec 839msec | avg time per step: 63msec
39 % = 816 | time elapsed: 44sec 777msec | avg time per step: 67msec
44 % = 918 | time elapsed: 52sec 75msec | avg time per step: 71msec
49 % = 1020 | time elapsed: 59sec 825msec | avg time per step: 75msec
54 % = 1122 | time elapsed: 1min 8sec 6msec | avg time per step: 79msec
59 % = 1224 | time elapsed: 1min 16sec 721msec | avg time per step: 84msec
64 % = 1326 | time elapsed: 1min 26sec 75msec | avg time per step: 91msec
69 % = 1428 | time elapsed: 1min 35sec 906msec | avg time per step: 96msec
74 % = 1530 | time elapsed: 1min 46sec 124msec | avg time per step: 99msec
79 % = 1632 | time elapsed: 1min 56sec 504msec | avg time per step: 101msec
84 % = 1734 | time elapsed: 2min 5sec 878msec | avg time per step: 91msec
89 % = 1836 | time elapsed: 2min 14sec 719msec | avg time per step: 86msec
94 % = 1938 | time elapsed: 2min 23sec 709msec | avg time per step: 87msec
99 % = 2040 | time elapsed: 2min 33sec 31msec | avg time per step: 91msec
calculation simulation time: 2min 33sec 725msec = 153 sec
full simulation time: 2min 54sec 717msec = 174 sec
electrons only:
Running program...
PIConGPU: 0.5.0
Build-Type: Release
Third party:
OS: Linux-3.10.0-693.17.1.el7.x86_64
arch: x86_64
CXX: GNU (7.3.0)
CMake: 3.15.2
CUDA: 10.2.89
mallocMC: 2.3.1
Boost: 1.68.0
MPI:
standard: 3.1
flavor: OpenMPI (2.1.6)
PNGwriter: 0.7.0
libSplash: 1.7.0 (Format 4.0)
ADIOS: 1.13.1
PIConGPUVerbose PHYSICS(1) | Sliding Window is OFF
PIConGPUVerbose PHYSICS(1) | used Random Number Generator: RNGProvider3XorMin seed: 42
PIConGPUVerbose PHYSICS(1) | Courant c*dt <= 1.00229 ? 1
PIConGPUVerbose PHYSICS(1) | Resolving plasma oscillations?
Estimates are based on DensityRatio to BASE_DENSITY of each species
(see: density.param, speciesDefinition.param).
It and does not cover other forms of initialization
PIConGPUVerbose PHYSICS(1) | species e: omega_p * dt <= 0.1 ? 0.0247974
PIConGPUVerbose PHYSICS(1) | y-cells per wavelength: 18.0587
PIConGPUVerbose PHYSICS(1) | macro particles per device: 4718592
PIConGPUVerbose PHYSICS(1) | typical macro particle weighting: 6955.06
PIConGPUVerbose PHYSICS(1) | UNIT_SPEED 2.99792e+08
PIConGPUVerbose PHYSICS(1) | UNIT_TIME 1.39e-16
PIConGPUVerbose PHYSICS(1) | UNIT_LENGTH 4.16712e-08
PIConGPUVerbose PHYSICS(1) | UNIT_MASS 6.33563e-27
PIConGPUVerbose PHYSICS(1) | UNIT_CHARGE 1.11432e-15
PIConGPUVerbose PHYSICS(1) | UNIT_EFIELD 1.22627e+13
PIConGPUVerbose PHYSICS(1) | UNIT_BFIELD 40903.8
PIConGPUVerbose PHYSICS(1) | UNIT_ENERGY 5.69418e-10
initialization time: 19sec 389msec = 19 sec
0 % = 0 | time elapsed: 55msec | avg time per step: 0msec
4 % = 102 | time elapsed: 2sec 962msec | avg time per step: 28msec
9 % = 204 | time elapsed: 6sec 23msec | avg time per step: 29msec
14 % = 306 | time elapsed: 9sec 264msec | avg time per step: 31msec
19 % = 408 | time elapsed: 12sec 913msec | avg time per step: 35msec
24 % = 510 | time elapsed: 16sec 946msec | avg time per step: 39msec
29 % = 612 | time elapsed: 21sec 149msec | avg time per step: 40msec
34 % = 714 | time elapsed: 25sec 571msec | avg time per step: 42msec
39 % = 816 | time elapsed: 30sec 234msec | avg time per step: 45msec
44 % = 918 | time elapsed: 35sec 133msec | avg time per step: 47msec
49 % = 1020 | time elapsed: 40sec 348msec | avg time per step: 50msec
54 % = 1122 | time elapsed: 45sec 813msec | avg time per step: 53msec
59 % = 1224 | time elapsed: 51sec 660msec | avg time per step: 57msec
64 % = 1326 | time elapsed: 58sec 96msec | avg time per step: 62msec
69 % = 1428 | time elapsed: 1min 4sec 985msec | avg time per step: 67msec
74 % = 1530 | time elapsed: 1min 12sec 210msec | avg time per step: 70msec
79 % = 1632 | time elapsed: 1min 19sec 576msec | avg time per step: 71msec
84 % = 1734 | time elapsed: 1min 25sec 901msec | avg time per step: 61msec
89 % = 1836 | time elapsed: 1min 31sec 713msec | avg time per step: 56msec
94 % = 1938 | time elapsed: 1min 37sec 650msec | avg time per step: 57msec
99 % = 2040 | time elapsed: 1min 43sec 841msec | avg time per step: 60msec
calculation simulation time: 1min 44sec 291msec = 104 sec
full simulation time: 2min 3sec 855msec = 123 sec
A slight increase from 19 to 21 seconds for with ions, but other than that, it's the same.
However, the compile duration rattled me:
my electron + ions compile time: 19m33.860s = 1174 sec
my electron only compile time: 8m14.936s = 495 sec
my ratio = 2.4 --> the expected increase due to ISAAC
your electron + ion init time: 43min 23sec 89msec = 2603 sec
your electron only init time: 3min 43sec 605msec = 223 sec
your ratio: 11.7 --> unexplained init duration increase
This happens despite that fact that you are using a more modern GPU - which you can also see that your avg time per step is only ~40ms while on the (old) k80 it is ~60ms.
The question for me is now, did you accidentally ran into JIT compilation due to a missing --archdefinition?
In your profile, how did you set PIC_BACKEND?
If you go to your .build directory and do ccmake . - what parameter is set for ALPAKA_CUDA_ARCH?
If I do not define PIC_BACKEND I get a default value of 30 in ALPAKA_CUDA_ARCH. Thus my results on k80 architectures are pretty much the same (init times being around 24 sec). This is the Kepler architecture and thus no JIT is needed.
However, if I submit such a executable to a V100 (Volta) - the init time explodes and is *40min 33sec 909msec = 2433 sec due to JIT.
(Exploring it interactively reveals registration of PIConGPU via nvida-smi but no GPU usage, while 100% CPU usage.)
Running program...
PIConGPU: 0.5.0
Build-Type: Release
Third party:
OS: Linux-3.10.0-693.11.6.el7.x86_64
arch: x86_64
CXX: GNU (7.3.0)
CMake: 3.15.2
CUDA: 10.2.89
mallocMC: 2.3.1
Boost: 1.68.0
MPI:
standard: 3.1
flavor: OpenMPI (2.1.6)
PNGwriter: 0.7.0
libSplash: 1.7.0 (Format 4.0)
ADIOS: 1.13.1
PIConGPUVerbose PHYSICS(1) | Sliding Window is OFF
PIConGPUVerbose PHYSICS(1) | used Random Number Generator: RNGProvider3XorMin seed: 42
PIConGPUVerbose PHYSICS(1) | Courant c*dt <= 1.00229 ? 1
PIConGPUVerbose PHYSICS(1) | Resolving plasma oscillations?
Estimates are based on DensityRatio to BASE_DENSITY of each species
(see: density.param, speciesDefinition.param).
It and does not cover other forms of initialization
PIConGPUVerbose PHYSICS(1) | species e: omega_p * dt <= 0.1 ? 0.0247974
PIConGPUVerbose PHYSICS(1) | species i: omega_p * dt <= 0.1 ? 0.000578698
PIConGPUVerbose PHYSICS(1) | y-cells per wavelength: 18.0587
PIConGPUVerbose PHYSICS(1) | macro particles per device: 9437184
PIConGPUVerbose PHYSICS(1) | typical macro particle weighting: 6955.06
PIConGPUVerbose PHYSICS(1) | UNIT_SPEED 2.99792e+08
PIConGPUVerbose PHYSICS(1) | UNIT_TIME 1.39e-16
PIConGPUVerbose PHYSICS(1) | UNIT_LENGTH 4.16712e-08
PIConGPUVerbose PHYSICS(1) | UNIT_MASS 6.33563e-27
PIConGPUVerbose PHYSICS(1) | UNIT_CHARGE 1.11432e-15
PIConGPUVerbose PHYSICS(1) | UNIT_EFIELD 1.22627e+13
PIConGPUVerbose PHYSICS(1) | UNIT_BFIELD 40903.8
PIConGPUVerbose PHYSICS(1) | UNIT_ENERGY 5.69418e-10
initialization time: 40min 33sec 909msec = 2433 sec
0 % = 0 | time elapsed: 43msec | avg time per step: 0msec
4 % = 102 | time elapsed: 1sec 617msec | avg time per step: 15msec
9 % = 204 | time elapsed: 3sec 201msec | avg time per step: 15msec
14 % = 306 | time elapsed: 4sec 912msec | avg time per step: 16msec
19 % = 408 | time elapsed: 6sec 810msec | avg time per step: 18msec
24 % = 510 | time elapsed: 8sec 853msec | avg time per step: 19msec
29 % = 612 | time elapsed: 10sec 941msec | avg time per step: 20msec
34 % = 714 | time elapsed: 13sec 2msec | avg time per step: 20msec
39 % = 816 | time elapsed: 15sec 90msec | avg time per step: 20msec
44 % = 918 | time elapsed: 17sec 290msec | avg time per step: 21msec
49 % = 1020 | time elapsed: 19sec 626msec | avg time per step: 22msec
54 % = 1122 | time elapsed: 22sec 19msec | avg time per step: 23msec
59 % = 1224 | time elapsed: 24sec 520msec | avg time per step: 24msec
64 % = 1326 | time elapsed: 27sec 220msec | avg time per step: 26msec
69 % = 1428 | time elapsed: 30sec 69msec | avg time per step: 27msec
74 % = 1530 | time elapsed: 33sec 29msec | avg time per step: 28msec
79 % = 1632 | time elapsed: 36sec 8msec | avg time per step: 28msec
84 % = 1734 | time elapsed: 38sec 560msec | avg time per step: 24msec
89 % = 1836 | time elapsed: 41sec 11msec | avg time per step: 23msec
94 % = 1938 | time elapsed: 43sec 526msec | avg time per step: 24msec
99 % = 2040 | time elapsed: 46sec 134msec | avg time per step: 25msec
calculation simulation time: 46sec 322msec = 46 sec
full simulation time: 41min 20sec 427msec = 2480 sec
Sadly, we do not have access to a Turing system. But I expect a similar init time increase for your GPU.
So in order to avoid the init time issue, please set the architecture correctly. (The issue is further worsened due to ISAAC's long compile time with number of species - so if you no not need ISAAC, do not compile it.)
Thanks for investigating @PrometheusPi . Do you think we could add this note somewhere to the docs, as e.g. I was not aware of this?
Btw to set the backend one can just provide an option to pic-build, e.g. pic-build -b cuda:60. PIC_BACKEND merely sets default value for -b (or at least that's how it seems supposed to work, however I never tried not defining it).
@sbastrakov Good idea. I think both a improved documentation and perhaps a separation between init time for PIConGPU and an init time for JIT as output would help to prevent such mistakes.
@PrometheusPi and @sbastrakov
I always used export PIC_BACKEND="cuda:72", though my cards support more, sm73 was not accepted at compilation time due to old CUDA. I admit that I installed ISAAC along with version develop and 0.5.0 but I haven't use it explicitly. Does it mean that once ISAAC is installed along with picongpu as for example spack install picongpu@develop +adios +isaac %[email protected] ^isaac@develop ^isaac-server@developit is always compiled, even without loading it?
I am happy to let you connect to one of my computers through TeamViewer and explore the Turing architecture. I will need to install picongpu@develop again to test your suggestion If you go to your .build directory and do ccmake . - what parameter is set for ALPAKA_CUDA_ARCH? and I always do it from scratch wiping all Spack folder for fear of dependencies clash.
I always asked myself if having CUDA and openMPI installed on the system not in Spack does help to speed up PIConGPU. What is your opinion about this?
And then there is this openMPI setup CUDA aware which I managed to follow and PIConGPU used.
https://www.open-mpi.org/faq/?category=buildcuda
Do you think it helps in any way as compared with the case when Spack builds CUDA and openMPI?
PROBLEM SOLVED!
For the performance shown below the specifications were:
electrons:
initialization time: 10sec 281msec = 10 sec
0 % = 0 | time elapsed: 11sec 896msec | avg time per step: 0msec
4 % = 102 | time elapsed: 24sec 35msec | avg time per step: 23msec
9 % = 204 | time elapsed: 36sec 330msec | avg time per step: 23msec
14 % = 306 | time elapsed: 48sec 674msec | avg time per step: 24msec
19 % = 408 | time elapsed: 1min 1sec 14msec | avg time per step: 24msec
24 % = 510 | time elapsed: 1min 13sec 428msec | avg time per step: 25msec
29 % = 612 | time elapsed: 1min 25sec 555msec | avg time per step: 25msec
34 % = 714 | time elapsed: 1min 37sec 800msec | avg time per step: 27msec
39 % = 816 | time elapsed: 1min 50sec 365msec | avg time per step: 26msec
44 % = 918 | time elapsed: 2min 3sec 48msec | avg time per step: 28msec
49 % = 1020 | time elapsed: 2min 15sec 733msec | avg time per step: 28msec
54 % = 1122 | time elapsed: 2min 28sec 356msec | avg time per step: 29msec
59 % = 1224 | time elapsed: 2min 41sec 152msec | avg time per step: 31msec
64 % = 1326 | time elapsed: 2min 53sec 873msec | avg time per step: 30msec
69 % = 1428 | time elapsed: 3min 6sec 905msec | avg time per step: 31msec
74 % = 1530 | time elapsed: 3min 19sec 637msec | avg time per step: 30msec
79 % = 1632 | time elapsed: 3min 32sec 728msec | avg time per step: 31msec
84 % = 1734 | time elapsed: 3min 45sec 214msec | avg time per step: 30msec
89 % = 1836 | time elapsed: 3min 57sec 580msec | avg time per step: 29msec
94 % = 1938 | time elapsed: 4min 10sec 135msec | avg time per step: 30msec
99 % = 2040 | time elapsed: 4min 22sec 389msec | avg time per step: 29msec
calculation simulation time: 4min 22sec 634msec = 262 sec
full simulation time: 4min 33sec 544msec = 273 sec
electrons + ions:
initialization time: 10sec 140msec = 10 sec
0 % = 0 | time elapsed: 23sec 689msec | avg time per step: 0msec
4 % = 102 | time elapsed: 48sec 69msec | avg time per step: 26msec
9 % = 204 | time elapsed: 1min 12sec 380msec | avg time per step: 26msec
14 % = 306 | time elapsed: 1min 36sec 784msec | avg time per step: 27msec
19 % = 408 | time elapsed: 2min 1sec 277msec | avg time per step: 28msec
24 % = 510 | time elapsed: 2min 25sec 401msec | avg time per step: 29msec
29 % = 612 | time elapsed: 2min 49sec 395msec | avg time per step: 30msec
34 % = 714 | time elapsed: 3min 13sec 532msec | avg time per step: 31msec
39 % = 816 | time elapsed: 3min 38sec 155msec | avg time per step: 33msec
44 % = 918 | time elapsed: 4min 2sec 815msec | avg time per step: 33msec
49 % = 1020 | time elapsed: 4min 27sec 390msec | avg time per step: 34msec
54 % = 1122 | time elapsed: 4min 52sec 126msec | avg time per step: 35msec
59 % = 1224 | time elapsed: 5min 17sec 49msec | avg time per step: 37msec
64 % = 1326 | time elapsed: 5min 42sec 128msec | avg time per step: 38msec
69 % = 1428 | time elapsed: 6min 7sec 17msec | avg time per step: 36msec
74 % = 1530 | time elapsed: 6min 31sec 902msec | avg time per step: 37msec
79 % = 1632 | time elapsed: 6min 56sec 549msec | avg time per step: 38msec
84 % = 1734 | time elapsed: 7min 20sec 947msec | avg time per step: 36msec
89 % = 1836 | time elapsed: 7min 45sec 313msec | avg time per step: 36msec
94 % = 1938 | time elapsed: 8min 10sec 404msec | avg time per step: 36msec
99 % = 2040 | time elapsed: 8min 34sec 2msec | avg time per step: 37msec
calculation simulation time: 8min 34sec 311msec = 514 sec
full simulation time: 8min 45sec 208msec = 525 sec
I wiped out the whole Spack folder and installed de develop version as spack install picongpu@develop +adios %[email protected] with CUDA 10.2 installed on the system. I also isntalled openPMD as spack install openpmd-api. With this ocasion ISAAC was not installed.
Then I loaded the program as
source $HOME/src/spack/share/spack/setup-env.sh && spack load openpmd-api && spack load picongpu %[email protected] && export PIC_BACKEND="cuda:75" && export OMPI_MCA_io=^ompio
so using compute capability 7.5 for the first time.
ALPAKA_CUDA_ARCH = 75
In conclusion, if the problem was not due to ISAAC and it shouldn't have been since I never set it up neither loaded it, it must have been due to running the code with slightly lower compute capability then what it can handle. That is 7.2 instead of 7.5.
Maybe an automatic change of CUDA capability can be implemented depending on the graphics card identified or on the CUDA toolking identified.
Some performance features for RTX 5000 GPU with the LWFA model. The CPU is 8-Core model: Intel Core i9-9900K

Glad the long initialization time issue is solved!
Regarding the run time, it is a bit weird to me that using 2 GPUs gives less than 2x speedup over 1 GPU. Perhaps the problem is too small so that they are underutilized, or the workload is not evenly distributed between them.
I agree with @sbastrakov the reduced speedup might be caused by an under-utilization. Furthermore I am confused why initialization for sm 7.3 takes 22 (35) seconds for 1 GPU and only 6 (6) seconds for two. For sm 7.5, the init times looks okay.
Most helpful comment
PROBLEM SOLVED!
For the performance shown below the specifications were:
electrons:
electrons + ions: