Picongpu: Understanding LWFA model

Created on 7 Nov 2020  路  53Comments  路  Source: ComputationalRadiationPhysics/picongpu

I am trying to check my understanding of the LWFA model.

the speciesDefinition.param contains define PARAM_IONS 0 which means that ions (protons here) are not included in the using VectorAllSpecies = MakeSeq_t< ... >;. Does this mean that there is no data stored for the ions but still they matter in the simulation? I guess so. On the other hand including them by setting define PARAM_IONS 1 extends the initialization time from 4 minutes to 40 mins in my case. This happens although the cfg file does not ask for data to be written out i.e. TBG_plugins="!TBG_pngYX". Is this normal behaviour? Is this efficient?

Related to data storing:

  • for electrons only (case 1) or
  • for electrons and ions (case 2),

what happens with the png figures written out? What would namespace preParticleDensCol = colorScales::red; mean in each case?

The speciesDefinition.param contains the block

if( PARAM_IONIZATION == 1 )

, boundElectrons

endif

which signals a simulation in which the Hydrogen atoms enter in neutral state and can be ionized during the interaction with the laser. I guess for Carbon atoms this is the way to go. Still I cannot see where PARAM_IONIZATION is defined in the original model.

Could this #if () ... # endif blocks be used in the starter.param file? Otherwise, what is this file used for?

Thank you.

question

Most helpful comment

PROBLEM SOLVED!
For the performance shown below the specifications were:

  • single GPU 2070 Super,
  • CPU Intel Xeon X5660,
  • CUDA 10.2.1,
  • compute capability 7.5

electrons:

initialization time: 10sec 281msec = 10 sec
  0 % =        0 | time elapsed:            11sec 896msec | avg time per step:   0msec
  4 % =      102 | time elapsed:            24sec  35msec | avg time per step:  23msec
  9 % =      204 | time elapsed:            36sec 330msec | avg time per step:  23msec
 14 % =      306 | time elapsed:            48sec 674msec | avg time per step:  24msec
 19 % =      408 | time elapsed:       1min  1sec  14msec | avg time per step:  24msec
 24 % =      510 | time elapsed:       1min 13sec 428msec | avg time per step:  25msec
 29 % =      612 | time elapsed:       1min 25sec 555msec | avg time per step:  25msec
 34 % =      714 | time elapsed:       1min 37sec 800msec | avg time per step:  27msec
 39 % =      816 | time elapsed:       1min 50sec 365msec | avg time per step:  26msec
 44 % =      918 | time elapsed:       2min  3sec  48msec | avg time per step:  28msec
 49 % =     1020 | time elapsed:       2min 15sec 733msec | avg time per step:  28msec
 54 % =     1122 | time elapsed:       2min 28sec 356msec | avg time per step:  29msec
 59 % =     1224 | time elapsed:       2min 41sec 152msec | avg time per step:  31msec
 64 % =     1326 | time elapsed:       2min 53sec 873msec | avg time per step:  30msec
 69 % =     1428 | time elapsed:       3min  6sec 905msec | avg time per step:  31msec
 74 % =     1530 | time elapsed:       3min 19sec 637msec | avg time per step:  30msec
 79 % =     1632 | time elapsed:       3min 32sec 728msec | avg time per step:  31msec
 84 % =     1734 | time elapsed:       3min 45sec 214msec | avg time per step:  30msec
 89 % =     1836 | time elapsed:       3min 57sec 580msec | avg time per step:  29msec
 94 % =     1938 | time elapsed:       4min 10sec 135msec | avg time per step:  30msec
 99 % =     2040 | time elapsed:       4min 22sec 389msec | avg time per step:  29msec
calculation  simulation time:  4min 22sec 634msec = 262 sec
full simulation time:  4min 33sec 544msec = 273 sec

electrons + ions:

initialization time: 10sec 140msec = 10 sec
  0 % =        0 | time elapsed:            23sec 689msec | avg time per step:   0msec
  4 % =      102 | time elapsed:            48sec  69msec | avg time per step:  26msec
  9 % =      204 | time elapsed:       1min 12sec 380msec | avg time per step:  26msec
 14 % =      306 | time elapsed:       1min 36sec 784msec | avg time per step:  27msec
 19 % =      408 | time elapsed:       2min  1sec 277msec | avg time per step:  28msec
 24 % =      510 | time elapsed:       2min 25sec 401msec | avg time per step:  29msec
 29 % =      612 | time elapsed:       2min 49sec 395msec | avg time per step:  30msec
 34 % =      714 | time elapsed:       3min 13sec 532msec | avg time per step:  31msec
 39 % =      816 | time elapsed:       3min 38sec 155msec | avg time per step:  33msec
 44 % =      918 | time elapsed:       4min  2sec 815msec | avg time per step:  33msec
 49 % =     1020 | time elapsed:       4min 27sec 390msec | avg time per step:  34msec
 54 % =     1122 | time elapsed:       4min 52sec 126msec | avg time per step:  35msec
 59 % =     1224 | time elapsed:       5min 17sec  49msec | avg time per step:  37msec
 64 % =     1326 | time elapsed:       5min 42sec 128msec | avg time per step:  38msec
 69 % =     1428 | time elapsed:       6min  7sec  17msec | avg time per step:  36msec
 74 % =     1530 | time elapsed:       6min 31sec 902msec | avg time per step:  37msec
 79 % =     1632 | time elapsed:       6min 56sec 549msec | avg time per step:  38msec
 84 % =     1734 | time elapsed:       7min 20sec 947msec | avg time per step:  36msec
 89 % =     1836 | time elapsed:       7min 45sec 313msec | avg time per step:  36msec
 94 % =     1938 | time elapsed:       8min 10sec 404msec | avg time per step:  36msec
 99 % =     2040 | time elapsed:       8min 34sec   2msec | avg time per step:  37msec
calculation  simulation time:  8min 34sec 311msec = 514 sec
full simulation time:  8min 45sec 208msec = 525 sec

All 53 comments

Hello @cbontoiu ,

So technically one can set values for such defines from both inside .param files and externally, via command-line options during compilation. To order this a little bit, our examples use the following naming scheme.

First, there are definitions starting with PARAM_, they are supposed to be provided externally. We normally do not require them to be provided and so check if the variable is defined and set some default otherwise. For example, here we check if "input" variable PARAM_IONS is provided and if not set it to 0. To set such variables, the easiest way is to modify the cmakeFlags file at the root level of each input directory, e.g. this file for the standard LWFA example (of course, your copy of it after doing pic-create). By default, only flags[0] is used for building. So e.g. to enable ions and ionization to that simulation, you can change it to
flags[0]="-DPARAM_OVERWRITES:LIST='-DPARAM_IONS=1;-DPARAM_IONIZATION=1'"" (-D is for the command line of compiler).

In the .param files we use both those PARAM_... variables, and sometimes derive other macro definitions from it. In this case, the derived names are also in all capital, but not supposed to be directly provided from a user, like the PARAM_ are.

Regarding much larger initialization time with ions, I am not sure what's the reason.

@n01r could you comment, maybe initialization with boundElectrons already involves some ionization calculations?

You can use #if () ... # endif in all .param files since they are just C++ files. However, I would imagine one needs a really good reason to tinker with starter.param, and most things are much easier accomplished without doing it.

@n01r could you comment, maybe initialization with boundElectrons already involves some ionization calculations?

Sure, no further ionization calculations are done on initialization.
The way the LWFA example was configured with its cmakeFlags is that in most cases, neither are ions created nor is ionization active. Only the flags[9] case activates both of these.
For most LWFA simulations, you do not need the ion motion or ionization. Still, if you care about a realistic charge in the accelerated electron bunch, you have to consider it. But @PrometheusPi knows more about this.

Our cmakeFlags in general, give you the ability to do parameter scans quickly and implement different configurations that you can switch on or off.

I would expect a higher initialization time for a case with ions but not an increase by a factor of 10.

@cbontoiu Your assumption is right. By default, the initialization only creates electrons. Since charge neutrally is initially assumed, we thus indirectly assume a compensation of the electron charge by an immobile ion background. This is a surprisingly good assumption for most LWFA cases. However, as you pointed out correctly, you could also initialize the ions. This will require more memory and some more copying. However, the ions are derived from the electrons and thus, this should be quicker than the electron initialization beforehand. Thus both ions and electrons should be initialized in less than 2x4=8 minutes in your case. Therefore, I find the 10-fold increase in initialization time a bit alarming. Even with IO included (which you said, is not running), the increase should be linear and thus "only" double the time. The only plugin that causes a severe compile time increase with more species is ISAAC. Did you compile the LWFA example together with ISAAC?

@PrometheusPi and all, thnaks for your quick reply. I didn't compile with ISAAC. In fact I haven't setup the server for ISAAC, though ISACC and its dependencies were installed via Spack together with picongpu@develop. I felt something weirg goes on when my usual model took a lot of time to initialize. Then I tought I am doing something wrong so I stated from scratch with the LWFA model. I ran first with define PARAM_IONS 0, all good, quick initialiation. Then I only changed to define PARAM_IONS 1 and it is much slower (factor of 10). You can easily check I guess, but i can also upload my model if necessary.

Here are the two outputs:

CONGPU/TESTS/myLaserWakefield_run_04
WARNING: 4 input file(s) in include/
         have been modified since the last compile!
         Did you forget to recompile?
         Run 'pic-build -f' to recompile with the modified files.
List of modified files:
/home/cristian/PIC_INPUT/PICONGPU/TESTS/myLaserWakefield/include/picongpu/param/speciesInitialization.param
/home/cristian/PIC_INPUT/PICONGPU/TESTS/myLaserWakefield/include/picongpu/param/species.param
/home/cristian/PIC_INPUT/PICONGPU/TESTS/myLaserWakefield/include/picongpu/param/speciesDefinition.param
/home/cristian/PIC_INPUT/PICONGPU/TESTS/myLaserWakefield/include/picongpu/param/starter.param
Running program...
using default compiler
==> Error: Spec 'picongpu@develop%[email protected]+adios+hdf5+isaac+png backend=cuda cudacxx=nvcc arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+blosc~bzip2~fortran~hdf5~infiniband+lz4+mpi~netcdf+shared+sz~szip+zfp+zlib patches=01113e9efb929d71c28bf33cc8b7f215d85195ec700e99cb41164e2f8f830640,8ae17f655248e87cbab1d1ed794e15364a38d2f5f8d971b1086702f72d79bd42,d24b79b795f66e40ddcd331ea4be896ac9c393d6f68f4318616d23928b0694e9 staging=none arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+atomic+chrono~clanglibcpp~container~context~coroutine+date_time~debug+exception~fiber+filesystem+graph~icu+iostreams+locale+log+math~mpi+multithreaded~numpy~pic+program_options~python+random+regex+serialization+shared+signals~singlethreaded+system~taggedlayout+test+thread+timer~versionedlayout+wave cxxstd=11 visibility=hidden arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+shared arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+avx2~ipo build_type=RelWithDebInfo patches=cd40604a26157a0e018ea496cf3267e116e6ec5ff80a7d1cef11b841c154c388 arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~doc+ncurses+openssl+ownlibs~qt patches=bf695e3febb222da2ed94b3beea600650e4318975da90e4a71d6f31a6d5d8c3d arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+libbsd arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+bzip2+curses+git~libunistring+libxml2+tar+xz arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~cxx~debug~fortran~hl~java+mpi+pic+shared~szip~threadsafe api=none arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~cairo~cuda~gl~libudev+libxml2~netloc~nvml+pci+shared arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~ipo build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^isaac@develop%[email protected]+cuda~ipo build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^isaac-server@develop%[email protected]~ipo build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~ipo+shared build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] patches=26f26c6f29a7ce9bf370ad3ab2610f99365b4bdd7b82e7c31df41a3370d685c0 arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~ipo+mpi build_type=RelWithDebInfo patches=669608721dfce0ada7cef1ac84344352791a8916b7bb98ca8a0d4e6d4670e744 arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~ipo build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~python arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+sigsegv patches=3877ab548f88597ab2327a2230ee048d2d07ace1062efe81fc92e91b7f39cd00,fc9b61654a3ba1a8d6cd78ce087e7c96366c290bc8d2c299f09828d793b853c8 arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~symlinks+termlib arch=linux-linuxmint19-westmere ^[email protected]%[email protected] patches=4e1d78cbbb85de625bad28705e748856033eaafab92a66dffd383a3d7e00cc94 arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~atomics~cuda~cxx~cxx_exceptions+gpfs~java~legacylaunchers~lustre~memchecker~pmi~singularity~sqlite3+static~thread_multiple+vt+wrapper-rpath fabrics=none schedulers=none arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+systemcerts arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+cpanm+shared+threads arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~ipo build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+bz2+ctypes+dbm~debug+libxml2+lzma~nis~optimizations+pic+pyexpat+pythoncmd+readline+shared+sqlite3+ssl~tix~tkinter~ucs4+uuid+zlib patches=0d98e93189bc278fbc37a50ed7f183bd8aaf249a8e1670a465f0db6bb4f8cf87 arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~ipo+pic+shared build_type=RelWithDebInfo patches=c9cfecb1f7a623418590cf4e00ae7d308d1c3faeb15046c2e5090e38221da7cd arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+column_metadata+fts~functions~rtree arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~fortran~hdf5~ipo~netcdf~pastri~python~random_access+shared~stats~time_compression build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~pic arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~aligned~fasthash~ipo~profile+shared~strided~twoway bsws=64 build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+optimize+pic+shared arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+pic arch=linux-linuxmint19-westmere' matches no installed packages.
PIConGPU: 0.5.0-dev
  Build-Type: Release

Third party:
  OS:         Linux-5.0.0-32-generic
  arch:       x86_64
  CXX:        GNU (7.5.0)
  CMake:      3.18.4
  CUDA:       10.2.89
  mallocMC:   2.5.0
  Boost:      1.70.0
  MPI:        
    standard: 3.1
    flavor:   OpenMPI (3.1.6)
  PNGwriter:  0.7.0
  libSplash:  1.7.0 (Format 4.0)
  ADIOS:      1.13.1
  openPMD:    NOTFOUND
PIConGPUVerbose PHYSICS(1) | Sliding Window is OFF
PIConGPUVerbose PHYSICS(1) | used Random Number Generator: RNGProvider3XorMin seed: 42
PIConGPUVerbose PHYSICS(1) | Courant c*dt <= 1.00229 ? 1
PIConGPUVerbose PHYSICS(1) | Resolving plasma oscillations?
   Estimates are based on DensityRatio to BASE_DENSITY of each species
   (see: density.param, speciesDefinition.param).
   It and does not cover other forms of initialization
PIConGPUVerbose PHYSICS(1) | species e: omega_p * dt <= 0.1 ? 0.0247974
PIConGPUVerbose PHYSICS(1) | y-cells per wavelength: 18.0587
PIConGPUVerbose PHYSICS(1) | macro particles per device: 4718592
PIConGPUVerbose PHYSICS(1) | typical macro particle weighting: 6955.06
PIConGPUVerbose PHYSICS(1) | UNIT_SPEED 2.99792e+08
PIConGPUVerbose PHYSICS(1) | UNIT_TIME 1.39e-16
PIConGPUVerbose PHYSICS(1) | UNIT_LENGTH 4.16712e-08
PIConGPUVerbose PHYSICS(1) | UNIT_MASS 6.33563e-27
PIConGPUVerbose PHYSICS(1) | UNIT_CHARGE 1.11432e-15
PIConGPUVerbose PHYSICS(1) | UNIT_EFIELD 1.22627e+13
PIConGPUVerbose PHYSICS(1) | UNIT_BFIELD 40903.8
PIConGPUVerbose PHYSICS(1) | UNIT_ENERGY 5.69418e-10
initialization time:  3min 43sec 605msec = 223 sec
  0 % =        0 | time elapsed:                   41msec | avg time per step:   0msec
  4 % =      102 | time elapsed:             2sec 459msec | avg time per step:  23msec
  9 % =      204 | time elapsed:             4sec 894msec | avg time per step:  23msec
 14 % =      306 | time elapsed:             7sec 389msec | avg time per step:  24msec
 19 % =      408 | time elapsed:             9sec 952msec | avg time per step:  24msec
 24 % =      510 | time elapsed:            12sec 580msec | avg time per step:  25msec
 29 % =      612 | time elapsed:            15sec 259msec | avg time per step:  25msec
 34 % =      714 | time elapsed:            17sec 997msec | avg time per step:  26msec
 39 % =      816 | time elapsed:            20sec 791msec | avg time per step:  27msec
 44 % =      918 | time elapsed:            23sec 651msec | avg time per step:  27msec
 49 % =     1020 | time elapsed:            26sec 577msec | avg time per step:  28msec
 54 % =     1122 | time elapsed:            29sec 551msec | avg time per step:  28msec
 59 % =     1224 | time elapsed:            32sec 585msec | avg time per step:  29msec
 64 % =     1326 | time elapsed:            35sec 665msec | avg time per step:  29msec
 69 % =     1428 | time elapsed:            38sec 780msec | avg time per step:  30msec
 74 % =     1530 | time elapsed:            41sec 908msec | avg time per step:  30msec
 79 % =     1632 | time elapsed:            45sec  39msec | avg time per step:  30msec
 84 % =     1734 | time elapsed:            48sec  79msec | avg time per step:  29msec
 89 % =     1836 | time elapsed:            51sec  95msec | avg time per step:  29msec
 94 % =     1938 | time elapsed:            54sec 115msec | avg time per step:  29msec
 99 % =     2040 | time elapsed:            57sec 125msec | avg time per step:  29msec
calculation  simulation time: 57sec 363msec = 57 sec
full simulation time:  4min 41sec 190msec = 281 sec



md5-4ad7d11a0909831ba070d25869ceae5d



cristian@T7500:~/PIC_INPUT/PICONGPU/TESTS/myLaserWakefield$ rm -r .build && pic-build &> out.txt && tbg -s bash -c etc/picongpu/1.cfg -t etc/picongpu/bash/mpiexec.tpl /media/cristian/RawDataDisk/PICONGPU/TESTS/myLaserWakefield_01
Running program...
using default compiler
==> Error: Spec 'picongpu@develop%[email protected]+adios+hdf5+isaac+png backend=cuda cudacxx=nvcc arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+blosc~bzip2~fortran~hdf5~infiniband+lz4+mpi~netcdf+shared+sz~szip+zfp+zlib patches=01113e9efb929d71c28bf33cc8b7f215d85195ec700e99cb41164e2f8f830640,8ae17f655248e87cbab1d1ed794e15364a38d2f5f8d971b1086702f72d79bd42,d24b79b795f66e40ddcd331ea4be896ac9c393d6f68f4318616d23928b0694e9 staging=none arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+atomic+chrono~clanglibcpp~container~context~coroutine+date_time~debug+exception~fiber+filesystem+graph~icu+iostreams+locale+log+math~mpi+multithreaded~numpy~pic+program_options~python+random+regex+serialization+shared+signals~singlethreaded+system~taggedlayout+test+thread+timer~versionedlayout+wave cxxstd=11 visibility=hidden arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+shared arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+avx2~ipo build_type=RelWithDebInfo patches=cd40604a26157a0e018ea496cf3267e116e6ec5ff80a7d1cef11b841c154c388 arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~doc+ncurses+openssl+ownlibs~qt patches=bf695e3febb222da2ed94b3beea600650e4318975da90e4a71d6f31a6d5d8c3d arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+libbsd arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+bzip2+curses+git~libunistring+libxml2+tar+xz arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~cxx~debug~fortran~hl~java+mpi+pic+shared~szip~threadsafe api=none arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~cairo~cuda~gl~libudev+libxml2~netloc~nvml+pci+shared arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~ipo build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^isaac@develop%[email protected]+cuda~ipo build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^isaac-server@develop%[email protected]~ipo build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~ipo+shared build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] patches=26f26c6f29a7ce9bf370ad3ab2610f99365b4bdd7b82e7c31df41a3370d685c0 arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~ipo+mpi build_type=RelWithDebInfo patches=669608721dfce0ada7cef1ac84344352791a8916b7bb98ca8a0d4e6d4670e744 arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~ipo build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~python arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+sigsegv patches=3877ab548f88597ab2327a2230ee048d2d07ace1062efe81fc92e91b7f39cd00,fc9b61654a3ba1a8d6cd78ce087e7c96366c290bc8d2c299f09828d793b853c8 arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~symlinks+termlib arch=linux-linuxmint19-westmere ^[email protected]%[email protected] patches=4e1d78cbbb85de625bad28705e748856033eaafab92a66dffd383a3d7e00cc94 arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~atomics~cuda~cxx~cxx_exceptions+gpfs~java~legacylaunchers~lustre~memchecker~pmi~singularity~sqlite3+static~thread_multiple+vt+wrapper-rpath fabrics=none schedulers=none arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+systemcerts arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+cpanm+shared+threads arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~ipo build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+bz2+ctypes+dbm~debug+libxml2+lzma~nis~optimizations+pic+pyexpat+pythoncmd+readline+shared+sqlite3+ssl~tix~tkinter~ucs4+uuid+zlib patches=0d98e93189bc278fbc37a50ed7f183bd8aaf249a8e1670a465f0db6bb4f8cf87 arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~ipo+pic+shared build_type=RelWithDebInfo patches=c9cfecb1f7a623418590cf4e00ae7d308d1c3faeb15046c2e5090e38221da7cd arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+column_metadata+fts~functions~rtree arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~fortran~hdf5~ipo~netcdf~pastri~python~random_access+shared~stats~time_compression build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~pic arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~aligned~fasthash~ipo~profile+shared~strided~twoway bsws=64 build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+optimize+pic+shared arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+pic arch=linux-linuxmint19-westmere' matches no installed packages.



md5-4ad7d11a0909831ba070d25869ceae5d



PIConGPU: 0.5.0-dev
  Build-Type: Release

Third party:
  OS:         Linux-5.0.0-32-generic
  arch:       x86_64
  CXX:        GNU (7.5.0)
  CMake:      3.18.4
  CUDA:       10.2.89
  mallocMC:   2.5.0
  Boost:      1.70.0
  MPI:        
    standard: 3.1
    flavor:   OpenMPI (3.1.6)
  PNGwriter:  0.7.0
  libSplash:  1.7.0 (Format 4.0)
  ADIOS:      1.13.1
  openPMD:    NOTFOUND
PIConGPUVerbose PHYSICS(1) | Sliding Window is OFF
PIConGPUVerbose PHYSICS(1) | used Random Number Generator: RNGProvider3XorMin seed: 42
PIConGPUVerbose PHYSICS(1) | Courant c*dt <= 1.00229 ? 1
PIConGPUVerbose PHYSICS(1) | Resolving plasma oscillations?
   Estimates are based on DensityRatio to BASE_DENSITY of each species
   (see: density.param, speciesDefinition.param).
   It and does not cover other forms of initialization
PIConGPUVerbose PHYSICS(1) | species e: omega_p * dt <= 0.1 ? 0.0247974
PIConGPUVerbose PHYSICS(1) | species i: omega_p * dt <= 0.1 ? 0.000578698
PIConGPUVerbose PHYSICS(1) | y-cells per wavelength: 18.0587
PIConGPUVerbose PHYSICS(1) | macro particles per device: 9437184
PIConGPUVerbose PHYSICS(1) | typical macro particle weighting: 6955.06
PIConGPUVerbose PHYSICS(1) | UNIT_SPEED 2.99792e+08
PIConGPUVerbose PHYSICS(1) | UNIT_TIME 1.39e-16
PIConGPUVerbose PHYSICS(1) | UNIT_LENGTH 4.16712e-08
PIConGPUVerbose PHYSICS(1) | UNIT_MASS 6.33563e-27
PIConGPUVerbose PHYSICS(1) | UNIT_CHARGE 1.11432e-15
PIConGPUVerbose PHYSICS(1) | UNIT_EFIELD 1.22627e+13
PIConGPUVerbose PHYSICS(1) | UNIT_BFIELD 40903.8
PIConGPUVerbose PHYSICS(1) | UNIT_ENERGY 5.69418e-10
initialization time: 43min 23sec  89msec = 2603 sec
  0 % =        0 | time elapsed:                   44msec | avg time per step:   0msec
  4 % =      102 | time elapsed:             2sec 781msec | avg time per step:  26msec
  9 % =      204 | time elapsed:             5sec 539msec | avg time per step:  26msec
 14 % =      306 | time elapsed:             8sec 381msec | avg time per step:  27msec
 19 % =      408 | time elapsed:            11sec 330msec | avg time per step:  28msec
 24 % =      510 | time elapsed:            14sec 369msec | avg time per step:  29msec
 29 % =      612 | time elapsed:            17sec 490msec | avg time per step:  30msec
 34 % =      714 | time elapsed:            20sec 694msec | avg time per step:  31msec
 39 % =      816 | time elapsed:            23sec 981msec | avg time per step:  31msec
 44 % =      918 | time elapsed:            27sec 355msec | avg time per step:  32msec
 49 % =     1020 | time elapsed:            30sec 845msec | avg time per step:  33msec
 54 % =     1122 | time elapsed:            34sec 408msec | avg time per step:  34msec
 59 % =     1224 | time elapsed:            38sec  38msec | avg time per step:  35msec
 64 % =     1326 | time elapsed:            41sec 743msec | avg time per step:  35msec
 69 % =     1428 | time elapsed:            45sec 484msec | avg time per step:  36msec
 74 % =     1530 | time elapsed:            49sec 226msec | avg time per step:  36msec
 79 % =     1632 | time elapsed:            52sec 977msec | avg time per step:  36msec
 84 % =     1734 | time elapsed:            56sec 655msec | avg time per step:  35msec
 89 % =     1836 | time elapsed:       1min  0sec 308msec | avg time per step:  35msec
 94 % =     1938 | time elapsed:       1min  3sec 960msec | avg time per step:  35msec
 99 % =     2040 | time elapsed:       1min  7sec 620msec | avg time per step:  35msec
calculation  simulation time:  1min  7sec 909msec = 67 sec
full simulation time: 44min 31sec 263msec = 2671 sec

[formated by psychocoderHPC]

And there is also this problem with the openPMD plugin: unrecognised option '--openPMD.period'
though Spack contains the openPMD API

image

Regarding openPMD API not found, to check it please use .build/picongpu -v to see if it was used during compilation time, or .build/picongpu -h to see if it is in the list of options. The fact that the directory exists is not a guarantee it was found and linked with.

@cbontoiu Regarding your following answer:

In fact I haven't setup the server for ISAAC, though ISACC and its dependencies were installed via Spack together with picongpu@develop

Does this mean that ISAAC was added as library? This would be enough to increase compile time.

But as I reread your question, I realized that you were having issues with initialization time - not compile time. My fault.
Init time clearly looks like an IO problem. Could you please list all files in simOutput via find . and also provide the output of picongpu -h?

simOutput contains only some images and the usual log file, which is attached. As for the picongpu -h command, this does not work for me. I tried:

cristian@T7500:~/PIC_INPUT/PICONGPU/TESTS/myLaserWakefield$ source $HOME/src/spack/share/spack/setup-env.sh && spack load picongpu +adios %[email protected] && export PIC_BACKEND="cuda:72" && export OMPI_MCA_io=^ompio
cristian@T7500:~/PIC_INPUT/PICONGPU/TESTS/myLaserWakefield$ picongpu -h
picongpu: command not found

but this query shows the version:

cristian@T7500:~/PIC_INPUT/PICONGPU/TESTS/myLaserWakefield$ spack info picongpu
Package:   picongpu
Description:
PIConGPU: A particle-in-cell code for GPGPUs
Homepage: https://github.com/ComputationalRadiationPhysics/picongpu
Maintainers: @ax3l
Tags: 
    None
Preferred version:  
    0.5.0        https://github.com/ComputationalRadiationPhysics/picongpu/archive/0.5.0.tar.gz
Safe versions:  
    develop      [git] https://github.com/ComputationalRadiationPhysics/picongpu.git on branch dev
    0.5.0        https://github.com/ComputationalRadiationPhysics/picongpu/archive/0.5.0.tar.gz
    0.4.3        https://github.com/ComputationalRadiationPhysics/picongpu/archive/0.4.3.tar.gz
    0.4.2        https://github.com/ComputationalRadiationPhysics/picongpu/archive/0.4.2.tar.gz
    0.4.1        https://github.com/ComputationalRadiationPhysics/picongpu/archive/0.4.1.tar.gz
    0.4.0-rc4    https://github.com/ComputationalRadiationPhysics/picongpu/archive/0.4.0-rc4.tar.gz
    0.4.0-rc3    https://github.com/ComputationalRadiationPhysics/picongpu/archive/0.4.0-rc3.tar.gz
    0.4.0-rc2    https://github.com/ComputationalRadiationPhysics/picongpu/archive/0.4.0-rc2.tar.gz
    0.4.0        https://github.com/ComputationalRadiationPhysics/picongpu/archive/0.4.0.tar.gz
    local        [git] file:///home/cristian/src/picongpu
    gtc18        [git] https://github.com/ax3l/picongpu.git on branch topic-NGCandGTC18
    foilISAAC    [git] https://github.com/ax3l/picongpu.git on branch topic-20171114-foilISAAC
Variants:
    Name [Default]    Allowed values    Description
    ==============    ==============    ======================================
    adios [off]       on, off           Enable the ADIOS plugin
    backend [cuda]    cuda, omp2b       Control the computing backend
    cudacxx [nvcc]    nvcc, clang       Device compiler for the CUDA backend
    hdf5 [on]         on, off           Enable multiple plugins requiring HDF5
    isaac [off]       on, off           Enable the ISAAC plugin
    png [on]          on, off           Enable the PNG plugin
Installation Phases:
    install
Build Dependencies:
    adios  boost  cmake  cuda  isaac  libsplash  pngwriter  zlib
Link Dependencies:
    adios  boost  cuda  isaac  libsplash  mpi  pngwriter  zlib
Run Dependencies:
    cmake  isaac-server  mpi  rsync  util-linux
Virtual Packages: 
    None

output.txt

just picongpu does not work, because it is not in that directory, but in .build inside it. Hence .build/picongpu -v. The spack output does not give full information there, as e.g. there could be a package that does not match for some reason, etc. While the build of PIConGPU knows for sure what was found and what wasn't.

EDIT: somehow duplication of @sbastrakov answer.

Okay - in your simulation directory, there is a tbg directory. It contains the file submit.start. In that file you should find a comment # Run PIConGPU - one line below, you will find the full path to picongpu. Could you please use that path to get the help of picongpu.

I first moved in to the input .build folder but neither of the two commands work

terminal

Then I went to the output folder as instrudcted in the tbg folder, but the file submit.start does not contain the comment # Run PIConGPU. I will attache the whole output folder here.

Thank you for looking into this.

@cbontoiu not needed - you are already in the right directory - you just need to type in ./picongpu instead of picongpu.

@PrometheusPi actully I canno upload more than 10 MB here. OK, so here is the output

cristian@T7500:~/PIC_INPUT/PICONGPU/TESTS/myLaserWakefield/.build$ ./picongpu -h
Usage picongpu [-d dx=1 dy=1 dz=1] -g width height depth [options]
:
  -h [ --help ]                         print help message and exit
  --validate                            validate command line parameters and 
                                        exit
  -v [ --version ]                      print version information and exit
  -c [ --config ] arg                   Config file(s)

PIConGPU:
  -s [ --steps ] arg                    Simulation steps
  --checkpoint.restart.loop arg (=0)    Number of times to restart the 
                                        simulation after simulation has 
                                        finished (for presentations). Note: 
                                        does not yet work with all plugins, see
                                        issue #1305
  -p [ --percent ] arg (=5)             Print time statistics after p percent 
                                        to stdout
  --checkpoint.restart                  Restart simulation
  --checkpoint.restart.directory arg (=checkpoints)
                                        Directory containing checkpoints for a 
                                        restart
  --checkpoint.restart.step arg         Checkpoint step to restart from
  --checkpoint.period arg               Period for checkpoint creation
  --checkpoint.directory arg (=checkpoints)
                                        Directory for checkpoints
  --author arg                          The author that runs the simulation and
                                        is responsible for created output files
  --mpiDirect                           use device direct for MPI communication
                                        e.g. GPU direct
  --versionOnce                         print version information once and 
                                        start
  -d [ --devices ] arg                  number of devices in each dimension
  -g [ --grid ] arg                     size of the simulation grid
  --gridDist arg                        Regex to describe the static 
                                        distribution of the cells for each 
                                        device,default: equal distribution over
                                        all devices
                                          example:
                                            -d 2 4 1
                                            -g 128 192 12
                                            --gridDist "64{2}" "64,32{2},64"

  --periodic arg                        specifying whether the grid is periodic
                                        (1) or not (0) in each dimension, 
                                        default: no periodic dimensions
  -m [ --moving ]                       enable sliding/moving window
  --windowMovePoint arg (=0.90000000000000002)
                                        ratio of the global window size in y 
                                        which defines when to start sliding the
                                        window. The window starts sliding at 
                                        the time required to pass the distance 
                                        of windowMovePoint * (global window 
                                        size in y) when moving with the speed 
                                        of light
  --stopWindow arg (=-1)                stops the window at stimulation step, 
                                        -1 means that window is never stopping
  --autoAdjustGrid arg (=1)             auto adjust the grid size if PIConGPU 
                                        conditions are not fulfilled

Initializers:

PluginController:

Checkpoint:
  --checkpoint.backend arg              Optional backend for checkpointing 
                                        [adios] default: adios
  --checkpoint.file arg                 Optional checkpoint filename (prefix)
  --checkpoint.restart.backend arg      Optional backend for restarting [adios]
                                        default: adios
  --checkpoint.restart.file arg         checkpoint restart filename (prefix)
  --checkpoint.restart.chunkSize arg (=1000000)
                                        Number of particles processed in one 
                                        kernel call during restart to prevent 
                                        frame count blowup
  --checkpoint.adios.aggregators arg    Number of aggregators [0 == number of 
                                        MPI processes] | default: 0
  --checkpoint.adios.ost arg            Number of OST | default: 1
  --checkpoint.adios.disable-meta arg   Disable online gather and write of a 
                                        global meta file, can be time consuming
                                        (use `bpmeta` post-mortem) | default: 0
  --checkpoint.adios.transport-params arg
                                        additional transport parameters, see 
                                        ADIOS manual chapter 6.1.5, e.g., 
                                        'random_offset=1;stripe_count=4' | 
                                        default: 
  --checkpoint.adios.compression arg    ADIOS compression method, e.g., zlib 
                                        (see `adios_config -m` for help) | 
                                        default: none

EnergyFields: calculate the energy of the fields:
  --fields_energy.period arg            enable plugin [for each n-th step]

ADIOSWriter: dump simulation data with ADIOS:
  --adios.period arg                    enable ADIOS IO [for each n-th step]
  --adios.source arg                    data sources: [species_all, fields_all,
                                        e_all, E, B, e_chargeDensity, 
                                        e_energyDensity, e_particleMomentumComp
                                        onent] | default: species_all, 
                                        fields_all
  --adios.file arg                      ADIOS output filename (prefix)
  --adios.aggregators arg               Number of aggregators [0 == number of 
                                        MPI processes] | default: 0
  --adios.ost arg                       Number of OST | default: 1
  --adios.disable-meta arg              Disable online gather and write of a 
                                        global meta file, can be time consuming
                                        (use `bpmeta` post-mortem) | default: 0
  --adios.transport-params arg          additional transport parameters, see 
                                        ADIOS manual chapter 6.1.5, e.g., 
                                        'random_offset=1;stripe_count=4' | 
                                        default: 
  --adios.compression arg               ADIOS compression method, e.g., zlib 
                                        (see `adios_config -m` for help) | 
                                        default: none

SumCurrents:
  --sumcurr.period arg                  enable plugin [for each n-th step]

ChargeConservation: Print the maximum charge deviation between particles and div E to textfile 'chargeConservation.dat':
  --chargeConservation.period arg       enable plugin [for each n-th step]

IntensityPlugin: calculate the maximum and integrated E-Field energy
over laser propagation direction:
  --E_intensity.period arg              enable plugin [for each n-th step]

IsaacPlugin:
  --isaac.period arg                    Enable IsaacPlugin [for each n-th 
                                        step].
  --isaac.name arg (=default)           The name of the simulation. Default is 
                                        "default".
  --isaac.url arg (=localhost)          The url of the isaac server to connect 
                                        to. Default is "localhost".
  --isaac.port arg (=2460)              The port of the isaac server to connect
                                        to. Default is 2460.
  --isaac.width arg (=1024)             The width per isaac framebuffer. 
                                        Default is 1024.
  --isaac.height arg (=768)             The height per isaac framebuffer. 
                                        Default is 768.
  --isaac.directPause arg (=0)          Direct pausing after starting 
                                        simulation. Default is false.
  --isaac.quality arg (=90)             JPEG quality. Default is 90.
  --isaac.reconnect arg (=1)            Trying to reconnect every time an image
                                        is rendered if the connection is lost 
                                        or could never established at all.

ResourceLog:
  --resourceLog.period arg              Enable ResourceLog plugin [for each 
                                        n-th step]
  --resourceLog.prefix arg (=resourceLog_)
                                        Set the filename prefix for output file
                                        if a filestream was selected
  --resourceLog.stream arg (=file)      Output stream [stdout, stderr, file]
  --resourceLog.properties arg          List of properties to log [rank, 
                                        position, currentStep, cellCount, 
                                        particleCount]
  --resourceLog.format arg (=json)      Output format of log (pp for pretty 
                                        print) [json, jsonpp, xml, xmlpp]

SliceFieldPrinter: prints a slice of a field:
  --B_slice.period arg                  notify period
  --B_slice.fileName arg                file name to store slices in
  --B_slice.plane arg                   specifies the axis which stands on the 
                                        cutting plane (0,1,2)
  --B_slice.slicePoint arg              slice point 0.0 <= x <= 1.0

SliceFieldPrinter: prints a slice of a field:
  --E_slice.period arg                  notify period
  --E_slice.fileName arg                file name to store slices in
  --E_slice.plane arg                   specifies the axis which stands on the 
                                        cutting plane (0,1,2)
  --E_slice.slicePoint arg              slice point 0.0 <= x <= 1.0

SliceFieldPrinter: prints a slice of a field:
  --J_slice.period arg                  notify period
  --J_slice.fileName arg                file name to store slices in
  --J_slice.plane arg                   specifies the axis which stands on the 
                                        cutting plane (0,1,2)
  --J_slice.slicePoint arg              slice point 0.0 <= x <= 1.0

EnergyParticles: calculate the energy of a species:
  --e_energy.period arg                 compute kinetic and total energy [for 
                                        each n-th step] enable plugin by 
                                        setting a non-zero value
  --e_energy.filter arg                 particle filter: [all]

CalcEmittance: calculate the slice emittance of a species:
  --e_emittance.period arg              compute slice emittance[for each n-th 
                                        step] enable plugin by setting a 
                                        non-zero value
  --e_emittance.filter arg              particle filter: [all]

BinEnergyParticles: calculate a energy histogram of a species:
  --e_energyHistogram.period arg        enable plugin [for each n-th step]
  --e_energyHistogram.filter arg        particle filter: [all]
  --e_energyHistogram.binCount arg      number of bins for the energy range | 
                                        default: 1024
  --e_energyHistogram.minEnergy arg     minEnergy[in keV] | default: 0
  --e_energyHistogram.maxEnergy arg     maxEnergy[in keV]

CountParticles: count macro particles of a species:
  --e_macroParticlesCount.period arg    enable plugin [for each n-th step]

PngPlugin: create png's of a species and fields:
  --e_png.period arg                    enable data output [for each n-th step]
  --e_png.axis arg                      axis which are shown [valid values 
                                        x,y,z] example: yz
  --e_png.slicePoint arg                value range: 0 <= x <= 1 , point of the
                                        slice
  --e_png.folder arg                    folder for output files

ParticleCalorimeter: (virtually) propagates and collects particles to infinite distance:
  --e_calorimeter.period arg            enable plugin [for each n-th step]
  --e_calorimeter.file arg              output filename (prefix)
  --e_calorimeter.filter arg            particle filter: [all]
  --e_calorimeter.numBinsYaw arg        number of bins for angle yaw. | 
                                        default: 64
  --e_calorimeter.numBinsPitch arg      number of bins for angle pitch. | 
                                        default: 64
  --e_calorimeter.numBinsEnergy arg     number of bins for the energy spectrum.
                                        Disabled by default. | default: 1
  --e_calorimeter.minEnergy arg         minimal detectable energy in keV. | 
                                        default: 0
  --e_calorimeter.maxEnergy arg         maximal detectable energy in keV. | 
                                        default: 1000
  --e_calorimeter.logScale arg          enable logarithmic energy scale. | 
                                        default: 0
  --e_calorimeter.openingYaw arg        opening angle yaw in degrees. 0 <= x <=
                                        360. | default: 360
  --e_calorimeter.openingPitch arg      opening angle pitch in degrees. 0 <= x 
                                        <= 180. | default: 180
  --e_calorimeter.posYaw arg            yaw coordinate of calorimeter position 
                                        in degrees. Defaults to +y direction. |
                                        default: 0
  --e_calorimeter.posPitch arg          pitch coordinate of calorimeter 
                                        position in degrees. Defaults to +y 
                                        direction. | default: 0

PhaseSpace: create phase space of a species:
  --e_phaseSpace.period arg             notify period
  --e_phaseSpace.filter arg             particle filter: [all]
  --e_phaseSpace.space arg              spatial component (x, y, z)
  --e_phaseSpace.momentum arg           momentum component (px, py, pz)
  --e_phaseSpace.min arg                min range momentum [m_species c]
  --e_phaseSpace.max arg                max range momentum [m_species c]

PositionsParticles: write position of one particle of a species to std::cout:
  --e_position.period arg               enable plugin [for each n-th step]

ParticleMerger: merges several macroparticles with similar position and momentum into a single one.
plugin disabled. Enable plugin by adding the `voronoiCellId` attribute to the particle attribute list.:

RandomizedParticleMerger: merges several macroparticles with similar position and momentum into a single one.
plugin disabled. Enable plugin by adding the `voronoiCellId` attribute to the particle attribute list.:

PerSuperCell: create hdf5 with macro particle count per superCell:
  --e_macroParticlesPerSuperCell.period arg
                                        enable plugin [for each n-th step]


and this one for the version:

cristian@T7500:~/PIC_INPUT/PICONGPU/TESTS/myLaserWakefield/.build$ ./picongpu -v
PIConGPU: 0.5.0-dev
  Build-Type: Release

Third party:
  OS:         Linux-5.0.0-32-generic
  arch:       x86_64
  CXX:        GNU (7.5.0)
  CMake:      3.18.4
  CUDA:       10.2.89
  mallocMC:   2.5.0
  Boost:      1.70.0
  MPI:        
    standard: 3.1
    flavor:   OpenMPI (3.1.6)
  PNGwriter:  0.7.0
  libSplash:  1.7.0 (Format 4.0)
  ADIOS:      1.13.1
  openPMD:    NOTFOUND

@cbontoiu Thanks for uploading the output. It looks like you build ISAAC. As mentioned this slows down the build process but not the initialization. Furthermore, the executable you used did not use any ions. This is this most likely the 4 minutes initialization case. Could you please also provide the help output for the 40 minutes initialization case? So far, I see no plugin that will cause a massive slowdown.

Also, from this output openPMD API was not found. Did you spack load it before pic-build ?

PS: @cbontoiu Did you observe the increase in build/compile time from only electrons to electrons+ions?

only for electrons + ions and only at runtime. I cannot say what happened at compillation, but they seemed to be as fast as usual (about 2-3 mins compillation time for each case).

Also, from this output openPMD API was not found. Did you spack load it before pic-build ?

No, I didn't. This must be the reason. Thank you.

So a command like this would do the job.

source $HOME/src/spack/share/spack/setup-env.sh && spack load picongpu +adios +openpmd %[email protected] && export PIC_BACKEND="cuda:72" && export OMPI_MCA_io=^ompio

I am not expert in spack, so not sure how to do it best. But I think either your option, or doing a separate spack load openpmd should work

Please let me know if this is an easy fix and I should wait for you to update the release, or if it is something more involved, I could maybe uninstall the @develop version and install version 0.43.
Thank you

@cbontoiu I will try to reproduce the massive increase in initialization time. Could you please recap, how you initialized the ions (by -D, or by overwriting the value in the file, or etc.)?

Hello @PrometheusPi I only changed the parameter define PARAM_IONS from 0 to 1. This is defined in speciesDefinition.param

spack load openpmd

It is actually spack load openpmd-api separated from leading piconpu. So spack load picongpu +adios +openpmd %[email protected] does not work

But, there is still as mistery for me the issue with #if( PARAM_IONIZATION == 1 ). C++ is a strongly typed language as Java, which I understand better. If I do this without extending a class or implementing an interface where PARAM_IONIZATION is defined, I would get an error as this variable was not declared. It must be defined in one of the headers used in speciesDefinition.param otherwise I cannot understand

@cbontoiu Thanks for the input. I an currently compiling and testing your setup on a single k20 GPU.

@PrometheusPi

I installed version 0.4.3 on my other computer (similar in performance) and obtained different simulation times.

_electrons:_
initialization time: 22sec 257msec = 22 sec
full simulation time: 1min 11sec 204msec = 71 sec

_electrons + ions:_
initialization time: 32sec 91msec = 32 sec
full simulation time: 1min 29sec 866msec = 89 sec

Apart from the tiny growth of the initialization time when using ions, please note the significant difference overall. That is 5.0 or @develop version would require more than 3 minutes for something that version 0.4.3 requires about 20 seconds. Maybe this conclusion helps to understand what happens. The terminal output shown below is first for the default LWFA model (only electrons) and second, for the modified LWFA model with define PARAM_IONS 1 so electrons + ions:

(base) quasar@quasar:~/PIC_INPUT/PICONGPU/TESTS/myLWFA$ tbg -s bash -c etc/picongpu/1.cfg -t etc/picongpu/bash/mpiexec.tpl /media/quasar/RawDataDisk/PICONGPU/TESTS/myLaserWakefield_01
Running program...
using default compiler
==> Error: Spec '[email protected]%[email protected]+adios+hdf5~isaac+png backend=cuda cudacxx=nvcc arch=linux-linuxmint19-skylake ^[email protected]%[email protected]+blosc~bzip2~fortran~hdf5~infiniband+lz4+mpi~netcdf+shared+sz~szip+zfp+zlib patches=01113e9efb929d71c28bf33cc8b7f215d85195ec700e99cb41164e2f8f830640,8ae17f655248e87cbab1d1ed794e15364a38d2f5f8d971b1086702f72d79bd42,d24b79b795f66e40ddcd331ea4be896ac9c393d6f68f4318616d23928b0694e9 staging=none arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected]+atomic+chrono~clanglibcpp~container~context~coroutine+date_time~debug+exception~fiber+filesystem+graph~icu+iostreams+locale+log+math~mpi+multithreaded~numpy~pic+program_options~python+random+regex+serialization+shared+signals~singlethreaded+system~taggedlayout+test+thread+timer~versionedlayout+wave cxxstd=11 visibility=hidden arch=linux-linuxmint19-skylake ^[email protected]%[email protected]+shared arch=linux-linuxmint19-skylake ^[email protected]%[email protected]+avx2~ipo build_type=RelWithDebInfo patches=cd40604a26157a0e018ea496cf3267e116e6ec5ff80a7d1cef11b841c154c388 arch=linux-linuxmint19-skylake ^[email protected]%[email protected]~doc+ncurses+openssl+ownlibs~qt patches=bf695e3febb222da2ed94b3beea600650e4318975da90e4a71d6f31a6d5d8c3d arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected]+libbsd arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected]+bzip2+curses+git~libunistring+libxml2+tar+xz arch=linux-linuxmint19-skylake ^[email protected]%[email protected]~cxx~debug~fortran~hl~java+mpi+pic+shared~szip~threadsafe api=none arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected] patches=26f26c6f29a7ce9bf370ad3ab2610f99365b4bdd7b82e7c31df41a3370d685c0 arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected]~ipo+mpi build_type=RelWithDebInfo patches=669608721dfce0ada7cef1ac84344352791a8916b7bb98ca8a0d4e6d4670e744 arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected]~python arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected]+sigsegv patches=3877ab548f88597ab2327a2230ee048d2d07ace1062efe81fc92e91b7f39cd00,fc9b61654a3ba1a8d6cd78ce087e7c96366c290bc8d2c299f09828d793b853c8 arch=linux-linuxmint19-skylake ^[email protected]%[email protected]~symlinks+termlib arch=linux-linuxmint19-skylake ^[email protected]%[email protected]~atomics~cuda~cxx~cxx_exceptions+gpfs~java~legacylaunchers~lustre~memchecker~pmi~singularity~sqlite3+static~thread_multiple+vt+wrapper-rpath fabrics=none schedulers=none arch=linux-linuxmint19-skylake ^[email protected]%[email protected]+systemcerts arch=linux-linuxmint19-skylake ^[email protected]%[email protected]+cpanm+shared+threads arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected]~ipo build_type=RelWithDebInfo arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected]+bz2+ctypes+dbm~debug+libxml2+lzma~nis~optimizations+pic+pyexpat+pythoncmd+readline+shared+sqlite3+ssl~tix~tkinter~ucs4+uuid+zlib patches=0d98e93189bc278fbc37a50ed7f183bd8aaf249a8e1670a465f0db6bb4f8cf87 arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected]~ipo+pic+shared build_type=RelWithDebInfo patches=c9cfecb1f7a623418590cf4e00ae7d308d1c3faeb15046c2e5090e38221da7cd arch=linux-linuxmint19-skylake ^[email protected]%[email protected]+column_metadata+fts~functions~rtree arch=linux-linuxmint19-skylake ^[email protected]%[email protected]~fortran~hdf5~ipo~netcdf~pastri~python~random_access+shared~stats~time_compression build_type=RelWithDebInfo arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected]~pic arch=linux-linuxmint19-skylake ^[email protected]%[email protected]~aligned~fasthash~ipo~profile+shared~strided~twoway bsws=64 build_type=RelWithDebInfo arch=linux-linuxmint19-skylake ^[email protected]%[email protected]+optimize+pic+shared arch=linux-linuxmint19-skylake ^[email protected]%[email protected]+pic arch=linux-linuxmint19-skylake' matches no installed packages.
PIConGPU: 0.4.3
  Build-Type: Release

Third party:
  OS:         Linux-5.0.0-32-generic
  arch:       x86_64
  CXX:        GNU (7.5.0)
  CMake:      3.18.4
  CUDA:       9.2.148
  mallocMC:   2.3.1
  Boost:      1.70.0
  MPI:        
    standard: 3.1
    flavor:   OpenMPI (3.1.5)
  PNGwriter:  0.7.0
  libSplash:  1.7.0 (Format 4.0)
  ADIOS:      NOTFOUND
PIConGPUVerbose PHYSICS(1) | Sliding Window is OFF
PIConGPUVerbose PHYSICS(1) | used Random Number Generator: RNGProvider3XorMin seed: 42
PIConGPUVerbose PHYSICS(1) | Courant c*dt <= 1.00229 ? 1
PIConGPUVerbose PHYSICS(1) | species e: omega_p * dt <= 0.1 ? 0.0247974
PIConGPUVerbose PHYSICS(1) | y-cells per wavelength: 18.0587
PIConGPUVerbose PHYSICS(1) | macro particles per device: 4718592
PIConGPUVerbose PHYSICS(1) | typical macro particle weighting: 6955.06
PIConGPUVerbose PHYSICS(1) | UNIT_SPEED 2.99792e+08
PIConGPUVerbose PHYSICS(1) | UNIT_TIME 1.39e-16
PIConGPUVerbose PHYSICS(1) | UNIT_LENGTH 4.16712e-08
PIConGPUVerbose PHYSICS(1) | UNIT_MASS 6.33563e-27
PIConGPUVerbose PHYSICS(1) | UNIT_CHARGE 1.11432e-15
PIConGPUVerbose PHYSICS(1) | UNIT_EFIELD 1.22627e+13
PIConGPUVerbose PHYSICS(1) | UNIT_BFIELD 40903.8
PIConGPUVerbose PHYSICS(1) | UNIT_ENERGY 5.69418e-10
initialization time: 22sec 257msec = 22 sec
  0 % =        0 | time elapsed:                   66msec | avg time per step:   0msec
  4 % =      102 | time elapsed:             1sec 940msec | avg time per step:  17msec
  9 % =      204 | time elapsed:             3sec 824msec | avg time per step:  18msec
 14 % =      306 | time elapsed:             5sec 729msec | avg time per step:  18msec
 19 % =      408 | time elapsed:             7sec 748msec | avg time per step:  19msec
 24 % =      510 | time elapsed:             9sec 973msec | avg time per step:  21msec
 29 % =      612 | time elapsed:            12sec 135msec | avg time per step:  20msec
 34 % =      714 | time elapsed:            14sec 307msec | avg time per step:  20msec
 39 % =      816 | time elapsed:            16sec 563msec | avg time per step:  21msec
 44 % =      918 | time elapsed:            18sec 905msec | avg time per step:  22msec
 49 % =     1020 | time elapsed:            21sec 347msec | avg time per step:  23msec
 54 % =     1122 | time elapsed:            23sec 810msec | avg time per step:  23msec
 59 % =     1224 | time elapsed:            26sec 476msec | avg time per step:  25msec
 64 % =     1326 | time elapsed:            29sec 213msec | avg time per step:  26msec
 69 % =     1428 | time elapsed:            32sec 280msec | avg time per step:  29msec
 74 % =     1530 | time elapsed:            35sec 211msec | avg time per step:  28msec
 79 % =     1632 | time elapsed:            38sec 273msec | avg time per step:  29msec
 84 % =     1734 | time elapsed:            40sec 888msec | avg time per step:  25msec
 89 % =     1836 | time elapsed:            43sec 413msec | avg time per step:  24msec
 94 % =     1938 | time elapsed:            45sec 949msec | avg time per step:  24msec
 99 % =     2040 | time elapsed:            48sec 594msec | avg time per step:  25msec
calculation  simulation time: 48sec 788msec = 48 sec
full simulation time:  1min 11sec 204msec = 71 sec
(base) quasar@quasar:~/PIC_INPUT/PICONGPU/TESTS/myLWFA$ rm -r .build/ && pic-build &> out.txt
(base) quasar@quasar:~/PIC_INPUT/PICONGPU/TESTS/myLWFA$ tbg -s bash -c etc/picongpu/1.cfg -t etc/picongpu/bash/mpiexec.tpl /media/quasar/RawDataDisk/PICONGPU/TESTS/myLaserWakefield_02
Running program...
using default compiler
==> Error: Spec '[email protected]%[email protected]+adios+hdf5~isaac+png backend=cuda cudacxx=nvcc arch=linux-linuxmint19-skylake ^[email protected]%[email protected]+blosc~bzip2~fortran~hdf5~infiniband+lz4+mpi~netcdf+shared+sz~szip+zfp+zlib patches=01113e9efb929d71c28bf33cc8b7f215d85195ec700e99cb41164e2f8f830640,8ae17f655248e87cbab1d1ed794e15364a38d2f5f8d971b1086702f72d79bd42,d24b79b795f66e40ddcd331ea4be896ac9c393d6f68f4318616d23928b0694e9 staging=none arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected]+atomic+chrono~clanglibcpp~container~context~coroutine+date_time~debug+exception~fiber+filesystem+graph~icu+iostreams+locale+log+math~mpi+multithreaded~numpy~pic+program_options~python+random+regex+serialization+shared+signals~singlethreaded+system~taggedlayout+test+thread+timer~versionedlayout+wave cxxstd=11 visibility=hidden arch=linux-linuxmint19-skylake ^[email protected]%[email protected]+shared arch=linux-linuxmint19-skylake ^[email protected]%[email protected]+avx2~ipo build_type=RelWithDebInfo patches=cd40604a26157a0e018ea496cf3267e116e6ec5ff80a7d1cef11b841c154c388 arch=linux-linuxmint19-skylake ^[email protected]%[email protected]~doc+ncurses+openssl+ownlibs~qt patches=bf695e3febb222da2ed94b3beea600650e4318975da90e4a71d6f31a6d5d8c3d arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected]+libbsd arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected]+bzip2+curses+git~libunistring+libxml2+tar+xz arch=linux-linuxmint19-skylake ^[email protected]%[email protected]~cxx~debug~fortran~hl~java+mpi+pic+shared~szip~threadsafe api=none arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected] patches=26f26c6f29a7ce9bf370ad3ab2610f99365b4bdd7b82e7c31df41a3370d685c0 arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected]~ipo+mpi build_type=RelWithDebInfo patches=669608721dfce0ada7cef1ac84344352791a8916b7bb98ca8a0d4e6d4670e744 arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected]~python arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected]+sigsegv patches=3877ab548f88597ab2327a2230ee048d2d07ace1062efe81fc92e91b7f39cd00,fc9b61654a3ba1a8d6cd78ce087e7c96366c290bc8d2c299f09828d793b853c8 arch=linux-linuxmint19-skylake ^[email protected]%[email protected]~symlinks+termlib arch=linux-linuxmint19-skylake ^[email protected]%[email protected]~atomics~cuda~cxx~cxx_exceptions+gpfs~java~legacylaunchers~lustre~memchecker~pmi~singularity~sqlite3+static~thread_multiple+vt+wrapper-rpath fabrics=none schedulers=none arch=linux-linuxmint19-skylake ^[email protected]%[email protected]+systemcerts arch=linux-linuxmint19-skylake ^[email protected]%[email protected]+cpanm+shared+threads arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected]~ipo build_type=RelWithDebInfo arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected]+bz2+ctypes+dbm~debug+libxml2+lzma~nis~optimizations+pic+pyexpat+pythoncmd+readline+shared+sqlite3+ssl~tix~tkinter~ucs4+uuid+zlib patches=0d98e93189bc278fbc37a50ed7f183bd8aaf249a8e1670a465f0db6bb4f8cf87 arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected]~ipo+pic+shared build_type=RelWithDebInfo patches=c9cfecb1f7a623418590cf4e00ae7d308d1c3faeb15046c2e5090e38221da7cd arch=linux-linuxmint19-skylake ^[email protected]%[email protected]+column_metadata+fts~functions~rtree arch=linux-linuxmint19-skylake ^[email protected]%[email protected]~fortran~hdf5~ipo~netcdf~pastri~python~random_access+shared~stats~time_compression build_type=RelWithDebInfo arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected] arch=linux-linuxmint19-skylake ^[email protected]%[email protected]~pic arch=linux-linuxmint19-skylake ^[email protected]%[email protected]~aligned~fasthash~ipo~profile+shared~strided~twoway bsws=64 build_type=RelWithDebInfo arch=linux-linuxmint19-skylake ^[email protected]%[email protected]+optimize+pic+shared arch=linux-linuxmint19-skylake ^[email protected]%[email protected]+pic arch=linux-linuxmint19-skylake' matches no installed packages.
PIConGPU: 0.4.3
  Build-Type: Release

Third party:
  OS:         Linux-5.0.0-32-generic
  arch:       x86_64
  CXX:        GNU (7.5.0)
  CMake:      3.18.4
  CUDA:       9.2.148
  mallocMC:   2.3.1
  Boost:      1.70.0
  MPI:        
    standard: 3.1
    flavor:   OpenMPI (3.1.5)
  PNGwriter:  0.7.0
  libSplash:  1.7.0 (Format 4.0)
  ADIOS:      NOTFOUND
PIConGPUVerbose PHYSICS(1) | Sliding Window is OFF
PIConGPUVerbose PHYSICS(1) | used Random Number Generator: RNGProvider3XorMin seed: 42
PIConGPUVerbose PHYSICS(1) | Courant c*dt <= 1.00229 ? 1
PIConGPUVerbose PHYSICS(1) | species e: omega_p * dt <= 0.1 ? 0.0247974
PIConGPUVerbose PHYSICS(1) | species i: omega_p * dt <= 0.1 ? 0.000578698
PIConGPUVerbose PHYSICS(1) | y-cells per wavelength: 18.0587
PIConGPUVerbose PHYSICS(1) | macro particles per device: 9437184
PIConGPUVerbose PHYSICS(1) | typical macro particle weighting: 6955.06
PIConGPUVerbose PHYSICS(1) | UNIT_SPEED 2.99792e+08
PIConGPUVerbose PHYSICS(1) | UNIT_TIME 1.39e-16
PIConGPUVerbose PHYSICS(1) | UNIT_LENGTH 4.16712e-08
PIConGPUVerbose PHYSICS(1) | UNIT_MASS 6.33563e-27
PIConGPUVerbose PHYSICS(1) | UNIT_CHARGE 1.11432e-15
PIConGPUVerbose PHYSICS(1) | UNIT_EFIELD 1.22627e+13
PIConGPUVerbose PHYSICS(1) | UNIT_BFIELD 40903.8
PIConGPUVerbose PHYSICS(1) | UNIT_ENERGY 5.69418e-10
initialization time: 32sec  91msec = 32 sec
  0 % =        0 | time elapsed:                   55msec | avg time per step:   0msec
  4 % =      102 | time elapsed:             2sec 213msec | avg time per step:  20msec
  9 % =      204 | time elapsed:             4sec 442msec | avg time per step:  21msec
 14 % =      306 | time elapsed:             6sec 790msec | avg time per step:  22msec
 19 % =      408 | time elapsed:             9sec 332msec | avg time per step:  24msec
 24 % =      510 | time elapsed:            11sec 886msec | avg time per step:  24msec
 29 % =      612 | time elapsed:            14sec 445msec | avg time per step:  24msec
 34 % =      714 | time elapsed:            17sec  29msec | avg time per step:  24msec
 39 % =      816 | time elapsed:            19sec 684msec | avg time per step:  25msec
 44 % =      918 | time elapsed:            22sec 410msec | avg time per step:  26msec
 49 % =     1020 | time elapsed:            25sec 238msec | avg time per step:  27msec
 54 % =     1122 | time elapsed:            28sec 155msec | avg time per step:  28msec
 59 % =     1224 | time elapsed:            31sec 229msec | avg time per step:  29msec
 64 % =     1326 | time elapsed:            34sec 473msec | avg time per step:  31msec
 69 % =     1428 | time elapsed:            37sec 843msec | avg time per step:  32msec
 74 % =     1530 | time elapsed:            41sec 326msec | avg time per step:  33msec
 79 % =     1632 | time elapsed:            44sec 845msec | avg time per step:  33msec
 84 % =     1734 | time elapsed:            48sec   6msec | avg time per step:  30msec
 89 % =     1836 | time elapsed:            51sec  51msec | avg time per step:  29msec
 94 % =     1938 | time elapsed:            54sec 182msec | avg time per step:  29msec
 99 % =     2040 | time elapsed:            57sec 386msec | avg time per step:  30msec
calculation  simulation time: 57sec 627msec = 57 sec
full simulation time:  1min 29sec 866msec = 89 sec

@cbontoiu I am still waiting for the jobs to start on our cluster. Currently, most GPUs are blocked by other jobs. I will keep you posted, if the jobs finished and I could reproduce the extended initialization time.

Just a quick question. Does spack install [email protected] %[email protected] install a different version than spack install picongpu@develop %[email protected]``. I couldn't understand from looking atpackage.py```

They should be different. 0.5.0 is the last released version, and so it does not change. While develop is the current branch where we add changes to.

I installed version 0.4.3 on my other computer (similar in performance) and obtained different simulation times.

Could you try 0.5.0 or rhe current dev branch on this system. How much main memory has the system which shows the slowdown with adios and how much memory has the system you used for 0.4.3

[update, I think I mixed rhis issue and the adios issue. So I removed my question.]

I tested also the distribution version 0.5.0 and shows similarly long initialization time as the development version as you can check below. Thanks for looking into this. Meanwhile I will use version 0.4.3 which does not show the problem. It would be great if you could find the problem and we could continue with 0.5.0 and make use of its PML.

cristian@T7500:~/PIC_INPUT/PICONGPU/TESTS/myLaserWakefield$ tbg -s bash -c etc/picongpu/1.cfg -t etc/picongpu/bash/mpiexec.tpl /media/cristian/RawDataDisk/PICONGPU/TESTS/myLaserWakefield_run_01
Running program...
using default compiler
==> Error: Spec '[email protected]%[email protected]+adios+hdf5+isaac+png backend=cuda cudacxx=nvcc arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+blosc~bzip2~fortran~hdf5~infiniband+lz4+mpi~netcdf+shared+sz~szip+zfp+zlib patches=01113e9efb929d71c28bf33cc8b7f215d85195ec700e99cb41164e2f8f830640,8ae17f655248e87cbab1d1ed794e15364a38d2f5f8d971b1086702f72d79bd42,d24b79b795f66e40ddcd331ea4be896ac9c393d6f68f4318616d23928b0694e9 staging=none arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+atomic+chrono~clanglibcpp~container~context~coroutine+date_time~debug+exception~fiber+filesystem+graph~icu+iostreams+locale+log+math~mpi+multithreaded~numpy~pic+program_options~python+random+regex+serialization+shared+signals~singlethreaded+system~taggedlayout+test+thread+timer~versionedlayout+wave cxxstd=11 visibility=hidden arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+shared arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+avx2~ipo build_type=RelWithDebInfo patches=cd40604a26157a0e018ea496cf3267e116e6ec5ff80a7d1cef11b841c154c388 arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~doc+ncurses+openssl+ownlibs~qt patches=bf695e3febb222da2ed94b3beea600650e4318975da90e4a71d6f31a6d5d8c3d arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+libbsd arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+bzip2+curses+git~libunistring+libxml2+tar+xz arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~cxx~debug~fortran~hl~java+mpi+pic+shared~szip~threadsafe api=none arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~cairo~cuda~gl~libudev+libxml2~netloc~nvml+pci+shared arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~ipo build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^isaac@develop%[email protected]+cuda~ipo build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^isaac-server@develop%[email protected]~ipo build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~ipo+shared build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] patches=26f26c6f29a7ce9bf370ad3ab2610f99365b4bdd7b82e7c31df41a3370d685c0 arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~ipo+mpi build_type=RelWithDebInfo patches=669608721dfce0ada7cef1ac84344352791a8916b7bb98ca8a0d4e6d4670e744 arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~ipo build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~python arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+sigsegv patches=3877ab548f88597ab2327a2230ee048d2d07ace1062efe81fc92e91b7f39cd00,fc9b61654a3ba1a8d6cd78ce087e7c96366c290bc8d2c299f09828d793b853c8 arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~symlinks+termlib arch=linux-linuxmint19-westmere ^[email protected]%[email protected] patches=4e1d78cbbb85de625bad28705e748856033eaafab92a66dffd383a3d7e00cc94 arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~atomics~cuda~cxx~cxx_exceptions+gpfs~java~legacylaunchers~lustre~memchecker~pmi~singularity~sqlite3+static~thread_multiple+vt+wrapper-rpath fabrics=none schedulers=none arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+systemcerts arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+cpanm+shared+threads arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~ipo build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+bz2+ctypes+dbm~debug+libxml2+lzma~nis~optimizations+pic+pyexpat+pythoncmd+readline+shared+sqlite3+ssl~tix~tkinter~ucs4+uuid+zlib patches=0d98e93189bc278fbc37a50ed7f183bd8aaf249a8e1670a465f0db6bb4f8cf87 arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~ipo+pic+shared build_type=RelWithDebInfo patches=c9cfecb1f7a623418590cf4e00ae7d308d1c3faeb15046c2e5090e38221da7cd arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+column_metadata+fts~functions~rtree arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~fortran~hdf5~ipo~netcdf~pastri~python~random_access+shared~stats~time_compression build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~pic arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~aligned~fasthash~ipo~profile+shared~strided~twoway bsws=64 build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+optimize+pic+shared arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+pic arch=linux-linuxmint19-westmere' matches no installed packages.
PIConGPU: 0.5.0
  Build-Type: Release

Third party:
  OS:         Linux-5.0.0-32-generic
  arch:       x86_64
  CXX:        GNU (7.5.0)
  CMake:      3.18.4
  CUDA:       10.2.89
  mallocMC:   2.3.1
  Boost:      1.70.0
  MPI:        
    standard: 3.1
    flavor:   OpenMPI (3.1.6)
  PNGwriter:  0.7.0
  libSplash:  1.7.0 (Format 4.0)
  ADIOS:      1.13.1
PIConGPUVerbose PHYSICS(1) | Sliding Window is OFF
PIConGPUVerbose PHYSICS(1) | used Random Number Generator: RNGProvider3XorMin seed: 42
PIConGPUVerbose PHYSICS(1) | Courant c*dt <= 1.00229 ? 1
PIConGPUVerbose PHYSICS(1) | Resolving plasma oscillations?
   Estimates are based on DensityRatio to BASE_DENSITY of each species
   (see: density.param, speciesDefinition.param).
   It and does not cover other forms of initialization
PIConGPUVerbose PHYSICS(1) | species e: omega_p * dt <= 0.1 ? 0.0247974
PIConGPUVerbose PHYSICS(1) | y-cells per wavelength: 18.0587
PIConGPUVerbose PHYSICS(1) | macro particles per device: 4718592
PIConGPUVerbose PHYSICS(1) | typical macro particle weighting: 6955.06
PIConGPUVerbose PHYSICS(1) | UNIT_SPEED 2.99792e+08
PIConGPUVerbose PHYSICS(1) | UNIT_TIME 1.39e-16
PIConGPUVerbose PHYSICS(1) | UNIT_LENGTH 4.16712e-08
PIConGPUVerbose PHYSICS(1) | UNIT_MASS 6.33563e-27
PIConGPUVerbose PHYSICS(1) | UNIT_CHARGE 1.11432e-15
PIConGPUVerbose PHYSICS(1) | UNIT_EFIELD 1.22627e+13
PIConGPUVerbose PHYSICS(1) | UNIT_BFIELD 40903.8
PIConGPUVerbose PHYSICS(1) | UNIT_ENERGY 5.69418e-10
initialization time:  4min 21sec 158msec = 261 sec
  0 % =        0 | time elapsed:                   43msec | avg time per step:   0msec
  4 % =      102 | time elapsed:             2sec 541msec | avg time per step:  24msec
  9 % =      204 | time elapsed:             5sec  50msec | avg time per step:  24msec
 14 % =      306 | time elapsed:             7sec 618msec | avg time per step:  24msec
 19 % =      408 | time elapsed:            10sec 321msec | avg time per step:  26msec
 24 % =      510 | time elapsed:            13sec 135msec | avg time per step:  27msec
 29 % =      612 | time elapsed:            15sec 975msec | avg time per step:  27msec
 34 % =      714 | time elapsed:            18sec 860msec | avg time per step:  27msec
 39 % =      816 | time elapsed:            21sec 791msec | avg time per step:  28msec
 44 % =      918 | time elapsed:            24sec 777msec | avg time per step:  28msec
 49 % =     1020 | time elapsed:            27sec 851msec | avg time per step:  29msec
 54 % =     1122 | time elapsed:            30sec 995msec | avg time per step:  30msec
 59 % =     1224 | time elapsed:            34sec 255msec | avg time per step:  31msec
 64 % =     1326 | time elapsed:            37sec 681msec | avg time per step:  33msec
 69 % =     1428 | time elapsed:            41sec 240msec | avg time per step:  34msec
 74 % =     1530 | time elapsed:            44sec 913msec | avg time per step:  35msec
 79 % =     1632 | time elapsed:            48sec 605msec | avg time per step:  35msec
 84 % =     1734 | time elapsed:            51sec 940msec | avg time per step:  32msec
 89 % =     1836 | time elapsed:            55sec 143msec | avg time per step:  31msec
 94 % =     1938 | time elapsed:            58sec 407msec | avg time per step:  31msec
 99 % =     2040 | time elapsed:       1min  1sec 752msec | avg time per step:  32msec
calculation  simulation time:  1min  2sec   3msec = 62 sec
full simulation time:  5min 23sec 382msec = 323 sec
cristian@T7500:~/PIC_INPUT/PICONGPU/TESTS/myLaserWakefield$ rm -r .build && pic-build &> out.txt
cristian@T7500:~/PIC_INPUT/PICONGPU/TESTS/myLaserWakefield$ tbg -s bash -c etc/picongpu/1.cfg -t etc/picongpu/bash/mpiexec.tpl /media/cristian/RawDataDisk/PICONGPU/TESTS/myLaserWakefield_run_02
Running program...
using default compiler
==> Error: Spec '[email protected]%[email protected]+adios+hdf5+isaac+png backend=cuda cudacxx=nvcc arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+blosc~bzip2~fortran~hdf5~infiniband+lz4+mpi~netcdf+shared+sz~szip+zfp+zlib patches=01113e9efb929d71c28bf33cc8b7f215d85195ec700e99cb41164e2f8f830640,8ae17f655248e87cbab1d1ed794e15364a38d2f5f8d971b1086702f72d79bd42,d24b79b795f66e40ddcd331ea4be896ac9c393d6f68f4318616d23928b0694e9 staging=none arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+atomic+chrono~clanglibcpp~container~context~coroutine+date_time~debug+exception~fiber+filesystem+graph~icu+iostreams+locale+log+math~mpi+multithreaded~numpy~pic+program_options~python+random+regex+serialization+shared+signals~singlethreaded+system~taggedlayout+test+thread+timer~versionedlayout+wave cxxstd=11 visibility=hidden arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+shared arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+avx2~ipo build_type=RelWithDebInfo patches=cd40604a26157a0e018ea496cf3267e116e6ec5ff80a7d1cef11b841c154c388 arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~doc+ncurses+openssl+ownlibs~qt patches=bf695e3febb222da2ed94b3beea600650e4318975da90e4a71d6f31a6d5d8c3d arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+libbsd arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+bzip2+curses+git~libunistring+libxml2+tar+xz arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~cxx~debug~fortran~hl~java+mpi+pic+shared~szip~threadsafe api=none arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~cairo~cuda~gl~libudev+libxml2~netloc~nvml+pci+shared arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~ipo build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^isaac@develop%[email protected]+cuda~ipo build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^isaac-server@develop%[email protected]~ipo build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~ipo+shared build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] patches=26f26c6f29a7ce9bf370ad3ab2610f99365b4bdd7b82e7c31df41a3370d685c0 arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~ipo+mpi build_type=RelWithDebInfo patches=669608721dfce0ada7cef1ac84344352791a8916b7bb98ca8a0d4e6d4670e744 arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~ipo build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~python arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+sigsegv patches=3877ab548f88597ab2327a2230ee048d2d07ace1062efe81fc92e91b7f39cd00,fc9b61654a3ba1a8d6cd78ce087e7c96366c290bc8d2c299f09828d793b853c8 arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~symlinks+termlib arch=linux-linuxmint19-westmere ^[email protected]%[email protected] patches=4e1d78cbbb85de625bad28705e748856033eaafab92a66dffd383a3d7e00cc94 arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~atomics~cuda~cxx~cxx_exceptions+gpfs~java~legacylaunchers~lustre~memchecker~pmi~singularity~sqlite3+static~thread_multiple+vt+wrapper-rpath fabrics=none schedulers=none arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+systemcerts arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+cpanm+shared+threads arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~ipo build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+bz2+ctypes+dbm~debug+libxml2+lzma~nis~optimizations+pic+pyexpat+pythoncmd+readline+shared+sqlite3+ssl~tix~tkinter~ucs4+uuid+zlib patches=0d98e93189bc278fbc37a50ed7f183bd8aaf249a8e1670a465f0db6bb4f8cf87 arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~ipo+pic+shared build_type=RelWithDebInfo patches=c9cfecb1f7a623418590cf4e00ae7d308d1c3faeb15046c2e5090e38221da7cd arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+column_metadata+fts~functions~rtree arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~fortran~hdf5~ipo~netcdf~pastri~python~random_access+shared~stats~time_compression build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected] arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~pic arch=linux-linuxmint19-westmere ^[email protected]%[email protected]~aligned~fasthash~ipo~profile+shared~strided~twoway bsws=64 build_type=RelWithDebInfo arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+optimize+pic+shared arch=linux-linuxmint19-westmere ^[email protected]%[email protected]+pic arch=linux-linuxmint19-westmere' matches no installed packages.
PIConGPU: 0.5.0
  Build-Type: Release

Third party:
  OS:         Linux-5.0.0-32-generic
  arch:       x86_64
  CXX:        GNU (7.5.0)
  CMake:      3.18.4
  CUDA:       10.2.89
  mallocMC:   2.3.1
  Boost:      1.70.0
  MPI:        
    standard: 3.1
    flavor:   OpenMPI (3.1.6)
  PNGwriter:  0.7.0
  libSplash:  1.7.0 (Format 4.0)
  ADIOS:      1.13.1
PIConGPUVerbose PHYSICS(1) | Sliding Window is OFF
PIConGPUVerbose PHYSICS(1) | used Random Number Generator: RNGProvider3XorMin seed: 42
PIConGPUVerbose PHYSICS(1) | Courant c*dt <= 1.00229 ? 1
PIConGPUVerbose PHYSICS(1) | Resolving plasma oscillations?
   Estimates are based on DensityRatio to BASE_DENSITY of each species
   (see: density.param, speciesDefinition.param).
   It and does not cover other forms of initialization
PIConGPUVerbose PHYSICS(1) | species e: omega_p * dt <= 0.1 ? 0.0247974
PIConGPUVerbose PHYSICS(1) | species i: omega_p * dt <= 0.1 ? 0.000578698
PIConGPUVerbose PHYSICS(1) | y-cells per wavelength: 18.0587
PIConGPUVerbose PHYSICS(1) | macro particles per device: 9437184
PIConGPUVerbose PHYSICS(1) | typical macro particle weighting: 6955.06
PIConGPUVerbose PHYSICS(1) | UNIT_SPEED 2.99792e+08
PIConGPUVerbose PHYSICS(1) | UNIT_TIME 1.39e-16
PIConGPUVerbose PHYSICS(1) | UNIT_LENGTH 4.16712e-08
PIConGPUVerbose PHYSICS(1) | UNIT_MASS 6.33563e-27
PIConGPUVerbose PHYSICS(1) | UNIT_CHARGE 1.11432e-15
PIConGPUVerbose PHYSICS(1) | UNIT_EFIELD 1.22627e+13
PIConGPUVerbose PHYSICS(1) | UNIT_BFIELD 40903.8
PIConGPUVerbose PHYSICS(1) | UNIT_ENERGY 5.69418e-10
initialization time: 52min 31sec   8msec = 3151 sec
  0 % =        0 | time elapsed:                   44msec | avg time per step:   0msec
  4 % =      102 | time elapsed:             3sec   6msec | avg time per step:  28msec
  9 % =      204 | time elapsed:             5sec 979msec | avg time per step:  28msec
 14 % =      306 | time elapsed:             9sec  14msec | avg time per step:  29msec
 19 % =      408 | time elapsed:            12sec 208msec | avg time per step:  30msec
 24 % =      510 | time elapsed:            15sec 530msec | avg time per step:  32msec
 29 % =      612 | time elapsed:            18sec 899msec | avg time per step:  32msec
 34 % =      714 | time elapsed:            22sec 340msec | avg time per step:  33msec
 39 % =      816 | time elapsed:            25sec 827msec | avg time per step:  33msec
 44 % =      918 | time elapsed:            29sec 401msec | avg time per step:  34msec
 49 % =     1020 | time elapsed:            33sec  99msec | avg time per step:  35msec
 54 % =     1122 | time elapsed:            36sec 878msec | avg time per step:  36msec
 59 % =     1224 | time elapsed:            40sec 805msec | avg time per step:  38msec
 64 % =     1326 | time elapsed:            44sec 920msec | avg time per step:  39msec
 69 % =     1428 | time elapsed:            49sec 162msec | avg time per step:  41msec
 74 % =     1530 | time elapsed:            53sec 517msec | avg time per step:  42msec
 79 % =     1632 | time elapsed:            57sec 908msec | avg time per step:  42msec
 84 % =     1734 | time elapsed:       1min  1sec 959msec | avg time per step:  39msec
 89 % =     1836 | time elapsed:       1min  5sec 897msec | avg time per step:  38msec
 94 % =     1938 | time elapsed:       1min  9sec 884msec | avg time per step:  38msec
 99 % =     2040 | time elapsed:       1min 13sec 980msec | avg time per step:  39msec
calculation  simulation time:  1min 14sec 292msec = 74 sec
full simulation time: 53min 45sec 562msec = 3225 sec

@cbontoiu How much main memory does your system has?
What kind of GPU do you have?

What is the difference of the last two runs you posted?
From the output it is not possible to see it.

Hello @psychocoderHPC and thanks for your availability. The last two runs for the 0.5.0 version show:

LWFA with electrons only: initialization time: 4min 21sec 158msec = 261 sec
LWFA with electrons + ions: initialization time: 52min 31sec 8msec = 3151 sec

I am running on two different computers. One with RTX2070 Super and 32GB of RAM and the other one with RTX5000 (2 pieces) and 64 GB or RAM. When ions are enabled, both computers show long initialization times for the develop and 0.5.0 versions and tiny growth (from the electrons only setup) for the 0.4.3 version.

@cbontoiu I switch to a different cluster partition to test your issue on 1 K80 GPU. I could not reproduce the issue. Initialization times are pretty much equal (19 seconds) for both electrons only and electrons + ions.

stdout for electrons and ions:

Running program...
PIConGPU: 0.5.0
  Build-Type: Release

Third party:
  OS:         Linux-3.10.0-693.17.1.el7.x86_64
  arch:       x86_64
  CXX:        GNU (7.3.0)
  CMake:      3.15.2
  CUDA:       10.0.130
  mallocMC:   2.3.1
  Boost:      1.68.0
  MPI:        
    standard: 3.1
    flavor:   OpenMPI (2.1.6)
  PNGwriter:  0.7.0
  libSplash:  1.7.0 (Format 4.0)
  ADIOS:      1.13.1
PIConGPUVerbose PHYSICS(1) | Sliding Window is OFF
PIConGPUVerbose PHYSICS(1) | used Random Number Generator: RNGProvider3XorMin seed: 42
PIConGPUVerbose PHYSICS(1) | Courant c*dt <= 1.00229 ? 1
PIConGPUVerbose PHYSICS(1) | Resolving plasma oscillations?
   Estimates are based on DensityRatio to BASE_DENSITY of each species
   (see: density.param, speciesDefinition.param).
   It and does not cover other forms of initialization
PIConGPUVerbose PHYSICS(1) | species e: omega_p * dt <= 0.1 ? 0.0247974
PIConGPUVerbose PHYSICS(1) | species i: omega_p * dt <= 0.1 ? 0.000578698
PIConGPUVerbose PHYSICS(1) | y-cells per wavelength: 18.0587
PIConGPUVerbose PHYSICS(1) | macro particles per device: 9437184
PIConGPUVerbose PHYSICS(1) | typical macro particle weighting: 6955.06
PIConGPUVerbose PHYSICS(1) | UNIT_SPEED 2.99792e+08
PIConGPUVerbose PHYSICS(1) | UNIT_TIME 1.39e-16
PIConGPUVerbose PHYSICS(1) | UNIT_LENGTH 4.16712e-08
PIConGPUVerbose PHYSICS(1) | UNIT_MASS 6.33563e-27
PIConGPUVerbose PHYSICS(1) | UNIT_CHARGE 1.11432e-15
PIConGPUVerbose PHYSICS(1) | UNIT_EFIELD 1.22627e+13
PIConGPUVerbose PHYSICS(1) | UNIT_BFIELD 40903.8
PIConGPUVerbose PHYSICS(1) | UNIT_ENERGY 5.69418e-10
initialization time: 19sec 137msec = 19 sec
  0 % =        0 | time elapsed:                  555msec | avg time per step:   0msec
  4 % =      102 | time elapsed:             4sec 952msec | avg time per step:  42msec
  9 % =      204 | time elapsed:             9sec 490msec | avg time per step:  44msec
 14 % =      306 | time elapsed:            14sec 356msec | avg time per step:  47msec
 19 % =      408 | time elapsed:            19sec 727msec | avg time per step:  52msec
 24 % =      510 | time elapsed:            25sec 715msec | avg time per step:  56msec
 29 % =      612 | time elapsed:            31sec 874msec | avg time per step:  60msec
 34 % =      714 | time elapsed:            38sec 404msec | avg time per step:  63msec
 39 % =      816 | time elapsed:            45sec 343msec | avg time per step:  67msec
 44 % =      918 | time elapsed:            52sec 628msec | avg time per step:  71msec
 49 % =     1020 | time elapsed:       1min  0sec 364msec | avg time per step:  75msec
 54 % =     1122 | time elapsed:       1min  8sec 517msec | avg time per step:  79msec
 59 % =     1224 | time elapsed:       1min 17sec 170msec | avg time per step:  84msec
 64 % =     1326 | time elapsed:       1min 26sec 471msec | avg time per step:  90msec
 69 % =     1428 | time elapsed:       1min 36sec 249msec | avg time per step:  95msec
 74 % =     1530 | time elapsed:       1min 46sec 394msec | avg time per step:  99msec
 79 % =     1632 | time elapsed:       1min 56sec 737msec | avg time per step: 100msec
 84 % =     1734 | time elapsed:       2min  6sec  45msec | avg time per step:  90msec
 89 % =     1836 | time elapsed:       2min 14sec 870msec | avg time per step:  86msec
 94 % =     1938 | time elapsed:       2min 23sec 832msec | avg time per step:  87msec
 99 % =     2040 | time elapsed:       2min 33sec 107msec | avg time per step:  90msec
calculation  simulation time:  2min 33sec 798msec = 153 sec
full simulation time:  2min 53sec 630msec = 173 sec

stdout for electrons only:

Running program...
PIConGPU: 0.5.0
  Build-Type: Release

Third party:
  OS:         Linux-3.10.0-693.17.1.el7.x86_64
  arch:       x86_64
  CXX:        GNU (7.3.0)
  CMake:      3.15.2
  CUDA:       10.0.130
  mallocMC:   2.3.1
  Boost:      1.68.0
  MPI:        
    standard: 3.1
    flavor:   OpenMPI (2.1.6)
  PNGwriter:  0.7.0
  libSplash:  1.7.0 (Format 4.0)
  ADIOS:      1.13.1
PIConGPUVerbose PHYSICS(1) | Sliding Window is OFF
PIConGPUVerbose PHYSICS(1) | used Random Number Generator: RNGProvider3XorMin seed: 42
PIConGPUVerbose PHYSICS(1) | Courant c*dt <= 1.00229 ? 1
PIConGPUVerbose PHYSICS(1) | Resolving plasma oscillations?
   Estimates are based on DensityRatio to BASE_DENSITY of each species
   (see: density.param, speciesDefinition.param).
   It and does not cover other forms of initialization
PIConGPUVerbose PHYSICS(1) | species e: omega_p * dt <= 0.1 ? 0.0247974
PIConGPUVerbose PHYSICS(1) | y-cells per wavelength: 18.0587
PIConGPUVerbose PHYSICS(1) | macro particles per device: 4718592
PIConGPUVerbose PHYSICS(1) | typical macro particle weighting: 6955.06
PIConGPUVerbose PHYSICS(1) | UNIT_SPEED 2.99792e+08
PIConGPUVerbose PHYSICS(1) | UNIT_TIME 1.39e-16
PIConGPUVerbose PHYSICS(1) | UNIT_LENGTH 4.16712e-08
PIConGPUVerbose PHYSICS(1) | UNIT_MASS 6.33563e-27
PIConGPUVerbose PHYSICS(1) | UNIT_CHARGE 1.11432e-15
PIConGPUVerbose PHYSICS(1) | UNIT_EFIELD 1.22627e+13
PIConGPUVerbose PHYSICS(1) | UNIT_BFIELD 40903.8
PIConGPUVerbose PHYSICS(1) | UNIT_ENERGY 5.69418e-10
initialization time: 18sec 965msec = 18 sec
  0 % =        0 | time elapsed:                   60msec | avg time per step:   0msec
  4 % =      102 | time elapsed:             2sec 963msec | avg time per step:  28msec
  9 % =      204 | time elapsed:             5sec 964msec | avg time per step:  29msec
 14 % =      306 | time elapsed:             9sec 210msec | avg time per step:  31msec
 19 % =      408 | time elapsed:            12sec 846msec | avg time per step:  35msec
 24 % =      510 | time elapsed:            16sec 850msec | avg time per step:  38msec
 29 % =      612 | time elapsed:            21sec  40msec | avg time per step:  40msec
 34 % =      714 | time elapsed:            25sec 439msec | avg time per step:  42msec
 39 % =      816 | time elapsed:            30sec  88msec | avg time per step:  45msec
 44 % =      918 | time elapsed:            34sec 969msec | avg time per step:  47msec
 49 % =     1020 | time elapsed:            40sec 175msec | avg time per step:  50msec
 54 % =     1122 | time elapsed:            45sec 616msec | avg time per step:  53msec
 59 % =     1224 | time elapsed:            51sec 433msec | avg time per step:  56msec
 64 % =     1326 | time elapsed:            57sec 831msec | avg time per step:  62msec
 69 % =     1428 | time elapsed:       1min  4sec 655msec | avg time per step:  66msec
 74 % =     1530 | time elapsed:       1min 11sec 826msec | avg time per step:  70msec
 79 % =     1632 | time elapsed:       1min 19sec 122msec | avg time per step:  71msec
 84 % =     1734 | time elapsed:       1min 25sec 403msec | avg time per step:  61msec
 89 % =     1836 | time elapsed:       1min 31sec 184msec | avg time per step:  56msec
 94 % =     1938 | time elapsed:       1min 37sec  90msec | avg time per step:  57msec
 99 % =     2040 | time elapsed:       1min 43sec 236msec | avg time per step:  59msec
calculation  simulation time:  1min 43sec 683msec = 103 sec
full simulation time:  2min  2sec 829msec = 122 sec

So I am confused by the massive difference you encounter. Thus I can only guess possible issues:

  • accidentally changed code in either the setup or the main source code (differences in init pipeline or particle definitions and thus attributes) - could you check git diff and perform a clean pic-create?
  • we used different CUDA; boost, etc. versions - difference in performance might arris from that , but I think this is highly unlikely
  • some strange behavior of the GPU you used - could you test this on another GPU type?
  • including (not using) ISAAC (very unlikely) causes an increase in init time - I will test this know.
  • EDIT (suggested by @cbontoiu): SPACK causes slow down

@psychocoderHPC @sbastrakov Do you have any other ideas?

Hello all and thank you for looking to this.

There is still another possibility. It might happen that running through Spack gives me trouble. If you could check the behaviour using Spack on a local machine, we could judge. I used two different graphics cards (RTX2070 Super and RTX5000) on two different machines with two different CUDA versions (10 and 9) and with both develop and 0.5.0 versions, always using the LWFA model which came with the distribution.

From the past we know that running on the cluster through Spack and directly on the system using installed modules is different, so this is what makes me think that Spack plays a role here. The question is whay it does it form version 0.5.0 and develop but not for 0.4.3

@cbontoiu I agree, and I can not judge how likely this is. I will add this idea to the list above.

Spack itself merely provides dependencies. The only way it influences things if these dependencies get wrong, or very unoptimal, settings.

I tested with ISAAC enabled and cuda 10.2 - I still get the same init times.

electrons and ions:

Running program...
PIConGPU: 0.5.0
  Build-Type: Release

Third party:
  OS:         Linux-3.10.0-693.11.6.el7.x86_64
  arch:       x86_64
  CXX:        GNU (7.3.0)
  CMake:      3.15.2
  CUDA:       10.2.89
  mallocMC:   2.3.1
  Boost:      1.68.0
  MPI:        
    standard: 3.1
    flavor:   OpenMPI (2.1.6)
  PNGwriter:  0.7.0
  libSplash:  1.7.0 (Format 4.0)
  ADIOS:      1.13.1
PIConGPUVerbose PHYSICS(1) | Sliding Window is OFF
PIConGPUVerbose PHYSICS(1) | used Random Number Generator: RNGProvider3XorMin seed: 42
PIConGPUVerbose PHYSICS(1) | Courant c*dt <= 1.00229 ? 1
PIConGPUVerbose PHYSICS(1) | Resolving plasma oscillations?
   Estimates are based on DensityRatio to BASE_DENSITY of each species
   (see: density.param, speciesDefinition.param).
   It and does not cover other forms of initialization
PIConGPUVerbose PHYSICS(1) | species e: omega_p * dt <= 0.1 ? 0.0247974
PIConGPUVerbose PHYSICS(1) | species i: omega_p * dt <= 0.1 ? 0.000578698
PIConGPUVerbose PHYSICS(1) | y-cells per wavelength: 18.0587
PIConGPUVerbose PHYSICS(1) | macro particles per device: 9437184
PIConGPUVerbose PHYSICS(1) | typical macro particle weighting: 6955.06
PIConGPUVerbose PHYSICS(1) | UNIT_SPEED 2.99792e+08
PIConGPUVerbose PHYSICS(1) | UNIT_TIME 1.39e-16
PIConGPUVerbose PHYSICS(1) | UNIT_LENGTH 4.16712e-08
PIConGPUVerbose PHYSICS(1) | UNIT_MASS 6.33563e-27
PIConGPUVerbose PHYSICS(1) | UNIT_CHARGE 1.11432e-15
PIConGPUVerbose PHYSICS(1) | UNIT_EFIELD 1.22627e+13
PIConGPUVerbose PHYSICS(1) | UNIT_BFIELD 40903.8
PIConGPUVerbose PHYSICS(1) | UNIT_ENERGY 5.69418e-10
initialization time: 20sec 784msec = 20 sec
  0 % =        0 | time elapsed:                   67msec | avg time per step:   0msec
  4 % =      102 | time elapsed:             4sec 470msec | avg time per step:  42msec
  9 % =      204 | time elapsed:             8sec 995msec | avg time per step:  43msec
 14 % =      306 | time elapsed:            13sec 860msec | avg time per step:  47msec
 19 % =      408 | time elapsed:            19sec 237msec | avg time per step:  52msec
 24 % =      510 | time elapsed:            25sec 116msec | avg time per step:  57msec
 29 % =      612 | time elapsed:            31sec 293msec | avg time per step:  60msec
 34 % =      714 | time elapsed:            37sec 839msec | avg time per step:  63msec
 39 % =      816 | time elapsed:            44sec 777msec | avg time per step:  67msec
 44 % =      918 | time elapsed:            52sec  75msec | avg time per step:  71msec
 49 % =     1020 | time elapsed:            59sec 825msec | avg time per step:  75msec
 54 % =     1122 | time elapsed:       1min  8sec   6msec | avg time per step:  79msec
 59 % =     1224 | time elapsed:       1min 16sec 721msec | avg time per step:  84msec
 64 % =     1326 | time elapsed:       1min 26sec  75msec | avg time per step:  91msec
 69 % =     1428 | time elapsed:       1min 35sec 906msec | avg time per step:  96msec
 74 % =     1530 | time elapsed:       1min 46sec 124msec | avg time per step:  99msec
 79 % =     1632 | time elapsed:       1min 56sec 504msec | avg time per step: 101msec
 84 % =     1734 | time elapsed:       2min  5sec 878msec | avg time per step:  91msec
 89 % =     1836 | time elapsed:       2min 14sec 719msec | avg time per step:  86msec
 94 % =     1938 | time elapsed:       2min 23sec 709msec | avg time per step:  87msec
 99 % =     2040 | time elapsed:       2min 33sec  31msec | avg time per step:  91msec
calculation  simulation time:  2min 33sec 725msec = 153 sec
full simulation time:  2min 54sec 717msec = 174 sec

electrons only:

Running program...
PIConGPU: 0.5.0
  Build-Type: Release

Third party:
  OS:         Linux-3.10.0-693.17.1.el7.x86_64
  arch:       x86_64
  CXX:        GNU (7.3.0)
  CMake:      3.15.2
  CUDA:       10.2.89
  mallocMC:   2.3.1
  Boost:      1.68.0
  MPI:        
    standard: 3.1
    flavor:   OpenMPI (2.1.6)
  PNGwriter:  0.7.0
  libSplash:  1.7.0 (Format 4.0)
  ADIOS:      1.13.1
PIConGPUVerbose PHYSICS(1) | Sliding Window is OFF
PIConGPUVerbose PHYSICS(1) | used Random Number Generator: RNGProvider3XorMin seed: 42
PIConGPUVerbose PHYSICS(1) | Courant c*dt <= 1.00229 ? 1
PIConGPUVerbose PHYSICS(1) | Resolving plasma oscillations?
   Estimates are based on DensityRatio to BASE_DENSITY of each species
   (see: density.param, speciesDefinition.param).
   It and does not cover other forms of initialization
PIConGPUVerbose PHYSICS(1) | species e: omega_p * dt <= 0.1 ? 0.0247974
PIConGPUVerbose PHYSICS(1) | y-cells per wavelength: 18.0587
PIConGPUVerbose PHYSICS(1) | macro particles per device: 4718592
PIConGPUVerbose PHYSICS(1) | typical macro particle weighting: 6955.06
PIConGPUVerbose PHYSICS(1) | UNIT_SPEED 2.99792e+08
PIConGPUVerbose PHYSICS(1) | UNIT_TIME 1.39e-16
PIConGPUVerbose PHYSICS(1) | UNIT_LENGTH 4.16712e-08
PIConGPUVerbose PHYSICS(1) | UNIT_MASS 6.33563e-27
PIConGPUVerbose PHYSICS(1) | UNIT_CHARGE 1.11432e-15
PIConGPUVerbose PHYSICS(1) | UNIT_EFIELD 1.22627e+13
PIConGPUVerbose PHYSICS(1) | UNIT_BFIELD 40903.8
PIConGPUVerbose PHYSICS(1) | UNIT_ENERGY 5.69418e-10
initialization time: 19sec 389msec = 19 sec
  0 % =        0 | time elapsed:                   55msec | avg time per step:   0msec
  4 % =      102 | time elapsed:             2sec 962msec | avg time per step:  28msec
  9 % =      204 | time elapsed:             6sec  23msec | avg time per step:  29msec
 14 % =      306 | time elapsed:             9sec 264msec | avg time per step:  31msec
 19 % =      408 | time elapsed:            12sec 913msec | avg time per step:  35msec
 24 % =      510 | time elapsed:            16sec 946msec | avg time per step:  39msec
 29 % =      612 | time elapsed:            21sec 149msec | avg time per step:  40msec
 34 % =      714 | time elapsed:            25sec 571msec | avg time per step:  42msec
 39 % =      816 | time elapsed:            30sec 234msec | avg time per step:  45msec
 44 % =      918 | time elapsed:            35sec 133msec | avg time per step:  47msec
 49 % =     1020 | time elapsed:            40sec 348msec | avg time per step:  50msec
 54 % =     1122 | time elapsed:            45sec 813msec | avg time per step:  53msec
 59 % =     1224 | time elapsed:            51sec 660msec | avg time per step:  57msec
 64 % =     1326 | time elapsed:            58sec  96msec | avg time per step:  62msec
 69 % =     1428 | time elapsed:       1min  4sec 985msec | avg time per step:  67msec
 74 % =     1530 | time elapsed:       1min 12sec 210msec | avg time per step:  70msec
 79 % =     1632 | time elapsed:       1min 19sec 576msec | avg time per step:  71msec
 84 % =     1734 | time elapsed:       1min 25sec 901msec | avg time per step:  61msec
 89 % =     1836 | time elapsed:       1min 31sec 713msec | avg time per step:  56msec
 94 % =     1938 | time elapsed:       1min 37sec 650msec | avg time per step:  57msec
 99 % =     2040 | time elapsed:       1min 43sec 841msec | avg time per step:  60msec
calculation  simulation time:  1min 44sec 291msec = 104 sec
full simulation time:  2min  3sec 855msec = 123 sec

A slight increase from 19 to 21 seconds for with ions, but other than that, it's the same.

However, the compile duration rattled me:

my electron + ions compile time: 19m33.860s = 1174 sec
my electron only compile time: 8m14.936s = 495 sec
my ratio = 2.4 --> the expected increase due to ISAAC

your electron + ion init time: 43min 23sec 89msec = 2603 sec
your electron only init time: 3min 43sec 605msec = 223 sec
your ratio: 11.7 --> unexplained init duration increase

This happens despite that fact that you are using a more modern GPU - which you can also see that your avg time per step is only ~40ms while on the (old) k80 it is ~60ms.

The question for me is now, did you accidentally ran into JIT compilation due to a missing --archdefinition?

In your profile, how did you set PIC_BACKEND?
If you go to your .build directory and do ccmake . - what parameter is set for ALPAKA_CUDA_ARCH?

If I do not define PIC_BACKEND I get a default value of 30 in ALPAKA_CUDA_ARCH. Thus my results on k80 architectures are pretty much the same (init times being around 24 sec). This is the Kepler architecture and thus no JIT is needed.

However, if I submit such a executable to a V100 (Volta) - the init time explodes and is *40min 33sec 909msec = 2433 sec due to JIT.
(Exploring it interactively reveals registration of PIConGPU via nvida-smi but no GPU usage, while 100% CPU usage.)

Running program...
PIConGPU: 0.5.0
  Build-Type: Release

Third party:
  OS:         Linux-3.10.0-693.11.6.el7.x86_64
  arch:       x86_64
  CXX:        GNU (7.3.0)
  CMake:      3.15.2
  CUDA:       10.2.89
  mallocMC:   2.3.1
  Boost:      1.68.0
  MPI:        
    standard: 3.1
    flavor:   OpenMPI (2.1.6)
  PNGwriter:  0.7.0
  libSplash:  1.7.0 (Format 4.0)
  ADIOS:      1.13.1
PIConGPUVerbose PHYSICS(1) | Sliding Window is OFF
PIConGPUVerbose PHYSICS(1) | used Random Number Generator: RNGProvider3XorMin seed: 42
PIConGPUVerbose PHYSICS(1) | Courant c*dt <= 1.00229 ? 1
PIConGPUVerbose PHYSICS(1) | Resolving plasma oscillations?
   Estimates are based on DensityRatio to BASE_DENSITY of each species
   (see: density.param, speciesDefinition.param).
   It and does not cover other forms of initialization
PIConGPUVerbose PHYSICS(1) | species e: omega_p * dt <= 0.1 ? 0.0247974
PIConGPUVerbose PHYSICS(1) | species i: omega_p * dt <= 0.1 ? 0.000578698
PIConGPUVerbose PHYSICS(1) | y-cells per wavelength: 18.0587
PIConGPUVerbose PHYSICS(1) | macro particles per device: 9437184
PIConGPUVerbose PHYSICS(1) | typical macro particle weighting: 6955.06
PIConGPUVerbose PHYSICS(1) | UNIT_SPEED 2.99792e+08
PIConGPUVerbose PHYSICS(1) | UNIT_TIME 1.39e-16
PIConGPUVerbose PHYSICS(1) | UNIT_LENGTH 4.16712e-08
PIConGPUVerbose PHYSICS(1) | UNIT_MASS 6.33563e-27
PIConGPUVerbose PHYSICS(1) | UNIT_CHARGE 1.11432e-15
PIConGPUVerbose PHYSICS(1) | UNIT_EFIELD 1.22627e+13
PIConGPUVerbose PHYSICS(1) | UNIT_BFIELD 40903.8
PIConGPUVerbose PHYSICS(1) | UNIT_ENERGY 5.69418e-10
initialization time: 40min 33sec 909msec = 2433 sec
  0 % =        0 | time elapsed:                   43msec | avg time per step:   0msec
  4 % =      102 | time elapsed:             1sec 617msec | avg time per step:  15msec
  9 % =      204 | time elapsed:             3sec 201msec | avg time per step:  15msec
 14 % =      306 | time elapsed:             4sec 912msec | avg time per step:  16msec
 19 % =      408 | time elapsed:             6sec 810msec | avg time per step:  18msec
 24 % =      510 | time elapsed:             8sec 853msec | avg time per step:  19msec
 29 % =      612 | time elapsed:            10sec 941msec | avg time per step:  20msec
 34 % =      714 | time elapsed:            13sec   2msec | avg time per step:  20msec
 39 % =      816 | time elapsed:            15sec  90msec | avg time per step:  20msec
 44 % =      918 | time elapsed:            17sec 290msec | avg time per step:  21msec
 49 % =     1020 | time elapsed:            19sec 626msec | avg time per step:  22msec
 54 % =     1122 | time elapsed:            22sec  19msec | avg time per step:  23msec
 59 % =     1224 | time elapsed:            24sec 520msec | avg time per step:  24msec
 64 % =     1326 | time elapsed:            27sec 220msec | avg time per step:  26msec
 69 % =     1428 | time elapsed:            30sec  69msec | avg time per step:  27msec
 74 % =     1530 | time elapsed:            33sec  29msec | avg time per step:  28msec
 79 % =     1632 | time elapsed:            36sec   8msec | avg time per step:  28msec
 84 % =     1734 | time elapsed:            38sec 560msec | avg time per step:  24msec
 89 % =     1836 | time elapsed:            41sec  11msec | avg time per step:  23msec
 94 % =     1938 | time elapsed:            43sec 526msec | avg time per step:  24msec
 99 % =     2040 | time elapsed:            46sec 134msec | avg time per step:  25msec
calculation  simulation time: 46sec 322msec = 46 sec
full simulation time: 41min 20sec 427msec = 2480 sec

Sadly, we do not have access to a Turing system. But I expect a similar init time increase for your GPU.

So in order to avoid the init time issue, please set the architecture correctly. (The issue is further worsened due to ISAAC's long compile time with number of species - so if you no not need ISAAC, do not compile it.)

Thanks for investigating @PrometheusPi . Do you think we could add this note somewhere to the docs, as e.g. I was not aware of this?

Btw to set the backend one can just provide an option to pic-build, e.g. pic-build -b cuda:60. PIC_BACKEND merely sets default value for -b (or at least that's how it seems supposed to work, however I never tried not defining it).

@sbastrakov Good idea. I think both a improved documentation and perhaps a separation between init time for PIConGPU and an init time for JIT as output would help to prevent such mistakes.

@PrometheusPi and @sbastrakov
I always used export PIC_BACKEND="cuda:72", though my cards support more, sm73 was not accepted at compilation time due to old CUDA. I admit that I installed ISAAC along with version develop and 0.5.0 but I haven't use it explicitly. Does it mean that once ISAAC is installed along with picongpu as for example spack install picongpu@develop +adios +isaac %[email protected] ^isaac@develop ^isaac-server@developit is always compiled, even without loading it?

I am happy to let you connect to one of my computers through TeamViewer and explore the Turing architecture. I will need to install picongpu@develop again to test your suggestion If you go to your .build directory and do ccmake . - what parameter is set for ALPAKA_CUDA_ARCH? and I always do it from scratch wiping all Spack folder for fear of dependencies clash.

I always asked myself if having CUDA and openMPI installed on the system not in Spack does help to speed up PIConGPU. What is your opinion about this?

And then there is this openMPI setup CUDA aware which I managed to follow and PIConGPU used.
https://www.open-mpi.org/faq/?category=buildcuda
Do you think it helps in any way as compared with the case when Spack builds CUDA and openMPI?

PROBLEM SOLVED!
For the performance shown below the specifications were:

  • single GPU 2070 Super,
  • CPU Intel Xeon X5660,
  • CUDA 10.2.1,
  • compute capability 7.5

electrons:

initialization time: 10sec 281msec = 10 sec
  0 % =        0 | time elapsed:            11sec 896msec | avg time per step:   0msec
  4 % =      102 | time elapsed:            24sec  35msec | avg time per step:  23msec
  9 % =      204 | time elapsed:            36sec 330msec | avg time per step:  23msec
 14 % =      306 | time elapsed:            48sec 674msec | avg time per step:  24msec
 19 % =      408 | time elapsed:       1min  1sec  14msec | avg time per step:  24msec
 24 % =      510 | time elapsed:       1min 13sec 428msec | avg time per step:  25msec
 29 % =      612 | time elapsed:       1min 25sec 555msec | avg time per step:  25msec
 34 % =      714 | time elapsed:       1min 37sec 800msec | avg time per step:  27msec
 39 % =      816 | time elapsed:       1min 50sec 365msec | avg time per step:  26msec
 44 % =      918 | time elapsed:       2min  3sec  48msec | avg time per step:  28msec
 49 % =     1020 | time elapsed:       2min 15sec 733msec | avg time per step:  28msec
 54 % =     1122 | time elapsed:       2min 28sec 356msec | avg time per step:  29msec
 59 % =     1224 | time elapsed:       2min 41sec 152msec | avg time per step:  31msec
 64 % =     1326 | time elapsed:       2min 53sec 873msec | avg time per step:  30msec
 69 % =     1428 | time elapsed:       3min  6sec 905msec | avg time per step:  31msec
 74 % =     1530 | time elapsed:       3min 19sec 637msec | avg time per step:  30msec
 79 % =     1632 | time elapsed:       3min 32sec 728msec | avg time per step:  31msec
 84 % =     1734 | time elapsed:       3min 45sec 214msec | avg time per step:  30msec
 89 % =     1836 | time elapsed:       3min 57sec 580msec | avg time per step:  29msec
 94 % =     1938 | time elapsed:       4min 10sec 135msec | avg time per step:  30msec
 99 % =     2040 | time elapsed:       4min 22sec 389msec | avg time per step:  29msec
calculation  simulation time:  4min 22sec 634msec = 262 sec
full simulation time:  4min 33sec 544msec = 273 sec

electrons + ions:

initialization time: 10sec 140msec = 10 sec
  0 % =        0 | time elapsed:            23sec 689msec | avg time per step:   0msec
  4 % =      102 | time elapsed:            48sec  69msec | avg time per step:  26msec
  9 % =      204 | time elapsed:       1min 12sec 380msec | avg time per step:  26msec
 14 % =      306 | time elapsed:       1min 36sec 784msec | avg time per step:  27msec
 19 % =      408 | time elapsed:       2min  1sec 277msec | avg time per step:  28msec
 24 % =      510 | time elapsed:       2min 25sec 401msec | avg time per step:  29msec
 29 % =      612 | time elapsed:       2min 49sec 395msec | avg time per step:  30msec
 34 % =      714 | time elapsed:       3min 13sec 532msec | avg time per step:  31msec
 39 % =      816 | time elapsed:       3min 38sec 155msec | avg time per step:  33msec
 44 % =      918 | time elapsed:       4min  2sec 815msec | avg time per step:  33msec
 49 % =     1020 | time elapsed:       4min 27sec 390msec | avg time per step:  34msec
 54 % =     1122 | time elapsed:       4min 52sec 126msec | avg time per step:  35msec
 59 % =     1224 | time elapsed:       5min 17sec  49msec | avg time per step:  37msec
 64 % =     1326 | time elapsed:       5min 42sec 128msec | avg time per step:  38msec
 69 % =     1428 | time elapsed:       6min  7sec  17msec | avg time per step:  36msec
 74 % =     1530 | time elapsed:       6min 31sec 902msec | avg time per step:  37msec
 79 % =     1632 | time elapsed:       6min 56sec 549msec | avg time per step:  38msec
 84 % =     1734 | time elapsed:       7min 20sec 947msec | avg time per step:  36msec
 89 % =     1836 | time elapsed:       7min 45sec 313msec | avg time per step:  36msec
 94 % =     1938 | time elapsed:       8min 10sec 404msec | avg time per step:  36msec
 99 % =     2040 | time elapsed:       8min 34sec   2msec | avg time per step:  37msec
calculation  simulation time:  8min 34sec 311msec = 514 sec
full simulation time:  8min 45sec 208msec = 525 sec

I wiped out the whole Spack folder and installed de develop version as spack install picongpu@develop +adios %[email protected] with CUDA 10.2 installed on the system. I also isntalled openPMD as spack install openpmd-api. With this ocasion ISAAC was not installed.

Then I loaded the program as
source $HOME/src/spack/share/spack/setup-env.sh && spack load openpmd-api && spack load picongpu %[email protected] && export PIC_BACKEND="cuda:75" && export OMPI_MCA_io=^ompio
so using compute capability 7.5 for the first time.

ALPAKA_CUDA_ARCH = 75

In conclusion, if the problem was not due to ISAAC and it shouldn't have been since I never set it up neither loaded it, it must have been due to running the code with slightly lower compute capability then what it can handle. That is 7.2 instead of 7.5.

Maybe an automatic change of CUDA capability can be implemented depending on the graphics card identified or on the CUDA toolking identified.

Some performance features for RTX 5000 GPU with the LWFA model. The CPU is 8-Core model: Intel Core i9-9900K

times

Glad the long initialization time issue is solved!

Regarding the run time, it is a bit weird to me that using 2 GPUs gives less than 2x speedup over 1 GPU. Perhaps the problem is too small so that they are underutilized, or the workload is not evenly distributed between them.

I agree with @sbastrakov the reduced speedup might be caused by an under-utilization. Furthermore I am confused why initialization for sm 7.3 takes 22 (35) seconds for 1 GPU and only 6 (6) seconds for two. For sm 7.5, the init times looks okay.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

psychocoderHPC picture psychocoderHPC  路  4Comments

ax3l picture ax3l  路  3Comments

saipavankalyan picture saipavankalyan  路  3Comments

cbontoiu picture cbontoiu  路  3Comments

steindev picture steindev  路  4Comments