Picongpu: Compilation errors

Created on 15 Oct 2020  Â·  21Comments  Â·  Source: ComputationalRadiationPhysics/picongpu

Hello,

I had a fresh install of picongpu@develop with gcc version 9.3.0, CUDA 11.1 and openmpi 4.0.5 and there are some compilation errors for my model when using PIConGPU as spack load picongpu +adios %[email protected] && export PIC_BACKEND="cuda:75".

The first error was:
include/picongpu/param/particle.param:9:10: fatal error: pmacc/nvidia/rng/distributions/Uniform_float.hpp: No such file or directory so I disabled the header in particle.para since I don't seem to make use of it but the second error came as spack/linux-linuxmint20-westmere/gcc-9.3.0/picongpu-develop-xihuo4szr2tdyxw5p5gx6scr7n3omjtp/include/pmacc/../pmacc/nvidia/atomic.hpp(95): error #304: no instance of function template "alpaka::warp::activemask" matches the argument list. I attach to whole model as a folder where you can find the out.txt and out_2.txt compilation log files.

I also tried using the stable PIConGPU version but there were some gcc compiler incompatibilities and I switched to picongpu@develop

Maybe you can help me to debug this model, please
Thank you.
Cristian

LAYERS-2D-PLATES.zip

Most helpful comment

Just a side note: I am impressed by PIConGPU and your work, guys! Is this the most advanced PIC code available today? If yes, why most of the people in the laser-lasma community use FBPIC or EPOCH, even in Germany? Is it due to the lack of notoriety? Do we have yet the dedicated area for sharing papers written based on PIConGPU? Here is my humble contribution https://iopscience.iop.org/article/10.1088/1742-6596/1596/1/012028/meta

All 21 comments

Hello @cbontoiu , thanks for your report. Regarding the first issue with including that file, I think the issue is that you used some older setup which still had this include, while the file was removed in PIConGPU and all its example setups since. For this particular case, I believe the include was redundant to begin with and indeed just commenting out should be sufficient. Regarding the second, interesting, we had issues with what looks like similar things a couple of month ago as well, perhaps the fix was not full. I will take a look tomorrow.

pmacc/nvidia/rng/distributions/Uniform_float.hpp

This file was deleted in #3351

Yes, thats what I thought when I suggested the setup was created before.

I have to add that the distribution model LWFA compiles fine but fails at runtime with my setup. Here is what I get at the terminal:

Running program... using default compiler ==> Error: Spec 'picongpu@develop%[email protected]+adios+hdf5~isaac+png backend=cuda cudacxx=nvcc arch=linux-linuxmint20-westmere ^[email protected]%[email protected]+blosc~bzip2~fortran~hdf5~infiniband+lz4+mpi~netcdf+shared+sz~szip+zfp+zlib patches=01113e9efb929d71c28bf33cc8b7f215d85195ec700e99cb41164e2f8f830640,8ae17f655248e87cbab1d1ed794e15364a38d2f5f8d971b1086702f72d79bd42,d24b79b795f66e40ddcd331ea4be896ac9c393d6f68f4318616d23928b0694e9 staging=none arch=linux-linuxmint20-westmere ^[email protected]%[email protected] arch=linux-linuxmint20-westmere ^[email protected]%[email protected] arch=linux-linuxmint20-westmere ^[email protected]%[email protected] arch=linux-linuxmint20-westmere ^[email protected]%[email protected]+atomic+chrono~clanglibcpp~container~context~coroutine+date_time~debug+exception~fiber+filesystem+graph~icu+iostreams+locale+log+math~mpi+multithreaded~numpy~pic+program_options~python+random+regex+serialization+shared+signals~singlethreaded+system~taggedlayout+test+thread+timer~versionedlayout+wave cxxstd=11 visibility=hidden arch=linux-linuxmint20-westmere ^[email protected]%[email protected]+shared arch=linux-linuxmint20-westmere ^[email protected]%[email protected]+avx2 build_type=RelWithDebInfo patches=cd40604a26157a0e018ea496cf3267e116e6ec5ff80a7d1cef11b841c154c388 arch=linux-linuxmint20-westmere ^[email protected]%[email protected]~doc+ncurses+openssl+ownlibs~qt arch=linux-linuxmint20-westmere ^[email protected]%[email protected] arch=linux-linuxmint20-westmere ^[email protected]%[email protected] arch=linux-linuxmint20-westmere ^[email protected]%[email protected]+libbsd arch=linux-linuxmint20-westmere ^[email protected]%[email protected] arch=linux-linuxmint20-westmere ^[email protected]%[email protected] arch=linux-linuxmint20-westmere ^[email protected]%[email protected]+bzip2+curses+git~libunistring+libxml2+tar+xz arch=linux-linuxmint20-westmere ^[email protected]%[email protected]~cxx~debug~fortran~hl~java+mpi+pic+shared~szip~threadsafe api=none arch=linux-linuxmint20-westmere ^[email protected]%[email protected] arch=linux-linuxmint20-westmere ^[email protected]%[email protected] patches=26f26c6f29a7ce9bf370ad3ab2610f99365b4bdd7b82e7c31df41a3370d685c0 arch=linux-linuxmint20-westmere ^[email protected]%[email protected] arch=linux-linuxmint20-westmere ^[email protected]%[email protected] arch=linux-linuxmint20-westmere ^[email protected]%[email protected] arch=linux-linuxmint20-westmere ^[email protected]%[email protected]+mpi build_type=RelWithDebInfo patches=669608721dfce0ada7cef1ac84344352791a8916b7bb98ca8a0d4e6d4670e744 arch=linux-linuxmint20-westmere ^[email protected]%[email protected] arch=linux-linuxmint20-westmere ^[email protected]%[email protected] arch=linux-linuxmint20-westmere ^[email protected]%[email protected]~python arch=linux-linuxmint20-westmere ^[email protected]%[email protected] arch=linux-linuxmint20-westmere ^[email protected]%[email protected]+sigsegv patches=3877ab548f88597ab2327a2230ee048d2d07ace1062efe81fc92e91b7f39cd00,fc9b61654a3ba1a8d6cd78ce087e7c96366c290bc8d2c299f09828d793b853c8 arch=linux-linuxmint20-westmere ^[email protected]%[email protected]~symlinks+termlib arch=linux-linuxmint20-westmere ^[email protected]%[email protected]~atomics~cuda~cxx~cxx_exceptions+gpfs~java~legacylaunchers~lustre~memchecker~pmi~singularity~sqlite3+static~thread_multiple+vt+wrapper-rpath fabrics=none schedulers=none arch=linux-linuxmint20-westmere ^[email protected]%[email protected]+systemcerts arch=linux-linuxmint20-westmere ^[email protected]%[email protected]+cpanm+shared+threads arch=linux-linuxmint20-westmere ^[email protected]%[email protected] arch=linux-linuxmint20-westmere ^[email protected]%[email protected] build_type=RelWithDebInfo arch=linux-linuxmint20-westmere ^[email protected]%[email protected] arch=linux-linuxmint20-westmere ^[email protected]%[email protected]+bz2+ctypes+dbm~debug+libxml2+lzma~nis~optimizations+pic+pyexpat+pythoncmd+readline+shared+sqlite3+ssl~tix~tkinter~ucs4+uuid+zlib patches=0d98e93189bc278fbc37a50ed7f183bd8aaf249a8e1670a465f0db6bb4f8cf87 arch=linux-linuxmint20-westmere ^[email protected]%[email protected] arch=linux-linuxmint20-westmere ^[email protected]%[email protected] arch=linux-linuxmint20-westmere ^[email protected]%[email protected]+pic+shared build_type=RelWithDebInfo patches=c9cfecb1f7a623418590cf4e00ae7d308d1c3faeb15046c2e5090e38221da7cd arch=linux-linuxmint20-westmere ^[email protected]%[email protected]+column_metadata+fts~functions~rtree arch=linux-linuxmint20-westmere ^[email protected]%[email protected]~fortran~hdf5~netcdf~pastri~python~random_access+shared~stats~time_compression build_type=RelWithDebInfo arch=linux-linuxmint20-westmere ^[email protected]%[email protected] arch=linux-linuxmint20-westmere ^[email protected]%[email protected] arch=linux-linuxmint20-westmere ^[email protected]%[email protected]~pic arch=linux-linuxmint20-westmere ^[email protected]%[email protected]~aligned~fasthash~profile+shared~strided~twoway bsws=64 build_type=RelWithDebInfo arch=linux-linuxmint20-westmere ^[email protected]%[email protected]+optimize+pic+shared arch=linux-linuxmint20-westmere ^[email protected]%[email protected]+pic arch=linux-linuxmint20-westmere' matches no installed packages. tbg/submit.start: line 53: unexpected EOF while looking for matching `"' tbg/submit.start: line 61: syntax error: unexpected end of file

and here is the compilation log file:

out.txt

@cbontoiu The second issue could be triggered from an issue that alpaka was in the past installed to the include folder in your parameter set.
Could you please remove the folder .build and include/alpaka. Run the following commands in your case folder.

rm -r .build
rm -r include/alpaka

OK, now the model compiles but I see something new like openPMD: NOTFOUND. Could you please tell me what is this about? Maybe this can also be installed through Spack? In the end removing alpaka folder solves the problem and removing the header in particle.param is not required.
Thank you.

PIConGPU: 0.5.0-dev
  Build-Type: Release

Third party:
  OS:         Linux-5.4.0-51-generic
  arch:       x86_64
  CXX:        GNU (9.3.0)
  CMake:      3.18.2
  CUDA:       11.1.74
  mallocMC:   2.5.0
  Boost:      1.70.0
  MPI:        
    standard: 3.1
    flavor:   OpenMPI (4.0.5)
  PNGwriter:  0.7.0
  libSplash:  1.7.0 (Format 4.0)
  ADIOS:      1.13.1
  openPMD:    NOTFOUND

Yes you can install openpmd-api with 'spack install openpmd-api' https://openpmd-api.readthedocs.io/en/0.12.0-alpha/

We removed our hdf5 plugin. So tje command line option --hdf5.* will not be avialble anymore. With the openPMD api you can write hdd5 or adios1/adios2 files.
Note: we found today a bug in the output so current particle data written with openpmd api are not openpmd conform https://github.com/ComputationalRadiationPhysics/picongpu/issues/3383

We will remove the native adios plugin in PIConGPU soon to.

At runtime I also notice a problem which didn't exist before with this model. Maybe there is an issue with the grid and the memory allocation. Maybe instead of using just one mesh cell in the z-direction for a 2D model, the code now requires 4 cells (thinking about the circular polarization laser?) and this drives the required memory beyond the physical limit in my case.

I get:

Dimension z: Local grid size is not a multiple of supercell size. Auto adjust from 1 to 4
Dimension z: Local grid size is not containing at least 3 supercells. Auto adjust from 4 to 12
Dimension z: Local grid size must be greater or equal than the largest absorber. Auto adjust from 12 to 32
Dimension z: Invalid global grid size. Auto adjust from 1 to 32
 new grid size (global|local|offset): {1040,10400,32}|{1040,10400,32}|{0,0,0}
PIConGPUVerbose PHYSICS(1) | Sliding Window is OFF
Unhandled exception of type 'St13runtime_error' with message '/home/cristi/src/spack/opt/spack/linux-linuxmint20-westmere/gcc-9.3.0/picongpu-develop-xihuo4szr2tdyxw5p5gx6scr7n3omjtp/thirdParty/cupla/alpaka/include/alpaka/mem/buf/BufUniformCudaHipRt.hpp(487) 'cudaMalloc3D( &pitchedPtrVal, extentVal)' returned error  : 'cudaErrorMemoryAllocation': 'out of memory'!', terminating

At runtime I also notice a problem which didn't exist before with this model. Maybe there is an issue with the grid and the memory allocation. Maybe instead of using just one mesh cell in the z-direction for a 2D model, the code now requires 4 cells (thinking about the circular polarization laser?) and this drives the required memory beyond the physical limit in my case.

I get:

Dimension z: Local grid size is not a multiple of supercell size. Auto adjust from 1 to 4
Dimension z: Local grid size is not containing at least 3 supercells. Auto adjust from 4 to 12
Dimension z: Local grid size must be greater or equal than the largest absorber. Auto adjust from 12 to 32
Dimension z: Invalid global grid size. Auto adjust from 1 to 32
 new grid size (global|local|offset): {1040,10400,32}|{1040,10400,32}|{0,0,0}
PIConGPUVerbose PHYSICS(1) | Sliding Window is OFF
Unhandled exception of type 'St13runtime_error' with message '/home/cristi/src/spack/opt/spack/linux-linuxmint20-westmere/gcc-9.3.0/picongpu-develop-xihuo4szr2tdyxw5p5gx6scr7n3omjtp/thirdParty/cupla/alpaka/include/alpaka/mem/buf/BufUniformCudaHipRt.hpp(487) 'cudaMalloc3D( &pitchedPtrVal, extentVal)' returned error  : 'cudaErrorMemoryAllocation': 'out of memory'!', terminating

You need at least 3 supercells in each dimension. PIConGPU is automatically. If you have no periodic bounderies the absorber must fit into the simulation area. Thats why the domain is auto adjusted to 32 cells.
To run a 2D simulation please change the dimension in dimension.param
e.g. in the laser wake field example https://github.com/ComputationalRadiationPhysics/picongpu/blob/454a28efe6a6eb0fa1e0e97031ef2569a1ad313b/share/picongpu/examples/LaserWakefield/include/picongpu/param/dimension.param#L23

You will than run a 2D3V simulation. (note: one dimensional is not possible)

@psychocoderHPC This issue was for my model which is 2D not for the LWFA.

@psychocoderHPC This issue was for my model which is 2D not for the LWFA.

You can use 'pic-edit dimension.param' to define that your case should be 2d too. The link was only an example that you know ehat you should change.

Yes, so just to reiterate @psychocoderHPC 's reply. In PIConGPU 2d simulation is set by modifying the dimension.param contents. And then e.g. grid size should be given as a 2d vector. However, having dimension set to 3 and grid size in form of N x M x 1 does not result in a 2d run. Instead, PIConGPU will adjust the 1 to be the least possible size, which was 32 in your case, and run like that, using way more memory than a user probably expected.

Just a side note: I am impressed by PIConGPU and your work, guys! Is this the most advanced PIC code available today? If yes, why most of the people in the laser-lasma community use FBPIC or EPOCH, even in Germany? Is it due to the lack of notoriety? Do we have yet the dedicated area for sharing papers written based on PIConGPU? Here is my humble contribution https://iopscience.iop.org/article/10.1088/1742-6596/1596/1/012028/meta

Thanks for kind words and sharing the paper @cbontoiu ! We actually have an issue for papers #3258 , even inspired by you I believe, however it got a little stuck seemingly, feel free to give it a bump.

We are of course very grateful when people spread the word, be it in discussions or via papers that say that simulation was done using PIConGPU

I still have some problems with the new installation. In Spack I have picongpu 0.4.3 installed with gcc 7.5.0 from the system and also on the system CUDA 11.1 and openmpi 4.0.5 to be used by picongpu

I fetch the LWFA mode and compile like:

Screenshot_2020-10-21_21-58-37

and I get an error like spack/opt/spack/linux-linuxmint20-westmere/gcc-7.5.0/picongpu-0.4.3-nvff4ydogmuyq62tzaqzaplyfianjo72/thirdParty/alpaka/include/alpaka/core/ConcurrentExecPool.hpp:361:19: error: base operand of ‘->’ is not a pointer.

The whole log file is attached.

out.txt

I would like to use the lastest version that is picongp@develop or version 5.0 but I don't know what is the right compiler for it.

I still have some problems with the new installation. In Spack I have picongpu 0.4.3 installed with gcc 7.5.0 from the system and also on the system CUDA 11.1 and openmpi 4.0.5 to be used by picongpu

I fetch the LWFA mode and compile like:

Screenshot_2020-10-21_21-58-37

and I get an error like spack/opt/spack/linux-linuxmint20-westmere/gcc-7.5.0/picongpu-0.4.3-nvff4ydogmuyq62tzaqzaplyfianjo72/thirdParty/alpaka/include/alpaka/core/ConcurrentExecPool.hpp:361:19: error: base operand of ‘->’ is not a pointer.

The whole log file is attached.

out.txt

I would like to use the lastest version that is picongp@develop or version 5.0 but I don't know what is the right compiler for it.

The error looks like a compiler issue or a combination of CUDA11 +gcc7.5. The code is definitive correct.

I would like to use the lastest version that is picongp@develop or version 5.0 but I don't know what is the right compiler for it.
Yuu can use any C++11 compiler for 0.5.0 and any C++14 compiler for the development branch. ALl gcc 6+ schould work in both cases.

@psychocoderHPC
Thank you. Yes, I managed to get LWFA and Bremstrahlung models compiled with CUDA 11.1, opnempi 4.0.5, gcc 9.3.0 and picongpu@develop with Spack. Both models give me the same error at runtime:

tbg/submit.start: line 53: unexpected EOF while looking for matching `"'
tbg/submit.start: line 61: syntax error: unexpected end of file

I attach the tbg/submit.start file.

submit.start.txt

In line 53 of your submit.start, you have to add a " before the word Note such that the note becomes a string.
I just don't understand, why it is not there. Is it possible, that you worked in that file before and maybe accidentally deleted the character?
Probably we will find the reason if further errors occur. :roll_eyes:

Thanks! I will try tomorrow. For sure I didn't alter the model. And it happens for two different models in the distribution, so it might be a small bug.

@cbontoiu Could you post the file submit.tpl to that we can check if the issue of the missing " is already there. I checked the

@cbontoiu Looks like the isse is already in the dev branch: https://github.com/ComputationalRadiationPhysics/picongpu/blob/4bb3936fffac6df6122269cbbf26c2dc56f38b4b/etc/picongpu/bash/mpiexec.tpl#L56

I will open an PR to fix it

Bug fix for the last issue is opened #3403

I will close this issue because it contains already a mix of different issues.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ax3l picture ax3l  Â·  4Comments

cbontoiu picture cbontoiu  Â·  3Comments

hightower8083 picture hightower8083  Â·  4Comments

ax3l picture ax3l  Â·  4Comments

bussmann picture bussmann  Â·  4Comments