picongpu run on workstation

Created on 25 Sep 2018  路  35Comments  路  Source: ComputationalRadiationPhysics/picongpu

Hi,

My question is too elementary. I installed picongpu on a workstation (having two intel Xeon E5-2630 v3 CPUs and two nvidia-quadro K4200 GPUs) using spack. I tried looking for instructions to run the code on desktop/workstation after a spack-install and couldn't find.

Where do I locate the example files?
What are the files I need to edit? Do I need to create some profile or config file?
How do I run the code for any given example file ?

machinsystem question

All 35 comments

Thank you for the question!

We do not have much experience with the Nvidia Quadro line, but the K4200 with SM 3.0 should work just fine.

Our spack install (repo) is a preview. After installing everything with spack, you just do:

spack load picongpu

This is identical to sourcing a manually written picongpu.profile in the other install methods. The only thing it does not set is export MY_NAME="$(whoami) <$MY_MAIL>" which you can put in your .bashrc or .profile if you want this meta-data in your output files (e.g. HDF5, PNG).

You now have variables such as $PIC_EXAMPLES and $PICSRC defined as well as all paths for the tools such as tbg and pic-create, pic-build that are used throughout the rest of the manual.

The command " spack load picongpu" had successfully loaded picongpu. But I had no clue about these environment variables.

Thanks for your help.

Glad this works for you!
Yes, feel free to ask any such questions, so we know where to improve the docs :)

Hi @ajitup73, you just reopened the issue. Do you have further questions regarding it or was this by accident? :)

Hi,

Now I land into new problems. They are listed below:

  1. I do "spack load backend=cuda"
    and pic-create $PIC_EXAMPLES/LaserWakefield/ $HOME/paramSets/myLWFA
    and cd ~/build && pic-configure $HOME/paramSets/myLWFA/
    amd make -j install

It configures the example (see attached file pic-config_cuda.txt) successfully
BUT when I run make -j install, it gives error (see attached file make_cuda.txt)

  1. I do "spack load backend=opm2b"
    and pic-create $PIC_EXAMPLES/LaserWakefield/ $HOME/paramSets/myLWFA
    and cd ~/build && pic-configure $HOME/paramSets/myLWFA/
    amd make -j install

It configures the example (see attached file pic-config_cuda.txt) successfully
make command is also successful (see attached file make_omp2b.txt)
Now, when I run the wakefield exapmle, it runs and gives error (see attached run-omp2b.txt, output-omp2b.txt)

i.e the example file has compilation error with "cuda" and run error with "omp2b"

Kindly let me know how to deal with these errors.

make_cuda.txt
make_omp2b.txt
output-omp2b.txt
pic-config_cuda.txt
pic-config_omp2b.txt
run-omp2b.txt

I forgot to mention that I unload picongpu-cuda before loading picongpu-omp2b

both of them have been installed with spack %[email protected] as they were giving error with gcc-5.4.0 and gcc-7.1.0

Hm, that's an upstream bug with recent changes in spack load spec matching. I am currently out of office, but can escalate the issue next week.

Will wait for your suggestions.

I just looked the issue up and it's a known spack issue https://github.com/spack/spack/issues/6314 that our recipe triggers.

You could try installing without ADIOS, did you add +adios by hand? It's not default as far as I can see in the recipe.
(Update: just saw that our mandatory OpenMPI dependency also triggers this.)

found a typo in your usage.

Instead of unload, just open a fresh terminal to be sure.

During load, you forgot the picongpu package name:

spack load picongpu backend=opm2b
# ...

I edit the package.py file in spack-repo and enable the adios plugin there itself in my previous attempts.

The command I run is "spack load picongpu backend=opm2b" or "spack load picongpu backend=cuda". It was a typo while writing the issue here. I rechecked the commands and they are giving the results I have reported here.

I also tried from a different user account where I used "spack install picongpu+adios %[email protected]"
and then followed by it "spack load picongpu".
and pic-create $PIC_EXAMPLES/LaserWakefield/ $HOME/paramSets/myLWFA
and cd ~/build && pic-configure $HOME/paramSets/myLWFA/
and make -j install. It is still giving the same error while compiling the example.

Again, in a new user area, I tried only "spack install picongpu %[email protected]"
and then followed by it "spack load picongpu".
and pic-create $PIC_EXAMPLES/LaserWakefield/ $HOME/paramSets/myLWFA
and cd ~/build && pic-configure $HOME/paramSets/myLWFA/
amd make -j install. It is still giving the same error.

Again, in a different user area, I tried only "spack install picongpu backend=omp2b %[email protected]"
and then followed by it "spack load picongpu backend=omp2b".
and pic-create $PIC_EXAMPLES/LaserWakefield/ $HOME/paramSets/myLWFA
and cd ~/build && pic-configure $HOME/paramSets/myLWFA/
and make -j install.
Here, it completes the installation of example and when I run it, it is still giving the error reported in my first post.

Kindly help.

Ok. This is a spack bug, so we need to hack around it temporarily (and I did this now).

enable the adios plugin there itself

don't. please keep it disabled, you won't need it on a work-station. please keep the default (off).

spack load picongpu backend=omp2b

Let us work on this, the CPU backend, first since it already installs for you.

First, do

spack uninstall --all picongpu

Now, update your spack-repo/, I added a flaky work-around in https://github.com/ComputationalRadiationPhysics/spack-repo/commit/76171f57d18091b397473805a0e3722fb8bb5e36

cd $HOME/src/spack-repo
git fetch --all
git pull --ff-only
# please check this succeeds!

Now repeat the steps from

spack install picongpu backend=omp2b
spack load picongpu backend=omp2b

# ... basic PIConGPU usage now

running with tbg should now work.

I followed the instructions and did "spack install picongpu backend=omp2b %gcc@my_gcc_version"
and it seems to work. I am waiting for the results and will revert back to you if I have any further problem or problem with the results. Thanks for this. NOW, what about backend=cuda ?? how to go about it.


tbg -s bash -c etc/picongpu/1.cfg -t etc/picongpu/bash/mpirun.tpl $HOME/runs/lwfa_001
Running program...
no binary 'cuda_memtest' available, skip GPU memory test
Data for JOB [59488,1] offset 0 Total slots allocated 16

======================== JOB MAP ========================

Data for node: plasma01 Num slots: 16 Max slots: 0 Num procs: 1
Process OMPI jobid: [59488,1] App: 0 Process rank: 0 Bound: socket 0[core 0[hwt 0-1]]:[BB/../../../../../../..][../../../../../../../..]

=============================================================
[1,0]:PIConGPU: 0.4.0-rc2
[1,0]: Build-Type: Release
[1,0]:
[1,0]:Third party:
[1,0]: OS: Linux-4.4.76-1-default
[1,0]: arch: x86_64
[1,0]: CXX: GNU (4.9.4)
[1,0]: CMake: 3.12.2
[1,0]: Boost: 1.65.1
[1,0]: MPI:
[1,0]: standard: 3.1
[1,0]: flavor: OpenMPI (3.1.2)
[1,0]: PNGwriter: 0.7.0
[1,0]: libSplash: 1.7.0 (Format 4.0)
[1,0]: ADIOS: NOTFOUND
[1,0]:PIConGPUVerbose PHYSICS(1) | Sliding Window is OFF
[1,0]:PIConGPUVerbose PHYSICS(1) | used Random Number Generator: RNGProvider3AlpakaRand seed: 42
[1,0]:PIConGPUVerbose PHYSICS(1) | Courant c*dt <= 1.00229 ? 1
[1,0]:PIConGPUVerbose PHYSICS(1) | species e: omega_p * dt <= 0.1 ? 0.0247974
[1,0]:PIConGPUVerbose PHYSICS(1) | y-cells per wavelength: 18.0587
[1,0]:PIConGPUVerbose PHYSICS(1) | macro particles per device: 8388608
[1,0]:PIConGPUVerbose PHYSICS(1) | typical macro particle weighting: 6955.06
[1,0]:PIConGPUVerbose PHYSICS(1) | UNIT_SPEED 2.99792e+08
[1,0]:PIConGPUVerbose PHYSICS(1) | UNIT_TIME 1.39e-16
[1,0]:PIConGPUVerbose PHYSICS(1) | UNIT_LENGTH 4.16712e-08
[1,0]:PIConGPUVerbose PHYSICS(1) | UNIT_MASS 6.33563e-27
[1,0]:PIConGPUVerbose PHYSICS(1) | UNIT_CHARGE 1.11432e-15
[1,0]:PIConGPUVerbose PHYSICS(1) | UNIT_EFIELD 1.22627e+13
[1,0]:PIConGPUVerbose PHYSICS(1) | UNIT_BFIELD 40903.8
[1,0]:PIConGPUVerbose PHYSICS(1) | UNIT_ENERGY 5.69418e-10
[1,0]:initialization time: 2sec 349msec = 2 sec
[1,0]: 0 % = 0 | time elapsed: 978msec | avg time per step: 0msec

Yay, looks good so far!

Yes, now we check what's off there. Can you try to install it again and check if the error message is still the same?

I created a fresh user account and installed picongpu (without adios, isaac).

spack install picongpu %[email protected]
spack load picongpu
pic-create $PIC_EXAMPLES/LaserWakefield/ $HOME/paramSets/myLWFA
tempo@plasma01:~/build> pic-configure $HOME/paramSets/myLWFA
tempo@plasma01:~/build> make -j install

Now everything is default and backend is cuda. The example file compilation is giving the same error (see the attached file).

I uninstalled picongpu and re-installed with isaac, again it fails with same error.
Again I uninstalled and re-installed with adios+isaac, it fails with same error.

make-output.txt

This looks like a problem with our code and CUDA 9.2.
I isolated this into #2714 to investigate.

You can do one of the following to go on:
a) try again with e.g. GCC 5.4 (as described in the manual, if you have this GCC version) instead of 4.9 or
b) try again with CUDA 9.1 instead

for b), first deinstall the GPU version via:

spack uninstall --all picongpu backend=cuda

Then go into your $HOME/src/spack-repo/packages/picongpu/package.py and change this line
from

depends_on('[email protected]:9.2', when='backend=cuda')

to

depends_on('[email protected]:9.1', when='backend=cuda')

then

spack install ...
spack load ...
# ...

as always.

@ajitup73 sorry to bother you, but can I ask you for a third option c) to try out?

Same steps required as b) but change the lines

    depends_on('[email protected]:1.65.1')
    depends_on('[email protected]', when='backend=cuda ^cuda@9:')

to

    depends_on('[email protected]:1.65.1 cxxstd=11')
    depends_on('[email protected] cxxstd=11', when='backend=cuda ^cuda@9:')

and keep GCC 4.9 as you did.

I just applied this and it should make "all things boost" more reliable.

I had reverted to cuda-9.1.

With cuda 9.1 and the change in boost option to c++11, it is working fine.
Now, the example file has compiled and is running.
Thanks for your kind support.

On Monday, I will uninstall cuda-9.1 version of picongpu, will try with cuda-9.2 while retaining the boost option change, and I am sure that it will run. I will let you know!

Thanks again.

I tried following configuration:

depends_on('[email protected]:9.2', when='backend=cuda')
depends_on('[email protected]:1.65.1 cxxstd=11')
depends_on('[email protected] cxxstd=11', when='backend=cuda ^cuda@9:')

and the problem is there. After loading picongpu, the example file gets configured but fails on make command with the error reported earlier in this issue.

So, there could be some problem with cuda-9.2 and code !

Thanks, that means #2714 is a bug we have to fix or a boost dependency update we have to push.

For curiosity, is the same problem present in newer boost releases? You can check via:

depends_on('[email protected]', when='backend=cuda')
depends_on('[email protected] cxxstd=11')

Yes, the "make -j install" fails after configuring the example.
makeError_cuda9.2_boost1.67.txt

depends_on('[email protected]:9.2', when='backend=cuda')
depends_on('[email protected]:1.65.1 cxxstd=11')
depends_on('[email protected] cxxstd=11', when='backend=cuda ^cuda@9:')

the example program compiles and installs successfully. However, when I run it, it fails with following error. th nvidia system details are also in the attached file.
system-details.txt

tempo@plasma01:~/paramSets/myLWFA2> tbg -s bash -c etc/picongpu/2.cfg -t etc/picongpu/bash/mpirun.tpl $HOME/runs/lwfa_004
Running program...
 Data for JOB [56519,1] offset 0 Total slots allocated 16

 ========================   JOB MAP   ========================

 Data for node: plasma01        Num slots: 16   Max slots: 0    Num procs: 2
        Process OMPI jobid: [56519,1] App: 0 Process rank: 0 Bound: socket 0[core 0[hwt 0-1]]:[BB/../../../../../../..][../../../../../../../..]
        Process OMPI jobid: [56519,1] App: 0 Process rank: 1 Bound: socket 0[core 1[hwt 0-1]]:[../BB/../../../../../..][../../../../../../../..]

 =============================================================
</home/tempo/src/spack/opt/spack/linux-opensuse_leap15-x86_64/gcc-7.3.0/picongpu-0.4.0-rc3-dk6p52imkiimygfftbxu5pdxzx5bawko/thirdParty/cuda_memtest/misc.cpp>:33
terminate called after throwing an instance of 'std::runtime_error'
  what():  [NVML] Error: Not Supported
</home/tempo/src/spack/opt/spack/linux-opensuse_leap15-x86_64/gcc-7.3.0/picongpu-0.4.0-rc3-dk6p52imkiimygfftbxu5pdxzx5bawko/thirdParty/cuda_memtest/misc.cpp>:33
terminate called after throwing an instance of 'std::runtime_error'
  what():  [NVML] Error: Not Supported
cuda_memtest crash: see file /home/tempo/runs/lwfa_004/simOutput/cuda_memtest_plasma01_1.err
cuda_memtest crash: see file /home/tempo/runs/lwfa_004/simOutput/cuda_memtest_plasma01_0.err
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[56519,1],1]
  Exit code:    1
--------------------------------------------------------------------------

Yes, the "make -j install" fails after configuring the example.
makeError_cuda9.2_boost1.67.txt

Thanks for testing and posting the report!
Confusing result: For some reason your CMake finds Boost 1.67.0 (which we support) but it compiles / throws while building PIConGPU with Boost 1.68.0 (which we don't support yet in 0.4.0-rc3).

However, when I run it, it fails with following error:
cuda_memtest: [NVML] Error: Not Supported

This is a bug in one of the Nvidia support libraries, NVML, and caused by our GPU health checker cuda_memtest. We can work-around this https://github.com/ComputationalRadiationPhysics/cuda_memtest/issues/16 (will be fixed in PIConGPU 0.4.0-rc4)

Until then: you don't need the cuda_memtest tool, so let's disable it during compile with

pic-build -b cuda:30 -c "-DCUDAMEMTEST_ENABLE=OFF"

for your K4200 workstation.

If the tool is not compiled, tbg will automatically skip it during execution of PIConGPU.

I am running it in spack environment.
which file do I edit to insert this option of disabling cudamemtest?

Just during your normal PIConGPU basic usage:

  1. pic-create ... ... (and cd)
  2. pic-build ...
  3. tbg ...

add the shown option -c "-DCUDAMEMTEST_ENABLE=OFF" to the pic-build command.

Thanks for your help.
the compilation and run advanced a few steps after disabling cudamemtest but still the run, after initializing a few things, fails.
see the attachment
pic-build-log.txt
run-log.txt

.

The error

alpaka/include/alpaka/event/EventCudaRt.hpp(195)
'ret = cudaEventQuery( event.m_spEventImpl->m_CudaEvent)' returned error  : 'cudaErrorLaunchTimeout': 'the launch timed out and was terminated'!

indicates your X-window server is running on the same GPU. That will interrupt long running compute kernels.

How many GPUs do you have on the system?
If it's only one, you should disable the window system fully when running PIConGPU, so it can take the whole GPU for computations and does not run into kernel timeouts because of the window manager.

Assuming your openSUSE uses systemd, you can stop the window manager temporarily via

sudo systemctl stop display-manager

and restart it later with

sudo systemctl start display-manager

Careful: save and close all open applications before doing this. This will shut down your graphical display and gives you a plain terminal.

An alternative configuration, assuming your system also has some kind of on-board graphics, would be to reconfigure your X11 service to not use the Quadro GPU but only the on-board GPU.

Thanks.
It is running in terminal after I shutdown the display-manager.

Excellent!
Just to close the thread: can you show me the spack install problem you saw with ISL when installing GCC 7.3.0?

Also: a big thank you again for trying our 0.4.0 release candidates! Your feedback made our release tremendously more stable and reliable. Thanks a lot! :sparkles:

problem you saw with ISL when installing GCC 7.3.0?

It seems it had to do with some system setting, as it is not repeating. Thanks for providing continuous help.

Alright, thanks a lot! Feel free to open new GitHub issues if anything further arises! :)

@ajitup73 uh, one last thing: do you mind adding yourself to our community map? :)
https://github.com/ComputationalRadiationPhysics/picongpu-communitymap

Was this page helpful?
0 / 5 - 0 ratings

Related issues

berceanu picture berceanu  路  4Comments

cbontoiu picture cbontoiu  路  3Comments

ax3l picture ax3l  路  3Comments

sbastrakov picture sbastrakov  路  3Comments

psychocoderHPC picture psychocoderHPC  路  4Comments