Picongpu: Fail to restart simulation

Created on 2 Dec 2019  路  5Comments  路  Source: ComputationalRadiationPhysics/picongpu

Hi, I run a simulation for large time steps. The walltime is limited to 24 hours. Therefore, I made restart from the previous check point. I have already made two restarts without problem. When come to the third one I got the following error:

WARN : ADIOS is set to abort on error terminate called after throwing an instance of 'std::runtime_error' what(): ADIOS: File does not exist. [gpu27:19242] *** Process received signal *** [gpu27:19242] Signal: Aborted (6) [gpu27:19242] Signal code: (-6) [gpu27:19242] [ 0] /lib64/libpthread.so.0(+0xf7e0)[0x7ff2f3d4b7e0] [gpu27:19242] [ 1] /lib64/libc.so.6(gsignal+0x35)[0x7ff2f01604f5] [gpu27:19242] [ 2] /lib64/libc.so.6(abort+0x175)[0x7ff2f0161cd5] [gpu27:19242] [ 3] /apps/compilers/gnu/6.4.0/lib64/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x15d)[0x7ff2f09958ed] [gpu27:19242] [ 4] /apps/compilers/gnu/6.4.0/lib64/libstdc++.so.6(+0x8e8a6)[0x7ff2f09938a6] [gpu27:19242] [ 5] /apps/compilers/gnu/6.4.0/lib64/libstdc++.so.6(+0x8e8f1)[0x7ff2f09938f1] [gpu27:19242] [ 6] /apps/compilers/gnu/6.4.0/lib64/libstdc++.so.6(+0x8eb08)[0x7ff2f0993b08] [gpu27:19242] [ 7] /work2/hpce3/jfuhong/runs/BLASTER00013D_4/input/bin/picongpu[0x66d2c5] [gpu27:19242] [ 8] /work2/hpce3/jfuhong/runs/BLASTER00013D_4/input/bin/picongpu[0x5e1ad6] [gpu27:19242] [ 9] /work2/hpce3/jfuhong/runs/BLASTER00013D_4/input/bin/picongpu[0x61dc2a] [gpu27:19242] [10] /work2/hpce3/jfuhong/runs/BLASTER00013D_4/input/bin/picongpu[0x6be8db] [gpu27:19242] [11] /work2/hpce3/jfuhong/runs/BLASTER00013D_4/input/bin/picongpu[0x6bf970] [gpu27:19242] [12] /work2/hpce3/jfuhong/runs/BLASTER00013D_4/input/bin/picongpu[0x42bb9b] [gpu27:19242] [13] /lib64/libc.so.6(__libc_start_main+0x100)[0x7ff2f014cd20] [gpu27:19242] [14] /work2/hpce3/jfuhong/runs/BLASTER00013D_4/input/bin/picongpu[0x42d1f1] [gpu27:19242] *** End of error message ***

I attached the stderr for run3 and run4.

stderr.zip

plugin question

All 5 comments

Hello @StevE-Ong , thanks for your report.

Sorry for a very simplistic idea of what to try. Basically, the error says that the checkpoint file can't be opened. Could be because the path was wrong, or that something happened to the file or perhaps your filesystem. To check the path, you can look to your stdout file, which should have ADIOS: open file: [filename], and check if this is the right path. That should confirm or eliminate the wrong path conjecture. Unfortunately, with the other two it would be more tricky to investigate.

Hello @sbastrakov, thank you for your reply.
The path is corrected and now it is running. Since you mentioned about ADIOS: open file: [filename], in stdout file, I have never seen this from my first run on picongpu. Is there something I missed out? I only have the ADIOS version printed.

Glad it works now @StevE-Ong .

Sorry, my earlier description of the output was imprecise. We indeed have this output, but it is not enabled by default. To enable it one needs to rebuild and have -c "-DPIC_VERBOSE_LVL=32" as an option to pic-build.

Shall I close this issue?

I see. Thanks.

Potentially you did go to simOutput/checkpoints/ and run bpmeta checkpoint_<yourStep>.bp to generate a .bp meta file from the .bp.dir raw data?
You only have to do this if you use the disable-meta option in ADIOS1 checkpoints. It's recommended to set this for large-scale runs to have faster I/O, but requires this additional step with bpmeta.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

sbastrakov picture sbastrakov  路  3Comments

berceanu picture berceanu  路  4Comments

berceanu picture berceanu  路  3Comments

cbontoiu picture cbontoiu  路  3Comments

cbontoiu picture cbontoiu  路  3Comments