Picongpu: Compile suite not working

Created on 14 Oct 2019  路  13Comments  路  Source: ComputationalRadiationPhysics/picongpu

Since Thursday last week (2019-10-10), the compile suite that runs the compile test of all PIConGPU commits is not responding.

@ax3l and @psychocoderHPC Could you please restart/fix it. You are the only ones with access to that server.

tools machinsystem question

All 13 comments

affects currently the following pull requests:

3086

3084

3052

Fixed. The CI was stopped last week during the power outage and our marker that a jobs is still checked was active even the job already finished.

@psychocoderHPC Do we need to push the pull requests again?

@ComputationalRadiationPhysics/picongpu-maintainers Would it be a good idea to have at least a second person at HZDR that is capable at restarting the compile test?

At least we are not the only labs with power outages ;)

Everyone can restart compile tests, just push to the PR again.

The compile node is under git and gets from time-to-time checked in here: https://github.com/ComputationalRadiationPhysics/compileNode This software, recipes and modules are in /opt and lmod is used. Think of this as manual spack before spack was a thing.

The compile node is reachable via hypnos and hemera as icn019a and we can add more users there, although it should retire soon.
When logged in, the compile user is sudo su - buildbot and after power-offs / netsplits usually only the file autoTests/runGuard must be removed which safeguards the CI cronjob against already running jobs.

The proxy at https://ci.plasma.ninja runs this software: https://github.com/ax3l/github_status_proxy

Another high-level overview is provided in DOI:10.5281/zenodo.15924, chapter 3.4.1.

The error was not caused by a power-outage but a 1h network downtime.
Pushing again or submitting new pull request strangely didn't trigger the buildbot.

@ax3l Thanks for all the information. I think that even icn019a will be deprecated soon, a few more users would be beneficial.

When logged in, the compile user is sudo su - buildbot and after power-offs usually only the file autoTests/runGuard must be removed which safeguards the CI cronjob against already running jobs.

Yes this was what I did. We are on a good way to remove this CI soon by our CI nodes at HZDR. There is only a minor issue to fix before we can move PIConGPU over to the CI nodes.

@psychocoderHPC Okay - thanks for the update. Does that mean you would prefer waiting with/not giving access to icn019a since it will deprecated soon enough, so that we do not need access?

offline discussed: everyone can get access but no introduction since I am also aware of all workflows.

pls be careful, it's a fragile node and I did not check in (backup to git remote) the updates of the new modules added to it, which came in in the last year or so ;-p

Was this page helpful?
0 / 5 - 0 ratings

Related issues

cbontoiu picture cbontoiu  路  3Comments

steindev picture steindev  路  4Comments

saipavankalyan picture saipavankalyan  路  3Comments

psychocoderHPC picture psychocoderHPC  路  4Comments

berceanu picture berceanu  路  3Comments