One process failed and .nextflow.log file show below error snippet :
一月-30 17:22:35.267 [Task monitor] DEBUG nextflow.executor.GridTaskHandler - Failed to get exist status for process TaskHandler[jobId: 46706; id: 68; name: quantify_01assemble_stat (EXN4_2); status: RUNNING; exit: -; error: -; workDir: /mnt/beegfs/u-analysis/UserData/yhfu/visual/6059b55cf1d241a8_1548678182040/work/df/9addc0825bc46ef13fb658d0535dd9 started: 1548814647433; exited: -; ] -- exitStatusReadTimeoutMillis: 270000; delta: 8345432
Current queue status:
> (null)
Content of workDir: /mnt/beegfs/u-analysis/UserData/yhfu/visual/6059b55cf1d241a8_1548678182040/work/df/9addc0825bc46ef13fb658d0535dd9
null
一月-30 17:22:35.275 [Task monitor] DEBUG nextflow.executor.GridTaskHandler - clear complete job 46706 result : true
一月-30 17:22:35.276 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[jobId: 46706; id: 68; name: quantify_01assemble_stat (EXN4_2); status: COMPLETED; exit: -; error: -; workDir: /mnt/beegfs/u-analysis/UserData/yhfu/visual/6059b55cf1d241a8_1548678182040/work/df/9addc0825bc46ef13fb658d0535dd9 started: 1548814647433; exited: -; ]
一月-30 17:22:35.438 [Task monitor] INFO nextflow.processor.TaskProcessor - [df/9addc0] NOTE: Process `quantify_01assemble_stat (EXN4_2)` terminated for an unknown reason -- Likely it has been terminated by the external system -- Execution is retried (1)
and see such process .exitcode and got 0, Using qacct -j 46706 and i can see such job finished at 18:40,so why NF thought such process finished at 17:22?
Expected behavior: Hope Nextflow can output real status of job running on SGE who status should be consistent with SGE final status
Actual behavior: Nextflow return job status beforehand
Sorry, it's an accidental event but happened a bit too often, but the process and result is always the same, i always use the same environment
check my uploaded .nextflow(2).log file, a little bit large because i set NXF_TRACE=nextflow
.nextflow (2).log
I had check SGE qmaster log at 16:00-18:00 and found lots of chdir error and it was another colleague task, such error also very strange, i think maybe some problem happened on distributed file system, here i'm using Beegfs file system.
So i want to know if you had ever happened such error and what will result to such situation ?
Is the log produced with a custom nextflow build?
eh...I had already build nextflow my own, but my own build version just output some running jobs file which will be watched by external tools and it doesn't matter with nextflow core running logic, For above job termination issue, both official nextflow and my own happened the same, i already tested.
Today, i happened to catch another process termination error snapshot, see next two screen shot:
nextflow hint error at 2-1 17:36

job still run in 2-2, job qstat result show below

I'm closing this issue because the log file has not been produced with the Nextflow distribution and does not contain the expected information. Therefore is useless.
One reason for this problem can be that you use a cluster submission queue that is an alias for another. As a consequence, Nextflow successfully submits jobs to it, however, checking for running jobs using qstat -q ... results in an empty list.
Since Nextflow doesn't check immediately for running jobs, this problem only affects jobs running for a few minutes.
More likely it was due to #1045
Most helpful comment
One reason for this problem can be that you use a cluster submission queue that is an alias for another. As a consequence, Nextflow successfully submits jobs to it, however, checking for running jobs using
qstat -q ...results in an empty list.Since Nextflow doesn't check immediately for running jobs, this problem only affects jobs running for a few minutes.