When calling Job#finished on job which is already removed by call to Job#remove or has removeOnComplete true in JobOpts then this function never resolves or rejects.
First test case:
``` node js
let Queue = require("bull");
let queue = new Queue("queue");
queue.process(_job => console.log("started"));
(async function()
{
let job = await queue.add({});
await job.finished(); // ensure job finished so we can remove it, we could also wait here some time
await job.remove();
console.log("removed");
await job.finished(); // finished on removed job never resolves or rejects
console.log("finished"); // this is never printed
})();
Second test case:
``` node js
let Queue = require("bull");
let queue = new Queue("queue");
queue.process(_job => console.log("started"));
(async function()
{
let job = await queue.add({}, { removeOnComplete: true });
await job.finished(); // wait until is job completed and removed, also could be used await Promise.delay(1000)
console.log("finished 1");
await job.finished();
console.log("finished 2"); // this is never printed
})();
3.10.0
In function Job#finished is first check whether is job finished and if yes then it resolves or rejects immediately. In other case it is waiting for global events completed or failed and then either resolves or rejects but in case job is already removed isFinished probably returns that it is not finished and it will never fire completed or failed events.
Also it is not enough to check whether job exists here. Job#finished relies on assumption that job cannot be removed without firing completed or failed events. But it can be removed in case job is removed before it had chance to run. Job#finished must listen also for global:removed event. Unfortunately this event is still not implemented:
removeJob-9.lua:
-- TODO PUBLISH global event 'removed'
At last if would be listeners for global:completed and global:failed registered before call to isFinished then there would be no need to have that "watchdog" checking isFinished every 5 seconds.
I think that for this particular case, isFinished should just throw and exception if the job does not exist.
You meant reject as it is returning promise.
There will still be at least 5 seconds waiting in Job#finished when happens that job is removed during waiting after was isFinished called. Ideal solution would be implemented global:removed event.
In current solution there still can occur this 5 seconds waiting if is job completed or failed right after isFinished and before listening on events. There are some calls at start of Job#finished like _registerEvent (I first thought from that point it can catch these events but not). I did not mean these. I meant line where it actually starts listening on events:
node js
this.queue.on('global:completed', onCompleted);
this.queue.on('global:failed', onFailed);
Ideally this should be before call to isFinished so it will never wait 5 seconds (watchdog interval) and there will be no need to have this watchdog.
Another good reason to have global:removed is to delete associated resources together with job. I assume it is not very good to store large files in job data. Jobs with removeOnComplete/removeOnFail are removed without even local event "removed" and as removeOnComplete/removeOnFail can be also "A number specified the amount of jobs to keep" it is hard to predict when which job will be removed and when to remove associated files (lets say one does not want to remove them when is task finished or failed for possible manual job retries, inspection and so on).
Maybe it is another bug that local removed is not called when using removeOnComplete/removeOnFail.
I'm having a similar issue. I use removeOnComplete: true to create a large number of jobs and when I to do a Promise.all on all the job.finished results it never completes.
@manast wouldn't you consider this a bug?
@sagivf yes, sorry for the late response. It has been a busy summer. I would like to fix the issue but regarding the watchdog we still need it because events based on pubsub are not 100% reliable. On Bull 4.0 we base events on XSTREAMS and there we do not need the watchdog for this, see the implementation here: https://github.com/taskforcesh/bullmq/blob/development-4.0/src/classes/job.ts#L291-L355
@manast I suppose in bull 4.0 Scripts.isFinished throws if job is already removed. But consider these 2 cases:
Removal of delayed job:
Can be solved by listening on event "removed".
This could probably happen only if bull uses different redis connection for Scripts.isFinished and for receiving events
Can be solved by registering listeners before call to Scripts.isFinished assuming strong consistency on redis side.
Also there is still watchdog which although does not look so awful can be eliminated completely. When is queue closing maybe it also could fire some local event on which this function can listen coupled by one-time check of this.queue.closing instead of checking this.queue.closing in intervals.
@manast thanks for your reply, sorry i'm late to reply this time :)
I'm not versed enough in the project (nor redis and watchdog) but if there is anything I can do to help and you can point me in the right direction I can try to help. We have a work around in place as a fix for now.
Any update on this, I am getting this problem a lot. My process get hung with await job.finished() when it is removed on completion.
I increased the prio
I'm running into this issue as well.
I tried to workaround it and remove the job after the job.finihsed() is fired.
it seems like I see another issue now (maybe we should open a separate issue about it as well), the jobs are removed from the Bull Queues successfully, but sometimes (happens indeterminately) some of the jobs are not getting deleted from Redis causing a memory leak there.
Is someone actively working on that?
Most helpful comment
@manast wouldn't you consider this a bug?