Bull: Error: Unable to renew nonexisting lock

Created on 29 Nov 2016  路  19Comments  路  Source: OptimalBits/bull

Hi,

It is the 2nd time I'm having this issue in production after upgrading to the latest version "bull": "^2.0.0". I'm still using node v4.6.0.

Error: Unable to renew nonexisting lock
1 at Object.scripts.takeLock (/home/ubuntu/project/node_modules/bull/lib/scripts.js line 314 col 13)
         throw new Error('Unable to renew nonexisting lock');
2 at Job.takeLock (/home/ubuntu/project/node_modules/bull/lib/job.js line 127 col 18)
         return scripts.takeLock(this.queue, this, renew, ensureActive)
3 at lockRenewer (/home/ubuntu/project/node_modules/bull/lib/queue.js line 644 col 16)
         return job.takeLock(renew, true).then(function(lock){
4 at null.<anonymous> (/home/ubuntu/project/node_modules/bull/lib/timer-manager.js line 80 col 5)
         fn();
5 at ontimeout (timers.js line 198 col 34)
6 at Timer.listOnTimeout (timers.js line 92 col 15)

Most helpful comment

+1 (bump)
we're receiving the same error as well.
when will a new version of the module be published to npm?

All 19 comments

Having the same issue here. Haven't had any luck trying to figure it out.

Currently using node v7.2.0

I also am seeing this error fairly often, not sure if it's related.

Error renewing lock LockError: Exceeded 0 attempts to lock the resource "bull:aob-order-synchronizer:3555:lock".

Having a similar issue as well!

Error renewing lock LockError: Cannot extend lock on resource "bull:user[415]-prospect-queue:21:lock" because the lock has already expired.

ping @doublerebel

@bradvogel on it.

@igorissen @cklmercer @emhagman Can you give me some more details about your setup? That will help me pinpoint the best solution.

  • Is Bull being used with NodeJS cluster?
  • How many concurrent workers are processing jobs?
  • Does the error crash Bull, or do you just see it in the logs?
  • Do any of these errors coincide with failed jobs?

Explanation:

In 2.0.0 we catch more errors where they were being dropped before, which is an improvement. However, in certain situations Bull just logs the error to console (related: #58, #196), so now these caught errors show in logs but _should_ be harmless.

I most often see @cklmercer 's error:

Error renewing lock LockError: Exceeded 0 attempts to lock the resource "bull:aob-order-synchronizer:3555:lock".

This happens when multiple workers attempt to process the same job and one of them doesn't win the lock [EDIT: Travis example from test suite]. This can be safely ignored. Let me know any details you can, and we'll take it from there.

@doublerebel Thank you for your reply. It's reassuring to know the cause behind that particular error message.

Regarding the primary error, here's some of those details you asked for...

I have tried my particular code both with and without the cluster module, both throw the same error.

The error appears at seemingly random times. Sometimes it's happens on the 1st job, sometimes it's 50 jobs in, and sometimes its 40k+ jobs in. (Same data, same job.)

Here's the particular bit of code that puts the jobs into the queue.

// This is a MySQL database stream. I've tried streaming result sets of various sizes.
const stream = db.query(query)

stream.on('result', item => queue.add(item, {
  delay: 100, // From 0 - 500
  attempts: 3, // From 0 - 5
  removeOnComplete: true // false
 }))

Here's the code that processes the queued jobs.

queue.process((job, done) => {
  let order = job.data

  // use promise to perform upsert into a different database
  app.service('order')
     .find(order.id)
     .then(found => updateOrCreateOrder(order, found))
     .catch(reason => console.error(reason))

  // nothing to return

  done()
})

Some times I'll have some schema validation error bubble up from the database, but those are logged to the console and operation doesn't miss a beat.

Whenever I receive the error from the OP node _seemingly_ crashes completely.

@doublerebel

  • Is Bull being used with NodeJS cluster?

Using PM2 to manage all processes in fork mode -x.

  • How many concurrent workers are processing jobs?

Using only 1 concurrent worker for all queues.

  • Does the error crash Bull, or do you just see it in the logs?

Yes, Bull is crashing I have 4 errors in Rollbar and 4 restarts for one queue process.

  • Do any of these errors coincide with failed jobs?

I don't think because I'm using Rollbar to manage all errors even uncaugthException. I'm tracking the error when removing a completed job.

queue.on("completed", function (job) {
  Logger.info("Job #" + job.jobId + " completed");
  job.remove()
    .then(function () {
      Logger.info("Job #" + job.jobId + " removed");
    })
    .error(function (err) {
      Logger.error(err);
      ErrorTracker.handleErrorWithPayloadData(err, { level: "error", custom: job.data });
    });
});
queue.on("failed", function (job, err) {
  Logger.info("Job #" + job.jobId + " failed");
  Logger.error(err);
});
  • Is Bull being used with NodeJS cluster?
    Using cluster module with two workers (cluster) on my local machine (inside Docker w/ Node 6.6.0) and one separate process to handle the queues.

  • How many concurrent workers are processing jobs?
    Using 1 concurrent worker for all queues.

  • Does the error crash Bull, or do you just see it in the logs?
    No Bull is not crashing.

  • Do any of these errors coincide with failed jobs?
    Yes, every time this happens it appears that the job fails and then tries to run again.

Thanks all for the feedback, that's a great help. I'll take a look at this right away.

@doublerebel Thank you!

@emhagman in your case:

LockError: Cannot extend lock ... because the lock has already expired.

it may be a normal job failure. The error is caught, the job is failed and retries. It sounds like something may be blocking the event loop (high CPU use?) or slowing redis communication (high redis usage?), because either one of those situations could cause the lock to expire before redis recieves and processes a renew request. Does that sound plausible? If not I can look into it further.

@igorissen @cklmercer in your cases:

Error: Unable to renew nonexisting lock
...
    at lockRenewer (/home/ubuntu/project/node_modules/bull/lib/queue.js line 644 

I think I know what is happening. There is some processing between where Bull releases the lock, and where the timer is canceled, meaning it is possible the lock reaches renewal after being released but before the async chain reaches the renew cancellation.

This should be harmless, except that it really shouldn't happen especially if we don't catch the error and it's crashing the process. I'm working on a test case here so that I can make sure I'm reproducing your error and it's prevented from happening again.

@doublerebel
Regarding error when lock has already expired. Could we just get the lock anyways? I mean, the only problem is if some other worker did "steal" the lock from the first worker that is processing the job. So if the lock expired it could just log a warning ("Lock expired, process too slow?"), and just take the lock again and continue working.

Regarding the second error, cant we just change the order: first cancel the timer and then release the lock?

I am having the same problem: Error: unable to renew nonexisting lock
It was actually happening quite often, as a temporary solution I did this:

/lib/queue.js 643
var lockRenewer = function(){ try { return job.takeLock(renew, true).then(function(lock){ if(lock){ renew = true; lockRenewId = _this.timers.set('lockRenewer', _this.LOCK_RENEW_TIME / 2, lockRenewer); } // TODO: if we failed to re-acquire the lock while trying to renew, should we let the job // handler know and cancel the timer? return lock; }, function(err){ console.error('Error renewing lock ' + err); }); } catch(err) { console.error('Caught renew lock problem ' + err); } };

Seems to be working :)

@manast I think you're correct about @emhagman 's case. Perhaps we should raise a warning event or provide an option for the user to add their own logging, as in #58? I'd like to separate that case from the original nonexisting lock error here.

In the original case, if @hiddestokvis and @floolean are satisfied with just catching the error, then (you're right) it should be an easy fix to clear the timer right before releasing the lock. I've been able to reproduce the case here:
https://travis-ci.org/nextorigin/bull/builds/180869746

I'll write up the fix now and make the PR.

+1 Receiving this error -- Is there a new npm version with this fix?

Error renewing lock LockError: Exceeded 0 attempts to lock the resource "bull:integrations-dev:93:lock".
Error: Unable to renew nonexisting lock at Object.scripts.takeLock (node_modules/bull/lib/scripts.js:314:13) at Job.takeLock 
(node_modules/bull/lib/job.js:127:18) at lockRenewer 
(node_modules/bull/lib/queue.js:644:16) at Timeout.<anonymous> 
(node_modules/bull/lib/timer-manager.js:80:5) at Timeout.ontimeout (timers.js:336:34)
    at tryOnTimeout (timers.js:224:11)
    at Timer.listOnTimeout (timers.js:198:5) '*** uncaughtException'

+1 (bump)
we're receiving the same error as well.
when will a new version of the module be published to npm?

+1

@Floby Are you still getting this issue with version 2.2.6?

please and thank you :]

This is 2.2.6
I'm getting the message Cannot extend lock on resource "bull:..." because the lock has already expired.
Is it possible that this sort of problem could flag the job as failing ?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

sibelius picture sibelius  路  3Comments

pigaov10 picture pigaov10  路  3Comments

PhillippOhlandt picture PhillippOhlandt  路  4Comments

DevBrent picture DevBrent  路  4Comments

davedbase picture davedbase  路  3Comments