Bull: Reconnection problem

Created on 16 Mar 2018  ·  21Comments  ·  Source: OptimalBits/bull

Description

I create a queue and add tasks to queue every 10 seconds. Use bluebird library to set a delay in a task processor.

The point is - I'm trying to check behavior when redis connection is lost. When I switch off the redis server in prcessors work phase, everything is ok, we finish the task, reconnect and continue task processing.

If I try to switch off the redis server, when there is no tasks in queue, task processing is stop and it doesn't reconnect to the server. The connection is in "end" status.

I need to get any way to establish a connection after redis server was restarted. It's vital for our project.

Logs when I restart redis server on waiting period of the time (non-task processing time):

➜  queue git:(master) ✗ node test.js
start task
resolve task!
start task
resolve task!
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6379
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6379
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6379
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6379
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6379
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6379
Error in bull queue happend: ReplyError: ERR only (P)SUBSCRIBE / (P)UNSUBSCRIBE / PING / QUIT allowed in this context

Logs when I restart redis server while a task is executing (processing time):

➜  queue git:(master) ✗ node test.js
start task
Error in bull queue happend: Error: read ECONNRESET
Error in bull queue happend: Error: read ECONNRESET
Error in bull queue happend: Error: read ECONNRESET
Error in bull queue happend: Error: read ECONNRESET
Error in bull queue happend: Error: read ECONNRESET
Error in bull queue happend: Error: read ECONNRESET
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6379
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6379
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6379
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6379
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6379
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6379
Error in bull queue happend: ReplyError: ERR only (P)SUBSCRIBE / (P)UNSUBSCRIBE / PING / QUIT allowed in this context
resolve task!
Error in bull queue happend: Error: Missing lock for job 1 failed
start task
resolve task!
start task
resolve task!

Look at the end of the copy-paste - new tasks are processing. Everything looks fine unlike the previous example.

Test code to reproduce

const Promise = require('bluebird');
const Queue = require('bull');

const queue = new Queue('test', {
  redis: {
    host: 'localhost',
    port: 6379
  }
});

queue
  .on('error', function (error) {
    console.error(`Error in bull queue happend: ${error}`);
  })
  .on('failed', function (job, error) {
    console.error(`Task was failed with reason: ${error}`);
  });

queue.process('test_task', () => {
  console.log('start task');
  return Promise.delay(5000).then(() => {
    console.log('resolve task!');
  });
});

setInterval(function () {
  queue.add('test_task', {});
}, 10000);

Bull version

bull - 3.3.10, node - v6.10.2

bug

Most helpful comment

Hi,

We had a problem of re-connection on our queues, whenever the redis was stopped the "listening" queues could not restart and there was this message each time:

ERR only (P)SUBSCRIBE / (P)UNSUBSCRIBE / PING / QUIT allowed in this context

We solved it.

Problem is that when reconnecting ioredis will check if server is ready. This check (https://github.com/luin/ioredis/blob/6dd3730617ece16f874e3ce3e5da0e113ac272d3/lib/redis.js#L413) use the info command which is not allowed in the queue context.

So we just had to disable the enableReadyCheck option on our configuration to stop this check. After that our config look like :

Queue(config.queueName, {
  redis: {
    port: xxx,
    host: xxx,
    maxRetriesPerRequest: null,
    enableReadyCheck: false
  },
});

All the queues are now reconnecting has they should do and processing remaining jobs. :+1:

We hope this option solves some of the connection problems in this thread (maybe not ...), @ks-s-a you might test this ;-)

All 21 comments

I have not tested further, but the signature to your process function is wrong, you can only have one argument, job, and done, but if you specify the done callback you need to use it!.

queue.process('test_task', function (job) {
  console.log('start task');
  return Promise.delay(5000).then(() => {
    console.log('resolve task!');
  });
});

actually, after fixing the signature I could test that the queue works well even with redis connection interruptions. please verify and close if it works for you.

@manast Ok. Throw out my small wrappers and use only vanilla library. The result is the same.

Added log outputs and cleaned code snippet. (see topic starter message)

please try version 3.3.10

@manast the same story. =(

Fixed version in description.

@manast Is it reproducible on your machine?

yes I did, seems like a bug in ioredis: seems like it handles incorrectly blocking commands in the event of a disconnection. I did not have time to report it as an issue and/or find a workaround.

I did some tests where using this redis option did help: {enableOfflineQueue: false} could be a temporary workaround until ioredis gets fixed.

@manast It didn't help me. Absolutely the same result of the test.

@manast Could you add a link to the ioredis issue? It's important for me to track it. May be if I have some time, I'll try to help them or will investigate the problem.

Is there any new info about this. The ticket raised with the ioredis maintainer has not been updated. Are there any known workarounds for this issue?

@lincoln-spiteri I tried the workaround suggested in ioredis#610 - disabling the built-in retryStrategy and reconnecting manually. It's not great, resending seems broken for example, but it stays connected. I'm also using it with another library, and this library is throwing a few exceptions every time I reconnect...

Also hoping for a better solution soon.

Sorry for the late response. I tested locally using the code provided by @ks-s-a but the result looks correct for me:

bull-test ❯ node index.js                                                                                                                  ⏎
start task
resolve task!
start task
resolve task!
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6381
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6381
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6381
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6381
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6381
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6381
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6381
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6381
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6381
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6381
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6381
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6381
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6381
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6381
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6381
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6381
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6381
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6381
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6381
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6381
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6381
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6381
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6381
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6381
start task
resolve task!
start task

I restarted the redis server multiple times at different stages (non-task processing time & processing time) but the result stayed the same. Did I miss something?

Tested with bull 3.3.10 & 3.4.1.

Unhandled rejection Error: connect ECONNREFUSED 127.0.0.1:6379 at Object.exports._errnoException (util.js:1018:11) at exports._exceptionWithHostPort (util.js:1041:20) at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1086:14)

Hi,

We had a problem of re-connection on our queues, whenever the redis was stopped the "listening" queues could not restart and there was this message each time:

ERR only (P)SUBSCRIBE / (P)UNSUBSCRIBE / PING / QUIT allowed in this context

We solved it.

Problem is that when reconnecting ioredis will check if server is ready. This check (https://github.com/luin/ioredis/blob/6dd3730617ece16f874e3ce3e5da0e113ac272d3/lib/redis.js#L413) use the info command which is not allowed in the queue context.

So we just had to disable the enableReadyCheck option on our configuration to stop this check. After that our config look like :

Queue(config.queueName, {
  redis: {
    port: xxx,
    host: xxx,
    maxRetriesPerRequest: null,
    enableReadyCheck: false
  },
});

All the queues are now reconnecting has they should do and processing remaining jobs. :+1:

We hope this option solves some of the connection problems in this thread (maybe not ...), @ks-s-a you might test this ;-)

@adogor Thank you for pointing this out. I'm wondering how to reproduce the error since it works well on my machine with the code provided by @ks-s-a . The error is strange since ioredis should handle the case that ready check is enabled while a subscribe command is issued.

@luin tested same code provided by @ks-s-a , still have problem with reconnecting, same output as @ks-s-a provided. Any updates for this issue?

Windows 10
Node v10.16.0 / npm 6.9.0
bull version 3.10.0
redis: Docker (Redis server v=5.0.5 sha=00000000:0 malloc=jemalloc-5.1.0 bits=64 build=7983a619928f1f2d)
D:\Code\nodetest>node test.js 
start task
resolve task!
start task
resolve task!
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6379
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6379
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6379
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6379
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6379
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6379
Error in bull queue happend: ReplyError: ERR only (P)SUBSCRIBE / (P)UNSUBSCRIBE / PING / QUIT allowed in this context

no more information even waiting a long time.

If I change Promise.delay(5000) to Promise.delay(1), then new job will not be processed anymore. New job is added one every 10s but not be processed.
image
And enableReadyCheck: false option makes it resume from this situation.

@adogor Does not work for me. (3.10.0).
I get another error: connect ETIMEDOUT when my application tries to add new jobs. Before it was the connection refused error.

Are there any updates on this?

@psychonetic if you have a specific use-case where your problem is reproduced I recommend you to post that, because most issues in this thread are obsolete or user-code specific.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

inn0vative1 picture inn0vative1  ·  4Comments

sibelius picture sibelius  ·  3Comments

DevBrent picture DevBrent  ·  4Comments

thelinuxlich picture thelinuxlich  ·  3Comments

alolis picture alolis  ·  4Comments