Moleculer: amqp disableBalancer = true => Reconnect not working

Created on 29 Mar 2020  路  10Comments  路  Source: moleculerjs/moleculer

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • [ x] I am running the latest version
  • [ x] I checked the documentation and found no answer
  • [ x] I checked to make sure that this issue has not already been filed
  • [x ] I'm reporting the issue to the correct repository

Current Behavior

I use disableBalancer = true on service and client and rabbitmq as transport.
If the rabbitmq server restarts the client don't reach the service anymore.

If i set disableBalancer = false on both servcie and client. The reconnect works.

Expected Behavior

With disableBalancer=true, after the reconnect of the amqp-server the cllient should reach the service again.

Steps to Reproduce

I created a repo: moleculer-disablebalancer-test

  1. Given is a running rabbitmq-server
  2. start service => node source/server.js
  3. start client => node source/client.js
    Every three seconds the client calls the service. You should see "call" from client and the response as "pong"
  4. restart your rabbitmq-server
  5. after the reconnect you will only see "call" but no responses anymore

if you retry it with disableBalancer = false on both, all works like expected

Context

Please provide any relevant information about your setup. This is important in case the issue is not reproducible except for under certain conditions.

  • Moleculer version: 0.14.5
  • NodeJS version: 13.10
  • Operating System: manjaro
Need reproduce

Most helpful comment

Thanks for the writeup @me23, I'll look into it.

All 10 comments

Thanks for the writeup @me23, I'll look into it.

After some investigation I think I've figured it out.

For AMQP 0-9-1, transit.afterConnect will always fire when a connection is established and "wasReconnect" will always be false. This means that makeSubscriptions will be called every time. This part is all working well 馃憤

The issue is that transit.makeSubscriptions calls tx.makeSubscriptions, but not tx.makeBalancedSubscriptions. The result is that, on a reconnect, the action calls are correctly enqueued to the REQB queue, but REQB queues have no consumers.

Locally, the issue is fixed if in transit.js I add this code before the existing .then block here https://github.com/moleculerjs/moleculer/blob/master/src/transit.js#L259

.then(() => {
  if (this.broker.options.disableBalancer) {
    return this.tx.makeBalancedSubscriptions();
  }
})

@icebob If we make the above change, do we still need the following line? https://github.com/moleculerjs/moleculer/blob/master/src/transit.js#L1062

@Nathan-Schwartz it's a good question. I don't know. The point is, that we should call the makeBalancedSubscriptions when all services available (started) and not when the transporter connected.

If you read this about broker lifecycle you will see, that the Transit send INFO packet two times. First send it without the local services (because they are not started yet, we don't accept incoming requests) and after all local services started it will send an INFO again wit all loaded & started services. So I think the good place is in sendNodeInfo method.

By the way, the sendNodeInfo will be called again after reconnecting, so it should call the makeBalancedSubscriptions method.

Thanks @icebob, that's helpful!

It sound like we want to call makeSubscriptions but not makeBalancedSubscriptions after the transporter connects. Is this because a balanced transporter might already have messages in a queue, so we can't prevent incoming requests by sending the empty info packet?

I confirmed that sendNodeInfo is called on reconnect, but makeBalancedSubscriptions is not called because no INFO broadcast is sent (only DISCOVER and targeted INFO).

Right now I have two ideas:
1) Send a MOL.INFO broadcast when transporter reconnects. This works but it might be a protocol change and I don't know all the edge cases.

2) This is messier, but we could add a workaround to fix this issue by putting the following snippet in afterConnect:

.then(() => {
  if (!wasReconnect)
    return Promise.all([
      this.makeSubscriptions(),
      (this.broker.started && this.broker.options.disableBalancer) ? this.makeBalancedSubscriptions() : Promise.resolve(),
    ])
})

In sendNodeInfo, makeBalancedSubscriptions would still be called for the initial connection, but not for reconnects.
In afterConnect, makeBalancedSubscriptions would run for reconnects but not for the initial connection because broker.started isn't true yet.

I'll keep thinking about it and I'll let you know if I have more ideas.

What do you think @icebob?

@icebob Btw I noticed that for AMQP there are extra HEARTBEAT packets each time the transporter reconnects. I suspect we'd need to move reconnection logic from transit -> transporter to fix it. I might try it later

Update:

I found a different solution that implements amqp-connection-manager and doesn't require any changes to transit. I'm working on a PR

I was able to repro the heartbeat issue for MQTT too, so I don't think it is an AMQP issue. To see the issue, I logged sent packet types and started one node. At first it is 1 HEARTBEAT packet per 5 seconds, but every MQTT/AMQP broker restart adds one extra.

Ohh, this heartbeat issue is very interesting. How I didn't find it, yet. Did you find any reason why?

Regarding reconnection issue, I think Transit will send broadcast INFO after reconnecting because it does same things like at first connection.
I've checked the code. It is another missing logic. After reconnecting Transit sends only a DISCOVER and not a full INFO. But this logic can cause problem if other new nodes are connected between disconnecting & reconnecting. They won't receive INFO from the reconnected node and it's a problem. I will try to reproduce this use-case.

Heartbeat issue: The problem is here.
https://github.com/moleculerjs/moleculer/blob/aacf6ab70c8755249d05de9ee41b5415481ce595/src/registry/node-catalog.js#L45
After reconnecting this event is fired, so it creates timer again after reconnecting.
We should call the stopHeartbeatTimers at first.

@me23 recent commits should have fixed the issue. Would you mind testing out v0.14.6 to confirm that this issue is resolved and can be closed?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

SushKenyNeosoft picture SushKenyNeosoft  路  3Comments

kesslerdev picture kesslerdev  路  4Comments

nurdism picture nurdism  路  3Comments

icebob picture icebob  路  3Comments

dylanwulf picture dylanwulf  路  4Comments