Moleculer: 'Orphan response is received' when using NATS balancer and stream params

Created on 2 Apr 2020  路  4Comments  路  Source: moleculerjs/moleculer

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • [x] I am running the latest version
  • [x] I checked the documentation and found no answer
  • [x] I checked to make sure that this issue has not already been filed
  • [x] I'm reporting the issue to the correct repository

Current Behavior

Assuming the following conditions are true:

  • I am using NATS transporter
  • disableBalancer is set to true
  • I am calling an action on the same node and passing a Stream as params (problem does not occur if the action is on a different node)

If the called action consumes the stream and resolves after the stream's end event fires, then I am seeing the message 'Orphan response is received' and the caller never receives a response.
If the called action consumes the stream and resolves before the stream's end event fires, then my action fires twice even though I only called it once.

Expected Behavior

Response should be returned to the action caller as soon as my action resolves, and my action should not be called twice.

Failure Information

Steps to Reproduce

Please provide detailed steps for reproducing the issue.

  1. Install nats and run on port 4222
  2. Clone repo https://github.com/dylanwulf/moleculer-stream-sample
  3. Run npm install to install packages
  4. Run node index.js to run reproduction
  5. See message in console Orphan response is received. Maybe the request is timed out earlier.

Context

Please provide any relevant information about your setup. This is important in case the issue is not reproducible except for under certain conditions.

  • Moleculer version: 0.14.5
  • NATS version: 1.4.8
  • NodeJS version: 12.16.1
  • Operating System: Windows 10 x64
Transporter In Progress Bug

Most helpful comment

I've been working with @dylanwulf to figure out what is happening here. I believe its an issue with using a transporter (due to disableBalancer being true) but having the same node sending both the request and the response.

In digging through the two different ways that his examples process the stream, I think different timings cause the two different improper behaviors.

In the readStreamWithReadableEvent example, the stream gets consumed by the called action. It then responds. The caller gets the response and in transit.js will call this.removePendingRequest. This removes all requests streams from both the sender and receiver of the stream because they are the same node. The caller then sends the stream close message, but there is no longer a stream in the pendingReqStreams Map. It then fails to properly close the stream and treats it like a regular action call. Since the action expects to get a stream, the action throws an error.

In the readStreamWithDataEvents example, the stream gets consumed by the called action but does not respond as it is awaiting the stream end event. The sender passes the message to close the stream, and the receiver proceeds to close the stream and then calls this.removePendingRequest which again, removes the reference to this stream for both the sender and receiver. The stream end event gets emitted, the response is sent, but the sender no longer has a reference to the request so it treats it as an orphan.

This explains why the problem does not occur when the sender and receiver of the streams are local (no transit) and why it works if the sender and receiver are different nodes (different transit instances and different request maps). So this problem appears specific to using disableBalancer but calling actions that are on the same node as each other.

All 4 comments

I've been working with @dylanwulf to figure out what is happening here. I believe its an issue with using a transporter (due to disableBalancer being true) but having the same node sending both the request and the response.

In digging through the two different ways that his examples process the stream, I think different timings cause the two different improper behaviors.

In the readStreamWithReadableEvent example, the stream gets consumed by the called action. It then responds. The caller gets the response and in transit.js will call this.removePendingRequest. This removes all requests streams from both the sender and receiver of the stream because they are the same node. The caller then sends the stream close message, but there is no longer a stream in the pendingReqStreams Map. It then fails to properly close the stream and treats it like a regular action call. Since the action expects to get a stream, the action throws an error.

In the readStreamWithDataEvents example, the stream gets consumed by the called action but does not respond as it is awaiting the stream end event. The sender passes the message to close the stream, and the receiver proceeds to close the stream and then calls this.removePendingRequest which again, removes the reference to this stream for both the sender and receiver. The stream end event gets emitted, the response is sent, but the sender no longer has a reference to the request so it treats it as an orphan.

This explains why the problem does not occur when the sender and receiver of the streams are local (no transit) and why it works if the sender and receiver are different nodes (different transit instances and different request maps). So this problem appears specific to using disableBalancer but calling actions that are on the same node as each other.

I think the solve here would be to be more precise with which of the request and response maps are cleared under various scenarios. Currently, they are all cleared at the same time. But since the sending and receiving nodes have different points in time when the maps are necessary, clearing all simultaneously causes an issue when the sender and receiver are on the same node but transport is being used.

Yeah, the problem is what @shawnmcknight described, Transit uses the same Maps to store streams for both of send & receive sides. As disableBalancer is enabled, moreover the consumer & producer nodes are same, thus Transit overwrites the sent stream with the received stream.
Thanks @dylanwulf the reproduction code, It's very useful to reproduce it.
I'm starting to write the "failed" unit tests for this case and after that, I'm going to fix it.

Hm, I was wrong. The request store Maps are good, just there was a wrong cleanup code in the request stream handling method.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

darky picture darky  路  5Comments

rozhddmi picture rozhddmi  路  4Comments

ngraef picture ngraef  路  4Comments

molobala picture molobala  路  3Comments

developez picture developez  路  5Comments