Node-slack-sdk: error: message received with unknown reply_to after every message

Created on 14 Jul 2016  路  21Comments  路  Source: slackapi/node-slack-sdk

Hello,

I noticed that I'm seeing this error every time I send a message. It's happening because, after sending a message, my app sees two responses. Upon sending a message, e.g.:

{ text: 'Foo',
  channel: 'G1Q9TF94J',
  type: 'message',
  id: 1 }

It first sees the message it just posted (the "echo"):

{ reply_to: 1,
  type: 'message',
  channel: 'G1Q9TF94J',
  user: 'U1Q2DRD6D',
  text: 'Foo',
  ts: '1468520040.000085' }

It later receives an ack for the same message:

{ ok: true,
  reply_to: 1,
  ts: '1468520064.000086',
  text: 'Foo' }

The first incoming message (the echo) causes the message state to be cleared inside RTMClient.prototype._handleMsgResponse; when the second message arrives (the ack), the state has been cleared so I get the error message received with unknown reply_to.

To put this in more concrete terms, after sending a message, I'm getting _both_ a messageAck and a mostRecentMsgReply (echo). The only way I can think of solving this is to store a separate list of valid outgoing replyTo IDs and, when an ack or echo comes back for one of these, for which the handler has already been called, we just drop it silently.

bug

All 21 comments

I'm going to go out on a limb here and say: The code's behavior as-is appears correct. What concerns me is that you are getting these message echos. That should basically not happen. Of course, when it does, yes it will get deleted and the ack will fail for sure. But, in truth that error message is harmless鈥攁nnoying at worst.

What I'd like to understand is why you are seeing these echos. That's most definitely an issue, albeit with our Message Servers. Do you have some server logs you can share with us?

Unfortunately I no longer have access to the slack instance in question. Will revert if I have a chance to test this on another instance.

Yeah, I'm getting this.

I'm using the code from the README.md, with my modification from https://github.com/slackapi/node-slack-sdk/issues/300#issuecomment-270806014 to get the actual channel id of #general.

Here's what I get:

Logged in as gandelf of team Magikcraft Servers, but not yet connected to a channel
Member of general at C3NF1QWLE
Connected!
null
Unhandled rejection Error
    at RTMClient.handlePendingMessage (/shire/node_modules/@slack/client/lib/clients/rtm/client.js:570:25)
    at wrapper (/shire/node_modules/lodash/lodash.js:4968:19)
    at arrayEach (/shire/node_modules/lodash/lodash.js:537:11)
    at forEach (/shire/node_modules/lodash/lodash.js:9359:14)
    at RTMClient._handleMostRecentMsgReply (/shire/node_modules/@slack/client/lib/clients/rtm/client.js:563:3)
    at RTMClient._handleWsMessageViaEventHandler (/shire/node_modules/@slack/client/lib/clients/rtm/client.js:461:12)
    at RTMClient.handleWsMessage (/shire/node_modules/@slack/client/lib/clients/rtm/client.js:419:10)
    at WebSocket.wrapper (/shire/node_modules/lodash/lodash.js:4968:19)
    at emitTwo (events.js:106:13)
    at WebSocket.emit (events.js:191:7)
    at Receiver.ontext (/shire/node_modules/ws/lib/WebSocket.js:841:10)
    at /shire/node_modules/ws/lib/Receiver.js:536:18
    at Receiver.applyExtensions (/shire/node_modules/ws/lib/Receiver.js:371:5)
    at /shire/node_modules/ws/lib/Receiver.js:508:14
    at Receiver.flush (/shire/node_modules/ws/lib/Receiver.js:347:3)
    at Receiver.finish (/shire/node_modules/ws/lib/Receiver.js:541:12)
error: message received with unknown reply_to: {"ok":true,"reply_to":1,"ts":"1483703111.000007","text":"Hello!"}
error: message received with unknown reply_to: {"ok":false,"reply_to":2,"error":{"msg":"message text is missing","code":2}}

And if I send no message, here is what I get:

Logged in as gandelf of team Magikcraft Servers, but not yet connected to a channel
Member of general at C3NF1QWLE
Connected!
info: message received on reconnect with no registered callback:
{"reply_to":1,"type":"message","channel":"C3NF1QWLE","user":"U3NEE6YKH","text":"Hello!","ts":"1483703111.000007"}

I just tried the code from README.md and I'm seeing the same Unhandled rejection Error mentioned here. The result of console.log inside the event AUTHENTICATED is the one expected but the command rtm.sendMessage('Hello!', channel); is throwing the error mentioned.

As mentioned in https://github.com/slackapi/node-slack-sdk/issues/300#issuecomment-266804306 you can only use channel/group/DM id's. You cannot use channel names, user names, or user id's.

@adamk33n3r I'm using the channel id, but its still not working.

I am also having this exact problem. Any fix?

@aoberoi ? I'm going to bow out of this conversation :(

so.... any updates? I keep getting the same error.

I am using this example from the Readme.md:

var RtmClient = require('@slack/client').RtmClient;
var CLIENT_EVENTS = require('@slack/client').CLIENT_EVENTS;

var bot_token = process.env.SLACK_BOT_TOKEN || '';

var rtm = new RtmClient(bot_token);

let channel;

// The client will emit an RTM.AUTHENTICATED event on successful connection, with the `rtm.start` payload
rtm.on(CLIENT_EVENTS.RTM.AUTHENTICATED, (rtmStartData) => {
  for (const c of rtmStartData.channels) {
      if (c.is_member && c.name ==='general') { channel = c.id }
  }
  console.log(`Logged in as ${rtmStartData.self.name} of team ${rtmStartData.team.name}, but not yet connected to a channel`);
});

// you need to wait for the client to fully connect before you can send messages
rtm.on(CLIENT_EVENTS.RTM.RTM_CONNECTION_OPENED, function () {
  rtm.sendMessage("Hello!", channel);
});

rtm.start();

My bot works fine.
However, I get this error in my terminal:

error: message received with unknown reply_to: {"ok":true,"reply_to":1,"ts":"1499931980.242327","text":"Hello!"}

How can we fix this? Even though its not affecting my bot, it is a bit worrying to be seeing this error.

Still getting this, not really a problem, it's just annoying.

thanks for the reports everyone. i haven't had a chance to try reproducing this one yet. it doesn't seem urgent but we will eventually try to narrow this one down.

Getting it also.

info: message received on reconnect with no registered callback:
{"reply_to":1,"type":"message","channel":"C6MMF4ZRB","user":"U6T6C7C31","text":"Hello Josh","ts":"1503432803.000112"}

Has anyone figured this out? It's causing problems with my sockets by sending duplicate messages.

i've been able to reproduce it, so it is a confirmed bug. i'll leave more notes when i've understood more about what's happening.

background:

at a high level there are two ways a client might reconnect to the RTM API:
1) when the previous connection ended and the process terminated.
2) when the network connection was interrupted, so within the same process a retry is initiated and succeeds.

the client generates an id for each message it sends to the server. when the server "commits" it or when it wants to inform the client of an error, it will send the client a reply using that same id. this RtmClient implementation uses the ids in those replies to trigger the callback or Promise you might attach when sending a message.

in this implementation those ids are just a monotonically increasing integer, starting from 1 for the first message. the client stores a map of ids : handlers (where handlers are either a callback function or functions to complete the Promise). every sent message gets a handler, so if you don't specify one, the implementation will just use a no-op callback.

now some specifics regarding this issue:

when a client connects to the RTM API, in order to help the it recover from network interruptions (case 2), the server will send the last reply over again (let's call this the resume ACK). this helps the client understand which of its sent messages weren't "committed". this implementation does so by succeeding the handler with the id received in the resume ACK and failing all handlers whose id is greater than that.

but what happens when the connection is fresh from startup (case 1)? the server still sends the resume ACK relative to the ids from the previous run. this client starts its ids over from 1. so there is a collision. the message the server refers to in the resume ACK is most definitely not the message with that id now, and the handlers it succeeds or fails as a consequence are wrong. this is a bug.

the symptom you've all experienced is that at some point the resume ACK consumed the handler that is supposed to be triggered, so at the point where slack is acknowledging the message you just sent, that handler is no longer found and an error is logged.

so how do we fix this?

one idea is to stop using monotonically increasing integers for IDs. if each message is identified by a UUID then the probability of a collision is almost 0. but that complicates failing each of the handlers that were not "committed" by the server since we cannot directly compare ids.

another idea is to store some state related to reconnection so that we can determine if a specific resume ACK should be operated on as in case (1) or case (2). the downside is: more state and state is hard. the reconnection logic is already kinda convoluted and i'm not 100% confident we could manage that state, and our testing ability for the RtmClient isn't super.

does anyone else have any better ideas?

cc @CharlieHess

HO'KAY i think i got a better solution. shower thoughts ftw.

on startup we can store a unique id for each RtmClient instance. then when we create the id for each message, we encode both the unique instance id AND a monotonically increasing integer (e.g. {instanceId}:1. now after an initial connect (case 1) the resume ACK message's id won't match because there's a different instance ID. and in the interrupted condition (case 2) we can still compare the ID's directly. :tada:

Ooh, that sounds good!

plans are foiled. id has to be an integer 馃槩 .

the only viable plan i can think of is the one i described above where we keep some state on initial connect and check that.

Not really get what to do now? I can reproduce this too.

@chenfanggm at this point i'm open to accepting PRs that can resolve the issue and/or implement tests that verify the correct behavior. we haven't begun to work on a fix yet, and i personally consider this a medium priority issue because its an annoyance but hasn't caused complete failure in the wild.

Was this page helpful?
0 / 5 - 0 ratings