Botkit: Long running issue.

Created on 25 Jan 2016 · 13Comments · Source: howdyai/botkit

If I run botkit for 2-3hrs it stops working and my bot logs out of slack. Any idea?

Source

vijayrawatsan

Most helpful comment

For others having this issue, note that you can also register a handler for the rtm_close event. So, for instance, if enabling retry isn't what you want to do, you might do something like:

controller.on('rtm_close', function() {
  //Do something - eg log it, or maybe process.exit();
});

In my case, I have a process manager that will restart the process if it detects it died, but your use case my differ. Just wanted to highlight that you have some options for closer control here.

anyonecancode on 26 May 2016

👍5

All 13 comments

Hi there, experienced same issue here this morning. First time form me, node was working all seemed fine but my bot was disconnected from the only Slack account it was connected to. Dunno when it disconnected though.

guillaumepotier on 25 Jan 2016

Same here. Bot disconnects from slack, but the process is still running. Does someone have any leads on what the problem might be?

kosio on 18 Feb 2016

I ran into the same/similar issue although my bot was timing out due to issues with the proxy that it is behind. I found a solution and can submit a pull request for it shortly. If you look at the Slack documentation on the real time messaging, they suggest sending a ping "every few seconds".

Here is the relevant portion of the docs...

Ping and Pong

Clients should try to quickly detect disconnections, even in idle periods, so that users can easily tell the difference between being disconnected and everyone being quiet. Not all web browsers support the WebSocket ping spec, so the RTM protocol also supports ping/pong messages. When there is no other activity clients should send a ping every few seconds. To send a ping, send the following JSON:
{
    "id": 1234, // ID, see "sending messages" above
    "type": "ping",
    …
}

petemichel77 on 18 Feb 2016

No idea what could cause that, but I've implemented a rudimentary ping solution do detect a disconnection and reboot the bot then:

  // pings and pongs that tests if bot still connected to Slack
  keepAlive: function (team, bot) {
    var that = this;

    if (!this._pongs[team.token]) {
      this._pongs[team.token] = [];

      (function (team, bot) {
        bot.rtm.on('pong', function (resp) {
          that._pongs[team.token].pop();
        });
      })(team, bot);
    }

    setTimeout(function () {
      if (that._pongs[team.token].length >= (that.config.pings_unack_tresshold || 3)) {
        that.controller.logger.error('ping_not_responding', team);

        return that.restart(team);
      }

      bot.rtm.ping();
      that._pongs[team.token].push(true);

      that.keepAlive(team, bot);
    }, this.config.pings_interval_ms || (30 * 1000));
  },

Hope that helps, dunno if I could submit a PR to integrate that natively into botkit. WDYT?

Best

guillaumepotier on 18 Feb 2016

👍2

Ah, posted at the same moment @petemichel77 did not see your answer before mine.

I've tried the "Slack" way, but no way for me to get a pong answer :( I then chose to use the websocket ping/pong native implementation in my case above.

guillaumepotier on 18 Feb 2016

@guillaumepotier Mine is pretty basic so far and it is probably better to use the native ws implementation.

I simply was using the bot.rtm.send() and sending a message of type _ping_. The pong comes back in the on message handler.

I'll test yours out with my proxy and see how it works. Thanks for this

petemichel77 on 18 Feb 2016

@guillaumepotier using the native ws ping method seems to work for my case of being behind a proxy. I didn't use your code verbatim, but the ping method works pretty well.

petemichel77 on 19 Feb 2016

I had this issue and thought that updating to the latest version would correct it, but I still found that it was happening. After some debugging I realized that it was disconnecting silently after a pong response took too long, and that it was not attempting to reconnect because connection retries are disabled by default.

I suggest adding some logging to make it clear when and why a disconnection has happened, and that perhaps connection retries should be enabled by default (although a log line could go a long way towards realizing this is something that needs to be turned on).

JonathanGuberman on 20 May 2016

👍5

Yes, it seems like retries should be on by default. We'll consider this for an upcoming release!

benbrown on 23 May 2016

👍2

For others having this issue, note that you can also register a handler for the rtm_close event. So, for instance, if enabling retry isn't what you want to do, you might do something like:

controller.on('rtm_close', function() {
  //Do something - eg log it, or maybe process.exit();
});

In my case, I have a process manager that will restart the process if it detects it died, but your use case my differ. Just wanted to highlight that you have some options for closer control here.

anyonecancode on 26 May 2016

👍5

@anyonecancode can you provide more detail when you say "I have a process manager that will restart the process if it detects it died"?

I'm using PM2 in the most basic way to keep our Node app up on Heroku. Are you saying I could also use PM2 to re-connect a dropped RTM session? Ideally if you have some example code, that would be really helpful.

sundeepgupta on 18 Nov 2016

Sure -- in my use case, I actually am using Spotify Helios to deploy my script inside of a docker container.

Helios will restart your container if it dies, and docker containers die if the main process inside them stops. So the flow ends up looking like this:

Detect rtm_close, call process.exit() - > Causes docker container to stop -> Causes helios to restart container (and hence my node script).

I haven't used PM2 so can't speak to that, but generically I'm forcing my app to stop running if it notes a bad state (eg lack of connection), and relying on my deployment tool to restart my app when it detects it stops running.

anyonecancode on 18 Nov 2016

Thanks! I see now what you mean.

sundeepgupta on 19 Nov 2016

Was this page helpful?

0 / 5 - 0 ratings