I'm still looking into the root cause of this, but noticed that there appears to be a memory leak when an RTM client loses its connection and performs automatic reconnects.
Thanks for the heads-up! Let us know what your research turns up!
I just upgraded my client from 2.0.6 to 3.5.1, and I now notice the memory leak as well. This is very annoying, as it always ends up crashing against the Heroku limit. I'm trying to fix it, will report if I find something
For the record, adding autoReconnect: false to the RTM client does not change the problem
Downgrading back to 2.0.6 has solved the problem (@DEGoodmanWilson, if I were you I'd bump up the priority of that issue)
Well poo.
Is this issue still happening? I'm looking at node-slack-sdk to implement a bot with a very large number of connections, so any feedback is most welcome :smile: @mvaragnat are you still on 2.0.X? Thanks!
Also maybe worth renaming the issue from "Memory leak on RTM client automatic reconnects" to "Memory leak on RTM", if it is confirmed that the leak occurs even without automatic reconnect?
Hello @thbar I think the title is appropriate because most memory issues I have seen of late are linked to Slack API disconnects. When Slack forces many connections to close (several hundreds at once - like a server hiccup) my app would usually explode in memory usage and crash. However, it's kind of good thing, because it cleans the state and sets up a clean reconnect...
I am still on 2.0 but more because of "if it works, don't fix it". I think I traced the most aggravating issue to the New Relic monitoring module (go debug a memory leak when it's caused by the monitoring tool!)
Ping @DEGoodmanWilson by the way
it sounds like what we need is a good reproduction case. if anyone who has experienced this issue (@mvaragnat, @mpcowan) has an guidance on how i'd be able to set up a test case for this, would you mind adding some implementation details? i could try spinning up a Docker container with a tiny mem limit, set up a couple hundred connections to the RTM API in my node process, and then simulate a disconnect. Does that sound adequate?
I think that would be a good way. I suggest you test with 1000 connections disconnecting at the exact same moment (like when the RTM server freezes, pong gets too old, and all connections reset at the exact same time).
Try to compare 3.6 to 2.0.6 in terms of memory management, perhaps. I found that 2.0.6 was not giving nearly as many problems as 3.X
So I ran an experiment and I have some results to share:

I ran my experiment program locally to analyze the memory behavior. What happens is that 1000 RTM connections are made, and as they all finally connect, i disconnect the network. Then when they are all in an ATTEMPTING_RECONNECT state, and a couple of GC events have occurred, I reconnect the network. I did this cycle ~7 times to observe the above.
If there were a memory leak, I would expect to see the memory usage rise unboundedly. Instead, the memory usage seems stable, except for during the first occurrence of reconnection attempts. I believe this is the expected behavior, and since the memory usage does not seem unbounded in other retries, I don't think we have a memory leak.
You can take a look at my raw results here: https://docs.google.com/spreadsheets/d/1wCKtZtOyTMFgIwG0fVoJCVN0DeJFWrErZHQcbucYLxM/edit?usp=sharing.
Feel free to give the experiment a go yourself, or even better, take a look at it and see if I am missing anything. I'd love to see your results.
As far as next steps go:
cc @mvaragnat @thbar @DEGoodmanWilson @mpcowan
Thank you @aoberoi for this great analysis. I agree that it's well possible that the spike of memory could lead to crashing the app. Perhaps, if my app is already running high in terms of dyno memory (hence slower to respond), or if there is a long running task, it could skip sending "ping" to the server, and the server thinks the connection is dead. Reconnection attempts would increase the memory load, slowing even more the app - until it crashes. Would that make sense?
One more thing I'd like to ask you, perhaps, would be to run again these tests in SDK 2.0.6. I saw much less issues with 2.0.6 versus 3.x, and I have the impression that something is different in terms of memory usage. I use Node 5 in both cases.
i just wanted to leave a link here to a tool i thought might be useful for any future analysis: https://github.com/andywer/leakage
seeing as how we haven't had any other recent reports of this issue, i'm moving it to the "needs feedback" category. i'll need more data or reports of this issue in the wild in order to make progress.
i'm going to close this issue because it hasn't gotten any engagement in a long time, and its probably only relevant to v3. if you do find this issue is impacting you, feel free to comment and i will reopen.
Most helpful comment
Well poo.