Version 2.0.1 exhibits a problem for me when using the default engine: uWebSockets, pushing the Node.js server process into 100% CPU
I get the problem consistently if I have an application running in the browser with a web-socket connection in place. I then reload the application in the browser (which forces the old socket to be removed and a new one put in place). At this point the server goes into a spinning loop, eating 100% of the CPU.
Replacing the engine with "ws" fixes the issue.
Here's what node debug reports when I pause during 100% CPU utilisation:
ubuntu@ip-172-30-1-57:~/qewd$ node debug localhost:7000
connecting to localhost:7000 ... ok
debug> pause
break in node_modules/uws/uws.js:8
6 const DEFAULT_PAYLOAD_LIMIT = 16777216;
7
8 function noop() {}
9
10 function abortConnection(socket, code, name) {
debug>
It appears to be stuck in a loop calling this function
Just a follow-up - I've tried to get some further diagnostic info on what's actually happening, but no joy. All I can tell you is that it does seem to be related to a socket disconnection event and when using the uWebSockets engine. The noop() function in uws.js starts being fired repeatedly - I've been unable to find out what's calling it: noop.caller just returns null, and there's no stack trace if you force an error within noop(). I don't believe I'm doing anything unusual within my app the way I use web-sockets, and switching to "ws" as the engine returns it to normality. If you can give me any suggestions to try out to help diagnose the problem, do let me know, and I'll see what I can find out.
@robtweed I am not able to reproduce (same OS, same browser, same socket.io version). Can you reproduce consistently? Could you provide some code please?
OK I eventually tracked it down. It happens if you've forked a child process, and the process environment of the master process is inherited by the child. If socket.io is loaded in the master process environment at the time the child process is forked, then reloading a socket client throws the master process into the noop() spin.
Initially everything is fine - as soon as you reload the browser, causing the original socket to be closed and a new one created, the master process goes berserk - for me it happens 100% of the time.
For example, I've managed to strip it back to this, so hopefully it's replicable:
var express = require('express');
var app = express();
server = app.listen(8080);
var io = require('socket.io')(server);
app.use('/', express.static(__dirname + '/www/'));
var cp = require('child_process');
var workerStarted = false;
function startWorker() {
var workerProcess = cp.fork(__dirname + '/workerKeepAlive.js', null, { env: process.env });
workerStarted = true;
}
io.on('connection', function (socket) {
console.log('socket connected: ' + socket.id);
var handleMessage = function(data) {
console.log(socket.id + ' handling message: ' + JSON.stringify(data));
if (!workerStarted) startWorker(); // start up the child process
var type = data.type;
var responseObj;
if (type === 'ewd-register' || type === 'ewd-reregister') {
console.log('ewd-register ' + socket.id);
data.socketId = socket.id;
data.ipAddress = socket.request.connection.remoteAddress;
responseObj = {"type":"ewd-register","finished":true,"message":{"token":"3fc58f5f-9ed3-4165-bdd2-650eca1ed9b6"}}
}
else {
responseObj = {
type: type,
finished: true,
message: {
hello: 'world'
}
};
}
socket.emit('ewdjs', responseObj);
};
socket.on('ewdjs', handleMessage);
socket.on('disconnect', function() {
console.log('socket ' + socket.id + ' disconnected');
});
});
Here's the very simple child process keepalive script:
console.log('child process started: ' + process.pid);
setTimeout(function() {
console.log('child process ' + process.pid + ' terminating');
process.exit();
}, 20000);
Note: if you start the child process before socket.io is loaded, it behaves OK. Start it once socket.io is loaded and the problem occurs.
I'd hazard a guess that it's an IPC-related problem, with the child process having a handle to the socket.io instance. Switch the socket.io engine to "ws" and the problem goes away.
For my situation, I need to be able to start child processes after socket.io has started, and the run-time environment of the child process needs to be the same as the parent process (although I don't need the socket.io environment in the worker)
Just a quick follow-up - it turns out the process.env issue is a red herring - it even fails with this:
cp.fork(__dirname + '/workerKeepAlive.js');
The critical thing is when the child process is started - start it inside the
io.on('connection', function (socket) {..}
closure and you get the CPU runaway happening when you reload the browser
Start it before this and it's OK
Hi! I'm experiencing a similar issue, with slightly different conditions :
1- the client connects via socket.io
2- the server spawns a child process with same env via a require('child_process').exec() call
3- the client diconnects before the child proces exits
Then the server CPU goes 100%.
If the child process ends before disconnection, nothing happens.
As reported above, the option {wsEngine: 'ws'} fixes the issue.
Any resolution to this sorted out? It would be good to be able to switch to uws as the engine
I also run into this problem. I set to use polling only for now.
Me too.
This problem is happen in Linux. (Debian stretch, Ubuntu 16.04)
However, that does not happen in macOS.
Hi, It seems that this bug has disappeared on my side with recent updates.
@valette could you please elaborate? I just checked npm and the latest version was published 4 months ago.
Are you referring to the latest develop
branch here in this repo?
no, I do not know exactly, but something changed and the bug is no longer visible (maybe an update in core node?). I am not using any development branch, only released packages.
I load tested the library with wsEngine
set to ws
, after only 500~ worker kill/fork the 100% cpu problem comes back and keep haunting me, this is basically a no go sign for me now.
I'm putting together demo code and will try to host it somewhere for others to play with.
Just so you know, i'm having the same issue with a 30 worker environment. Same patern. At least now that I have found your issue, i'm know i'm not crazy. I'll keep making some tests and post anything I can find.
same issue for me,
same OS and I am using React Native on client side
wsEngine: 'ws'
does not fix the issue for me...
any other ideas?
Hi,
I saw symptoms similar to the ones described above, namely:
Socket.onclose()
I was able to work around this by handling the Socket "disconnecting" event (new in 2.x?) and cleaning up associated forks before the rest of the onclose() executes. See below for the onclose() impl to see why this works. EventEmitter emits disconnecting before anything else is done, so you can safely kill all relevant child processes in an event handler here.
Socket.prototype.onclose = function(reason){
if (!this.connected) return this;
debug('closing socket - reason %s', reason);
this.emit('disconnecting', reason);
this.leaveAll();
this.nsp.remove(this);
this.client.remove(this);
this.connected = false;
this.disconnected = true;
delete this.nsp.connected[this.id];
this.emit('disconnect', reason);
};
Hope this helps anyone else having this issue currently.
@darrachequesne I'd recommend a closer look at the onclose flow and see if there is a place where retry logic can be handled better during socket cleanup. Maybe an increasing retry interval would be acceptable, followed by a thrown Error after n retries? At least then it wouldn't blow up the event loop with constant retries.
* Just in case this helps, I wanted to note that this issue does not exist in 1.3.5. I didn't notice anything in the diffs between 1.3.5 and 2.0.4 that could be the culprit, but could be inside a dependency, as many of these were upgraded in 2.x
DETAILS: Node.js 8.4.0, socket-io 2.0.4 running on Debian 8.9
@juergengunz Hello, Can it be used in the react native project? I plan to use it. Please advise.
Note: ws
will be restored as default wsEngine
starting version 2.1.0
I am having the same issue, even without using child_process
. This only happens after quite a long time of the server being active!
Is still an issue with node >= 11.1.0 ? After a chrome-devtools inspect... It's look like a nodejs infinite loop.
Looks like we are having the same issue still in 2020.
Socket.io version 2.2.0
Node.js version 12.15.0
As I can see this is now default { "wsEngine": "ws" }
so not sure if it is related to the same issue.
One specific thing is that we are using socket.io-redis as adapter for running multiple instances.
Same issue here. I'm also using redis adapter. Thinking of use ws instead.
Same issue. I'm also using redis adapter.
Most helpful comment
Hi,
I saw symptoms similar to the ones described above, namely:
Socket.onclose()
I was able to work around this by handling the Socket "disconnecting" event (new in 2.x?) and cleaning up associated forks before the rest of the onclose() executes. See below for the onclose() impl to see why this works. EventEmitter emits disconnecting before anything else is done, so you can safely kill all relevant child processes in an event handler here.
Hope this helps anyone else having this issue currently.
@darrachequesne I'd recommend a closer look at the onclose flow and see if there is a place where retry logic can be handled better during socket cleanup. Maybe an increasing retry interval would be acceptable, followed by a thrown Error after n retries? At least then it wouldn't blow up the event loop with constant retries.
* Just in case this helps, I wanted to note that this issue does not exist in 1.3.5. I didn't notice anything in the diffs between 1.3.5 and 2.0.4 that could be the culprit, but could be inside a dependency, as many of these were upgraded in 2.x
DETAILS: Node.js 8.4.0, socket-io 2.0.4 running on Debian 8.9