Socket.io: Default uWebSocket Engine in 2.0.1 causing 100% CPU spin on server

Created on 30 May 2017 · 23Comments · Source: socketio/socket.io

I want to:

[x] report a bug
[ ] request a feature

Current behaviour

Version 2.0.1 exhibits a problem for me when using the default engine: uWebSockets, pushing the Node.js server process into 100% CPU

Steps to reproduce (if the current behaviour is a bug)

I get the problem consistently if I have an application running in the browser with a web-socket connection in place. I then reload the application in the browser (which forces the old socket to be removed and a new one put in place). At this point the server goes into a spinning loop, eating 100% of the CPU.

Replacing the engine with "ws" fixes the issue.

Setup

OS: Ubuntu 16.04
browser: Chrome 57
socket.io version: 2.0.1

Other information (e.g. stacktraces, related issues, suggestions how to fix)

Here's what node debug reports when I pause during 100% CPU utilisation:
ubuntu@ip-172-30-1-57:~/qewd$ node debug localhost:7000
connecting to localhost:7000 ... ok
debug> pause
break in node_modules/uws/uws.js:8
6 const DEFAULT_PAYLOAD_LIMIT = 16777216;
7

8 function noop() {}
9
10 function abortConnection(socket, code, name) {
debug>

It appears to be stuck in a loop calling this function

Source

robtweed

Most helpful comment

Hi,

I saw symptoms similar to the ones described above, namely:

socket handler code generating a child-process fork
socket disconnect occurs before fork exits
node thrashing repeatedly trying to reap the socket somewhere in Socket.onclose()
event loop does not fully block because the reaper fails, and retries, fails, and retries in a loop until the fork is killed. CPU appears pegged at 100% but in reality the event loop is just getting clogged with these retries. Node server still able to respond to requests but super slow (and gets slower with more volume of course)
kill the fork via cmd line kill signal or via eventual code process.exit and boom, socket cleanup finishes and node stops thrashing.

I was able to work around this by handling the Socket "disconnecting" event (new in 2.x?) and cleaning up associated forks before the rest of the onclose() executes. See below for the onclose() impl to see why this works. EventEmitter emits disconnecting before anything else is done, so you can safely kill all relevant child processes in an event handler here.

Socket.prototype.onclose = function(reason){
  if (!this.connected) return this;
  debug('closing socket - reason %s', reason);
  this.emit('disconnecting', reason);
  this.leaveAll();
  this.nsp.remove(this);
  this.client.remove(this);
  this.connected = false;
  this.disconnected = true;
  delete this.nsp.connected[this.id];
  this.emit('disconnect', reason);
};

Hope this helps anyone else having this issue currently.

@darrachequesne I'd recommend a closer look at the onclose flow and see if there is a place where retry logic can be handled better during socket cleanup. Maybe an increasing retry interval would be acceptable, followed by a thrown Error after n retries? At least then it wouldn't blow up the event loop with constant retries.

* Just in case this helps, I wanted to note that this issue does not exist in 1.3.5. I didn't notice anything in the diffs between 1.3.5 and 2.0.4 that could be the culprit, but could be inside a dependency, as many of these were upgraded in 2.x

DETAILS: Node.js 8.4.0, socket-io 2.0.4 running on Debian 8.9

michaeljho on 20 Dec 2017

👍3

All 23 comments

Just a follow-up - I've tried to get some further diagnostic info on what's actually happening, but no joy. All I can tell you is that it does seem to be related to a socket disconnection event and when using the uWebSockets engine. The noop() function in uws.js starts being fired repeatedly - I've been unable to find out what's calling it: noop.caller just returns null, and there's no stack trace if you force an error within noop(). I don't believe I'm doing anything unusual within my app the way I use web-sockets, and switching to "ws" as the engine returns it to normality. If you can give me any suggestions to try out to help diagnose the problem, do let me know, and I'll see what I can find out.

robtweed on 31 May 2017

@robtweed I am not able to reproduce (same OS, same browser, same socket.io version). Can you reproduce consistently? Could you provide some code please?

darrachequesne on 31 May 2017

OK I eventually tracked it down. It happens if you've forked a child process, and the process environment of the master process is inherited by the child. If socket.io is loaded in the master process environment at the time the child process is forked, then reloading a socket client throws the master process into the noop() spin.

Initially everything is fine - as soon as you reload the browser, causing the original socket to be closed and a new one created, the master process goes berserk - for me it happens 100% of the time.

For example, I've managed to strip it back to this, so hopefully it's replicable:

var express = require('express');
var app = express();
server = app.listen(8080);
var io = require('socket.io')(server);
app.use('/', express.static(__dirname + '/www/'));
var cp = require('child_process');

var workerStarted = false;

function startWorker() {
  var workerProcess = cp.fork(__dirname + '/workerKeepAlive.js', null, { env: process.env });
  workerStarted = true;
}

io.on('connection', function (socket) {
    console.log('socket connected: ' + socket.id);
    var handleMessage = function(data) {
      console.log(socket.id + ' handling message: ' + JSON.stringify(data));

      if (!workerStarted) startWorker();  // start up the child process

      var type = data.type; 
      var responseObj;
      if (type === 'ewd-register' || type === 'ewd-reregister') {
        console.log('ewd-register ' + socket.id);
        data.socketId = socket.id;
        data.ipAddress = socket.request.connection.remoteAddress;
        responseObj = {"type":"ewd-register","finished":true,"message":{"token":"3fc58f5f-9ed3-4165-bdd2-650eca1ed9b6"}}
      }
      else {
        responseObj = {
          type: type,
          finished: true,
          message: {
            hello: 'world'
          }
        };
      }

      socket.emit('ewdjs', responseObj);
    };

    socket.on('ewdjs', handleMessage);

    socket.on('disconnect', function() {
      console.log('socket ' + socket.id + ' disconnected');
    });
});

Here's the very simple child process keepalive script:

console.log('child process started: ' + process.pid);
setTimeout(function() {
  console.log('child process ' + process.pid + ' terminating');
  process.exit();
}, 20000);

Note: if you start the child process before socket.io is loaded, it behaves OK. Start it once socket.io is loaded and the problem occurs.

I'd hazard a guess that it's an IPC-related problem, with the child process having a handle to the socket.io instance. Switch the socket.io engine to "ws" and the problem goes away.

For my situation, I need to be able to start child processes after socket.io has started, and the run-time environment of the child process needs to be the same as the parent process (although I don't need the socket.io environment in the worker)

robtweed on 1 Jun 2017

👍1

Just a quick follow-up - it turns out the process.env issue is a red herring - it even fails with this:

cp.fork(__dirname + '/workerKeepAlive.js');

The critical thing is when the child process is started - start it inside the
io.on('connection', function (socket) {..}
closure and you get the CPU runaway happening when you reload the browser

Start it before this and it's OK

robtweed on 1 Jun 2017

👍1

Hi! I'm experiencing a similar issue, with slightly different conditions :

1- the client connects via socket.io
2- the server spawns a child process with same env via a require('child_process').exec() call
3- the client diconnects before the child proces exits

Then the server CPU goes 100%.
If the child process ends before disconnection, nothing happens.
As reported above, the option {wsEngine: 'ws'} fixes the issue.

valette on 15 Jun 2017

👍3

Any resolution to this sorted out? It would be good to be able to switch to uws as the engine

robtweed on 4 Aug 2017

I also run into this problem. I set to use polling only for now.

Haocen on 31 Aug 2017

Me too.
This problem is happen in Linux. (Debian stretch, Ubuntu 16.04)
However, that does not happen in macOS.

l3tnun on 5 Sep 2017

👍1

Hi, It seems that this bug has disappeared on my side with recent updates.

valette on 2 Oct 2017

@valette could you please elaborate? I just checked npm and the latest version was published 4 months ago.

Are you referring to the latest develop branch here in this repo?

Haocen on 2 Oct 2017

no, I do not know exactly, but something changed and the bug is no longer visible (maybe an update in core node?). I am not using any development branch, only released packages.

valette on 2 Oct 2017

I load tested the library with wsEngine set to ws, after only 500~ worker kill/fork the 100% cpu problem comes back and keep haunting me, this is basically a no go sign for me now.

I'm putting together demo code and will try to host it somewhere for others to play with.

Haocen on 2 Oct 2017

Just so you know, i'm having the same issue with a 30 worker environment. Same patern. At least now that I have found your issue, i'm know i'm not crazy. I'll keep making some tests and post anything I can find.

ghost on 18 Oct 2017

same issue for me,
same OS and I am using React Native on client side

juergengunz on 14 Dec 2017

wsEngine: 'ws'
does not fix the issue for me...
any other ideas?

juergengunz on 14 Dec 2017

Hi,

I saw symptoms similar to the ones described above, namely:

socket handler code generating a child-process fork
socket disconnect occurs before fork exits
node thrashing repeatedly trying to reap the socket somewhere in Socket.onclose()
event loop does not fully block because the reaper fails, and retries, fails, and retries in a loop until the fork is killed. CPU appears pegged at 100% but in reality the event loop is just getting clogged with these retries. Node server still able to respond to requests but super slow (and gets slower with more volume of course)
kill the fork via cmd line kill signal or via eventual code process.exit and boom, socket cleanup finishes and node stops thrashing.

Socket.prototype.onclose = function(reason){
  if (!this.connected) return this;
  debug('closing socket - reason %s', reason);
  this.emit('disconnecting', reason);
  this.leaveAll();
  this.nsp.remove(this);
  this.client.remove(this);
  this.connected = false;
  this.disconnected = true;
  delete this.nsp.connected[this.id];
  this.emit('disconnect', reason);
};

Hope this helps anyone else having this issue currently.

DETAILS: Node.js 8.4.0, socket-io 2.0.4 running on Debian 8.9

michaeljho on 20 Dec 2017

👍3

@juergengunz Hello, Can it be used in the react native project? I plan to use it. Please advise.

giantss on 5 Mar 2018

Note: ws will be restored as default wsEngine starting version 2.1.0

darrachequesne on 9 Mar 2018

I am having the same issue, even without using child_process. This only happens after quite a long time of the server being active!

gaurav21r on 5 Dec 2018

👍1

Is still an issue with node >= 11.1.0 ? After a chrome-devtools inspect... It's look like a nodejs infinite loop.

AlgoSolu on 12 Dec 2018

Looks like we are having the same issue still in 2020.
Socket.io version 2.2.0
Node.js version 12.15.0

As I can see this is now default { "wsEngine": "ws" } so not sure if it is related to the same issue.
One specific thing is that we are using socket.io-redis as adapter for running multiple instances.

ipetrovic11 on 7 Feb 2020

Same issue here. I'm also using redis adapter. Thinking of use ws instead.

cesarlima-hotmart on 10 Mar 2020

Same issue. I'm also using redis adapter.

DancingCorp on 16 Sep 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Socket.io pm2 on shutdown doesnt closes sockets

MyMomSaysIAmSpecial · 4Comments

Multiple origins in options

adammw · 4Comments

Are "ping" and/or "pong" reserved events?

thebinarypenguin · 4Comments

broadcasting to all clients except sender issue

jloa · 4Comments

The value of the 'Access-Control-Allow-Origin' header in the response must not be the wildcard '*' when the request's credentials mode is 'include'. Origin 'http://localhost:3000' is therefore not allowed access. The credentials mode of requests initiated by the XMLHttpRequest is controlled by the withCredentials attribute.

akasrai · 3Comments