Node: Http connections aborted after 5s / keepAliveTimeout

Created on 2 Jun 2017  路  37Comments  路  Source: nodejs/node

  • Version: v8.0.0
  • Platform: Windows 10, 64bit
  • Subsystem: http

Short Description

Node 8 introduced a change in the handling of http keep-alive connections. IMHO, this is (at least) a breaking change. When an http server does long-running requests (>5s), and the client requests a Connection: keep-alive connection, the http server closes the connection after 5s. This potentially causes browsers to re-send the request even if it is a POST request.

To Reproduce

clone https://github.com/pbininda/node8keepAliveTimeout and npm install. Then

    npm test

Starts a little express server (server.js) and a client.

  • The server is a standard express server with a long running post request (/longpost takes 10s).
  • The client calls the POST /longpost with a preflight OPTIONS /longpost.

The test runs through fine on node 6 and node 7:

> node test.js

got request OPTIONS /longpost
got options response 200
sending post request
got request POST /longpost
got post response 200 { status: 'OK' }

but fails on node 8 with

> node test.js

got request OPTIONS /longpost
got options response 200
sending post request
got request POST /longpost
C:\Users\pbininda\projects\ATRON\node8keepAliveTimeout\client.js:39
            throw err;
            ^

Error: socket hang up
    at createHangUpError (_http_client.js:343:15)
    at Socket.socketOnEnd (_http_client.js:435:23)
    at emitNone (events.js:110:20)
    at Socket.emit (events.js:207:7)
    at endReadableNT (_stream_readable.js:1045:12)
    at _combinedTickCallback (internal/process/next_tick.js:102:11)
    at process._tickCallback (internal/process/next_tick.js:161:9)

Browser Retries

It seems, most of the major browsers (Chrome, Firefox, Edge) implement https://www.w3.org/Protocols/rfc2616/rfc2616-sec8.html#sec8.2.4. Since the server closes the connection on which it received the POST request before sending an answer, the Browsers re-send the POST. Note that you don't see the re-send in chrome dev tools but using Wireshark shows the retransmission. To have a look at this, run

    npm start

which launches the server (server.js) and then load browsertest.html in chrome. This runs browsertest.js in the browser which does a simple $.ajax request against the server. On the server side you will see:

> node server.js

got request OPTIONS /longpost
got request POST /longpost
got request POST /longpost
format called 5003ms after previous

This shows, that the server received two POST requests the second one 5s after the first one, even though the browser client code only does one request.

Bug or Breaking Change?

I'm not sure if this is a bug or a breaking change. It probably got introduced through #2534. It only seems to happen when two connections are used (that's why the prefight OPTIONS is forced to happen in my code), so it may be that the wrong connection is being closed here.

Workaround

Setting the keepAliveTimeout (see https://nodejs.org/dist/latest-v8.x/docs/api/http.html#http_server_keepalivetimeout) of the http server to a value greater than the maximum duration of a request solves the problem. You can try this with

    npm start -- --keepAliveTimeout 20000

and then in another terminal

    node client.js
http

Most helpful comment

This problem still exists in node 8.7.0

All 37 comments

/cc @indutny @tshemsedinov @aqrln
Could be related to #2534.

Thanks for the ping, I'll be able look into this in a few hours.

Thanks for bringing this up @pbininda. We hit this issue as well when running requests longer than a few seconds. We also see multiple POSTs effecting all browsers on 8 ... reverted back to 7.10 and all is well. Seems like a pretty pernicious bug. We're using Koa 1.x middleware on Linux.

Same behaviour here; request bit longer than usual and the browser kept resending the request. A general remark is that Firefox and Chrome did resend the request, but Safari didn't. Using Express server 4.15.2. Finally, we reverted to 7.10 and is working normally.

Thanks for the note regarding Safari, I'll change the wording regarding "all major browsers" 馃槗

Ugh... sorry for the delay, I was more busy that I hoped for. Let's fix it today. Thanks a lot for a detailed report and reproduction!

Confirmed issue still present in release 8.1 as well.

@pbininda @lvpro @juanecabellob I'm very sorry for not making it in time for the 8.1 release. I took a look at the reproduction back then, but didn't have an opportunity to debug it until now. https://github.com/nodejs/node/pull/13549 should fix it.

Don't be sorry @aqrln! Thank you very much for getting this resolved! :)

@aqrln No problem, I can keep the server.keepAliveTimeout workaround in place until the fix is released. Thanks for your effort.

I have this issue too with webpack-dev-server. It somehow does not work.

How would I pass the paratmeter keepAlive Timeout so that the default timeout is 20 secs?

Above this is the example

npm start -- --keepAliveTimeout 20000

I tried this:

 npm run-script start -- --"keepAliveTimeout 20000"

It gets expanded to:

webpack-dev-server --inline --hot --history-api-fallback  --public --host 104.222.96.51 --port 9090 "--keepAliveTimeout 20000"

Unfortunately it does not have any effect.

This problem still exists in node 8.7.0

@rickywu what problem and show your code that produces problem

@rickywu I am using webpack dev server to host a bundle.js file that is about 5 MB in size.

webpack-dev-server --inline --hot --history-api-fallback  --public --host 127.0.0.1 --port 9090 --keepAliveTimeout 20000

It all worked when the bundle size was 2 MB. Now that it is bigger, I do get timeouts in the browser.
I believe that it is a pretty well known problem. The first link shows some solutions, which unfortunately do not work; the other two links show that the issue is there on webpack-dev-server and also with angular-cli

https://github.com/nodejs/node/issues/13391
https://github.com/webpack/webpack-dev-server/issues/1044
https://github.com/angular/angular-cli/pull/7563

I have this bug till nide 8.2 as i know. Now it fixed and does not related with this. I have bundles about 10Mb. And if server respond then all fine. I recommed to tune mtu if you using some vpn

@Delagen This is really strange. I run this on a dedicated server with direct IP. I have solved this issue so far, by basically creating a bundle that I now serve via express.js where I set the timeout in code. To me all the descriptions fit 100% to the issue that came up with node recently. I cannot run the webpack dev server anymore on my server; it only works on the localhost on my development machine, as there I do not have any transfer delay.

@Delagen see this issue

I used server.keepAliveTimeout=30000 in app.js solved, but I think the default value 5s is too short.

@rickywu As for me error on express was in repeating request callback. I have no any problems with expressjs in Nodejs 8.7.0. I only set timeout as I have heavy requests, that need up to minute to begin response. But I don't set ant keepAliveTimeout

FWIW, there is a PR open that's dealing with this: https://github.com/nodejs/node/pull/15791 and there's a more recent (open) issue: https://github.com/nodejs/node/issues/15082

@apapirovski As I understand It only occurs when use of TLS in node?

@Delagen No, it occurs in both tls & net (the PR just fixes both tls & net issues related to it). See https://github.com/nodejs/node/issues/15082 for a good explanation of what triggers it.

But to summarize, once Node starts writing data, the timeout starts ticking (whether it's keep alive or the regular one) and it completely ignores whether any writing is still going on or not. The timer is only updated when a write starts or ends. That's why most reports of this issue have to do with downloading/serving large files. Also this issue has existed for a very long time, it just wasn't super obvious before keepAliveTimeout because the regular timeout is 120s.

@Delagen What's the size of your data per request? My data is about 3Mb
I use debug tool monitor network, transmission suspend after 6s if not set keep alive timeout.

About 5 mb max, but seems I have fast channel, so I don't take this timeout

FYI A fix for this just landed on master so it should be in 8.x LTS & 9.x at some point within the next month. With the fix there should no longer be a need to bump up the keepAliveTimeout just so transfers don't timeout.

@apapirovski can you link to the fix please :D

v4.x and v6.x are unaffected correct?

v6.x is affected. Probably v4.x too.

Here's a way to test:

const http = require('http');

const server = http.createServer((req, res) => {

  var content = Buffer.alloc(30000000, 0x44);

  res.writeHead(200, {
    'Content-Type': 'application/octet-stream',
    'Content-Length': content.length.toString(),
    'Vary': 'Accept-Encoding'
  });

  var handle = res.socket._handle;
  var endRc = res.end(content, function() {
    console.log(handle.writeQueueSize);
  });

  res.end();
});
server.timeout = 5000;

server.listen(8000);
curl -v http://localhost:8000/ | dd bs=1 of=/dev/null

I don't know if my code can be backported straight up though. Maybe label with backport requested and I'll try to get around to figuring out how to make it work on those.

@apapirovski please make a backport to the v6.x and v4.x branches.

We are doing a maintenance release of v4.x soon which can be postponed for this

Ok, I labeled the PR and will work on it this week. Will keep you posted on it. Let me know if it lands cleanly on v8.x or not 鈥斅爓ant to make sure it makes it into the first LTS release of that.

@apapirovski I just landed it in v8.x-staging in f6a725ea09, it should hopefully go out in tomorrows release

@MylesBorins You'll need the other commit: https://github.com/nodejs/node/commit/aaf2a1c2264fffd6eea82fe3778f41ac046e6349 鈥斅爐hey probably should've landed as one or I should've indicated that the 2nd one depends on the first working properly. My bad.

@apapirovski done

landed in eeef06a4bb

Would this have been present in node 10.6.0? Test code above seems to pass.

3 years and bug still exists? How's that possible?

How's that possible?

As with all things in Node.js, it requires someone free to work on it. Pull requests are always welcome. One thing that may be helpful is a reproduction of the issue in the form of a known_issue test that can be used to guide someone in making a fix.

How's that possible?

As with all things in Node.js, it requires someone free to work on it. Pull requests are always welcome. One thing that may be helpful is a reproduction of the issue in the form of a known_issue test that can be used to guide someone in making a fix.

Not intended to sound rude. I'm trying to solve that riddle for hours now - maybe someone else has it as well:

  • trying to send in response an 8KB array of 50 items that takes 6.7 seconds to perform [ERROR]
  • when I slice(10) it suddenly takes 2 seconds and everything works like a charm.

My question would be: how is that possible? Seems like if send data is too big (?) it just freezes and then times out.

I'm using Node 10.19.0. Thanks for your help.

In case this helps - I think there's an additional use case which isn't related to the transmission size, but rather it's a race condition between the start / end of requests on the same connection.

  1. First request, new connection opened, keep-alive timer starts.
  2. First request ends
  3. Keep-alive timer kicks in, "timeout" event is about to be emitted on the socket
    3.1 First bytes of second request come in on the soon-to-be-destroyed socket
  4. "timeout" event is emitted on the socket, resulting in the destruction of the socket
    4.1 Second request gets dropped
  5. Second requests parsing (parseOnIncoming..) starts, trying to reset the timeout on the already-destroyed socket

Important:
Note that in this test (see "Reproduce" below) we're not transmitting large amounts of data on the connection - which is (I think) why f6a725e doesn't fix this issue. Put differently, I think it's a timing issue from the time one requests ends (and keepAlive timer is about to end) - and just at the (almost) same time - a new request comes in.

*_I'm not 100% sure this is the correct flow - it's just my best educated-guess, based on some debugging_

Reproduce:

Tested Versions:

  • v8.16.1
  • v12.18.0
  • v12.19.1
  • v12.20.0
  • v14.7.0

Naive Approach to a Fix
Assuming my debug analysis is correct, I would assume that there are 2 main ways to go about a fix for this:

  • either prevent the timeout event from firing on the socket if the socket is still in use
  • or, when ever the timeout event has fired, we then check if the socket is still in use, in which case we do nothing

Here's a naive code listing of the second approach:

// file: _http_server.js
function socketOnTimeout() {
  // "this" is the socket
  if (this.isInUse) {
    return;
  }  
  ...
}
Was this page helpful?
0 / 5 - 0 ratings

Related issues

ksushilmaurya picture ksushilmaurya  路  3Comments

filipesilvaa picture filipesilvaa  路  3Comments

fanjunzhi picture fanjunzhi  路  3Comments

addaleax picture addaleax  路  3Comments

mcollina picture mcollina  路  3Comments