I use hapi as an API server on AWS EC2.
Load Balancer returns 502 error several times a day.
When I looked at the TCP stream, I received a POST request with FIN-WAIT 1 state and it seems that it is getting a 502 error because it returns RST.
I was told from AWS Technical Support "Does Node.js properly implement closing processing for requests?" However, how can I properly perform close processing with hapi?
60090 2018-02-01 08:43:23.014057 ALB_IP NODE_SERVER_IP TCP 74 23495 → 3000 [SYN] Seq=0 Win=26883 Len=0 MSS=8961 SACK_PERM=1 TSval=37350007 TSecr=0 WS=256
60091 2018-02-01 08:43:23.014086 NODE_SERVER_IP ALB_IP TCP 74 3000 → 23495 [SYN, ACK] Seq=0 Ack=1 Win=26847 Len=0 MSS=8961 SACK_PERM=1 TSval=45343656 TSecr=37350007 WS=128
60092 2018-02-01 08:43:23.014462 ALB_IP NODE_SERVER_IP TCP 66 23495 → 3000 [ACK] Seq=1 Ack=1 Win=27136 Len=0 TSval=37350007 TSecr=45343656
60093 2018-02-01 08:43:23.014487 ALB_IP NODE_SERVER_IP HTTP 722 POST POST_PATH HTTP/1.1 (application/json
60094 2018-02-01 08:43:23.014493 NODE_SERVER_IP ALB_IP TCP 66 3000 → 23495 [ACK] Seq=1 Ack=657 Win=28160 Len=0 TSval=45343656 TSecr=37350007
60099 2018-02-01 08:43:23.017253 NODE_SERVER_IP ALB_IP HTTP 293 HTTP/1.1 200 OK (application/json)
60100 2018-02-01 08:43:23.017505 ALB_IP NODE_SERVER_IP TCP 66 23495 → 3000 [ACK] Seq=657 Ack=228 Win=28160 Len=0 TSval=37350008 TSecr=45343657
60142 2018-02-01 08:43:28.019353 NODE_SERVER_IP ALB_IP TCP 66 3000 → 23495 [FIN, ACK] Seq=228 Ack=657 Win=28160 Len=0 TSval=45344908 TSecr=37350008
60143 2018-02-01 08:43:28.019908 ALB_IP NODE_SERVER_IP HTTP 745 POST POST_PATH HTTP/1.1 (application/json)
60144 2018-02-01 08:43:28.019931 NODE_SERVER_IP ALB_IP TCP 54 3000 → 23495 [RST] Seq=229 Win=0 Len=0
60145 2018-02-01 08:43:28.019942 ALB_IP NODE_SERVER_IP TCP 66 23495 → 3000 [FIN, ACK] Seq=1336 Ack=229 Win=28160 Len=0 TSval=37351258 TSecr=45344908
60146 2018-02-01 08:43:28.019946 NODE_SERVER_IP ALB_IP TCP 54 3000 → 23495 [RST] Seq=229 Win=0 Len=0
internals.handler = (request, reply) => {
let randam = uuid.v4().replace(/-/g, '');
reply({
uuid: randam
}).code(200);
}
Application Load Balanser -> EC2 ( Node.js on hapi at 3000 port)
@kanongil any ideas on this?
Looks like a race condition of sorts.
A response is sent to client for original req, server waits exactly 5 seconds for more data and then closes its side of the connection with FIN, ACK, then client sends another request immediately after but server considers the connection already closed so sends RST.
There's a couple of things that come to mind:
I guess the 5 seconds comes from: https://nodejs.org/dist/latest-v8.x/docs/api/http.html#http_server_keepalivetimeout. Default node server keepalive timeout.
You may want to tweak node/elb timeout settings to arrive at something more harmonious.
@mtharrison thanks, I try it.
Is there keepalive timeout setting in hapi options ?
I can't find in docs.
https://hapijs.com/api/16.6.2#serverconnections
https://hapijs.com/api/16.6.2#serverstartcallback
This is related, same issue described as you: https://github.com/nodejs/node/issues/17749. Are you on Node 8? I think node 8 changed timeout from 2min to 5s.
If you wanted the old behaviour you'd do something like:
const Hapi = require('hapi');
const server = Hapi.server({ port: ... });
server.listener.keepAliveTimeout = 120e3;
....
We've been having intermittent 502s since node 8 as well, so it was interesting to see this issue. I was able to repro this with a similar configuration in Azure and a loop of POST requests running every 5.01 seconds. I applied @mtharrison's keepAliveTimeout as above and am no longer able to repro the FIN and resulting 502 from the LB. Thanks!
solved. thanks!
Here's my own experience with this new situation: https://github.com/nodejs/node/issues/20256
This thread has been automatically locked due to inactivity. Please open a new issue for related bugs or questions following the new issue template instructions.
Most helpful comment
This is related, same issue described as you: https://github.com/nodejs/node/issues/17749. Are you on Node 8? I think node 8 changed timeout from 2min to 5s.
If you wanted the old behaviour you'd do something like: