We are running a NodeJS app in Azure. During high load we get high response times due to SNAT Port Exhaustion. We seem to be trying to do more than 1000 simultaneous outgoing requests. I have tried to limit this by setting the maxSockets in the HTTPS Agent to 160 and enabling HTTP KeepAlive, which I thought would limit the no of outbound connections to 160. See code below.
Things I'm wondering about:
const agentOptions = {
maxSockets: 160,
maxFreeSockets: 10,
keepAlive: true,
timeout: 30000
}
const httpAgent = new http.Agent(agentOptions)
const httpsAgent = new https.Agent(agentOptions)
const gotClient = got.extend({
headers: headers,
agent: {
http: httpAgent,
https: httpsAgent
}
})
const client = {
get: async (url = '') => {
const response = await gotClient.get(url, {
headers,
json: true
})
return response.body
}
...
}
I see that I'm not running the latest versions of Node and Got, but my questions are more general and should be valid for older versions as well.
maxSockets are per host, as well as maxFreeSockets.
enabling HTTP KeepAlive
That's a very good approach, but it has no effect if you query a new host every time.
Does the no of threads affect this?
no, because Node.js is based on event loops
enabling HTTP KeepAlive
That's a very good approach, but it has no effect if you query a new host every time.
Thanks for replying!
All my requests are to the same backend, e.g. my-backend.azurewebsites.net/api/getstuff/{uuid}/,
so that should be treated as the same destination, right?
if it's the same hostname, then yes
make sure you're replying with connection: keep-alive and not with connection: close
timeout: 30000
Try setting this to 10000 or 5000
make sure you're replying with
connection: keep-aliveand not withconnection: close
Ah, thanks, will verify that in the backend code. But even if I was, I should't be opening >1000 connections, right? Somehow the code isn't listening to the maxSockets-setting? Or is there any way I can verify this locally by logging?
Somehow the code isn't listening to the maxSockets-setting?
Not possible. I've studied Node.js HTTP code long hours. I double checked everything :P
Somehow the code isn't listening to the maxSockets-setting?
Not possible. I've studied Node.js HTTP code long hours. I double checked everything :P
Haha! :) But the fact that Azure shows >1000 failed SNAT port acquisitions proves to me (at least) that the app is trying to open more than 160 connections, right? Could it due to an old version of NodeJS (10.6) or old version of Got (9.6)? Or is the problem that I'm not setting the agent per connection?
Or is there any way I can verify this locally by logging?
But the fact that Azure shows >1000 failed SNAT port acquisitions proves to me (at least) that the app is trying to open more than 160 connections, right?
weird, I cannot say anything about this as I don't use Azure
Could it due to an old version of NodeJS (10.6)
Possibly, try using Node.js 12
old version of Got (9.6)
Well, that version is no longer developed so it may be a bug but I don't think so.
Or is the problem that I'm not setting the agent per connection?
there is no such thing like "agent per connection" :rofl:
Or is the problem that I'm not setting the agent per connection?
there is no such thing like "agent per connection" 馃ぃ
Hehe, sorry for being unclear. I mean is it enough to just do:
const gotClient = got.extend({
headers: headers,
agent: {
http: httpAgentWithKeepAliveAndMaxSocketSet,
https: httpsAgentWithKeepAliveAndMaxSocketSet
}
})
Or do I need to also pass the agent along with each request:
const response = await gotClient.get(url, {
headers: headers,
agent: agent,
json: true
})
The first one is fine, you don't have to do the latter, that's the point of the custom Got instances :)
Or is there any way I can verify this locally by logging?
Wow, thanks for taking the time to write that example! I see that it will log each time I create a connection. However, if I want to see the number of active connections, do I need to do some corresponding thing for closing connections and then have a global variable that I update?
the createConnection returns basically a net.Socket instance, it's in the Node.js docs
Or if I write a more open ended question: How would you, who has spent Node.JS HTTP for long hours ;), go about debugging this? Somehow it seems the MaxSockets setting is not being listened to?
(When I look in my process list I have about 13 node instances, but I'm assuming the extra 7 (6 are default, right?) are just extra threads for DNS lookups due to me setting UV_THREADPOOL_SIZE = 128. I was thinking if this somehow was a sign that there are multiple node instances doing requests...)
debugging this
https://github.com/GoogleChromeLabs/ndb is a great tool, I strongly recommend it :)
Somehow it seems the MaxSockets setting is not being listened to?
Try something like this: https://runkit.com/szmarczak/5e5e64d2c38f7e0013896198
First off I cannot begin to explain how thankful I am for you taking the time to help me! I've spent the last _days_ trying to fix this.
I tried your code and just updated the log line to show if you were connecting or disconnecting:
let activeConnections = 0
const createConnection = httpAgent.createConnection.bind(httpAgent)
httpAgent.createConnection = (options, onCreate) => {
const socket = createConnection(options, onCreate)
console.log(`Connecting to ${options.hostname}:${options.port}`)
socket.once('connect', () => {
activeConnections++
console.log(`Connect. Active connections: ${activeConnections}`)
socket.once('end', () => {
activeConnections--
console.log(`End. Active connections: ${activeConnections}`)
})
})
return socket
}
The result kind of blows my mind (tried setting keepAlive to false during the tests)
Connecting to localhost:8070
Active connections: 2
Connecting to localhost:8070
Active connections: 3
Connecting to localhost:8070
Connecting to localhost:8070
Active connections: 1
Active connections: 1
Connecting to localhost:8070
Active connections: 1
Connecting to localhost:8070
Active connections: 1
Connecting to localhost:8070
How can the number go from 2 to 1, without printing the End log line!? Is this a sign that I'm instantiating new Got clients every time or something?
Is this a sign that I'm instantiating new Got clients every time or something?
Seems like you're creating a new Agent every time. Try not to pass any agent at all, it should use https.globalAgent:
I found the problem!!! The problem had nothing to do with the Got-library or Azure - it was a bug in our code. The code snippet you wrote (I had to change socket.once('end'...) to socket.once('close'...) to make it work) helped me realise that we weren't reusing the connections, we weren't even reusing the same instance of the Got client!? In fact, we were creating a new instance of the Got library for every new connection 馃樋. This bug had been there since way before I started on this project, so I'm surprised we didn't get any problems until now.
I would probably have wasted (even more) days on this without your fantastic support @szmarczak. Is there anything I can do to show my appreciation? I am truly thankful for you taking the time to help a random software developer with his problems.
You can become a sponsor :sparkles:
Your issue made me wonder whether it's a Got bug or not, as your code example is totally valid. It means that you have read the documentation carefully. Some people tick "I have read the documentation" even though they had not. Issues can be of a good quality too, so it's best to provide as much information as possible.
Just made a donation over Paypal. Thanks again for your help!
Yep, I have received it. I am veeery grateful! :D
This issue was very helpful while debugging why each request was opening up a new connection for my Azure App Service. Thank you for your work here!
I was using https://github.com/node-modules/agentkeepalive and by using that library's getCurrentStatus method, I figured out that the destination server didn't have support for keepAlive because the open and closed socket counts matched with no free sockets left open after my load tests. Swapping in a node http global agent with that code above also showed that connections couldn't stay open and they were being closed. Other known servers supporting keepalive were producing expected results with connection pooling.
I then used curl -Iv to confirm and sure enough the connection was being closed each time.
Most helpful comment
Just made a donation over Paypal. Thanks again for your help!