Got: Too many outgoing connections causes SNAT Port Exhaustion

Created on 3 Mar 2020  路  26Comments  路  Source: sindresorhus/got

Question

  • Got version:
    9.6.0

    • Node.js version:

      10.16-alpine

    • OS & version:

      Azure App Service Linux

We are running a NodeJS app in Azure. During high load we get high response times due to SNAT Port Exhaustion. We seem to be trying to do more than 1000 simultaneous outgoing requests. I have tried to limit this by setting the maxSockets in the HTTPS Agent to 160 and enabling HTTP KeepAlive, which I thought would limit the no of outbound connections to 160. See code below.

Things I'm wondering about:

  1. Am I doing it wrong? Is there any other way re-use connections and limit the amount of connections? Or do I need to set the Agent per request?
  2. Does the no of threads affect this? I have updated the UV_THREADPOOL_SIZE to 128 and I'm thinking this should not affect this behaviour. Or is the limit somehow per thread?
  3. Is there any way for me to log the number of outgoing connections?

Code to reproduce

  const agentOptions = {
    maxSockets: 160,
    maxFreeSockets: 10,
    keepAlive: true,
    timeout: 30000
  }
  const httpAgent = new http.Agent(agentOptions)
  const httpsAgent = new https.Agent(agentOptions)

  const gotClient = got.extend({
    headers: headers,
    agent: {
      http: httpAgent,
      https: httpsAgent
    }
  })

  const client = {
    get: async (url = '') => {
      const response = await gotClient.get(url, {
        headers,
        json: true
      })
      return response.body
    }
   ...
}
  • [x] I have read the documentation.
  • [ ] I have tried my code with the latest version of Node.js and Got.

I see that I'm not running the latest versions of Node and Got, but my questions are more general and should be valid for older versions as well.

Most helpful comment

Just made a donation over Paypal. Thanks again for your help!

All 26 comments

maxSockets are per host, as well as maxFreeSockets.

enabling HTTP KeepAlive

That's a very good approach, but it has no effect if you query a new host every time.

Does the no of threads affect this?

no, because Node.js is based on event loops

enabling HTTP KeepAlive

That's a very good approach, but it has no effect if you query a new host every time.

Thanks for replying!
All my requests are to the same backend, e.g. my-backend.azurewebsites.net/api/getstuff/{uuid}/,
so that should be treated as the same destination, right?

if it's the same hostname, then yes

make sure you're replying with connection: keep-alive and not with connection: close

timeout: 30000

Try setting this to 10000 or 5000

make sure you're replying with connection: keep-alive and not with connection: close

Ah, thanks, will verify that in the backend code. But even if I was, I should't be opening >1000 connections, right? Somehow the code isn't listening to the maxSockets-setting? Or is there any way I can verify this locally by logging?

Somehow the code isn't listening to the maxSockets-setting?

Not possible. I've studied Node.js HTTP code long hours. I double checked everything :P

Somehow the code isn't listening to the maxSockets-setting?

Not possible. I've studied Node.js HTTP code long hours. I double checked everything :P

Haha! :) But the fact that Azure shows >1000 failed SNAT port acquisitions proves to me (at least) that the app is trying to open more than 160 connections, right? Could it due to an old version of NodeJS (10.6) or old version of Got (9.6)? Or is the problem that I'm not setting the agent per connection?

Or is there any way I can verify this locally by logging?

https://runkit.com/szmarczak/5e5e5be815265f001327f3ef

But the fact that Azure shows >1000 failed SNAT port acquisitions proves to me (at least) that the app is trying to open more than 160 connections, right?

weird, I cannot say anything about this as I don't use Azure

Could it due to an old version of NodeJS (10.6)

Possibly, try using Node.js 12

old version of Got (9.6)

Well, that version is no longer developed so it may be a bug but I don't think so.

Or is the problem that I'm not setting the agent per connection?

there is no such thing like "agent per connection" :rofl:

Or is the problem that I'm not setting the agent per connection?

there is no such thing like "agent per connection" 馃ぃ

Hehe, sorry for being unclear. I mean is it enough to just do:

const gotClient = got.extend({
    headers: headers,
    agent: {
      http: httpAgentWithKeepAliveAndMaxSocketSet,
      https: httpsAgentWithKeepAliveAndMaxSocketSet
    }
  })

Or do I need to also pass the agent along with each request:

const response = await gotClient.get(url, {
       headers: headers,
       agent: agent,
        json: true
      })

The first one is fine, you don't have to do the latter, that's the point of the custom Got instances :)

Or is there any way I can verify this locally by logging?

https://runkit.com/szmarczak/5e5e5be815265f001327f3ef

Wow, thanks for taking the time to write that example! I see that it will log each time I create a connection. However, if I want to see the number of active connections, do I need to do some corresponding thing for closing connections and then have a global variable that I update?

the createConnection returns basically a net.Socket instance, it's in the Node.js docs

Or if I write a more open ended question: How would you, who has spent Node.JS HTTP for long hours ;), go about debugging this? Somehow it seems the MaxSockets setting is not being listened to?

(When I look in my process list I have about 13 node instances, but I'm assuming the extra 7 (6 are default, right?) are just extra threads for DNS lookups due to me setting UV_THREADPOOL_SIZE = 128. I was thinking if this somehow was a sign that there are multiple node instances doing requests...)

debugging this

https://github.com/GoogleChromeLabs/ndb is a great tool, I strongly recommend it :)

Somehow it seems the MaxSockets setting is not being listened to?

Try something like this: https://runkit.com/szmarczak/5e5e64d2c38f7e0013896198

First off I cannot begin to explain how thankful I am for you taking the time to help me! I've spent the last _days_ trying to fix this.

I tried your code and just updated the log line to show if you were connecting or disconnecting:

  let activeConnections = 0

  const createConnection = httpAgent.createConnection.bind(httpAgent)
  httpAgent.createConnection = (options, onCreate) => {
    const socket = createConnection(options, onCreate)
    console.log(`Connecting to ${options.hostname}:${options.port}`)

    socket.once('connect', () => {
      activeConnections++
      console.log(`Connect. Active connections: ${activeConnections}`)

      socket.once('end', () => {
        activeConnections--
        console.log(`End. Active connections: ${activeConnections}`)
      })
    })

    return socket
  }

The result kind of blows my mind (tried setting keepAlive to false during the tests)

Connecting to localhost:8070
Active connections: 2
Connecting to localhost:8070
Active connections: 3
Connecting to localhost:8070
Connecting to localhost:8070
Active connections: 1
Active connections: 1
Connecting to localhost:8070
Active connections: 1
Connecting to localhost:8070
Active connections: 1
Connecting to localhost:8070

How can the number go from 2 to 1, without printing the End log line!? Is this a sign that I'm instantiating new Got clients every time or something?

Is this a sign that I'm instantiating new Got clients every time or something?

Seems like you're creating a new Agent every time. Try not to pass any agent at all, it should use https.globalAgent:

https://runkit.com/szmarczak/5e5e6a5010e7d500137cb9ae

I found the problem!!! The problem had nothing to do with the Got-library or Azure - it was a bug in our code. The code snippet you wrote (I had to change socket.once('end'...) to socket.once('close'...) to make it work) helped me realise that we weren't reusing the connections, we weren't even reusing the same instance of the Got client!? In fact, we were creating a new instance of the Got library for every new connection 馃樋. This bug had been there since way before I started on this project, so I'm surprised we didn't get any problems until now.

I would probably have wasted (even more) days on this without your fantastic support @szmarczak. Is there anything I can do to show my appreciation? I am truly thankful for you taking the time to help a random software developer with his problems.

You can become a sponsor :sparkles:

Your issue made me wonder whether it's a Got bug or not, as your code example is totally valid. It means that you have read the documentation carefully. Some people tick "I have read the documentation" even though they had not. Issues can be of a good quality too, so it's best to provide as much information as possible.

Just made a donation over Paypal. Thanks again for your help!

Yep, I have received it. I am veeery grateful! :D

This issue was very helpful while debugging why each request was opening up a new connection for my Azure App Service. Thank you for your work here!

I was using https://github.com/node-modules/agentkeepalive and by using that library's getCurrentStatus method, I figured out that the destination server didn't have support for keepAlive because the open and closed socket counts matched with no free sockets left open after my load tests. Swapping in a node http global agent with that code above also showed that connections couldn't stay open and they were being closed. Other known servers supporting keepalive were producing expected results with connection pooling.

I then used curl -Iv to confirm and sure enough the connection was being closed each time.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

alanzhaonys picture alanzhaonys  路  4Comments

tkoelpin picture tkoelpin  路  3Comments

AxelTerizaki picture AxelTerizaki  路  3Comments

sindresorhus picture sindresorhus  路  3Comments

dominusmars picture dominusmars  路  3Comments