Aws-sdk-js: SQS consumer performance concern

Created on 22 Nov 2018  路  18Comments  路  Source: aws/aws-sdk-js

We are looking at the CPU usage of our workers, we have noticed that they use 100% of the available CPU. We have started running a profiler.

capture d ecran 2018-11-22 a 16 52 36

capture d ecran 2018-11-22 a 16 49 01

It feels like we are spending most of the time performing DNS lookups, TCP connections, TLS handshakes and AWS ACL verifications.

Is there a way to use the same agent, credentials between the repeative calls of
sqs.receiveMessage().promise() and sqs.deleteMessage().promise()?

guidance

All 18 comments

@oliviertassinari

How are you setting up sqs?

Is that the same client? Or are you re-instantiating that between calls?

We use the same client. So if I understanding it correctly, it's not the default expected behavior? I can share the source code.

SqsConsumer.js

import AWS from 'aws-sdk'
import config from 'config'
import log from 'modules/scripts/log'
import crashReporter from 'modules/crashReporter/common'
import { addTeardown, removeTeardown, isShuttingDown } from 'modules/process/handleKillSignals'
import Queue from 'modules/async/Queue'

export default class SqsConsumer {
  constructor(options) {
    const QueueUrl = options.queueUrl || config.get(`sqs.${options.topic}`)
    if (!QueueUrl) {
      throw new Error(`The following config is missing: sqs.${options.topic}`)
    }

    this.name = `sqsConsumer[${options.topic}]`
    this.sqs = new AWS.SQS({
      apiVersion: '2012-11-05',
      params: {
        MaxNumberOfMessages: 10,
        QueueUrl,
        WaitTimeSeconds: 5, // Long polling
      },
      region: config.get('aws.region'),
    })
    this.queue = new Queue(
      async ({ message, options: { handler } }) => {
        try {
          await handler(JSON.parse(message.Body))
          await this.sqs.deleteMessage({ ReceiptHandle: message.ReceiptHandle }).promise()
        } catch (err) {
          if (isShuttingDown() && err.signal === 'SIGINT') {
            return
          }

          crashReporter.captureException(err, {
            extra: {
              message,
            },
            tags: {
              scope: this.name,
            },
          })
        }
      },
      { concurrency: options.concurrency }
    )
    addTeardown((this.tearDown = { callback: this.destroy, nice: 3 }))
  }

  destroy = async () => {
    log.info({ name: this.name, msg: 'closing' })
    removeTeardown(this.tearDown)
    await this.queue.destroy()
    log.info({ name: this.name, msg: 'closed' })
  }

  consume = async options => {
    const { Messages: messages = [] } = await this.sqs.receiveMessage().promise()

    if (isShuttingDown()) {
      return
    }

    this.queue.push(messages.map(message => ({ message, options })))

    await this.queue.waitUntil({ predicate: 'almost-empty' })
  }

  consumeAll = async options => {
    log.info({ name: this.name, msg: 'consumeAll' })
    await new Promise(async (resolve, reject) => {
      try {
        while (!isShuttingDown()) {
          // eslint-disable-next-line no-await-in-loop
          await this.consume(options)
        }
        resolve()
      } catch (err) {
        reject(err)
      }
    })
  }
}

@oliviertassinari

Thanks for following up with this detail.

Does enabling keepAlive improve the performance?

this.sqs = new AWS.SQS({
  apiVersion: '2012-11-05',
  httpOptions: {
    agent: new http.Agent({
      keepAlive: true
    })
  },
  params: {
    MaxNumberOfMessages: 10,
    QueueUrl,
    WaitTimeSeconds: 5, // Long polling
  },
  region: config.get('aws.region'),
})

@srchase We will try that out next Monday :). Should it be https or it's not important? (for us, it's not sensitive)

No luck:

NetworkingError: Protocol "https:" not supported. Expected "http:"
at new ClientRequest (_http_client.js:118:11)
at Object.request (https.js:280:10)
at features.constructor.handleRequest (/onepixel/app/node_modules/aws-sdk/lib/http/node.js:42:23)
at executeSend (/onepixel/app/node_modules/aws-sdk/lib/event_listeners.js:333:29)
at Request.SEND (/onepixel/app/node_modules/aws-sdk/lib/event_listeners.js:347:9)
at Request.callListeners (/onepixel/app/node_modules/aws-sdk/lib/sequential_executor.js:102:18)
at Request.emit (/onepixel/app/node_modules/aws-sdk/lib/sequential_executor.js:78:10)
at Request.emit (/onepixel/app/node_modules/aws-sdk/lib/request.js:683:14)
at Request.transition (/onepixel/app/node_modules/aws-sdk/lib/request.js:22:10)
at AcceptorStateMachine.runTo (/onepixel/app/node_modules/aws-sdk/lib/state_machine.js:14:12)

Using an https agent doesn't help either.

@oliviertassinari

Yes, it should be https.

Here's a simplified example:

const AWS = require('aws-sdk');
const https = require('https');

const sqs = new AWS.SQS({
  region: 'us-west-2',
  httpOptions: {
    agent: new https.Agent({
      keepAlive: true
    })
  }
})

sqs.listQueues({}, (err,data) => {
  console.log(data)
})

@srchase Yes, thanks. I have tried with an https agent after seeing the http network error of node. However, the flame graph is the same.

@oliviertassinari

I should've asked earlier, but are you using Node? Or by 'worker' did you mean web worker?

What version of Node? What version of the SDK? Any other environmental details that would help for digging into profiling this?

@srchase We are using:

  • Node v10.12.0
  • aws-sdk v2.335.0
  • The node process is hosted by a Fargate on a Docker.

@oliviertassinari,

There were some recent changes to the credential provider that may help to improve the performance you're seeing, depending on what kind of credentials you are using.

Would you have time to re-test using the latest version of the SDK?

@srchase Yes, I can test it again Tuesday :).

This issue has been automatically closed because there has been no response to our request for more information from the original author. With only the information that is currently in the issue, we don't have enough information to take action. Please reach out if you have or find the answers we need so that we can investigate further.

@oliviertassinari , did it work? How did you solve this?

@thales-gaddini I have stopped using SQS.

If anyone else stumbles here as I did, the keepAlive fix worked for me.

Closing this issue now, please reach out if you have any further questions.

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs and link to relevant comments in this thread.

Was this page helpful?
0 / 5 - 0 ratings