Azure-sdk-for-js: [Service Bus] Performance Testing of the library

Created on 18 Mar 2019  路  13Comments  路  Source: Azure/azure-sdk-for-js

We need to test the performance of the send and receive operations provided by the @azure/service-bus library.

Compare say the time taken to send and receive a thousand messages with the established expectations from the corresponding library in other languages

Client Service Bus

Most helpful comment

@mikeharder The link to the msdn blog you provided is broken, can you share the link again?

@ramya0820 For performance tests, I would concentrate on the number of messages that we send/receive and compare the time taken (with other sdks) rather than long running tests.

Some variations I can think of

  • Send 10k messages using send()
  • Receive 10k messages using streaming receiver in PeekLock mode with maxConcurrency set to 1, 10, 100, 1000
  • Receive 10k messages using streaming receiver in ReceiveAndDelete mode with maxConcurrency set to 1, 10, 100, 1000
  • Receive total of 10k messages in PeekLock mode using receiveBatch(n) where n is 10,100, 1k, 2k
  • Receive total of 10k messages in ReceiveAndDelete mode using receiveBatch(n) where n is 10,100, 1k, 2k
  • Peek() 10k times

We already know via https://github.com/Azure/azure-sdk-for-js/issues/1389#issuecomment-476783383, that the performance between the 2 modes (PeekLock and ReceiveAndDelete) are not the same and it is worth our time to dig into this.

All 13 comments

This is directly relevant to purpose of load and stress tests identified in - https://github.com/Azure/azure-sdk-for-js/issues/1478

If we want to setup long running tests with minimal load, I suggest we refer to them as maybe 'Durable Tests' for clarity as that is not same as stress/load tests.
As pointed out earlier, I think 'Performance Tests' is an ambiguous term as it basically refers to capturing how the app (here, SDK) performs under different kinds of situations/request patterns (which is objective of all non-functional tests) ?

@ramya-rao-a @AlexGhiondea @mikeharder Thoughts?

@mikeharder The link to the msdn blog you provided is broken, can you share the link again?

@ramya0820 For performance tests, I would concentrate on the number of messages that we send/receive and compare the time taken (with other sdks) rather than long running tests.

Some variations I can think of

  • Send 10k messages using send()
  • Receive 10k messages using streaming receiver in PeekLock mode with maxConcurrency set to 1, 10, 100, 1000
  • Receive 10k messages using streaming receiver in ReceiveAndDelete mode with maxConcurrency set to 1, 10, 100, 1000
  • Receive total of 10k messages in PeekLock mode using receiveBatch(n) where n is 10,100, 1k, 2k
  • Receive total of 10k messages in ReceiveAndDelete mode using receiveBatch(n) where n is 10,100, 1k, 2k
  • Peek() 10k times

We already know via https://github.com/Azure/azure-sdk-for-js/issues/1389#issuecomment-476783383, that the performance between the 2 modes (PeekLock and ReceiveAndDelete) are not the same and it is worth our time to dig into this.

@ramya-rao-a: Updated link to new Service Bus blog location.

I was able to get some results already for send() with the .NET client:

Service MU | Language | App | Client Version | MessageSizeInBytes | MaxInflight | Send (msg/s) | CPU
-- | -- | -- | -- | -- | -- | -- | --
2 | .NET | ServiceBusClientPerf | 3.3.0 | 1024 | 1000 | 13419 | 80%

I don't have results for JavaScript yet, but it should be fairly easy to create a JavaScript version of my .NET test client:

https://github.com/mikeharder/ServiceBusClientPerf/tree/master/net/ServiceBusClientPerf/ServiceBusClientPerf

I will try to get results for JavaScript send() later today.

I was looking at writing a test case for the bug https://github.com/Azure/azure-sdk-for-js/issues/1611 i.e. ensure that we can send/receive more than 2048 messages (2028 being the rhea buffer limit). I soon realized that having a live test for this in our CI is not reliable. The test passes in 5 seconds some times, other times takes more than 2 mins.

The perf tests for scenarios I mentioned above can cover this case and it would be a good idea to run these in a particular cadence.

Next steps:

  1. Investigate why the rhea-promise implementation is so much faster than the service-bus implementation. Is the rhea-promise implementation an apples-to-apples comparison? Is it correctly enforcing the maximum number of inflight messages?

  2. Add tests for receive and peek scenarios.

Is the rhea-promise implementation an apples-to-apples comparison

I took another look at both the test samples, there is one key difference between the 2

The service bus sample waits for a message to be successfully sent before sending the next message.
The rhea-promise doesnt do the same. It fires send requests one after the other, not caring for whether the request completed or not.

Updating service bus source code to not care about the success/failure of the send operation gives same perf as rhea-promise.
Updating the rhea-promise sample to wait for success/failure of the send operation gives the same perf service-bus

So, no, the current way of comparing service bus and rhea samples is not right.
With the above correction, both fare the same without much difference.

Found a key difference when comparing the .net test code with service bus as well.

Each sender in .net gets its own AMQP connection and therefore, its own link

But the test code for the Nodejs SDK is re-using the same sender link across each "flight"

To get similar comparision with .net, we need to update the js sample code to use separate ServiceBusClient in each flight to ensure each flight gets its own AMQP connection

@mikeharder Let me know if my understanding of the .net test code is wrong

For service-bus vs rhea-promise, I think the rhea-promise implementation should be updated to wait for success/failure. @ramya-rao-a @HarshaNalluru : Can one of you create a PR to update rhea-promise in this way, and please add me as a reviewer?

For JS vs .NET, I created a different implementation in .NET which should be an apples-to-apples comparsion with JS. I will create a PR to move the .NET implementation from my personal GitHub to the .NET monorepo. Here is the code to create the senders in each case:

https://github.com/mikeharder/ServiceBusClientPerf/blob/master/net/Program.cs#L37
https://github.com/Azure/azure-sdk-for-js/blob/master/sdk/servicebus/service-bus/test/perf/service-bus/send.ts#L50

I assumed the best practice was to use a single sender instance for the whole process, similar to the guidance for say HttpClient in .NET. If this is not the best practice we would recommend to customers for optimal performance, then I agree we should change both the JS and .NET implementations.

@ramya-rao-a
Updated the rhea-promise sample to wait for success/failure of the send operation in my branch.
https://github.com/Azure/azure-sdk-for-js/compare/master...HarshaNalluru:StressPerfRepro?expand=1

Updating the rhea-promise sample to wait for success/failure of the send operation gives the same perf service-bus

I see rhea-promise sample performing better.
Am I missing something?

@HarshaNalluru: Can you create a PR for this change so I can comment on it? Also, can you please post the numbers you are seeing before after this change in the PR? I can run the tests on my own VM to confirm your results.

@HarshaNalluru Maybe because your sample is not using promises?
Below is what I tried inside ExecuteSendsAsync i.e. the same things that we do in service bus

  while (++_messages <= messages) {
    while (!sender.sendable()) {
      await delay(0.01);
    }

    let onRejected: any;
    let onReleased: any;
    let onModified: any;
    let onAccepted: any;

    const removeListeners = () => {
      sender.removeListener(SenderEvents.accepted, onAccepted);
      sender.removeListener(SenderEvents.rejected, onRejected);
      sender.removeListener(SenderEvents.released, onReleased);
      sender.removeListener(SenderEvents.modified, onModified);
    };

    if (sender.sendable()) {
      const sendPromise = new Promise((resolve) => {
        onRejected = (context: EventContext) => {
          removeListeners();
          resolve();
        };
        onReleased = (context: EventContext) => {
          removeListeners();
          resolve();
        };
        onModified = (context: EventContext) => {
          removeListeners();
          resolve();
        };
        onAccepted = (context: EventContext) => {
          removeListeners();
          resolve();
        };

        sender!.on(SenderEvents.accepted, onAccepted);
        sender!.on(SenderEvents.rejected, onRejected);
        sender!.on(SenderEvents.modified, onModified);
        sender!.on(SenderEvents.released, onReleased);
        sender.send({ body: _payload });
      });

      await sendPromise;
    }
  }

Closing this issue as we plan to revisit the perf & stress testing story for Service Bus this month, potentially based on the common perf & stress infrastructure.

Was this page helpful?
0 / 5 - 0 ratings