Orleans: Best practices for IO intensive stateless workers

Created on 25 Dec 2017 · 2Comments · Source: dotnet/orleans

Suppose you have a message queue, and these messages are processed using stateless workers subscribing to a stream that all lead to some IO work.
The Orleans task scheduler will execute each worker activation code in turns, which is good in general but in case of heavily IO intensive work this will quickly become a bottleneck.

Quick math:
Say 40 activations with an average of 10ms IO wait for each message leads to 4000 message/sec throughput.
Though in practice I reached a far smaller number and that 10 milliseconds can be much higher.

One way would be to increase the number of workers. Let's say you have 100ms of IO wait. To make sure you respond in just about 100ms while having 1000 messages pushed in the queue every second, you will need 1000 activations. Let's say 10k, you'll need 100k activations and so on.
I understand having a lot of grains is kinda the point of Orleans but this scenario can be solved with much less overhead (almost shouting on "much") and is pretty common.

The other way is obviously to break out of Orleans task scheduler to run these jobs on another scheduler. But what if you need the result of that IO work?

I couldn't find any suggestions about this on the web. So I'm looking for a better approach to this problem here or maybe if lucky, support for non-turn-based-stateless-workers 😊 (in form of instructing the framework to run certain types of grains using another task scheduler, I would think 🤔 ).

_To be clear I'm talking about asynchronous IO here._

question

Source

alirezajm

Most helpful comment

I thought that would be the case with reentrant grains and I didn't see any performance gain while I tested it. It was probably something else causing the poor performance, will try to find the problem and benchmark again.
Thanks for the response 👍

alirezajm on 4 Jan 2018

👍2

All 2 comments

Say 40 activations with an average of 10ms IO wait for each message leads to 4000 message/sec throughput.

This is true if the activations are non-reentrant (which is the default). If you make them reentrant, they can process many more messages while awaiting the 10ms IO calls.