Orleans: Add Streams on top of relational database

Created on 22 Jul 2015  ·  11Comments  ·  Source: dotnet/orleans

Implementing an Orleans Stream Provider on top of a relational backend has come up as a subject a few times in Gitter, in https://github.com/dotnet/orleans/issues/623 and is connected to https://github.com/dotnet/orleans/issues/255. We could open an official issue about it as queuing, and streams, is a common scenario and especially on-premises development happens largely on top of relational backends. The following ideas are applicable to all relational databases, but to keep this simple, let’s assume SQL Server for the time being.

As mentioned in https://github.com/dotnet/orleans/issues/623 already, there’s an interesting article Using tables as Queues by _Remus Rusanu_. The ideas presented about FIFO queues look like sound plan to implement Orleans Streams of interest are the ordering guarantees. However, it might be prudent to explore the use of ANSI SQL 2003 SEQUENCE. This feature is widely available in various relational databases and also on SQL Server 2012 and later.

As of myself, I don't have a more concrete plan or schedule on starting this (though I've expressed interest), but a rough idea for the SEQUENCE part could be to mimick a ring buffer on a table. The table could something like following (might be inaccurate with regard to the needs of Orleans Streams, but the gist of it)

CREATE TABLE StreamRingBuffer
(
    [SequenceToken] BIGINT NOT NULL,
    [Slot] BIGINT NOT NULL,
    [StreamId] NVARCHAR(256),
    [EnqueuedOn] DATETIME2(7) NOT NULL,
    [ReferenceCount] TINYINT NOT NULL,
    [Payload] VARBINARY(MAX)
);

Here sequence token would be generated like

CREATE SEQUENCE EnqueueSequence AS BIGINT START WITH 1 INCREMENT BY 1 CACHE 500000;
SET @sequenceToken = NEXT VALUE FOR EnqueueSequence;
SET @slot = @sequenceToken % 500000;

This table could, or better perhaps should, be prefilled to avoid constant CPU churn on statistics, index updating, page allocation churn etc. If a variable size is needed, perhaps a feasible strategy would be to add a trigger that allocates a bigger table (as a note, perhaps doubling always the size, cf. amortization).

If I were to implement this, I would continue on the path of adding a few new queries to the database and use the same idea in code as with other tables.

enhancement hacktoberfest help wanted

All 11 comments

I want to point Super Scaling Queues Using the LMax Disruptor Pattern which discussed apparently the same idea, puts out some improvements and most of all, has some very detailed analysis.

<edit: Also Super Scaling Queues Using the LMax Disruptor Pattern And The In-Memory OLTP Engine.

Has anyone considered SQL Server Service Broker functionality as a somewhat drop-in implementation? I'm sure there is a lot that I'm missing. Just curious at this point if the idea has been floated around before.

https://docs.microsoft.com/en-us/sql/database-engine/configure-windows/sql-server-service-broker?view=sql-server-2017

@normanhh3 I think so... The problem was the support for Cloud offerings like Azure SQL who doesn't have it...

Plus the performance is -- or has been at least -- really, really abysmal. @normanhh3 You'd like to have this feature? :)

If performance is an issue, then maybe not something I am interested in. I would be interested to hear of anyone trying it out though.

We discussed this a while back. I think the solution for that would be to use In-memory tables in OLtP engine with a very thin native-compiled procedure to insert and pool messages.

In orleans side, the provider would need logic for partition messages and pool them using an exponential backoff mechanism just like in EventHubs provider.

The problem with that approach is that you would need to foresee the partition number so you can create one pooling agent per partition. For some use cases it would be more than enough, but for other very hight throughput ones, it may be lead to contentions as the “lock” on the partition for a single reader(pooling agent) would not allow multiple readers to consume them in parallel.

@normanhh3 The performance isn't necessarily an issue, see the initial sketch of an idea using a ring-buffer like layout and the two linked articles which explore a similar one and take it even further.

What @galvesribeiro has mentions has been discussed also. You may not know this, @normanhh3, so a bit of a background, the partition number isn't the only issue, others are basically that the features aren't widely available for erveryone. But the system has been designed so that one can change the layout and queries used as long as the interface invariants hold, i.e. the queries are keyed, have an id, and use certain types of parameters with certain names. This would allow one to use more performant features when available and then also not. This is a bit done already in Storage (table) in SQL Server and even more when there was still support fro SQL Server 2000 (it required more changes).

The way I see forwards would be to build the Orleans side code and produce a performant general database layout with queries and possibly a special case one using in-memory tables. Or, say, someone doing a in-memory layout with compiled procedures and then adding a general one. One just needs to take care the interface between Orleans and the DB isn't unnecessarily constrained to some specific features.

There isn't specifically anything why one couldn't also modify the structures when siloes are working, like through the management system or through the injected storage drivers. I think this should be possible now with the new configuration system. One particular place I see this useful is that if I want to partition (as in have a new filegroup, maybe schemas, to split Storage into Storage1, Storage2 etc. by some parameters, which can be types too) dynamically (and maybe grant the Orleans process only temporarily the rights to do such things on the DB). In fact, this sort of splitting and using a different schema and filegroup, maybe also encrypt data, would be a useful measure from the GDPR perspective to split sensitive data. Albeit it could be done otherwise too.

It just happens that when I had a bit more time, the code-side was in the flux so I waited a bit, I might have time this summer, I'm not sure, it depends how much there's on the code-side to work (which files to look at, @jason-bragg ?). Maybe what I'm trying to tell too is that this isn't too difficult, but I've got myself mixed to some busy day jobs and then some startup stuff. :)

Thank you for the detailed conversation, the relevant links and generally
enlightening discussion.

This has proved useful already.

Tangential question, what is the philosophy for creation of DB-specific
optimizations? For instance, indexes on the Storage table.

Our schema version appears to have not included any indexing. Is that just
considered a local optimization?

On Sun, Jun 17, 2018, 03:33 Veikko Eeva notifications@github.com wrote:

@normanhh3 https://github.com/normanhh3 The performance isn't
necessarily an issue, see the initial sketch of an idea using a ring-buffer
like layout and the two linked articles which explore a similar one and
take it even further.

What @galvesribeiro https://github.com/galvesribeiro has mentions has
been discussed also. You may not know this, @normanhh3
https://github.com/normanhh3, so a bit of a background, the partition
number isn't the only issue, others are basically that the features aren't
widely available for erveryone. But the system has been designed so that
one can change the layout and queries used as long as the interface
invariants hold, i.e. the queries are keyed, have an id, and use certain
types of parameters with certain names. This would allow one to use more
performant features when available and then also not. This is a bit done
already in Storage (table) in SQL Server and even more when there was
still support fro SQL Server 2000 (it required more changes).

The way I see forwards would be to build the Orleans side code and produce
a performant general database layout with queries and possibly a special
case one using in-memory tables. Or, say, someone doing a in-memory layout
with compiled procedures and then adding a general one. One just needs to
take care the interface between Orleans and the DB isn't unnecessarily
constrained to some specific features.

There isn't specifically anything why one couldn't also modify the
structures when siloes are working, like through the management system or
through the injected storage drivers. I think this should be possible now
with the new configuration system. One particular place I see this useful
is that if I want to partition (as in have a new filegroup, maybe schemas,
to split Storage into Storage1, Storage2 etc. by some parameters, which
can be types too) dynamically (and maybe grant the Orleans process only
temporarily the rights to do such things on the DB). In fact, this sort of
splitting and using a different schema and filegroup, maybe also encrypt
data, would be a useful measure from the GDPR perspective to split
sensitive data. Albeit it could be done otherwise too.

It just happens that when I had a bit more time, the code-side was in the
flux so I waited a bit, I might have time this summer, I'm not sure, it
depends how much there's on the code-side to work (which files to look at,
@jason-bragg https://github.com/jason-bragg ?). Maybe what I'm trying
to tell too is that this isn't too difficult, but I've got myself mixed to
some busy day jobs and then some startup stuff. :)


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/dotnet/orleans/issues/634#issuecomment-397860507, or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABj7xLMTdxG_5B-inRrvbQietMNWslyAks5t9gZWgaJpZM4FdmiM
.

@normanhh3 It's written in the scripts: everything goes as long as the names and types in queries remain the same. You can add indices, for example, at your leisure. Do you have specific index in mind for Storage?

Cross-referencing due to expected simplification in Orleans side code: https://github.com/dotnet/orleans/pull/6711.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

bobanco picture bobanco  ·  3Comments

gabikliot picture gabikliot  ·  4Comments

SebastianStehle picture SebastianStehle  ·  4Comments

luciobemquerer picture luciobemquerer  ·  4Comments

guopenglun picture guopenglun  ·  3Comments