After much slacking, I finally got my Cassandra integration module in shape and put it on GitHub. The repo is here, and I would love it if someone would take a look at the source. We have been running a 1.5-compatible version of this source for a few months now, without errors so far, though it hasn't been under a lot of load. I'm hoping to eventually get the source into OrleansContrib.
This is awesome! I think it's better to put it to OrleansContrib right away. That should make it visible to more people for review and use.
Thanks! So (forgive the noob question, I'm an avid Microsoft fan and only ever used TFS up until now) how does one put a project on OrleansContrib?
No worries. Somebody will create a repo there for you and grant your permissions to it. Paging @richorama and @galvesribeiro who've done this for many projects already.
Hey @Arshia001, I have made you an 'owner' in the OrleansContrib org. You should now be able to transfer the ownership of your repo over to OrleansContrib (you should see this in the danger zone of your repo settings). You'll still have full control over it.
Ping me an email if you have any problems ([email protected])
It's done. Thank you.
Thank you, @Arshia001!
But now we have 2 implementations of Clustering and Persistence providers for Cassandra?
https://github.com/OrleansContrib/Orleans.Persistence.Cassandra and
https://github.com/OrleansContrib/Orleans.Clustering.Cassandra by @denisivan0v
https://github.com/OrleansContrib/OrleansCassandraUtils
Does it make sense to merge them somehow?
That's really great to have 2 implementations, it's much better than nothing :)
I've took a quick look at implementations by @Arshia001 and found out that we use different Cassandra driver API at the moment.
I'm really looking forward to having a full and solid support for Cassandra and will be happy to take best parts from both implementations. @Arshia001 can we make a call to discuss it?
@denisivan0v I'm open to any and all discussions. Naturally, I also looked through yours, and here's a quick comparison (I'll use "yours" and "mine" throughout this, because there's no simpler way to distinguish the two. I'll also place my own reasoning within brackets to keep it separate from the comparison itself):
That's about it I think. As for merging the two, I don't really believe it's possible, since we use such radically different approaches. I think the best course of action is to choose one, and expand it to include features from the other.
@denisivan0v any thoughts?
@Arshia001, sorry for delay and many thanks for the detailed comparison. Here are few comments:
Can I ask you to give a try to mine providers in your environment? You can take them here https://www.myget.org/feed/Packages/orleans-cassandra.
The project that I'm working on is going to prod in the early fall, so all missing features in providers for Cassandra will be implemented in the near future, and we will be highly focused on performance.
JSON is extremely wasteful in both space and performance. It's best to avoid it where possible.
I don't know why you think etags are slow. You just need an IF in your query. I don't think the performance impact is noticeable compared to serialization. Your current implementation defaults to ne ETag for all types, this will cause errors later on. You should at least assume all types need ETags by default IMO.
Mapping equals overhead, which is hardly necessary in this case.
Cross DC deployments don't need the cluster ID field. A cluster can span many datacenters.
I'll give it a try when I have time, but I don't know when that'll be.
JSON seems to be the recommended approach for storing state, since the data will be human readable. Maybe it's easier to change the state object when JSON is used too?
I remember reading somewhere that the team decided JSON wasn't a suitable option... @sergeybykov can we have your opinion?
I just did a little test. When used to serialize a relatively big grain (containing 1,000,000 records, each with a GUID and a ulong), JSON is ~43% slower and the resulting data is ~172% larger. For JSON serialization, I'm using Newtonsoft.Json. For binary, I'm using a modified version of Bond, which allows me to specify a custom serialization routine for any type. I also used each serializer once as warm-up before measuring the time. The results are as follows:
I also tried deserializing the same data (which is originally a skip list) with these results:
Version tolerance is an important feature you should consider, which is a benefit of JSON.
@richorama version tolerance is also a feature of some binary serializers.
@Arshia001 actually, as far as I can tell, the Orleans team now actively discourages using binary serialization and recommends JSON. It's already been mentioned above that this allows people to read that data, but the additional benefit of that is the ability to analyse and maintain (e.g. patch) that data where needed.
To me it looks like the Orleans team recommends something that is versionable.
I think that no matter what is done, a refactoring (changing class name, namespace, whatnot) can make the serialized data non-backwards compatible what comes to tooling and requires a human transformation inserted into the pipeline. This can be whatever custom code run to make the transformations to succeed. It could function so that the transformation function takes as parameter the cluster ID, grain ID, grain type and data and the developer can use that information to make the transformation while the system has data that needs transforming (also in-storage transformations could be done).
The (de)serialization could also work so that one could use arbitrary (de)serializaers and choose them with arbitrary parameters, like in case of transformations. It would be helpful if one could, as with transformations, change also the serialization format.
There is some prior work to this in Ado.Net provider. See for instance https://github.com/dotnet/orleans/blob/master/src/AdoNet/Orleans.Persistence.AdoNet/Storage/Provider/OrleansStorageDefaultJsonDeserializer.cs, which implements a canonical interface to load in a JSON deserializer. This is pre-2 DI system, but the idea then was to allow user to wrap any (de)serializer and the system would be happy to use it. Ado.Net currently supports JSON, XML and binary, all stored in their special respective field types if available (in relational there's some extra to be gained when using "native types"). It does have the other mentioned features and even a test for change of serialization formation. A bit crude, but prior art is there if there's interest for you guys to work towards a common ground on this. :)
Again, version tolerant binary is possible and already available. If viewing the stored data is a requirement, one can implement a viewer utility. My storage module already supports custom serializers, and I'm using it together with Bond in production. I can put the code somewhere and we can all have a look once I'm back home.
@Arshia001 It might make sense to come up with a system that looks like the same in both systems. My point wasn't solely about version tolerant binary, but plugging in any (de)serializer one thinks is called for. And if necessary, using even per-grain basis (I'm thinking of avoiding all sorts of format transformation overhead).
@veikkoeeva Per-grain storage selection is already supported in Orleans, I don't know how beneficial it'd be to also support at the storage provider level. As for plugging in serializers, my storage module already does that, and also supports Orleans' default serializer as a fallback.
Anyway, here's the serializer I was talking about. It's version-tolerant, though a bit of manual work is required. You just assign new IDs to new data members and remove old ones. The rest is handled by Bond. I'm using it in production, and it works really well, but the source is a mess. You've been warned XD
@Arshia001 Now that we have https://github.com/OrleansContrib/OrleansCassandraUtils as well as https://github.com/OrleansContrib/Orleans.Clustering.Cassandra and https://github.com/OrleansContrib/Orleans.Persistence.Cassandra in OrleansContrib, should we close this issue? IIRC there was a discussion about merging these three repos into one, but that's a topic for a separate issue I think.
@sergeybykov Yes, I agree. I'm still open to discussions and ready to help with integrating the modules too.
@Arshia001 Thanks for confirming.