Objects deserialized by protocol buffers have very often very similiar life time and can strongly benefit from Arena allocation as in C++ version. (https://developers.google.com/protocol-buffers/docs/reference/arenas)
C# 7.0 introduces new powerful feature called ref returns and ref locals (https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/classes-and-structs/ref-returns) practically making structs a first class citizen.
Using arena allocation can strongly decrease pressure on the GC during deserialization (and together with some other perf-works can remove completely GC from the deserialization path).
This feature requires dramatic changes to the generated code - it requires to generate an additional struct for every message contract. Structs are required for custom memory management and we do not want to break compatibility and APIs for pre-existing code. Also there should be some conversion possible from structs to classes if user decides to keep his/her struct a bit longer (at least longer than arena lifetime).
I have created some early POC structs allowing to implement arena allocation for protocol buffers deserialization and they seem to do the job, however as I told I see no even remote possibility to do this on classes.
Are you interested in accepting such a PR?
There is also a decision to be made whether we would like to prefer performance: i.e. refs everywhere, limited safety checks over preventing developer from hurting himself by freeing arena and still trying to use some remaining refs.
Hi I had some time today and I have created a prototype implementation.
I have implemented VERY basic prototype using Address book contract.
From my very trivial "benchmarks"
~100KB size message
~250KB arena size
10000 repetitions (deserialization) including arena clearing
.NET Core 2.0
I can observe:
10-15% performance gain deserialization takes less time (time-wise) - even in a single threaded test (I can expect much more serious impact in multithreaded environment)
0 (!) Garbage collections
I was using a lot of managed memory. There is still a huge field for optimization. But current results look quite promising.
Data:
Original PB:
Elapsed: 25824ms GC0=1021 GC1=470 GC2=0 PeakMem=21815296
My PB:
Elapsed: 23422ms GC0=0 GC1=0 GC2=0 PeakMem=15925248
You can preview my prototype implementation here: https://github.com/mkosieradzki/protobuf/tree/arena-allocator
This is just a PoC!!!
@jskeet You might be interested in this one ;).
I'm interested, but I have very little time - protobuf is only a very small part of my work these days, I'm afraid. I don't know when I'll have time to take a close look at this.
Quick update: I have run the same test on a different computer in "Release" mode:
Elapsed: 6307ms GC0=989 GC1=459 GC2=0 PeakMem=21917696
Elapsed: 5383ms GC0=0 GC1=0 GC2=0 PeakMem=16003072
With compiler/JIT optimizations the difference is even bigger.
Update: I am simultaneously experimenting with the new language features (and System.Memory package):
https://github.com/mkosieradzki/protobuf/tree/arena-allocator-spans
However I am getting significant performance regressions there (hopefully due to the invalid compiler version). However I have succeeded in implementing version that can possibly enforce safety (and without using unsafe code outside of arena allocator).
Most helpful comment
Hi I had some time today and I have created a prototype implementation.
I have implemented VERY basic prototype using Address book contract.
From my very trivial "benchmarks"
~100KB size message
~250KB arena size
10000 repetitions (deserialization) including arena clearing
.NET Core 2.0
I can observe:
10-15% performance gain deserialization takes less time (time-wise) - even in a single threaded test (I can expect much more serious impact in multithreaded environment)
0 (!) Garbage collections
I was using a lot of managed memory. There is still a huge field for optimization. But current results look quite promising.
Data:
Original PB:
Elapsed: 25824ms GC0=1021 GC1=470 GC2=0 PeakMem=21815296
My PB:
Elapsed: 23422ms GC0=0 GC1=0 GC2=0 PeakMem=15925248
You can preview my prototype implementation here: https://github.com/mkosieradzki/protobuf/tree/arena-allocator
This is just a PoC!!!
@jskeet You might be interested in this one ;).