Protobuf: [CSharp] Consider implementing arena allocation in C#

Created on 19 Aug 2017  路  4Comments  路  Source: protocolbuffers/protobuf

Background

Objects deserialized by protocol buffers have very often very similiar life time and can strongly benefit from Arena allocation as in C++ version. (https://developers.google.com/protocol-buffers/docs/reference/arenas)

C# 7.0 introduces new powerful feature called ref returns and ref locals (https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/classes-and-structs/ref-returns) practically making structs a first class citizen.

Benefits

Using arena allocation can strongly decrease pressure on the GC during deserialization (and together with some other perf-works can remove completely GC from the deserialization path).

Drawbacks

This feature requires dramatic changes to the generated code - it requires to generate an additional struct for every message contract. Structs are required for custom memory management and we do not want to break compatibility and APIs for pre-existing code. Also there should be some conversion possible from structs to classes if user decides to keep his/her struct a bit longer (at least longer than arena lifetime).

POC

I have created some early POC structs allowing to implement arena allocation for protocol buffers deserialization and they seem to do the job, however as I told I see no even remote possibility to do this on classes.

Open questions

Are you interested in accepting such a PR?

There is also a decision to be made whether we would like to prefer performance: i.e. refs everywhere, limited safety checks over preventing developer from hurting himself by freeing arena and still trying to use some remaining refs.

c# customer issue

Most helpful comment

Hi I had some time today and I have created a prototype implementation.

I have implemented VERY basic prototype using Address book contract.

From my very trivial "benchmarks"
~100KB size message
~250KB arena size
10000 repetitions (deserialization) including arena clearing
.NET Core 2.0

I can observe:
10-15% performance gain deserialization takes less time (time-wise) - even in a single threaded test (I can expect much more serious impact in multithreaded environment)
0 (!) Garbage collections

I was using a lot of managed memory. There is still a huge field for optimization. But current results look quite promising.

Data:
Original PB:
Elapsed: 25824ms GC0=1021 GC1=470 GC2=0 PeakMem=21815296
My PB:
Elapsed: 23422ms GC0=0 GC1=0 GC2=0 PeakMem=15925248

You can preview my prototype implementation here: https://github.com/mkosieradzki/protobuf/tree/arena-allocator

This is just a PoC!!!

@jskeet You might be interested in this one ;).

All 4 comments

Hi I had some time today and I have created a prototype implementation.

I have implemented VERY basic prototype using Address book contract.

From my very trivial "benchmarks"
~100KB size message
~250KB arena size
10000 repetitions (deserialization) including arena clearing
.NET Core 2.0

I can observe:
10-15% performance gain deserialization takes less time (time-wise) - even in a single threaded test (I can expect much more serious impact in multithreaded environment)
0 (!) Garbage collections

I was using a lot of managed memory. There is still a huge field for optimization. But current results look quite promising.

Data:
Original PB:
Elapsed: 25824ms GC0=1021 GC1=470 GC2=0 PeakMem=21815296
My PB:
Elapsed: 23422ms GC0=0 GC1=0 GC2=0 PeakMem=15925248

You can preview my prototype implementation here: https://github.com/mkosieradzki/protobuf/tree/arena-allocator

This is just a PoC!!!

@jskeet You might be interested in this one ;).

I'm interested, but I have very little time - protobuf is only a very small part of my work these days, I'm afraid. I don't know when I'll have time to take a close look at this.

Quick update: I have run the same test on a different computer in "Release" mode:
Elapsed: 6307ms GC0=989 GC1=459 GC2=0 PeakMem=21917696
Elapsed: 5383ms GC0=0 GC1=0 GC2=0 PeakMem=16003072

With compiler/JIT optimizations the difference is even bigger.

Update: I am simultaneously experimenting with the new language features (and System.Memory package):
https://github.com/mkosieradzki/protobuf/tree/arena-allocator-spans

However I am getting significant performance regressions there (hopefully due to the invalid compiler version). However I have succeeded in implementing version that can possibly enforce safety (and without using unsafe code outside of arena allocator).

Was this page helpful?
0 / 5 - 0 ratings