Protobuf: C# ByteString, CodedInputStream performance improvements desired for messages with large byte array field

Created on 21 Jan 2018 · 17Comments · Source: protocolbuffers/protobuf

Currently if you have a message with an array of bytes field that translates into a ByteString immutable object.
ByteString can't be created without incurring a GC memory allocation and an array copy.
When you deserialize a ByteString using CodedInputStream, the same penalty is paid.
Furthermore, when you then go to consume the bytes you received you pay this same penalty yet again because of ByteString.ToByteArray().

In my application these penalties are adding up to excessive GC pressure and memory copies which is reducing my message throughput substantially. This leaves me some unatractive options for my .NET C# application, either use C++ protobuffs communication layer or send my large byte array messages natively without using protobuffs.

It would be really nice if you allowed an option to have more fine grained control over the "bytes" field such that the code generator would spit out just a native byte[] type field or a new class (i.e. UnsafeByteString) which bypasses the above mentioned penalties and user takes on full responsibility for the fact that it is a mutable unsafe byte array.
CodedInputStream also needs to provide an option for user to inject his own byte array allocator so that user application can have fine grained control over byte array memory allocations to minimize GC penalties.

I think this is really important to allow for truly high performance C# protobuf apps.

Source

d79ima

👍19

Most helpful comment

I wrote a PR that allows a ByteString to be created without copying the data. It matches what is available in Java - https://github.com/protocolbuffers/protobuf/pull/7645

No progress on merging it right now.

JamesNK on 6 Oct 2020

👍2

All 17 comments

Somewhat related gRPC issue: https://github.com/grpc/grpc/issues/14378

jtattermusch on 26 Feb 2018

I would want to have support for unsafe ByteString that would support setting it's length as well. When sending chunked data, last chunk is usually smaller.

@d79ima are you are aware that if you allocate 85000 bytes or more in .NET Framework allocation goes directly to LOH (Large Object Heap) and skips GC generations? It will add lot of GC pressure and LOH can only be compacted manually. So if you want to send binary data it should done in less than 85000 byte chunks.

wanton7 on 23 May 2018

I'm seconding this wish for a garbage free api.

dpsenner on 16 Feb 2019

+1, we are observing the same problem. How do we move forward on this?

smflorentino on 17 May 2019

https://github.com/protocolbuffers/protobuf/pull/5888

jtattermusch on 18 May 2019

A simple enhancement that would address most of this issue is if the C# gpc library had a way to pool no longer needed ByteString (like Go's sync.Pool) and so then could be internally reused. This would prevent the crazy firehose of ByteStrings constantly being created that the GC has to chase.

If I were implementing it, I'd make 4-8 size ByteStrings pools/buckets internally so the internal faux-allocator chooses from a pool such that waste isn't excessive. And there could be a manual polling call to drain them or a periodic task that slowly drains 1-5% of each bucket every 10-60 secs. That way no timestamps/tracking is needed, it covers the lion's share of cases, and there are no freak/worst-case scenarios. That extension could be added and tested in a couple days by one person. I'd volunteer to do if someone here told me there was a good chance of the enhancement being picked up.

Please help us Protobuf heroes! We absolutely love protobufs and we're astonished they're not used more. I do think that will change as interoperability is recognized as more important.

Drew

drew-512 on 26 Jun 2019

I believe the way forward is implementing https://github.com/protocolbuffers/protobuf/pull/5888 (which allows parsing messages directory from a ReadOnlySequence).

For avoiding excessive allocations in ByteString, that's a separate problem and we'd need to come up with a separate design. The main problem of using array pools with ByteString is that ByteString wasn't designed to be disposable, so there's no way of expressing that the rented array can be returned to the pool.

jtattermusch on 28 Jun 2019

👍1

Hi!

Maybe I'm not following correctly, but my vision was that a gpb consumer would explicitly drop off ByteStrings that are known to be no longer used (Return() in C# parlance). For example, we are processing a firehose of messages all the time loaded w/ ByteStrings, but after we unmarshal them, we would return those (so grpc/gpb could reuse them). Sure, that reuse will waste some here and there (as an alloc request needs 800 and the bucket it draws from the 1024 bucket), but w/ that bucket approach, most waste is addressed and the GC workload would vanish. It's a totally worthy trade imo to drop the GC workload in exchange for ByteStrings wasting part of their allocation. Basically, if we all agree a ByteString is basically a temp/scrap object, then lets treat them that way and assume ppl will extract their data and Return them to the pool if they care about performance.

Sure, I love the idea of 5888, but it seems to be a larger bolder ask of development, whereas adding a ByteString Return() call that funnels out into 6-12 ArrayPool buckets seems doable, no-risk, and poses no burden on people who don't use the enhancement.

Hopefully I'm making sense or maybe I'm missing what you're saying.

drew-512 on 28 Jun 2019

Would something similar to Stream.Read(byte[], int, int) work?

https://docs.microsoft.com/en-us/dotnet/api/system.io.stream.read

dpsenner on 29 Jun 2019

I'm having trouble imagining some API additions for ByteString that would make it possible to manage them more efficiently:

the most efficient future way of parsing protobufs is going to be from "ReadOnlySequence" (as mentioned above), but ReadOnlySequence is an ephemeral data source and in many implementations, it will become invalid once parsing the message is done (so we cannot remember the "reference" for the input data and make ByteString read from there).
currently ByteString is not disposable, so renting an array from a pool and returning it back once not needed is problematic, because it's not clear when to return the rented array from the pool and also messages are recursive and they don't have a concept of disposing (and traversing messages to dispose all the byteStrings seems like a very clumsy approach).

Because ReadOnlySequence is ephemeral, the only approach I could see is somehow making ByteString parsing customizable (an advanced API) and optionally make ByteString only remember a Slice of the input ReadOnlySequence (which will become invalid as soon as the input ROS becomes invalid). After that, the user would need to make sure to extract the data from byteString themselves before the input ROS is invalidated. This could be useful in some high-performance scenarios, but it's sound like an api that's very specialized and hard to use, so I'm not a fan.

jtattermusch on 8 Jul 2019

I think there are two scenarios for creating ByteString:

Deserialization via CodedInputStream/CodedInputReader. The ByteString is being created from a byte[]/ReadOnlySequence<byte> buffer. Because the buffer is owned by someone else, e.g. web server, and it is likely to be reused, we have to create a copy of the data inside the ByteString. The best way to avoid allocations here is using an array pool, but there is no way to track message lifetime. A possible way to solve this problem is to provide an array lifetime management API to the CodedInputXXX. It will handle providing arrays used by ByteStrings it creates out of a pool. Those arrays are added to a list. At the end of its lifetime scope, e.g. a web request, the lifetime management API will release all the arrays in the list.
Manually creating the ByteString for serialization via CodedOutputStream/CodedOutputWriter. In this situation the developer owns the byte array, and can choose how to create the ByteString. To avoid allocations/copies I think there should be a new way to create ByteString: ByteString.FromMemory(ReadOnlyMemory<byte> bytes). This will simply set an internal ROM<byte> field. There are existing internal methods like this on ByteString, although they are internal:

https://github.com/protocolbuffers/protobuf/blob/6263268b8c1b78a8a9b65acd6f5dd5c04dd9b0e1/csharp/src/Google.Protobuf/ByteString.cs#L57-L78

To emphasis its unsafe nature, the name could alternatively be ByteString.Unsafe.FromMemory(ReadOnlyMemory<byte> bytes).

Adding this new way of creation would sacrifice the current safety ByteString has of it always being immutable. I don't know if this is important or not. If it is, perhaps ByteString could have a flag on it to identify whether it is an "unsafe" or not?

#1 complex to do.
#2 is pretty simple.

@davidfowl @JunTaoLuo @shirhatti

JamesNK on 19 Dec 2019

👍1

It's it possible to codegen a different type besides ByteString for the unsafe "optimized" variant?

davidfowl on 19 Dec 2019

Is there any progress / workaround for this issue?

I belive that, as @JamesNK mentioned, new public Unsafe method should solve this problem.

cc @jtattermusch

Ursanon on 6 Oct 2020

I wrote a PR that allows a ByteString to be created without copying the data. It matches what is available in Java - https://github.com/protocolbuffers/protobuf/pull/7645

No progress on merging it right now.

JamesNK on 6 Oct 2020

👍2

When will the change arrive? For gRPC messages with large payloads (read images) a GC friendly design is a must.