Protobuf: Why should we use arena?

Created on 22 Feb 2018 · 9Comments · Source: protocolbuffers/protobuf

As far as I know, Arena is not a memory pool which can reuse allocated memory by maintain a freelist, it just cache more and more memory when create message with arena; Isn't google tcmalloc a better and straighter way to improve overall performance? I just want to take the advantage of network transmission with protobuf.

c++ question

Source

little-bird-in-china

Most helpful comment

Right now using arena with opensource protobuf doesn't gain you much, but inside Google, we have seen massive improvement by adopting arena. I think protobuf arena has two advantages that tcmalloc can't offer:

The ability to deallocate a entire proto message tree in big chunks. With arena, it's possible to allocate everything in a proto message tree in one or several bulk chunks of memory. When you are done with the message, you just need to deallocate these few large chunks. Without arena, deleting a proto message tree will result in numerous small delete calls for every single small object a proto may hold. Basically we have the ability to skip all the desctructor calls with arena which we can't do otherwise.
Better locality. With protobuf arena, objects belong to the same proto message tree are put in adjacent memory whereas tcmalloc doesn't know whether an object is part of a proto message tree and is likely interleave protos with non-protos.

I think the most benefit we saw is from (1). This unfortunately isn't the case with opensource protobuf because all string fields are not allocated in the arena. Internally we have a hack to allocate something that looks like a string in the arena and cast it to a string with accessed, but that isn't portable. We also don't have ctype=STRING_PIECE support in opensource which can help with the issue. I know there are some users using arena with their own patch to implement ctype=STRING_PIECE. I don't think arena will can be widely used until we address the string issue.

xfxyjwf on 23 Feb 2018

👍6

All 9 comments

The ability to deallocate a entire proto message tree in big chunks. With arena, it's possible to allocate everything in a proto message tree in one or several bulk chunks of memory. When you are done with the message, you just need to deallocate these few large chunks. Without arena, deleting a proto message tree will result in numerous small delete calls for every single small object a proto may hold. Basically we have the ability to skip all the desctructor calls with arena which we can't do otherwise.
Better locality. With protobuf arena, objects belong to the same proto message tree are put in adjacent memory whereas tcmalloc doesn't know whether an object is part of a proto message tree and is likely interleave protos with non-protos.

xfxyjwf on 23 Feb 2018

👍6

@xfxyjwf Thanks for you quickly reply, I still have two more question in my scene, a server holds ten thousands of tcp connection keeping alive with heartbeat package:

Should I set a threshold for each connection to control the overall memory usage? when the threshold reached, free all messages by reset the corresponding arena? or should I use arena in different way?
Since I only need to hold a few messages for each thread in memory, can I detach some out-of-date messages and reuse the memory they hold? so I needn't to request memory again from os.

little-bird-in-china on 23 Feb 2018

The common patterns:

With arena: one arena for one message. Something like:

{
  proto2::Arena arena;
  unique_ptr<Foo> foo(Arena::CreateMessage<Foo>(&arena));
  foo->ParseFromString(data);
  ... use foo ...
  // arena is destructed
}

Without arena: reuse proto messages with a free-list.

Foo* foo = free_list_->Pop();
foo->ParseFromString(data);
... use foo ...
free_list_->Push(foo);

(1) works well if the message structure is complex. You can also fine-control the memory allocation using ArenaOptions. For example, you can provide an initial block so if the message fits into this block no memory allocation/deallocation will happen. However, as I mentioned, string fields won't be allocated on arena so it doesn't help if you have lots of string fields.

(2) is the most common pattern used before we have arena support. That's probably still true today. Protobuf objects have the property that proto.Clear() doesn't deallocate any memory but instead caches them for reuse. So if you reuse the same proto object, memory allocation will be kept minimum. Compared to arena, proto.Clear() still has a cost because it needs to traverse the entire message tree structure, but it's much better then deleting the proto object and therefore is used very widely. This is likely the best pattern for your use case as well. You can either use a global free list or per-thread free list. In its simplest form you can just reuse one single proto object again and again. There is one catch: because proto.Clear() doesn't deallocate memory, the memory usage of the reused proto will keep increasing. The reused proto basically allocates enough memory to accommodate every message parsed into it. For example, if one message uses repeated field "a" and another message uses repeated field "b", the reused proto will keep both. The more complex your message structure is, the faster the memory usage increases. For this reason the free-list implementation usually delete an object after a certain number of uses and newly allocated object will start to accumulate memory afresh.

xfxyjwf on 23 Feb 2018

I think i got it.

little-bird-in-china on 24 Feb 2018

@xfxyjwf

You mention strings not working great in arenas, but what about bytes. Bytes are pseudo strings, but since they don’t need to marshaled into some object, my assumption would be that arenas would be excellent for receiving bytes.

Especially if you wanted to receive these bytes directly into some special block of pinned memory, eg. cudaMallocHost memory using ArenaOptions.

Do arenas make sense for FlatBuffers? It seems like this might be the mechanism to do zero copy directly in and out of the memory blocks you reserve for messages.

ryanolson on 12 Aug 2018

@ryanolson In protobuf C++ API, string fields and bytes fields are both stored as std::string so the same issue applies: neither of them will be stored efficiently in protobuf arena. That can be solved by open-sourcing the zero copy support (see https://github.com/google/protobuf/issues/1896), which includes StringPiece (basically std::string_view) support and that will allow a string or bytes field to alias memory in the arena directly.

xfxyjwf on 12 Aug 2018

@xfxyjwf hi , I have an problem about arena .
now protobuf-3.6.1 has support create string in arena , so about this advice "I don't think arena will can be widely used until we address the string issue"
now Can I use this version to improve performance.

sorry , my english is bad . thank you .
Looking forward to your reply.

yuinm on 15 Oct 2018

@ly82882592 No, we still do not yet have a solution for this unfortunately. We will probably need to introduce a string ctype based on std::string_view to be able to store string data directly on the arena.