int is used to express the size of serialized messages. If the size exceeds 4 GiB, the application may allocate a buffer which is too small, or protobuf itself does this, in google::protobuf::MessageLite::SerializeToString. This lead to a heap buffer overflow, which may be exploitable for code execution in some cases.
It has been suggested that serialization of messages larger than 2 GiB is unsupported. But there is no good way for an application to ensure that the limit is not exceeded accidentally, without imposing rather draconian limits. To some degree, this is an gets-style interface.
Right now, this is more or less harmless because the message sizes involved are substantial. But this will change over time. My worry is that it will be difficult to fix this because some of the overflowing computations end up in generated *.pb.cc files, so the eventual fix will not be a simple library update.
The recommendation has been keeping the protobuf message relatively small (several MBs). There is code checking for sizes when parsing:
https://github.com/google/protobuf/blob/master/src/google/protobuf/io/coded_stream.h#L612
If you have a message that large (> 4GB), you might already be aware of this limit in parsing and have overridden the default limit. That's a chance to reconsider your design to avoid large messages.
@xfxyjwf, this issue is about serialization, not parsing. There is no default limit for serialization.
Do you mean the data is only serialized but never parsed back? I think parsing/serialization is always paired.
I'm not sure why you are asking this question. There is no serialized data to parse because the process crashes before in-memory serialization completes, due to the heap buffer overflow. In other words, this crash/security problem resides on the sending side, while constructing a protobuf message.
I was thinking in a more realistic case where the message is gradually grown to exceed 2G when the software evolves (i.e., you add more and more fields and data into the message). If you start straight out with a message larger 4G, it will fail lousily and I don't think anyone ship such software. A more likely case is that you ship a software which serializes/parses data close to 2G and some unusual user input pushes it to exceed protobuf limit. What I was saying is that in such cases, the developer should already be aware of the limit because protobuf parsing fails on a much smaller limit than 2G.
Anyway, for this issue, we can improve the documentation to make it explicit that protobuf does not support messages larger than 2G. We can probably also guard against >2G messages with CHECK errors in some cases, but that may not apply to all cases (e.g., >4G). Making protobuf to support >4G message is a non-goal.
@xfxyjwf: What was the resolution on this issue? Is there some limit in more recent versions of protobuf?
@anderius We now have a MessageLite::ByteSizeLong() method that will return the size in size_t instead of int and the serialization method is guaranteed to fail when ByteSizeLong() >= 2G.
In which release was this issue resolved?
@stephfonder It was first introduced in v3.4.0 release:
https://github.com/google/protobuf/releases/tag/v3.4.0
Most helpful comment
In which release was this issue resolved?