Protobuf: add SetStringByArray(..., const char *, size_t) method in Reflection

Created on 18 Jul 2019  路  7Comments  路  Source: protocolbuffers/protobuf

What language does this apply to?
cxx and other binding languages such as python's cpp_implenentation

Describe the problem you are trying to solve.
the Reflection class in message.h only has SetString(..., const std::string &value)interface. in some cases, we just have raw buffer (char * and size), to call the SetString method, we have to construct/memcpy/deconstruct a std::string. But in some time-sensitive case, this consumes much, especially when the the bytes is large enough (eg. 8MB), which will cause large amount of page-faults.

Describe the solution you'd like
Add SetStringByArray(..., const char *, size_t ) interface, and call it when necessorily.

Describe alternatives you've considered
even in protobuf, construct 8MB bytes(std::string) will cause large amount of page-fault, this will slow down much. is it possible to setup an allocator for the protobuf. I did not seen such method.

Additional context
Add any other context or screenshots about the feature request here.

Most helpful comment

If the string is passed by value or by rvalue reference then we can eliminate one of the two copies. The user can create the string and then it can be moved directly into place.

All 7 comments

Unfortunately I don't think there's any easy way to do this. Internally messages store string and bytes fields as std::string, so ultimately we have to construct a real std::string. However, we could potentially add an overload that takes a std::string&& and then we could at least move the string into place without creating another copy.

@acozzette
I'm sorry, I did not describe it clearly.
I uses the contruct/memcpy/deconstruct did not mean the protobuf's internal data storage but the extra std::string, which must be create to adapt to the interface.
like the following code, could we remove the temp std::string's construct/memcpy/deconstruct

bool foo(const char *buf, size_t size,...) {
  ...
  std::string temp(buf, size); // actually, we don't need to create a temporary std::string here, if we have 
                               // some method like SetStringByArray(..., buf, size); 
  reflection->SetString(...., temp);
  ...
}

the original case comes from the following code, and I measured it in a 8MB bytes case. the value_string's construct and memcpy take about 3ms( actually due to the page fault).
https://github.com/protocolbuffers/protobuf/blob/a03d332aca5d33c5d4b2cd25037c9e37d57eff02/python/google/protobuf/pyext/message.cc#L787

If the string is passed by value or by rvalue reference then we can eliminate one of the two copies. The user can create the string and then it can be moved directly into place.

Adding an overloaded SetString(..., std::string&& value) which utilizes move semantics will avoid using the copy constructor and the std::string object will be moved, not copied which is less costly and will improve performance significantly

copy that. when will this feature be added to the master?

I'm afraid I don't have time to work on it, but if you want to try implementing it yourself then feel free to send me a pull request.

solution:

  1. change the Reflection's prototype from SetString(..., const std::string&) to SetString(..., std::string).
  2. if the caller do not use the created std::string s any more , using SetString(..., std::move(s))
  3. if the caller would like to keep the created std::string s, using SetString(..., s)
  4. in the SetString, becuase we get the moved or copied string, we just move it forward
Was this page helpful?
0 / 5 - 0 ratings