Here's a very simple proto file:
$ cat test.proto
syntax = "proto3";
message Message {
uint64 foo1 = 1;
uint32 foo2 = 2;
float foo3 = 3;
double foo4 = 4;
bool foo5 = 5;
int64 foo6 = 6;
int32 foo7 = 7;
}
and I created a protobuf message in Python as follows:
In [1]: import test_pb2
In [2]: message = test_pb2.Message()
In [3]: message.foo1 = 11
In [4]: message.foo2 = 22
In [5]: message.foo3 = 33
In [6]: message.foo4 = 44
In [7]: message.foo5 = True
In [8]: message.foo6 = 66
In [9]: message.foo7 = 77
In [11]: message
Out[11]:
foo1: 11
foo2: 22
foo3: 33
foo4: 44
foo5: true
foo6: 66
foo7: 77
In [12]: from google.protobuf.json_format import MessageToJson
In [13]: print(MessageToJson(message))
{
"foo6": "66",
"foo4": 44,
"foo7": 77,
"foo5": true,
"foo2": 22,
"foo1": "11",
"foo3": 33
}
Note that foo6 and foo1 should not have quotes as they are integers and not strings. Using the protobuf_to_dict module does not have this problem
In [15]: from protobuf_to_dict import protobuf_to_dict
In [16]: protobuf_to_dict(message)
Out[16]:
{'foo1': 11,
'foo2': 22,
'foo3': 33.0,
'foo4': 44.0,
'foo5': True,
'foo6': 66,
'foo7': 77}
In [17]: import json
In [18]: json.dumps(protobuf_to_dict(message))
Out[18]: '{"foo6": 66, "foo4": 44.0, "foo7": 77, "foo5": true, "foo2": 22, "foo1": 11, "foo3": 33.0}'
My protoc version is 3.2.0 and python version is 3.5.2
$ protoc --version
libprotoc 3.2.0
$ python --version
Python 3.5.2 :: Anaconda custom (x86_64)
As per proto3 JSON spec, uint64/int64 fields should be printed as decimal strings. See:
https://developers.google.com/protocol-buffers/docs/proto3#json
The reason is that uint64/int64 is not part of JSON spec and many JSON libraries only support double precision. To prevent precision loss our proto3 JSON spec requires serializers to put int64/uint64 values in strings.
I wanted ask the same question for MessageToDict here before creating another issue.
I respect MessageToJson converting int64 and uint64 into string for some reason even though it is hard to understand, why a long / integer value (without precision?) should be translated into string when there is no limit on a JSON integer by the specs.
Regardless to this _unexpected behavior_, I am assuming only by the name MessageToDict has nothing to do with JSON specs or whatnot, and all it is expected to do is to take a message object and give back a _Python dictionary_ without touching its data types. Unfortunately, it's behavior is the same with MessageToJson, which makes us impossible to use these utility functions who deal with int64 data types.
The question is, should MessageToDict return these values in Python dictionary without touching their data types?
Isn't MessageToDict part of the json_format package? It should honor the same proto3 JSON spec as with MessageToJson.
It's all nice that json_format package follows the spec but it's laughable that I can't do this:
m = test_pb2.Message.FromString(serialized_proto_bytes)
d = MessageToDict(m)
copy = test_pb2.Message(**d)
because of type inconsistency.
I know I'm using wrong package for this but if you went that far to produce Python dict which is JSON spec compatible and you allow to create message instances out of unpacked nested dict structures then why we have to use another Python package to add the missing link for proper dict serialization?
I'll add my vote to @WloHu that the code fragment above should work (ie the process should be reversible), even if we have to specify some optional argument like:
d = MessageToDict(m, preserve_int64_as_int=True)
Is something like that open for consideration or is there an existing recommended solution to this?
Most helpful comment
It's all nice that
json_formatpackage follows the spec but it's laughable that I can't do this:because of type inconsistency.
I know I'm using wrong package for this but if you went that far to produce Python
dictwhich is JSON spec compatible and you allow to create message instances out of unpacked nesteddictstructures then why we have to use another Python package to add the missing link for properdictserialization?