Protobuf: MessageToJson outputs the wrong type for uint64 and int64 in Python

Created on 7 Apr 2017 · 5Comments · Source: protocolbuffers/protobuf

Here's a very simple proto file:

$ cat test.proto
syntax = "proto3";

message Message {
    uint64 foo1 = 1;
    uint32 foo2 = 2;
    float  foo3 = 3;
    double foo4 = 4;
    bool   foo5 = 5;
    int64  foo6 = 6;
    int32  foo7 = 7;
}

and I created a protobuf message in Python as follows:

In [1]: import test_pb2
In [2]: message = test_pb2.Message()

In [3]: message.foo1 = 11

In [4]: message.foo2 = 22

In [5]: message.foo3 = 33

In [6]: message.foo4 = 44

In [7]: message.foo5 = True

In [8]: message.foo6 = 66

In [9]: message.foo7 = 77

In [11]: message
Out[11]:
foo1: 11
foo2: 22
foo3: 33
foo4: 44
foo5: true
foo6: 66
foo7: 77

In [12]: from google.protobuf.json_format import MessageToJson

In [13]: print(MessageToJson(message))
{
  "foo6": "66",
  "foo4": 44,
  "foo7": 77,
  "foo5": true,
  "foo2": 22,
  "foo1": "11",
  "foo3": 33
}

Note that foo6 and foo1 should not have quotes as they are integers and not strings. Using the protobuf_to_dict module does not have this problem

In [15]: from protobuf_to_dict import protobuf_to_dict

In [16]: protobuf_to_dict(message)
Out[16]:
{'foo1': 11,
 'foo2': 22,
 'foo3': 33.0,
 'foo4': 44.0,
 'foo5': True,
 'foo6': 66,
 'foo7': 77}

In [17]: import json

In [18]: json.dumps(protobuf_to_dict(message))
Out[18]: '{"foo6": 66, "foo4": 44.0, "foo7": 77, "foo5": true, "foo2": 22, "foo1": 11, "foo3": 33.0}'

My protoc version is 3.2.0 and python version is 3.5.2

$ protoc --version
libprotoc 3.2.0

$ python --version
Python 3.5.2 :: Anaconda custom (x86_64)

Source

mortada

Most helpful comment

It's all nice that json_format package follows the spec but it's laughable that I can't do this:

m = test_pb2.Message.FromString(serialized_proto_bytes)
d = MessageToDict(m)
copy = test_pb2.Message(**d)

because of type inconsistency.
I know I'm using wrong package for this but if you went that far to produce Python dict which is JSON spec compatible and you allow to create message instances out of unpacked nested dict structures then why we have to use another Python package to add the missing link for proper dict serialization?

WloHu on 14 Mar 2019

👍10

All 5 comments

As per proto3 JSON spec, uint64/int64 fields should be printed as decimal strings. See:
https://developers.google.com/protocol-buffers/docs/proto3#json

The reason is that uint64/int64 is not part of JSON spec and many JSON libraries only support double precision. To prevent precision loss our proto3 JSON spec requires serializers to put int64/uint64 values in strings.

xfxyjwf on 8 Apr 2017

I wanted ask the same question for MessageToDict here before creating another issue.

I respect MessageToJson converting int64 and uint64 into string for some reason even though it is hard to understand, why a long / integer value (without precision?) should be translated into string when there is no limit on a JSON integer by the specs.

Regardless to this _unexpected behavior_, I am assuming only by the name MessageToDict has nothing to do with JSON specs or whatnot, and all it is expected to do is to take a message object and give back a _Python dictionary_ without touching its data types. Unfortunately, it's behavior is the same with MessageToJson, which makes us impossible to use these utility functions who deal with int64 data types.

The question is, should MessageToDict return these values in Python dictionary without touching their data types?

kirpit on 20 Nov 2017

👍7

Isn't MessageToDict part of the json_format package? It should honor the same proto3 JSON spec as with MessageToJson.

xfxyjwf on 20 Nov 2017

It's all nice that json_format package follows the spec but it's laughable that I can't do this:

m = test_pb2.Message.FromString(serialized_proto_bytes)
d = MessageToDict(m)
copy = test_pb2.Message(**d)

WloHu on 14 Mar 2019

👍10

I'll add my vote to @WloHu that the code fragment above should work (ie the process should be reversible), even if we have to specify some optional argument like:
d = MessageToDict(m, preserve_int64_as_int=True)
Is something like that open for consideration or is there an existing recommended solution to this?