protobuf 🚀 - Missing value/null support for scalar value types in proto 3

While proto3 mentions using proto2 Message Types

It's possible to import proto2 message types and use them in your proto3 messages, and vice versa. However, proto2 enums cannot be used in proto3 syntax.

It might not be good application design to expect the .proto files and binary format to stay the same after switching to proto3.

Even though it appears that using the wrappers.proto is the way proto3 is addressing nullable fields, and the c#/.net library has convenient nullable formatting for them, they will end up consuming an extra byte per value because of the additional message layer.

ngbrown on 27 May 2016

@lostindark, workaround (2) using an oneof is a wire compatible change.

xfxyjwf on 27 May 2016

@lostindark, workaround (2) using an oneof is a wire compatible change.

The issue is it doesn't produce nice code, and it needs box/unbox when use (hurt performance).

lostindark on 3 Jun 2016

@lostindark what if we optimize that case (one primitive field in an oneof) to eliminate the box/unbox cost?

xfxyjwf on 3 Jun 2016

@xfxyjwf That seems a good solution with minimum change required. However, I do wish I don't need to name the oneof, like below:
message Test1 { oneof { int32 a = 1; } }

lostindark on 4 Jun 2016

I am thinking of a dedicated syntax for that. Something like:

message Foo {
  nullable int32 value = 1;
}

which is basicaly a syntax sugar for:

message Foo {
  oneof value_oneof {
    int32 value = 1;
  }
}

I'll bring this up with the team and see how others think about this.

xfxyjwf on 4 Jun 2016

👍61

@xfxyjwf

message Foo
{
int x = 1;
oneof v1 { int32 value = 1;}
oneof v2 { string value = 1;}
}

gets error on v2: "value" is already defined in "Foo".
Is it possible to have several independent oneofs in one message?
Moreover it's not clear to me which tag is assigned for v1 and v2 fields?

skorokhod on 8 Jun 2016

@skorokhod

Oneof fields are just like regular fields and are in the parent message's naming scope. You need to define your message as:

message Foo {
  int x = 1;
  oneof v1 { int32 value1 = 2; }
  oneof v2 { int32 value2 = 3; }
}

All these fields need different names and field numbers.

xfxyjwf on 8 Jun 2016

👍2

The C#/.NET library already optimizes for the wrappers.proto case and doesn't box/unbox those nullable values. Looping in @jskeet to ask if it can/should optimize both cases?

Even in my own application which used protobuf2, I relied on the HasXXX functions for field presence. This was used to broadcast updates, only filling in the fields that had changed. Not present didn't mean that it was the default value, but that it hadn't changed from the previous value.

I think though, this was probably viewed as a mistake in the original design/spec since it conflicts with the eliding of fields with default values. Going forward, I don't see many realistic ways to say that v2 and v3 are binary compatible for applications/messages that relied on this behavior.

ngbrown on 8 Jun 2016

👍4

@ngbrown: I don't want to start special-casing the situation where there's a oneof with only a single field in it, no. That feels like complexity for little benefit. The way proto3 has been designed to work is that if you want a primitive field with presence, you use the wrapper type. I wouldn't want to subvert that.

jskeet on 8 Jun 2016

In struggling (#1655) with this 'bug' I realized that I don't understand the rational behind having default values at all when you can't set the default value or make the field required, i.e., trust that you should use the default value if the field isn't set on the wire. Why not do away with defaults all together? If a field is set put it on the wire even it is a zero length string or a zero. Otherwise one can assume it is null.

This problem is killing me because I have a situation where a int32 field may or may not be set and one of the valid data values is zero. As it stands I have to use oneof or do something hackish like +1 to all values zero or greater before serialization and -1 after parsing the message.

In the current incarnation enums are particularly problematic. Given that "every enum definition must contain a constant that maps to zero as its first element" and fields are inherently optional it is impossible to use it in an application. Was the zeroth element explicitly set before serialization? Was it not set? There should be a big warning- don't use the zeroth element in an enum. You won't know if it has been set or not. It is best to set the first element to something that will never be selected, e.g., NOT_SET = 0;

christian-storm on 10 Jun 2016

👍19

Ran across this in python_message.py and thought it may be pertinent. Just trying to help...

def _AddPropertiesForNonRepeatedScalarField(field, cls):
  """Adds a public property for a nonrepeated, scalar protocol message field.
  Clients can use this property to get and directly set the value of the field.
  Note that when the client sets the value of a field by using this property,
  all necessary "has" bits are set as a side-effect, and we also perform
  type-checking.
 <snip>
  def getter(self):
    # TODO(protobuf-team): This may be broken since there may not be
    # default_value.  Combine with has_default_value somehow.
    return self._fields.get(field, default_value)

christian-storm on 12 Jun 2016

@jskeet The problem with wrapper type is it is not wire format compatible with existing proto 2 message. Assume you want to upgrade to proto 3 for some services in a large project, what is the option here if the project already leverage HasXXX feature in proto 2? Upgrade to a different wire format? The cost might be too high.
I know proto 3 is not 100% compatible to proto 2. But I think it shouldn't break important functionalities that people may frequently depend on. Maybe in proto 3 design the choice was made to get ride of this support for simplicity. However, this break an important feature and makes the upgrade from proto 2 painful.

lostindark on 20 Jun 2016

@xfxyjwf Is there any update on the discussion?

lostindark on 20 Jun 2016

Thanks for replying @lostindark. I thought my comment may have fallen on deaf ears. I'm actually coming to this from a new project perspective. I was attracted to v3 vs v2 because of its tight integration with JSON. Since v3 doesn't support HasXXX I've so far worked around this issue by wrapping fields that may be set to the default value in oneof statements. This is very clumsy and brittle to me. Enums take the cake though. All of my enums have the necessary 0th enum element set to NOT_SET. I still don't get why defaults are needed at all in v3 since they are effectively not supported anymore.

I know defaults aren't set over the wire. One open question that I have is whether fields set to a default should have some sort of has_been_set_bit flipped to true so that the receiver knows to set the value of a seemingly valueless field to the default? That way when you try to access it you get the default value instead of nothing.

christian-storm on 21 Jun 2016

👍7

@christian-storm In my perspective proto 3 has some design issue.

In the doc it says:

Message fields can be one of the following:
•singular:
a well-formed message can have zero or one of this field (but not more than one).
•repeated:

This make people think all fields are optional now.
However, the generated code can't not tell if it exists or not, as it can't tell the difference between missing and default value. This means those fields are not optional, instead, they're actually "required". The fields are always there. If you don't set them, they have the default value. The default value is stored as "not exist field" in the wire format though.

Does people need optional fields? Yes we do. We don't want wrapper fields (why extra space? and not compatible with proto 2), and we don't want ugly one of approach.

I hope this will be fixed in proto 3, or else it will be a big problem for people who have similar requirements.

lostindark on 29 Jun 2016

👍33

We also need this. Adding oneof to every fields that should be nullable if pretty verbose and syntaxic sugar version would be fantastic. What do you think?

dopuskh3 on 23 Aug 2016

message Foo {
  oneof value_oneof {
    int32 value = 1;
  }
}

so, this is the recommended/only way to handle a nullable field?

this is quite a non-obvious usage for oneof, the docs barely mention that it actually means 'one or none of'
https://developers.google.com/protocol-buffers/docs/proto3#oneof

surely nearly everyone designing their first protobuf messages has this question, the docs could give more guidance I think

anentropic on 29 Aug 2016

👍7

another possibility I've stumbled on is using a wrapper type, since the default value for message fields is null

so you can:

import "google/protobuf/wrappers.proto";

message Foo {
  google.protobuf.Int32Value value = 1;
}

the available wrappers correspond the "Well-Known Types" listed in the docs https://developers.google.com/protocol-buffers/docs/reference/google.protobuf although I didn't see it explained anywhere what they're intended to be used for or how to import them

but it seems a bit nicer than the oneof way, because you can't do this with oneof:

message Foo {
  oneof bar {
    int32 value = 1;
  }
  oneof baz {
    int32 value = 2;
  }
}

...the compiler complains about re-use of value label, so you'd want to come up with your own convention like oneof bar { int32 bar_value = 1; } and you have to get/set like bar.bar_value = x

whereas with the wrappers, the field is always just called value so you can get/set like bar.value = x which is a bit nicer

it's annoying that, either way, your nullable fields therefore have different get/set code to other fields

to be honest, if message fields can be unset and effectively have null value I don't understand why all the fields weren't designed that way. default values should be responsibility of the application. it seems dangerous to have eg int32 defaults to 0 when in many domains that is a meaningful value

I am wondering if it makes sense to just use the wrappers for _all_ my fields.

anentropic on 29 Aug 2016

👍26 👎1

A question to the protobuf team if they're around. Why not add a new wire type for null values? There are 3 bits reserved in the format allowing for 7 values, and only 6 are used (2 of which are deprecated).
I'm running into the same issue and the alternative of wrapping every nullable field in a *Value adds a byte to the wire packet for each. In my case, I'm reading from a DB where (almost) any field can be null.

ckamel on 27 Sep 2016

👍3

@ckamel, adding a new wire type could work, but I think there are a few big downsides to doing that:

We only have two unused wire types left (6 and 7), so if we want to add any new ones in the future we have to be very thrifty with them.
Older parsers wouldn't know how to interpret the new wire type, so it would be tricky to roll out the change in a way that doesn't break anything. We would possibly have to do something like update all parsers to understand the new type, wait a year or so for them to be deployed, and only then start serializing in the new format.
The proto3 semantics is already specified and being used, and so reintroducing nullability could break a lot of existing code and be disruptive.

acozzette on 29 Sep 2016

That makes sense. I think null would be a worthy cause for one of those unused wire types left :) But if the older parsers can't ignore unknown wire types then rolling this out would be complicated.
I worked around this by wrapping every field in a message, the downside is an extra byte per field whenever it's not null (which is the predominant case).
Thanks, @acozzette!

ckamel on 29 Sep 2016

Hey,

Right now, not having nullable field is a very annoying issue:

switching from encoding that has nullable to proto can be extra hard and result in wrapping everything using google wrapper or adding a huge amount of boolean fields.
using messages for machine learning applications or were you just throw a set of dimensions (the message) in a black box. Not having null type in in this case breaks a lot of such applications.

You will probably argue that this kind of use-case should clearly end in adding extra fields in order to explicitly encode meaning of null value - which is the cleanest way to go.
Unfortunately, this is type of change can be extra-hard to achieve in legacy code, when trying to migrate from encoding like json to protobuf. This is clearly a stopper.

@acozzette I understand the above point but let me point out a few things:

Adding a nullable things may not break other implementations in case the unknown type will just be ignored which is the most common case.
In the current implementation, I propose to add a isXXXNull generated method that will return true in case field type is null. Getters will still return the type default value, remaining backward compatible.
Regarding the fact that only 2 free rooms are left. I think this nullable feature is such a pain for users that the question is worth asking. Plus group start/end values that are deprecated may be re-usable at some point

dopuskh3 on 14 Jan 2017

👍23

I think @dopuskh3 's suggestion would be interesting, adding a isXXXDefined (or isXXXnull) method will not hurt the existing code using protobuf 3.

qinghui-xu on 15 Jan 2017

I concur with @dopuskh3, having nullable types in ProtoBuf is a very interesting option. I think people migrating from JSON to ProtoBuf are faced with that problem sonner or later, and I'd love to see a clean way to make the transition smoother.

In some applications, null actually carries information. Consider for instance the time since last event: if the event actually occured, then you have an int value, otherwise null.

natbraun on 17 Jan 2017

@dopuskh3 Have you considered using proto2? Proto2 already has generated methods for checking presence like the one you suggested--if you have a field x you can call has_x() to see if it's set (this is C++ but it's similar in other languages). Proto2 is still fully supported and we're still working on new improvements to it with every release.

For proto3 I think currently the best approach is to either use the wrapper types or oneof fields in cases where you need nullability. The oneof option is especially nice because it is wire-compatible with proto2. I would say that if the oneof trick proves to be too awkward, we should spend more time looking into @xfxyjwf's idea above, which would basically be to add some nice syntactic sugar around the oneof trick. This would probably be the easiest way forward because it would be very backward-compatible (both with the wire format and API) and wouldn't require much work since the oneof functionality already exists. Does that sound reasonable?

acozzette on 17 Jan 2017

👍2

We considered sticking with protobuf2 but some implementations (C#) only support proto3 (and we are using C# extensively)

The syntaxic sugar option looks interesting but it makes field accessing and setting kind-of bloated as is. An ideal option would be to make nullable syntaxic sugar also alias getXXX() to call getXXX().getValue() and hasXXXX() aliased to check for defined value inside the oneof.
That would be the ideal solution IMO:

User POV result in exactly the same behavior than what was proposed with the null type above.
Compatible with the current specification
Mimic the proto2 API
Probably smaller overhead (Can't find encoding specification for oneof though) than nullable type (or storing the list of defined field ids in the message itself like what we are thinking of implementing to overcome this issue).

Although that could probably be feasible using insertion points I think such an API out of the box would probably be useful for a lot of users.

Does it sounds acceptable? I would be happy to contribute if it's OK.

dopuskh3 on 17 Jan 2017

👍1

Having a look at the code looks like it would be easier to define nullable as an option instead of a label. Looks like multiple options will require more work (as for 'nullable repeated') and will probably break more existing implementations.

After adding this field option to descriptor.proto I propose to treat fields having this option the following way:

string field = 12 [nullable=true]

Exactly as

oneof field {
string value = 12
}

Also I would like to mimic proto2 behavior for those fields with a has_field and get_field accessible from the container message.

Sounds reasonable ?

dopuskh3 on 18 Jan 2017

👍1

@dopuskh3 Those ideas sound good but I am thinking we should wait for @xfxyjwf to get back from vacation in about 2.5 weeks to get his opinion on the nullable idea since he originally came up with it and was investigating it. I think the idea may be controversial because to some degree it goes against the proto3 way of doing things, and it would also be a fair amount of work to implement it for all languages. If you're interested you could try implementing something, but I would be careful not to spend too much time on it because in the end we might not agree on going with that approach.

acozzette on 18 Jan 2017

@acozzette I started to have a look at the code but I'm waiting to reach a consensus before spending more time on that topic.

About the argument about being against of proto3 on the way of doing things I would like to ear more on that. This solution sound the less intrusive and the most retro-compatible IMO.

Could you provide some details about potentially unwanted impacts that could lead to rejecting this option? I'm not an expert of all protobuf use cases so there are probably some drawback I'm not aware of.

dopuskh3 on 18 Jan 2017

Sure, here are what I think would be the main objections:

First, proto3 deliberately removed field presence for primitive types because it's expected that in the common case users don't need to distinguish between missing and 0/"". The idea was that in the unusual cases where you need field presence, you would use one of the wrapper types or the oneof trick, and so if we now introduce a nullable label or option then that adds yet another way of accomplishing the same thing. (If we add this new feature then what are the wrapper types even for?)
Adding in nullable can be confusing because it's basically equivalent to optional from proto2, but with a different name.
If this nullable label/option changes the generated code then we will have to have support for it in every language, which will be a fair amount of work.
We would need to make sure this works for proto3 JSON; I haven't used it much so I'm not sure of what the issues are there.
Finally, we usually are just somewhat conservative about adding any new options, flags, etc. since once they are in, we have to maintain them indefinitely.

acozzette on 18 Jan 2017

👎13 👍1

Hello @acozzette and @dopuskh3,

So our main concern is your first point. We have the need to differentiate between absent and default value for most of our fields. We are not in the common case, and the solutions offered to us have major drawbacks:
- the use of wrapper field add 2 bytes per field when they are set
- the use of oneof does not have any wire overhead but increase dramatically the complexity on client code
- we can wrap each field with a oneof at a huge cost of readability for schema and usability for generated code
If nullable is the same as optional, why not adding it back to proto3? what is the impact of doing so and what was the intention when you decided to remove it in the first place?
I agree that if nullable is added to the spec, it has to be supported in every languages. I think that @dopuskh3 is asking if this is complex and/or doable.
I don't see how it will impact proto3 json as json is kind of the same as proto3 wire format, i.e. for a nullable field:
- if it is not in the json payload then we can consider it absent
- if it is present in the json payload then it is set
- if we set the value to the default one in code, we the library will have to send it to the wire
I agree that adding new option, flag or label is really the last option to consider, but the current request is kind of non-intrusive. The only impact on wire format will be for default values of scalar type. The library will have to send bytes for them. It seems that it will not break current implementations. And it will dramatically ease our development (avoid the use of oneof and provide a method isFieldNameSet or hadFieldName)

mchataigner on 19 Jan 2017

👍13

@acozzette I used proto2's optional capabilities as a substitute for _null_. By not setting the value, I really meant that the value hadn't changed since the last time the receiver saw that value. I did not in any way mean that it was the default value, which is generally zero or empty string (""). Other people used _null_ to mean other things in their applications. _null_ is a valid value for JSON.

By proto3 removing from applications the ability to control when send and not send a default value, proto3 took something away. This request is to give something back. It doesn't have to be the same thing, but it needs to be something in kind with what was taken away. Since the nullability of fields is the most common use the _has field_ functions were for, it makes sense that nullability is what is being requested to be given back.

I best like the idea of using one of the wire types to flag for _null_, and am ok with introducing it in a non-breaking manner of explicitly setting _null_ and checking for _null_; existing APIs would get the default value of the missing field (like proto2 did).

ngbrown on 19 Jan 2017

👍1

I also see great need for nullable primitive data. In our application, we collect tons of data from embedded devices where each source of data can potentially fail.

I very much like @dopuskh3 's post https://github.com/google/protobuf/issues/1606#issuecomment-272616014 and I would like to add a few thoughts:

1) https://developers.google.com/protocol-buffers/docs/encoding currently does not contain any information about "oneof". I assume that it overrides the semantics that an element not sent over the wire is to be interpreted as the default value of that element. That is, of all elements in an oneof-clause, the one element written to last will be encoded on the wire regardless of its value.

2)
a) proto3 does not allow the receiver to distinguish for wire types 0, 1, and 5 whether a value has been set: "Also note that if a scalar message field is set to its default, the value will not be serialized on the wire." (https://developers.google.com/protocol-buffers/docs/proto3#scalar)

b) proto2 does say "When a message is parsed, if it does not contain an optional element, the corresponding field in the parsed object is set to the default value for that field." (https://developers.google.com/protocol-buffers/docs/proto#optional). It does not state whether an optional field set to the default value must be sent over the wire and the encoding reference only states "For any non-repeated fields in proto3, or optional fields in proto2, the encoded message may or may not have a key-value pair with that tag number." (https://developers.google.com/protocol-buffers/docs/encoding#optional).

Having never worked with proto2, it looks like an implementation decision on whether to encode the default value on the wire. I might be wrong here, but I cannot find a requirement that enforces the ability of proto2 to tell absent from default. Please let me know if I am wrong!

3) Messages and Well-Known Types cause significant overhead. The latter are also missing signed integers (https://github.com/google/protobuf/issues/2603).

4) Oneof causes ugly, hard to read code. If I understand it correctly, it also causes significant overhead for elements that are frequently set to their default values.

So there are three options to chose from:
a) Do not do anything: this would be sad. The length and number of participants in this discussion states some need.
b) Add syntactic sugar: simple solution that would still keep the disadvantage of (4) at no cost with regard to wiretypes and on-the-wire compatibility.
c) Add native support by wiretype: uses up a very precious wiretype

I would strongly suggest (c). I find the cost of using up the last free wiretype 6 (type 7 will probably have to become an extension marker) acceptable regarding the importance of this feature.

Default values would continue to be encoded by absence keeping compatibility with old code and highest efficiency for values that are frequently at their default value.
Null could be encoded extremely efficient. Null could also become a valid default value, allowing for highest efficiency were applicable.

However, I need help in evaluating the impact on existing systems. What happens with unknown wire types when a message is preprocessed by legacy code and forwarded to the final consumer? If the NULL information is replaced by absence by legacy code, this would be a serious downside and might sway my opinion towards (b). Is this a problem?

VolkerKamin on 19 Jan 2017

@mchataigner I believe the main motivations for not supporting field presence for primitive fields in proto3 were (a) much of the time no one relied on the distinction and (b) removing it simplifies implementations and possibly makes them faster in some cases (to get or set a field you just directly access the memory, no need to check whether it's set).

I don't think JSON would be a big issue, but someone just needs to work through the details and make sure it makes sense. For example, what does it mean if a JSON value is null? Probably we should treat that field as unset, but that's kind of weird because then there are two ways to express an unset field in JSON (omit it entirely or set it explicitly to null).

@ngbrown Don't forget that proto2 is still fully supported and with each release we are making new improvements to both proto2 and proto3 in parallel, so if proto2 optional fields are working well for you then sticking with that is a good option.

@VolkerKamin I believe your interpretation of the wire format is correct except for a couple of things:

2b: Proto2 does make a distinction as to whether a primitive field is set or not. If it's unset and you read it anyway, you get the default value. If you explicitly set a field to the default value then proto2 does have to serialize it so that the receiver knows that the value was set.

4: I don't know of any significant overhead for using the oneof trick, other than just the overhead you'll inevitably have to pay for if you want nullability. That is one advantage of the proto3 field presence--if a field is set to its default then it costs nothing on the wire.

About using a new wire type to represent nullability, this would be quite difficult to roll out because existing parsers would be unable to handle it. Imitating the proto2 format for optional fields makes much more sense to me because it is fully wire-compatible and saves space when the field is unset. This is also the exact same wire format that you get if you use the oneof trick (a oneof with a single primitive field inside).

acozzette on 19 Jan 2017

@acozzette If your application does not need to differentiate between a value set to default and a value not set, you will not loose performance.

It will not have any impact in performance, as the way to get or set value will continue to access memory directly. It will only add back the method hasXXX for those who require it.

Concerning protobuf 2, we are using C#. As I can see from the documentation, the C# library supports only syntax 3 for code generation. If I understand you correctly, protobuf 2 is still supported and in development. So, do you have any plans to add syntax 2 support for C# code generation?

I know that some thirdparty plugin/library do this partially (i.e. custom options are missing).

Thanks.

mchataigner on 20 Jan 2017

👍2

I think the plan was that we hope to eventually support proto2 for C# but at the moment we don't have any concrete plans in place because no one has the cycles to work on it right now.

acozzette on 20 Jan 2017

+1.
oneof does the job but there should be something more readable and that can be used to generate more usable code (Eg. in java: a Float instead of a float).

cbornet on 30 Jan 2017

👍3

@acozzette @xfxyjwf I think we have a pretty fair consensus on the fact that implementing syntaxic sugar and easier API on top of it is a pretty good way to go.

Do you think it's worth starting implementing it?

dopuskh3 on 2 Feb 2017

@dopuskh3 I'll be happy to help you in implementing this!

natbraun on 2 Feb 2017

Here is a very rough implementation of the nullable syntaxic suggar for scalars: https://github.com/criteo-forks/protobuf/commit/8298aff178ccffd0c7c99806e714d0f14f40faf8

dopuskh3 on 4 Feb 2017

Hi,

I brought up the syntax sugar proposal in https://github.com/google/protobuf/issues/1606#issuecomment-223729746 to protobuf team meetings and after some discussions the decision is to not move forward with the proposal:

Rationale of removing field presence in proto3:
- Field presence in proto2 has caused confusions and it complicates the semantics, e.g. one
  
  has to distinguish between absence fields vs fields set to their default values; users usually
  
  check presence before accessing the fields which is unnecessary. We believe in most cases,
  
  field presence info is not needed.
- Removing field presence makes Proto3 significantly easier to implement with open struct
  
  representations, as in languages like Android Java (go/nano-proto), or Go. The easier
  
  implementation in turn makes it better accessible to external implementer communities.
If such presence info is explicitly needed, there are several workarounds, e.g. wrappers, explicit
has_field boolean. Oneof can also be used if backward wire compatibility with proto2 optional
field is desired.
Introducing a new keyword or reusing an existing keyword to support field presence in proto3
will complicate protobuf semantics. We believe it will lead to confusion and misuse, which
defeats the purpose of removing field presence in proto3.

We may reconsider the proposal in the future when there are more data showing field presence are used more often than we expect in proto3, but at the moment we recommend users to design their proto3 protos without relying on field presence.

xfxyjwf on 22 Feb 2017

👎77 😕28 👍5

Closing for now as the official decision is to take no action at this time.

haberman on 9 Mar 2017

👎67 😕11 👀2

Do you think one_of containing only one field can be detected by the compiler so that nullable types are used by language implementations when possible ? (either by hasXXX() methods or boxed types)

cbornet on 9 Mar 2017

👍4

+1 to the above @cbornet

And in general I would love to see field presence introduced. Clients/Servers don't always follow contracts and trying to use the default values to work out when something was not set is flakey. Wrapper types are really only a sticking plaster (buggy code may well not set the value inside the wrapper type!)

I don't know if repeated fields have been mentioned in this chat, but I think it's worth bringing up as they would also be extremely useful to support 'nullable' cases. Again, the 'default' is just an empty array and I don't know whether the other end meant to send that or they just forgot. It would be a real shame if we added nullable to scalars but not arrays

Finally, I'd like to hear from the team what the canonical implementation of embedded messages is in languages that have decent null support? has_X methods can be a bit ugly if your language has proper optional support

samskiter on 15 Mar 2017

👍6

About nullability for repeated fields, I think this would be quite complicated to implement because the wire format has no way of even representing any distinction between an empty repeated field and a null one. So if we wanted to implement this, we would have to come up with a whole new scheme for encoding it and it probably would be not be wire-compatible with the existing encoding for repeated fields. So for this particular use case of nullable repeated fields, I think it makes much more sense to just use a message wrapper rather than to expend a huge amount of effort implementing it as a special feature.

acozzette on 15 Mar 2017

👍1

Hi,

here is the way we are adding nullable semantics for all field types.

We declare a map in each messages and nested messages
Each time a field is set, if it's set to the type default value map item corresponding to the set field id is set to True. This is done by inserting some code in field setters.
We add hasXXX methods for each fields: if the field is not set to default, return true, if the field is set to default value, return map value for field else False.

This approach required to add a few insertion points in setters though (see #2684 ).

We implemented this approach as a potobuf plugin and it's working pretty well:

limited overhead: add a map which is almost empty most of the time
backward compatible.

Alternatives

Use protobuf2: and don't get C# support (this is was not an option for us)
Use oneofs for all fields: that would emulate somehow proto2 behavior. repeated fields are not supported.
Use wrappers: this can have a huge overhead. Making generated objects bloated.

dopuskh3 on 15 Mar 2017

👍15 😄3

What an ugly solution! I need to distinguish between 'no value' and 'default value' for string and i can't use protobuf2 becuase i use gRPC. What should i do?

4ntoine on 26 Jun 2017

👍2

@4ntoine: You should use StringValue. That's what it's there for. Regardless of what you want the solution to be, that's the solution for proto3.

jskeet on 26 Jun 2017

@jskeet: Thanks! But i'm still shocked by the design

4ntoine on 26 Jun 2017

👍24 😄2

In all discussions like this (inc how to best write protos) I hear a lot of 'but the wire format' / 'it's not efficient on the wire' which IMO stinks of bad abstraction. These are often performance arguments which are rarely relevant (esp. when you consider gRPC as a replacement for REST+JSON). I think this is largely a by-product of this being a Google-developed tool and will make it trickier to persuade people over from 'traditional' REST/JSON

As a design decision - I'd rather have good abstraction and semantic expressiveness over shaving a few bits off on the wire.

samskiter on 30 Jun 2017

👍31 ❤11

Yes agreed, we have not gone down the route of gRPC as we need to represent null values without workarounds.

For us a 0 is a valid value of an int!

leearmstrong on 3 Jul 2017

👍2

@samskiter I don't think the serialized format is the issue here. It makes sense to avoid representing missing values from forward compatibility standpoint. The problem is that the generated code (at least generated Java code) doesn't represent it at all. If scalar values had been generated as the object wrappers for the primitives instead of the primitive types, null would be enough to note missing values (as long as Java <8 compatibility is maintained and Optional is not a choice). The problem is that no-value is coerced into whatever the default value is for a particular primitive type is.

liqweed on 4 Jul 2017

The problem is that no-value is coerced into whatever the default value is for a particular primitive type is.

@liqweed I might be wrong, but the coercing is not actually done on the receivers end, rather the field is always filled over the wire. Some explanation on the Cap'n'proto discussion forum here which does essentially boil down to "it needs another bit on the wire".

samskiter on 5 Jul 2017

BTW worth adding we had an error recently where a wrapper was set, but the integer within it was not set to the correct value. Wrapper just feel like a sticking plaster to me..

Would the team reconsider syntactic sugar for 'oneof' and let developers decide when performance trumps accurate, safe, representation?

samskiter on 5 Jul 2017

@samskiter don't hesitate to upvote #2984 for having hazzers functions for oneof fields which would already make our lives better !

cbornet on 5 Jul 2017

👍1

I use the json to pb feature a lot and have issues especially with zero and zero crossing primitives. Ex a field representing dBm. I can't just take zero. Yes, proto2's field presence works, but it doesn't have a pb to json converter, or maps, Timestamps, etc.

nullable seems reasonable to me sort of like std::optional.

ekigwana on 7 Jul 2017

Hi, just wanted to chime in to say that I find all proposed solutions appalling.
I also find it hard to believe that the concepts of a missing field and a default value can be confusing to a developer and lead to bad coding practices. That seems an after the fact rationalization. I think the easier implementation is the real reason behind the choice, which is perfectly legitimate.
On the other hand I cannot think of a single project I have been involved in where the serialization layers didn't need to handle missing fields. In many cases there was no "flag" value available that could have been used instead, and, as I mentioned above, wrappers and oneofs are ugly (both to write in the proto and handle in code).
I, for one, will not switch to proto3 for old projects, and will have to look for alternatives for new ones.

sky87 on 19 Nov 2017

👍35

This design decision puts us in a bit of a bind, because we have a massive v2 .proto file with heavy use of missing as meaning null for primitives. We now wish to support C#, which forces us to use v3. However, the impact on our Java and C++ code is far too great to actually move them to v3, so we want to have a v3 .proto file which is wire compatible with our v2 one. If C# were supported in v2 that would have solved our problem. We are now forced to have thousands of oneofs around our nullable primitives in the v3 file, to maintain wire compatibility, and we have to keep 2 massive .proto files in sync for eternity...
Not trying to be a jerk, but imagine what would happen if a database vendor removed nullable primitive column types... Of course db vendors would not do that because of the backwards compatibility problems. But keep in mind one of the main selling points of protobuf is backwards compatibility...

kenlars99 on 11 Dec 2017

👍17

I think at this point we are just grieving.

ekigwana on 11 Dec 2017

👍7

sky87 on 11 Dec 2017

❤24

Proto2 isn't frozen: there are some people who have been working to add proto2 support to Ruby. Adding proto2 support to C# is a totally realistic idea. I don't believe we have plans to implement this ourselves, but anyone who is interested in this could definitely consider sending PR's for this.

Do get in touch with us before doing too much work on it, so we can make sure we're on the same page about the design (APIs and implementation). But proto2 is alive and well, not deprecated, and totally open to new development.

haberman on 11 Dec 2017

I think WKT is more complicated and more ugly than a nullable scalar type

lawrsp on 15 Mar 2018

What is the decision on this? so we have to use oneof{} to wrap the value if we want protobuf detect false and 0 being set?

willdzeng on 16 Mar 2018

👍1

I was one of the first to complain and waited a while for a solution. Pain
in the derriere but after this long it seems this is what we are stuck with.

On Thu, Mar 15, 2018 at 5:20 PM Di Zeng notifications@github.com wrote:

What is the decision on this? so we have to use oneof{} to wrap the value
if we want protobuf detect false and 0 being set?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/google/protobuf/issues/1606#issuecomment-373563726,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ARKIHCPWitzQ9cBZfXWikeeSi50vQmEtks5tewVFgaJpZM4In-A4
.

>

Christian Storm, Ph.D.
CTO and co-founder DeepGrammar, Inc.

christian-storm on 22 Mar 2018

The funny/sad thing is that almost all internal usage of protobufs at Google remain on proto2 for this reason.

kurtome on 18 Apr 2018

😄18 😕8 👎1

Yes, null is important for data intensive applications.

Discarded proto3 in favour of json+jsonschema for my last project where source data required a lot of vidation and cleaning.

DXist on 23 Apr 2018

👍2 👎1

This is an incredibly sad decision, thumbs down from a fellow Google engineer.
We really need to be able to tell the difference between a boolean/int field that was set or not, without having to use a really verbose 'oneof'.

erwincoumans on 30 Aug 2018

👍12 😄1

I now abuse oneof.

ekigwana on 30 Aug 2018

👍1

I literally have 10,000 lines of stuff like this now:

oneof lastModifiedBy_ { string lastModifiedBy = 5;}
oneof tag_ { string tag = 6;}
oneof uuid_ { string uuid = 7;}
oneof partitionUnid_ { int32 partitionUnid = 8;}
oneof accountUnid_ { int32 accountUnid = 9;}
oneof name_ { string name = 10;}
oneof nodeUnid_ { int32 nodeUnid = 11;}
oneof unid_ { int32 unid = 12;}
oneof comments_ { string comments = 13;}
oneof locUnid_ { int32 locUnid = 14;}

kenlars99 on 30 Aug 2018

😄10

I tried to, but cannot, since the check 'has_value' is made private?!?
~~~
oneof dummy0 { boolean value = 1;}
~~~

erwincoumans on 30 Aug 2018

I check if the message set is equal to the generated enumeration.

ekigwana on 30 Aug 2018

Why did they make the has_ function private, when using oneof? That makes the workaround suggested here even more verbose.

erwincoumans on 30 Aug 2018

👍1

I tried to defend the case : https://github.com/protocolbuffers/protobuf/issues/2984 but... :cry:
Don't hesitate to upvote/comment.

cbornet on 30 Aug 2018

👍1

Amazone: it seems @acozzette is contradicting himself, read above, recommending oneof:
~~
For proto3 I think currently the best approach is to either use the wrapper types
or oneof fields in cases where you need nullability.
~~
And here in #2984 telling it is not their intended use:
~~~
Another thing is that although oneofs can be useful for providing nullability,
that is not really their intended use.
~~~
So you tell that oneof is the best approach, but at the same time you make it very difficult/involving to use oneof for this. Please listen to the community and make it easier to check for nullability.

erwincoumans on 30 Aug 2018

👍4

in short, the solution is onethousandof { oneof }

kenlars99 on 30 Aug 2018

😄4

Yes, not only thousands of oneof, but also thousands of
~~~~~
if (dummy_case()== ::some_grpc::some_field::HasDummy::kUseValue)
~~~~~
instead of simply
~~~~~
some_grpc->hasValue()
~~~~~

Has someone come up with a macro that expands some 'hasValue' into a case/constant check, to workaround this foolishness?

erwincoumans on 30 Aug 2018

👍1

I'm sad to be at the bottom of this thread without seeing a good idea for supporting presence on enum fields.
This is creates a big problem at my company.

briansorahan on 13 Sep 2018

👍13

When can we expect proto 4 that will support a basic feature required by basically everyone? (even though some people are pretending that nobody needs it by yelling and putting fingers in their ears)

boreq on 15 Sep 2018

👍14

Yes, not only thousands of oneof, but also thousands of
if (dummy_case()== ::some_grpc::some_field::HasDummy::kUseValue)
instead of simply
some_grpc->hasValue()
Has someone come up with a macro that expands some 'hasValue' into a case/constant check, to workaround this foolishness?

@erwincoumans I just do this:
if (foo.dummy_case()) as there will be a ONEOF_DUMMY_NOT_SET = 0 enum defined that will evaluate as false if it's not set.

korhadris on 10 Oct 2018

👍1

Given the maintainers' intransigence, maybe the open source community can fork this project and add the has_foo methods to the code generators. The two projects would continue to be wire-compatible since the only changes are in the codegen.

sharvil on 11 Oct 2018

👍1

What was the problem being solved when nullable fields were removed? I've read in this thread that it promotes bad coding practice but that sounds like a non-answer to me.

When working on v3, what was the problem that prompted removing this established feature?

kieraneglin on 15 Oct 2018

👍1

This change was made before I started working on protobuf, but from what I understand the main motivation was to enable an "open struct" kind of implementation, which is what the Go implementation ended up using. The idea is that you can represent a message as a struct with simple public data members for fields. This doesn't work well if primitive fields can be nullable, since then you can't directly access the data from the struct without going through a pointer indirection.

acozzette on 15 Oct 2018

For my .02:

I'm prototyping an API that has a bunch of optional double and string values on it. the obvious option would be to make them nullable, but, given that I cant, both have a reasonable null-like default: NaN and _empty-string_ respectively. The latter should work reasonably well since its protobuf's default anyways. But 0.0 to NaN is a problem: on our domain 0.0 is a perfectly valid (and otherwise indistinguishably correct) value. And I'm not looking forward to the "oh I forgot to set the foo.bar value to NaN, leaving it as its default 0.0, so I got a funny behaviour" bug.

Is there really no room to explore a further solution? What about a macro nullable that expands to the oneof solution? Can we _the project owners_ make the import facility more powerful or otherwise include a nullability plugin similar to the grpc plugin to add this functionality?

Groostav on 1 Nov 2018

Any chance to see this issue reopened someday?
I know that the official decision is to take no action, but could you have a rethink about it?

laymain on 28 Jan 2019

👀11 👍2

As a workaround Google provides wrappers

Example:

syntax = "proto3";

package mypackage;

import "google/protobuf/wrappers.proto";

message DummyMessage {
    google.protobuf.BoolValue consent = 1;
    google.protobuf.StringValue name = 2;
    google.protobuf.Int32Value age = 3;
    google.protobuf.FloatValue size = 4;
}

which gives, when translated into Java using protoc:

DummyMessage message = ...;
if (message.hasConsent()) {
  boolean consent = message.getConsent();
}
if (message.hasName()) {
  String name = message.getName();
}
if (message.hasAge()) {
  int age = message.getAge();
}
if (message.hasSize()) {
  float size = message.getSize();
}

laymain on 29 Jan 2019

👍5

If you start with Proto3 Wrappers could be used. However, they take up more memory and are ugly to work with.

Bring has back! Why if everything is optional would we possibly assume everything has a value. This is particularly problematic for storing values in a DB. If a developer makes a mistake and doesn't check a has method the default value is returned and they successfully store a bad value in the database. Why does get return a value on an optional? Return null, throw and exception, not a default value.

efenglu on 12 Feb 2019

I agree that it is error prone, it's a workaround. I still want to see a correct nullable/optional values handling here

laymain on 12 Feb 2019

Even with Wrappers how should I deal with the case I outlined above?

If a developer makes a mistake and doesn't check a has method the default value is returned and they successfully store a bad value in the database.

Why does get return a value on an optional?

I'd prefer a failfast approach here. Something akin to java.util.optional. If a developer calls get on an optional that is null an exception is thrown.

efenglu on 12 Feb 2019

Wrappers are null by default and have hazzer methods. But they still suck...

cbornet on 12 Feb 2019

Even with Wrappers how should I deal with the case I outlined above?

You can't, as I said, I agree with you, wrappers are error prone

laymain on 12 Feb 2019

Wrappers are null by default and have hazzer methods. But they still suck...

They are and they aren't null. They are null IFF you check the has method. If you go strait to the get method, (a developer coding wrong), they are NOT null. In fact they have a value of default object value. For example Int32Value would return 0.

Yikes!

So it is both Null and not null!

efenglu on 12 Feb 2019

😄1

For those of you playing at home I've opened a separate issue here #5697

efenglu on 12 Feb 2019

It's strange that we went from default all primitives to pointers! to no way in hell will you get primitive pointers!. There are very real and common use cases in which an application or human needs to differentiate between 0 and unset. Creating wrappers for this introduces other challenges like needing to put hacks into your protoc plugins to treat wrapped values as primitives, and serializing to other data formats such a JSON. I wish this is a decision the maintainers would reconsider. We have been exploring other modeling techniques due to rigidity factors such as this

wikiwong on 19 Feb 2019

👍13 👀4

I'm starting a brand new project but I'm going to use proto2 instead of proto3 even though proto3 is the "latest" version and has more features.

To me, the major reason to use protobuf is to enable communication between different clients of different languages and different versions. It's very important that an old client can talk to a new client in a meaningful way. But proto3 provides default values that are indistinguishable from empty values. The other client may not be aware of at all. and the new client won't be able to tell that. It may also be useful to specify only a few fields, like in a SQL where clause.

The problem is that the default values that proto3 chooses are unintuitive and almost never what an application expects. As someone said, it's effectively no support for default values at all. Messages are like a sentence and fields are like adjectives, so if you want to describe something like "big blue bottle that costs $5" but you don't know the exact color, size and price, you are never going to expect default values like "small black bottle that costs $0".

vincent-163 on 21 Mar 2019

👍18

Using protobuf 2 and wanted to migrate to protobuf 3 but this single subject is really cumbersome ...
The response from the team seems hacky at best when this seems like a core issue for people using protobuf to make different languages talk with each other ...

TehBakker on 9 May 2019

👍3

Proto3: The Microsoft Vista of Protobuf.

odyth on 18 May 2019

👍45 😄2

@xfxyjwf I'm curious how you plan to get data on whether or not people intend to use field presence in proto3 if you don't support field presence in proto3?

Is there a strawpoll I can get my name into?!?

Nonetheless, given no official avenue to offer up that data, consider this comment, along with the uncountable comments (and downvotes on your post) since your post as data to support presence.

That is all.

Cruuncher on 3 Jul 2019

😄1 👍1

I was going to use proto3 for old client app that does not use protobuf now but custom format for serialization but has default values for each field (which is not always false for booleans, or 0 for numbers).

After reading all this, I take a step back.

Adding oneof to 23 optional fields in my proto file increased the JS client size from 40KB to 60KB.
The code is littered with structures like this:

proto.dmconf.DmConf.ShowtopheaderoptionalCase = {
  SHOWTOPHEADEROPTIONAL_NOT_SET: 0,
  SHOWTOPHEADER: 1
};

Only to add 1 bit of information more.

Also the proto file looks ugly and there is a lot of keyword duplication:

oneof showHeaderOptional {
   bool showHeader = 2;
}
oneof showFooterOptional {
   bool showFooter = 3;
}

If the team would offer support for built-in optional it would be far better because we would not need to ask these questions :

Am I using the "right" workaround?
Is the "right" workaround providing good serialization form?
Is the workaround inflating my client code?

pstanoev on 22 Jul 2019

Obviously, very many have issues with the absence of nullability support - so why this issue is closed? the fact that there is no solution doesn't matter. Is that also a sign of bad project management?

idntfy on 23 Jul 2019

👍3

Obviously, very many have issues with the absence of nullability support - so why this issue is closed? the fact that there is no solution doesn't matter. Is that also a sign of bad project management?

oneof is a design choice of the project. Avro and JSONSchema also provide nulls via union types. If you don't like this design decision you have an option to choose another serialization format (like flatbuffers).

Blaming others is easier.

DXist on 23 Jul 2019

👎8 😕2

FWIW if you're like me and had to use proto3 in GRPC and want to know what fields were set by clients, we took inspiration from the suggested has_field boolean
https://github.com/protocolbuffers/protobuf/issues/1606#issuecomment-281832148

But instead of adding an explicit Boolean per field or mass oneOf's for each message we included
a FieldMask which contained a all paths of fields that were explicitly set. (httpss://developers.google.com/protocol-buffers/docs/reference/csharp/class/google/protobuf/well-known-types/field-mask)

And client libraries essentially would append to the field masks whenever they set a field.

So the has_field methods essentially just did a lookup of this path list to know if the client set it or not.

gary-lo on 23 Jul 2019

👍2

I wonder what internal Google users are saying about this. Does Google not have this issue?

GSPP on 24 Jul 2019

👍19

None of the proposed workarounds is acceptable for me as either the readability of the protos suffer (when wrappers are used) and/ or the size is blown up (needed by added bitmap or boolean per field).

What I don't understand: According to the documentation Protobuf 3 does not encode fields that are not set; instead they are filled with the default value of field the type when decoded (for example false for booleans). Why can't this information be exploited by the client and exposed with an wasSet() method for example? For my purpose this would be enough as I'm only interested whether the value was set.

Also the argument that this is a rare case as put forward by some Googlers is not comprehensible for me. Think about a system were you send delta-updates for entities like and address over the wire. Of course users can update only a subset of the fields and then it's more efficient to send only the delta.

vanthome on 20 Jan 2020

👍4

This issue was a non starter for us. But we DID find a solution. We forked the protoc compiler.

With a ONE LINE modification we brought back has support for primitive fields. (At least for Java)

src/google/protobuf/compiler/java/java_helpers.h (347)

inline bool SupportFieldPresence(const FileDescriptor* descriptor) {
-  return descriptor->syntax() != FileDescriptor::SYNTAX_PROTO3;
+  return true;
}

I'm still at a loss as to why this is such a big sicking point to NOT support. Proto2 had support, and we could very easily add support to proto3. I get removing the concept of default values and required fields. BUT this is different as outlined by all the comments above.

Google, PLEASE, come down from the ivory tower and help us commoners get work done.

efenglu on 21 Jan 2020

👍4 😄1

The question is simple: should we stick with Proto2 in this scenario? Proto2 would be deprecated soon or it's just another flavor of Protocol Buffers and it'll be maintained like Proto3?

I'm starting a new project and this is a big concern.

Thanks in advance!

perezzini on 21 Jan 2020

The question is simple: should we stick with Proto2 in this scenario? Proto2 would be deprecated soon or it's just another flavor of Protocol Buffers and it'll be maintained like Proto3?

I'm starting a new project and this is a big concern.

Thanks in advance!

According to the Protobuf Google Groups, proto3 is not a superset of proto2, therefore proto2 will not deprecated anytime soon and is still actively being developed (although this message is from 2017). And while the Guide recommends switching to proto3, it also mentions that proto2 will continue to be supported.

daniel-shuy on 21 Jan 2020

Thanks for your response, @daniel-shuy . I wrote a message with this concern in the Google Groups thread (the messages are all from 2017, maybe they've changed of opinions).

I'll be commenting as soon as I receive some kind of response!

perezzini on 21 Jan 2020

From @acozzette in Google Groups thread: "Right, we are still maintaining both proto2 and proto3 and plan to keep supporting both flavors indefinitely."

perezzini on 21 Jan 2020

👍1

Sounds like there are two Google camps and lines were drawn a long time ago

ekigwana on 21 Jan 2020

Can please someone answer to my previous question:

What I don't understand: According to the documentation Protobuf 3 does not encode fields that are not set; instead they are filled with the default value of field the type when decoded (for example false for booleans). Why can't this information be exploited by gRPC clients and exposed with an wasSet() method for example? For my purpose this would be enough as I'm only interested whether the value was set.

vanthome on 10 Feb 2020

Because it also doesn't encode fields that are set to their default value, so there is no difference in the encoding between "not set" and "equal to the default value".

gregmarr on 10 Feb 2020

👍1

@gregmarr ok, that makes sense of course.

vanthome on 11 Feb 2020

Even if proto2 is not deprecated anytime soon, library writers don't want the pain to have to maintain two library versions for eternity, and many would rather just support the newest one. Not supporting nulls is dumb.

domino14 on 12 Feb 2020

...and not to mention that you are forced to use Proto3 when want to use gRPC. In summary I would say proto2 is only a viable solution in certain cases.

vanthome on 12 Feb 2020

In the end we decided against using ProtoBuff in our project due to this.
I fail to see how such primitive feature that can be brought back with small modification is seemingly unimportant for the development team.
This is the most commented closed issue in the repo currently.

pstanoev on 12 Feb 2020

👍1

Even if proto2 is not deprecated anytime soon, library writers don't want the pain to have to maintain two library versions for eternity, and many would rather just support the newest one. Not supporting nulls is dumb.

An example is parquet-protobuf library which now supports only protobuf 3.

qinghui-xu on 13 Feb 2020

I'm struggling with the exact same problem everyone else here is too. I want to build a gRPC API to an existing and complicated DB data model that allows the vast majority of fields to be unspecified (i.e. - NULL) and I can't assume that the default value for scalars means unspecified.

My options seem to be:

1) use proto2
2) use Well-Known-Type wrapper messages
3) use singleton oneofs
4) use additional has_xyz bools everywhere per field
5) embed a map or FieldMask into every message type to indicate which fields are actually set or not

Honestly, 1. seems the cleanest and most attractive, although I read some concerns above that gRPC requires proto3? I'm also concerned that more work is being put into proto3 so proto2 will atrophy over time?

On 2., I don't really care much about the extra byte or two per field on the wire. My concern is more about the code that interacts with these types. I have to jump through a submessage and use .value everywhere now rather than just directly operating on the thing. Or you've written a giant special case into the system for WKTs to get special treatment to act as their underlying scalars, which would be hilarious if that was your "fix." Also, why aren't there WKT wrapper classes for ALL of the basic scalar types (e.g. - sint32 because int32 is terribly implemented)???

On 3., this is surely ugly in the .proto file, but I could live with that if it didn't complicate the underlying code much. People above seem to be saying that it does. I'll take a look.

On 4., this is me manually doing something ugly that should be done automatically and is highly error prone.

On 5., this is slightly nicer than 4. but is subject to the same problem of being manual and highly error prone.

jschultz410 on 20 Feb 2020

👍3

One more question: what is wrong with what @efenglu did above?

Here: https://github.com/protocolbuffers/protobuf/issues/1606#issuecomment-576463010

It seems to me that the real design mistake in proto3 was the decision that default values should not be serialized. In the hopes of optimizing serialization costs you greatly complicated the lives and application code of very many people as the above thread documents.

proto2's design decision that only set fields (default value or otherwise) should be serialized is far more sensible.

jschultz410 on 20 Feb 2020

Thx @jschultz410 for this nice write up of the situation and the good arguments why this must be resolved. I see one more option:

Google or the community defines Protocol Buffers 4 (or 3.1)which brings back this feature if compatibility is a concern.

vanthome on 21 Feb 2020

Dear users,

On behalf the pröto3 team, I'm happy to announce that, we feel your pain, that's why we are rolling out proto3++, which will completely blow your mind with these features:

A completely mind blowing new message type called Maybe, with the introducing of Maybe, the infamous null will be dis-loved and throw to trash, every field in a message definition, be it a primitive type such as int32, or a message definition, must be set (except when the message is annotated with a new glorious delta annotation). There are two kinds of Maybe: Nothing for when you want to leave the field as "null", or Just(x) for when you want it to contain a value x
When a message definition Foo is annotated with a delta annotation, the compiler will automatically derive a new message named FooDelta, which has the same filed definitions with Foo, with two differences, a) every field in FooDelta will have an associated method is_set() b) a field in FooDelta, unlike in an ordinary message, can be left unset, resulting is_set() == false
FooDelta can be sent across wire, on the receiving end, there is a method called to_Foo(), a call of FooDelta.toFoo(foo :: Foo) will return a new Foo, when a field is set in FooDelta, the final Foo will have that value, otherwise, it will have the same value with foo
We also introducing a completely new glorified version of the enum, called sum, with this, you can write enums with fields, how awesome is that!:

sum Animal {
  Human {
  string name = 1;
  DateTime dob = 2;
  },
  Dog {
  Breed = 1; // puns, this is equivalent to `Breed breed = 1;`
  },
  Cat
}

I hope you enjoy using proto3++, because once we loose passion about it, we will abandon it.

qwfy on 21 Feb 2020

😄6 😕2 ❤1

Yup basically, its a super leaky abstraction and imparts design decisions about what goes on the wire to the end user. If you have to start the explanation with 'because the wire format....' you messed up.

Not sending the 'set' bit is a performance optimisation that your average user doesn't care about or even need. I see the benefit it could bring, but IMO it should be an extra/power user feature that you can enable - not the default.

samskiter on 21 Feb 2020

👍1

@samskiter Yeah, the design decision here was pretty weird.

proto3 made the correct decision that basically every field should be optional and that required fields were a bad idea. Ok, that makes complete sense.

It also makes complete sense that if a field isn't set to some value, then there is no need for it to be serialized and go on the wire. If you just create a Message, don't set any of its fields and send it, then there is no reason to serialize all of the (optional) fields that weren't set. On the reading end, the parser can see that none of those fields were set too and very nicely reflect that information up to the user. This is basically how proto2 works for optional fields.

But then someone went one step further and said "Aha! We can save some more on serialization if a field is set to its default value by also not serializing those fields too!" Yes, that can save a bit more on serialization, but it also greatly degrades the basic ability to check if a field was explicitly set or not because now you can't distinguish between 'field not set' versus 'field set but to default value.'

For bonus weirdness points, this very common desire to unambiguously check for field presence or absence is still kinda, sorta supported. It seems the proto3 answer is "If you DB people really want to do this ugly thing that we do not like, then you can abuse the oneof feature to get back that functionality and it really won't cost anything on the wire or cause many other problems either. But no, we will not add a nicer syntax (even just sugar) for this common use case because we don't like it. Nor will we advertise that this is the preferred solution for testing for scalar field presence / absence. People will have to somehow figure this out the hard way. Nor will we make working with oneofs easier in languages like Java because we really don't like what you are trying to do here."

jschultz410 on 21 Feb 2020

👍8

Yeah proto3 is screwed up. Stay on proto2 !

cbornet on 21 Feb 2020

@cbornet I'm going to use proto3 since that is where most of the development effort is going, but just abuse oneof like crazy. For Python generated code at least, this will behave very much proto2. Sorry about Java not getting a similarly clean API for oneofs.

The really disappointing thing is that it shouldn't have taken me a treasure hunt on github closed issues to figure out that this is a good answer to this problem.

It should be pointed out quite clearly in the documentation and supported (preferably with a newer, cleaner syntax, and easy APIs in all supported languages).

jschultz410 on 21 Feb 2020

You can use proto2 with gRPC. Heck, you don't even have to use protobuf

daniel-shuy on 21 Feb 2020

@daniel-shuy Very true, but you would think the proto3 people would care somewhat about an issue that is driving people away from using their solution, especially when they already kinda, sorta have an obscured fix for it (i.e. - singleton oneofs).

jschultz410 on 21 Feb 2020

What would be great is if someone created a tool that would take a proto2 .proto file and generate the equivalent proto3 .proto file with the oneof wrappers. That way you can use proto2 or proto3. We need to use proto2 in C++, but we use proto3 in C#, using the "equivalent" protocol.

z9security on 21 Feb 2020

👍1

All, not trying to plug myself but if you are looking for a good run down of how we've dealt with protobuf v3 and null support I've written up a pretty detailed article with examples. Would be willing to talk more with people if you need help.

Protobuf and Null Support

efenglu on 25 Feb 2020

👍1

Thanks for the writeup Erik!

I too quickly realized that even if you use oneof singletons to be able to tell if a scalar field was explicitly set or not, then there are still some wrinkles in interpreting the meaning of set versus unset.

In particular, if a field is unset, which at least you can now detect, then does that mean the user is really trying to specify that value as null or do they just not care about that field for this operation?

I kind of like what you did with the oneof explicitly containing a null value or a field value to get past that secondary ambiguity. On the other hand, it looks like it might complicate the code more than I'd like. I haven't decided which way to go on that point myself yet.

jschultz410 on 25 Feb 2020

In particular, if a field is unset, which at least you can now detect, then does that mean the user is really trying to specify that value as null or do they just not care about that field for this operation?

I believe this nuance is one of the primary reasons the protobuf team decided to leave to remove the isSet flag from proto3. Not saying in totally agree with that design decision, but I get that force people's hand into deciding how to interpret default values instead of having them try decide what the semantics of an unset value means in every API.

kurtome on 25 Feb 2020

A slight twist on one of @efenglu's suggestions from his writeup above:

message Foo
{
    oneof goo_ { string goo = 1; google.protobuf.NullValue goo_null = 2; }
    oneof baz_ { string baz = 3; google.protobuf.NullValue baz_null = 4; }
}

In generated Python at least, the above is very clean and easy to use. You can ask msg.WhichOneof("goo_"), msg.HasField("goo"), and/or msg.HasField("goo_null") to explicitly detect field absence / presence. You can directly operate on msg.goo as a string. It doesn't require you to define any wrapper classes and then hop through them to operate on the underlying thing. This also doesn't incur any more overhead on the wire and can be backwards compatible with proto2, if you care about those things.

In this approach, you can completely distinguish between 'unspecified,' 'specified with some normal value,' and 'specified as null' without any ambiguity, nor needing to interpret absence as meaning anything other than 'unspecified.' Then how your API handles 'unspecified' versus 'specified' in any particular context can be very explicit and clear cut.

However, I see at least a few drawbacks to this approach: 1) the one logical field you are after is now somewhat split across two fields (and 3 names) that you have to treat as a pair, 2) other languages aren't as friendly as Python with oneofs, 3) repeated fields can't exist inside oneofs, and 4) this isn't the typical / recommended way people do this.

On 2), dealing with oneof's and scalars inside oneof's aren't as friendly in other generated languages such as Java.

On 3), this might not be important, but because repeated fields can't exist inside a oneof, you would probably want to instead add an explicit bool field to indicate whether an empty list means 'unspecified' or 'specified but empty.' The alternative is to wrap the repeated field in a message, which can allow you to distinguish between 'specified' and 'unspecified.' But wrapping every such repeated field in a message is fugly. Then again, so is this:

message Foo
{
    repeated string bars     = 1;
    bool            bars_set = 2;  // only has meaning when bars is empty, else ignored
}

All of that being said, on 4) it seems the typically recommended way to do this is to instead use wrapper messages for everything. Then use the presence / absence of the wrapper message in combination with some kind of explicit re-specification (e.g. - google.protobuf.FieldMask) to determine the full meaning of the field.

So, for example, to set a string field to be NULL in a database, you'd have the string be wrapped inside a google.protobuf.StringValue, you'd have a google.protobuf.FieldMask specify that you really are operating on this StringValue by field name, and then its absence in the containing message can be interpreted as meaning NULL.

This seems pretty ridiculous to me, but there you are. The fact that everyone uses FieldMasks everywhere to explicitly specify the field names (!!!) that are being operated on tells me that there is a fundamental representational problem going on here. First, explicitly listing out all of the field names in a FieldMask is getting us right back towards an explicit and verbose specification like JSON!!! Also, because we are listing field names rather than field numbers, we are effectively forcing in and locking down field names in the serialization schema too. Second, the root of this problem, again, seems to be the design decision to be unable to distinguish between 'unspecified' and 'default' after serialization. For example, I see no good reason why on an update for a resource, I couldn't just specify the fields that I want to update in the resource (e.g. - one single string field out of many potential fields) and have the server be able to understand that I was only interested in operating on those fields without any ambiguity at all:

message Foo
{
    string name  = 1;
    sint32 bling = 2;
    Goo    blang = 3;
    ...
    string blarg = 22;
    ...
}

message UpdateFooRequest
{
    string options = 1;
    Foo    foo     = 2;
}

In C++ client pseudo-code:

UpdateFooRequest msg;  // nothing is specified; EVERYTHING IS **ACTUALLY** OPTIONAL

msg.set_foo();
msg.foo.set_name("TheResourceName");
msg.foo.set_blarg("Just update me!");
send(msg);

In C++ server pseudo-code:

UpdateFooRequest msg = recv();

// NOTE: we could instead iterate over msg.fields() like we do over msg.foo.fields() below but we 
// want to ensure we handle 'options' first here and this code only cares about 'options' and 'foo'

if ( msg.has_options() )
    handle_options( msg );

if ( msg.has_foo() )
{
    FooRecord rec = getRecord( msg.foo.get_name() );  // get_name() would throw an exception if name was not specified

    for ( it = msg.foo.fields(); it != msg.foo.fields_end(); ++it )
    {
        switch ( it->fieldNumber() )
        {
        case Foo.NAME:                                  break;
        case Foo.BLING: handle_update_bling(rec, it);   break;
        case Foo.BLANG: handle_update_blang(rec, it);   break;
        ...
        case Foo.BLARG: handle_update_blarg(rec, it);   break;
        default:        handle_update_unknown(rec, it); break;
        }
    }
}

Ok, I've wasted enough time agonizing over this. I guess I'll go with the typical recommendation of using wrapper classes, crappy FieldMasks everywhere, and treating absence as NULL in proper context.

jschultz410 on 29 Feb 2020

❤1

TL;DR: for the love of God, if there ever is a proto4, then PLEASE, PLEASE, PLEASE reconsider (1) explicitly and fully supporting the notion of null similar to the way JSON does, (2) make it easy to distinguish between 'unspecified' and 'specified' fields (regardless of their value!!!) without ambiguity, and (3) make it easy to iterate over just the specified fields.

I completely understand that you want to make it very easy to interact with the protobuf representational objects in every language, but that goal should not effectively force your hand into making everything always present and specified but with default value (that won't be serialized), which causes a whole host of other problems as this thread demonstrates.

The fact that everyone uses FieldMasks everywhere in protobuf based APIs to explicitly and poorly (re)specify the field names they are actually interested in is a GIANT hint that your basic representational approach isn't cutting it for this most common of use cases.

jschultz410 on 29 Feb 2020

👍14

But proto3 does support null??? If you only pass a string of your serialized class🤣🤣🤣

devhl-labs on 18 Apr 2020

On behalf of the protobuf team, I'm happy to report that field presence is coming to proto3. It will be experimental in the upcoming protobuf 3.12 release, and will be generally available hopefully in 3.13.

For more info please see:

Please give it a try once 3.12 is released and let us know if you run into any issues.

haberman on 23 Apr 2020

🎉29 ❤9 🚀7

Huzzah! The Null gods win again! TAKE THAT GOLANG

_[Note from maintainers: Sorry for the hijack, but a quick note... the API for testing presence isn't null-based per se. Please see the notes @haberman referenced for per-language details. -- @dlj-NaN ]_

kgreav on 24 Apr 2020

👎1

They didn't go that far @kgreav ... at least not yet! :)

They just added back the ability to detect whether a field was actually set or not, regardless of its value. Basically, they added some nice syntactic sugar for automagically wrapping a field in a oneof, which is how I skinned this cat explicitly.

jschultz410 on 24 Apr 2020

something something battle but not the war.... yes, I'd still like to be able to tell the difference between a 0 and a present but null number, but that would require proto4.

at least we won't have to use all the other hideous solutions like field masks and oneofs.

kgreav on 24 Apr 2020

👀1 ❤1

@kgreav I'm in too deep already! I don't think I can turn back now! :)

My DB API has something like 30 resources and probably 150 message types with tons of:

message Foo
{
    oneof name_ { string name = 1; }
    oneof goo_  { string goo  = 2; google.protobuf.NullValue goo_null = 3; }
    oneof baz_  { sint32 baz  = 4; google.protobuf.NullValue baz_null = 5; }
    repeated      string bars = 6; bool bars_set = 7;  // bars_set only has meaning if bars is empty
}

message GetFooRequest
{
    string name            = 1;
    Foo    field_mask      = 2;  // the fields of Foo that should be returned as set
    bool   field_mask_pstv = 3;  // by default the empty field_mask is negated
}

This way, for every resource data field, I can differentiate between 'unspecified', 'specified as normal value (including defaults!)', and 'specified as null.'

I have no need for FieldMasks. My equivalent to FieldMasks is to simply send a message and use field presence to indicate the fields that should be operated on and field absence to indicate the fields that should be ignored. Field mask negation is through an explicit companion boolean.

If I tried to change my design now, then I think my dev team would mutiny.

jschultz410 on 24 Apr 2020

I'll humbly take credit for inspiring this change. You're welcome everyone!

For real though, thanks to the team for this change.

devhl-labs on 24 Apr 2020

Huzzah, get ready for required optionals again!

// Required
optional int32 foo = 1;

samskiter on 24 Apr 2020

👍3

Five years of pain coming to an end!

nilsonsfj on 28 Apr 2020

🎉8 😕2 😄2

Huzzah, get ready for required optionals again!
// Required
optional int32 foo = 1;

Frustrating but working. I hope this is not required in 3.13.

marshallma21 on 29 May 2020

Thanks for the effort on this! Does this got released (not experimental) any more on [https://github.com/protocolbuffers/protobuf/releases/tag/v3.13.0](v3.13.0]? I did not see it on the Release page and wanted to double check

yulrizka on 21 Oct 2020

Funny, we just added very similar support for null scalars (not present, not default) in FlatBuffers: https://github.com/google/flatbuffers/issues/6014

aardappel on 21 Oct 2020

Will encoders be obliged to write every field they know about, regardless of whether it is equivalent to a zero? That is, will checking for presence of a field indicate whether the write-side knew of the existence of the field?

kriskowal on 21 Oct 2020

@yulrizka Proto3 optional is still guarded by an experimental flag in the 3.13 release.

@kriskowal No, encoders with proto3 optional fields will work the same way as optional fields in proto2. The field will be written only if it was present, and the notion of presence is unrelated to whether the value is 0 or not.

acozzette on 22 Oct 2020

👍1

Is there any targeted future release version for field presence feature to ship without the experimental flag?

smund01 on 19 Nov 2020

👍4

Protobuf: Missing value/null support for scalar value types in proto 3

Most helpful comment

All 148 comments

>

Related issues