Protobuf: Java Option for Null

Created on 8 Feb 2019  路  10Comments  路  Source: protocolbuffers/protobuf

What language does this apply to?
Proto3
Java Generated Code

Describe the problem you are trying to solve.
Java null support. I understand the desire to not have null. However, the fact remains many teams like, even depend on using NULL even if its considered a bad idea.

Describe the solution you'd like
Add a option:

option java_null_fields = true;

This option would modify the code generator to:

  1. In the set method if the value is null call clear, otherwise set the value. Don't throw NPE
  2. In the get method if the value is unset return null. Don't return the default object instance

Describe alternatives you've considered
I tried to create a plugin for protoc. I was successful in adding a setOrClear* method and a optional* method.

However since the get and set methods still had this NPE behavior it was very confusing to lots of our developers.

I have successfully added the change to protoc code and all the tests pass. Since this is opt in it shouldn't break any existing code.

I've also considered modifying the generated code with the maven replacer plugin. This works, however, it seems like a terrible idea to modify generated code. Not to mention it is incredibly fragile.

Additional context
Sample code:

public Builder setFieldValue(Timestamp value) {
    if (value == null) {
        clearFieldValue();
        return this;
    } else {
      ...
    }
}
public Timestamp getFieldValue() {
    return value_;
}
java

Most helpful comment

Proto4?

All 10 comments

An alternative to returning null would be to throw an exception in the get if the value is unset.

The main concern my team has is when the value is unset and get is called there is a chance we could get invalid data. Exceptions would be preferred to data that is invalid.

So I got it working more or less: https://github.com/efenglu/protobuf/tree/refactor/eenglund/getNPE

I have created an option:

  // Java Bean style of Get/Set Methods
  enum JavaBeanStyle {
    DEFAULT_INSTANCE = 1;   // Returns a default instance value, set throws NPE if passed null
    FAIL_FAST = 2;                   // Set and Get throw a Null Pointer Exception if the value is null, similar to java.util.Optional
    NULL_CLEAR = 3;              // Get returns null, set accepts null and clears the value
  }
  optional JavaBeanStyle java_bean_style = 2 [default=DEFAULT_INSTANCE];

Generated Code sample:

message TestTypes {
    string value = 1;
    google.protobuf.Timestamp changeDateTime = 2;
}
/**
   * <code>.google.protobuf.Timestamp changeDateTime = 2;</code>
   */
  public com.google.protobuf.Timestamp getChangeDateTime() {
    if (changeDateTime_ == null) {
       throw new NullPointerException("'changeDateTime' is not set");
    } else {
       return changeDateTime_;
    }
  }

if I could do the same thing somehow for primitives I would be tickled pink!

I could support primitives with proto2 syntax but proto3 doesn't have enough information

Hello, @efenglu
In fact in order to support correctly java null semantics, it's more ticky than just modifying get and set behavior. Especially, if your proto object gets serialized, all fields with default value are not sent to wire, thus when you decode the binaries to a protobuf object, you could not tell if a field is missing (thus null in java) or it should contain default value.
That being said, without a modification of the wire format (sending default values), it would be difficult to implement properly the null semantics.

I think I understand the confusion. I interpreted optional as meaning something that may or may not have a value. This is not the case. optional in this context means it has a value or the default value.

The problem I have with this is that the default value is often, if not always, wrong. This in my mind is worse than having no value at all. I would actual propose eliminate default value.

Here's why.

I have a developer write code to take a protobuf object and insert it into a database.

Lets say it looks something like this:

import "google/protobuf/timestamp.proto";
message Entry {
    string id = 1;
    google.protobuf.Timestamp createTime = 2;
    google.protobuf.Timestamp endTime = 3;
}

Now this developer isn't very good about writing tests. They get there code to work. We have a large team and the following isn't found in a code review:

preparedStmt.setInt(4, Timestamps.toMillis(entry.getCreateTime()));
preparedStmt.setInt(5, Timestamps.toMillis(entry.getEndTime()));

The above code will work. Not only that it will ALSO work when endtime is not provided. Meaning, we didn't get an error about the developer mistake and even worse we inserted BAD data into the database.

Obviously, the developer should have done a entry.hasEndTime() but is besides the point. The interface to the data is prone to developer misuse and doesn't report that misuse.

How should I deal with this?

The has*() method is available for nested MessageType's but with primitives we don't even have this option.

Unfortunately I don't think we would want an option that change the semantics of getters / setters. It is too surprising to developers when some Message's getters return default instance while others may return null or throw NPE. We made a decision to return default instance in proto3. While it may not be ideal for many cases, it is consistent everywhere, so developers who are familiar with protos can reliably tell what is going to happen. Whereas if we introduce an option, developer and code reviewer will need to look at the proto file definition to figure out what will happen in case of null. If you chain the getters like entry.getCreationTime().getSeconds(), it is very hard to notice a bug. If you depend on only the .jar file without looking at the actual proto, then you may not even know what will happen.

One not-so-elegant solution for you case would be to add some extra fields to indicate if your fields are set. You can also add some preconditions check to ensure that the data is valid before write to the database. Obviously all of these rely on the developers knowing the behavior for protobuf so they can watch out for it, but it is a one-time learning vs. never knowing what the behavior is.

Proto4?

yes, does the binary format ever get modified. it would be nice to be able to releasea server that responds with "you need to upgrade your protocol version to talk to this server" Sooooo, that I can at least start using the new binary format that passes default values and empty if null for fields that are nullable. I personally would want ease of development over wire protocol since most businesses, the developers time in development cost my business way more time, than a very compact protocol.....I mean hell, most use json as it is which is very slow. Squeezing the protocol seems kind of hard on developers.

ouch, more importantly, how do I do the variations of json then? json wire format options

"name": {defaultValue=0 for int} is on the wire
"name": null
{name not exist}

These are all used in the wild as different triggers. If the protobuf binary format can't tell the difference between these 3 things, then it can't truly do json, can it?

Was this page helpful?
0 / 5 - 0 ratings