What version of protobuf and what language are you using?
Version: v3.6.0
Language: any
What operating system (Linux, Windows, ...) and version?
n/a
What runtime / compiler are you using (e.g., python version or gcc version)
n/a
What did you do?
Steps to reproduce the behavior:
test.proto:
syntax = "proto3";
package testing;
message Animal {
string name = 1;
bool has_hooves = 2;
int32 birth_year = 3 [json_name = "year_of_birth"];
}
Command:
protoc -o /dev/stdout test.proto | protoc --decode=google.protobuf.FileDescriptorSet descriptor.proto
Output:
file {
name: "test.proto"
package: "testing"
message_type {
name: "Animal"
field {
name: "name"
number: 1
label: LABEL_OPTIONAL
type: TYPE_STRING
json_name: "name"
}
field {
name: "has_hooves"
number: 2
label: LABEL_OPTIONAL
type: TYPE_BOOL
json_name: "hasHooves"
}
field {
name: "birth_year"
number: 3
label: LABEL_OPTIONAL
type: TYPE_INT32
json_name: "year_of_birth"
}
}
syntax: "proto3"
}
What did you expect to see
It should be possible to tell whether json_name was specified in the .proto file. In particular, there should be a way to distinguish that json_name was specified for birth_year, and not specified for has_hooves.
What did you see instead?
It is impossible to tell whether json_name was specified in the .proto file, because protoc invents a json_name for each field, regardless.
Anything else
jsonpb, there is an OrigName option. But it's all-or-nothing. If you opt out of the java-ish names, you also opt out of being able to override a name..proto file from the descriptor.The one thing this issue fails to address is how this works in a language neutral way. If two language runtimes fail to make the same default json field name because of a quirk with one runtime then they can't work together. Sure that might be a bug with the quirked runtime, but the whole issue could just be avoided if the runtime was given the same json name as the other runtimes via protoc.
On top of that, calculating a json name takes time. It may not be a long time, but it is time and memory. It's not time and memory to calculate it ahead of time and reference it as a constant value (depending on the runtime).
The last point doesn't seem like much of an issue. You should be distributing .proto files to anyone that needs them. The only people I know that can't get .proto files distributed to them are people writing Steam bots and Pokemon Go bots. Feel free to list any legitimate reasons people should be able to more easily reverse engineer a .proto file from a descriptor.
The biggest complaint is that it makes it impossible to tell whether a user specified a json_name or not. I suppose it would be possible to add an additional bool explicit_json_name field or something.
We have a lot of code still on protobuf 3.0.0, and when we try to upgrade to a newer protobuf release, all the json field names change. We can use the OrigName option in jsonpb, but that will completely disallow _any_ fieldname customization. What we really need is a PreferOrigName option that still allows explicit field renaming. But the fact that protoc is synthesizing stuff into the descriptor precludes the possibility of implementing such an option, either in upstream golang/protobuf or even in our own custom serialization code.
As far as the time burden of calculating json names… in compiled runtimes that often happens at generation time (it does in Go, currently), and even in interpreted languages, it's surely highly cacheable. Besides, they all have to have the code/machinery to be able to handle descriptors that lack json_name anyway, so it's already there.
It's true that recreating a .proto file from the descriptor is uncommon (although they do/can include all kinds of comment and line number information in the SourceCodeInfo message, which seems built for that purpose); I guess I was trying to articulate the fact that it “feels wrong” to introduce extra synthetic information into the descriptor.
https://github.com/protocolbuffers/protobuf/blob/master/src/google/protobuf/compiler/command_line_interface.cc#L1972
I think we can have a command line option to decide whether to include json name in the output. @acozzette how do you think?
Yeah, I think a commandline option is perfect. It should always reflect the json_name if it was specified in the .proto file, but the commandline option could turn off the synthesis of googleCase defaults.
Any updates on this? Would sending a PR help?
Sorry this slipped off my radar. Usually we try to avoid new options but maybe it is justified in this case. I'm guessing if we just removed json_name unconditionally then things might break. @haon4 What do you think?
I don't think we should change the current behavior as users are already relying on it. I am also not a fan of adding more command line option, as this doesn't seem like a common feature that other people are asking for, and I really don't like to add more switch for one use-case.
Can this problem be solved at the application level rather than at the protobuf level? Does it really matter if the json_name was explicitly specified or not? If you make the lowerCamelCase transformation on the field name and that matches the json_name option, can you assume that the json_name is not specified? Even if the json_name is explicitly specified in that case, does it matter?
I thought I described why earlier, but no, it is not solvable at an application level, because the way protoc currently populates json_name for _all_ fields makes it impossible to tell whether any given field actually had a json_name property.
A commandline flag would be perfect.
btw, if y'all don't add a commandline option, we will be forced to fork protoc internally, and add one 😞 Either that or specify a json_name on every single existing field that includes underscores (also 😞). I'm happy to rough together a PR adding a commandline option, if that would help 🙂.
Fair enough. If you can put together a PR for this command line option, it may help. Point taken that this can reduce the size bloat on the serialized descriptor, and most languages should already handle the fallback in case json_name is not there, so I think that may be something we would want to enable by default in the long term. I see there are still many places that are making the assumption about json_name being there though, but having the command line option may enable us to slowly knock them out.
I think removing json_name from returned FileDescriptorSets is unfortunately a breaking change at this point - external plugins could totally be relying on that to remove the need to compute json_name themselves. @jhump
Another option would be to add a field to google.protobuf.FieldDescriptorProto that is something like i.e. optional bool json_name_explicitly_set = 11; that protoc could populate, and plugins could make decisions based on this. If this was done, the builtin plugins should handle this field as well re: #6175. This seems dirty, but may actually be cleaner than a protoc command-line flag, and it is kind of how Protobuf is intended to be used (you can't change the json_name field, but you can iterate on your message definition by adding new fields).
Yea, I think having something like optional bool json_name_explicitly_set = 11; would be quite nice, actually. I would be happy with that.
On second thought, I think it has to be the negative for backwards compatibility reasons, unless I'm confusing myself, ie optional bool json_name_NOT_explicitly_set = 11;. If you don't have this, then you can't tell the difference between explicitly_set == false being because of using an old version of protoc, or because it was actually not set. I might be confusing myself though in double-negatives.
The work as I see it:
descriptor.proto, preferably before the 3.8.0 release but I doubt that's possible at this point.FileDescriptorSets.FileDescriptorSet to the builtin plugins when compiling Protobuf files directory as is passed when running --descriptor_set_in (the former does not send a set json_name to the builtin plugins)Note that for backwards compatibility, when running a plugin (both the builtins and external plugins), you need to do the following with this (in pseudo-code):
if get(jsonNameField) != "" {
// if this is false, it could be due to not being set at all with older versions of protoc,
// in which case we have to act as if the jsonName was explicitly set
if get(jsonNameNotExplicitlySetField) {
actAsIfNotSet()
} else {
actAsIfSet()
}
}
I think it has to be the negative for backwards compatibility reasons
Remember that descriptor.proto is proto2 syntax. So it could just use a default value:
optional bool json_name_configured = 11 [default=true];
Good point. One issue though is that makes the definition a tad weird - even if no FileDescriptorSet provider (ie protoc et al) would produce this, what does it mean when json_name == “” && json_name_configured == true?
string foo = 1 [json_name=""];
:P
(Related: https://github.com/protocolbuffers/protobuf/issues/5682)
Lol
I'm fine with json_name_explicitly_set with a default of false. If you're in this situation, you can probably just start using a new version of protoc. Anything I can help do to push this forward?
I think it needs to be json_name_not_explicitly_set, this does matter for being able to really reason about the value of the field, see above