5.3
amd64,windows
Playgroud link: http://play.golang.org/p/9j0ome9HqK
@rsc, @adg, thoughts? This surprised me. I thought we only did the case insensitive thing when there was no struct tag.
But from the ietf link above:
is looks quite diabolical for security as it is trivial to create valid JSON values that will be interpreted differently by different implementations.
Related to "differently by different implementations", we permit JSON keys twice: http://play.golang.org/p/lPgEj1T6Zk (two "alg"). That probably differs between implementations in either its output or whether it's even successful.
This has been the behavior at least as far back as Go 1.2 (I can't run
earlier on my Mac).
The docs also seem to state quite clearly that this is what happens:
To unmarshal JSON into a struct, Unmarshal matches incoming object keys to
the keys used by Marshal (either the struct field name or its tag),
preferring an exact match but also accepting a case-insensitive match.
Unmarshal will only set exported fields of the struct.
I understand there are security implications if JSON is used in security
contexts, and I was a little surprised too, but the docs are very clear. I
doubt that encoding/json's behavior should be dictated by security
concerns. FWIW, I don't believe we should go out of our way to reject
repeated fields either, especially not now. I might go so far as to argue
that using JSON in security standards is a mistake anyway, but I won't do
that here.
In any event it's too late to change the defaults. If we want to support
the security use case, maybe we could have a UseStrictNames method on
Decoder (like UseNumber).
UseStrictNames SGTM.
I would consider addressing a bunch of related issue as an option:
https://github.com/golang/go/issues/14749
https://github.com/golang/go/issues/14135
The other issues are unrelated and shouldn't be mixed in here. For one
thing, we were talking about a decoding problem and #14749 is an encoding
problem.
UseStrictNames does look like a decent way to add case-sensitive decoding support in a backward compatible way.
However, while that would make case-sensitive decodes possible, it wouldn't help make this safer mode common or the default.
How about also defining a "strict" field tag option (c.f. "omitempty")? That should allow types to be safely used via newDecoder, or Unmarshall, or when the decoding is within a library. You can define the safety in the type, without finding every place it might be decoded.
Shall I send a UseStrictNames CL? Should that apply strict names across the board, whether or not the user supplied a struct tag?
type Foo struct {
Bar string
Baz string `json:"baz"`
}
So that would only match exactly "Bar" and "baz".
@cespare If it's simple, then yes sure. But after a thought on #15314 I wonder if maybe the UseStrictNames method is the wrong approach and instead it should be a tag attribute on the field. Then it can be enforced by the author of the struct instead of the author of the unmarshal call. Specifically, something like json:"Foo,exactname"
.
Would it get super repetitive if the person had to set it on every tag name?
I want to implement that thing. It was such a disgrace when I started working with financial and ticker data which has lots of single-char keys in JSON (to be as short as possible) and I have to define all of the keys/fields in order to protect the fields of the struct to be eventually overwritten. Meh.
p.s. I'm new to Go so any assistance or mentoring would be very supportive and motivative :)
@ianlancetaylor @rsc @bradfitz anyone, guys? :)
I'm a bit confused by this issue. From the godoc that has already been quoted:
preferring an exact match but also accepting a case-insensitive match.
What part of the decoder's current implementation prefers exact matches? As far as I can tell, it accepts exact and case-insensitive matches all the same, as if the godoc was:
accepting either an exact or a case-insensitive match.
Going back to the very first playground link; if we're decoding both "alg"
and "ALG"
into Alg string `json:"alg"`
, I'd presume that "ALG"
would be ignored, as "alg"
was previously decoded in the same object and has preference.
This is still an issue - https://play.golang.org/p/uyYYa186mez :(
@npenzin, that's why this bug is still open.
I took a stab at implementing this: https://go-review.googlesource.com/c/go/+/132735
/cc @dsnet to help make a decision
I very much agree that we must do something. I'd personally prefer an encoder option instead of a struct field tag. As far as I've seen, users tend to either care about this for an entire operation, or not at all. Users could always use maps or custom unmarshalers if they want their own case sensitivity rules.
What part of the decoder's current implementation prefers exact matches? As far as I can tell, it accepts exact and case-insensitive matches all the same
The logic does direct comparison first, then falls back on unicode equal fold logic:
https://github.com/golang/go/blob/f082dbfd4f23b0c95ee1de5c2b091dad2ff6d930/src/encoding/json/decode.go#L700-L706
In regards to json
API changes, Russ is the ultimate decision maker.
That said, it seems like there's two clear APIs for this:
json.Encoder
option.Another thing to consider is that sometimes a type needs to implement UnmarshalJSON
themselves, where they call json.Unmarshal
again for some sub-value. In such situations it is a bit odd that a "SetStrictCasing" option on the top-level unmarshal has no effect on a sub-tree of the JSON input. However, my observation of that oddity is already true for Decoder.DisallowUnknownFields and Decoder.UseNumber. There is a general issue where top-level unmarshal options are not propagated to recursive calls to UnmarshalJSON
and more top-level options will only exacerbate the issue.
It doesn't seem like there's been much discussion to inform which approach to take.
Thinking about it more, I want to expand more on the problem of top-level options not propagating through recursive UnmarshalJSON
calls as we're hitting similar issues with the increase of various protobuf implementations.
Imagine the sequence of events over the lifetime of a package that I'm the author of:
Foo
that is a struct with a set of fields.Foo
in their programs with the encoding/json
package. It's not a use-case I intend, but fine, Hyrum's law.json
because there are other fields I want to add for which encoding/json
cannot handle (e.g., unmarshaling into an unexported field). Thus, I want to add a UnmarshalJSON
method.Am I allowed to add the UnmarshalJSON
method?
UnmarshalJSON
is a magic method, maybe I should at least replicate as much of the behavior of json.Unmarshal
being called on my type. However, the presence of top-level options that I don't know about makes it impossible for me to implement UnmarshalJSON
in any way that is backwards compatible.Thus, I'm actually concerned about more top-level options being added to the unmarshaler as that feature does not cooperate well with the fact that the json
package also respects the json.Unmarshaler
interface. It subverts control away from the author of types.
In that case should there maybe be an UnmarshalJSONWithOptions
or something that passes the options of the decoder to the method and is preferred over the normal UnmarshalJSON
?
Or should UseNumber
be a struct tag option as well? But DisallowUnknownFields
is a bit harder to implement as struct tag.
I know Hyrum's law but how often do people really marshal/unmarshal structs from packages that they have no control over? I know I never do since most of the time these structs have internal fields as well that are important for the state as well.
In that case should there maybe be an UnmarshalJSONWithOptions or something that passes the options of the decoder to the method and is preferred over the normal UnmarshalJSON?
UnmarshalJSONWithOptions
is a pretty costly interface for users to start implementing. While it provides flexibility in pushing down top-level options, it also has downsides:
encoding/json
package, when that wasn't the case with the simpler UnmarshalJSON
method. how often do people really marshal/unmarshal structs from packages that they have no control over?
Unfortunately, often enough. The situation I came up with above was not a hypothetical, but modeled off real problems I run into.
The logic does direct comparison first, then falls back on unicode equal fold logic
That seems like just case insensitive matching to me. I can't come up with a scenario where a case sensitive match takes precedence over a case insensitive one, from the user's perspective. Since both match, it's always the last that wins - independently of which is the case sensitive match.
Perhaps the code was meant as a quick path to avoid equalFold. Otherwise, I can't figure out its purpose.
Since both match, it's always the last that wins - independently of which is the case sensitive match.
This is demonstrably how the parser works. I agree it functions in a case-insensitive manner.
@mvdan Yes it's a fast path to try and avoid equalFold
which can be expensive in case of unicode strings.
@dsnet I see your point, but making it a struct tag option would mean the package that defined the type has to specify if it should be case sensitive or not. In your example where Foo
wasn't intended to be used with encoding/json
initially it probably wouldn't have these struct tags and would always be case insensitive.
On the one hand it feels weird if the person who maintains the package of Foo
gets to decide if my JSON has to be case sensitive or not. On the other hand that person can already enforce any rule they want by implementing UnmarshalJSON
on Foo
.
Yes it's a fast path to try and avoid equalFold which can be expensive in case of unicode strings.
Then my point is that this piece of godoc should be removed:
preferring an exact match but also accepting a case-insensitive match.
The decoder always prefers the latest match, not the case-sensitive one.
@mvdan the code is:
for i := range fields {
if bytes.Equal(ff.nameBytes, key) {
f = ff
break
}
if f == nil && ff.equalFold(ff.nameBytes, key) {
f = ff
}
}
which means that when it's an exact match (bytes.Equal
) it doesn't execute the equalFold
as we already break
out of the loop before that.
Besides this is not the point of this issue and not the point of the discussion here.
I think the documentation is clearly misleading for the current situation. The flow of recent tickets shows it. I was and still am confused by this:
Unmarshal matches incoming object keys to the keys used by Marshal ..., preferring an exact match but also accepting a case-insensitive match
I mean, technically and logically it is correct, but still confusing. Should we at least to mention that it will match the last matching value from the json whenever there is more than one possible choice?
I don't think it's 100% correct, since even if there's an exact match with the field tag it won't always choose it (like in #28190).
@mvdan pointed me here from https://github.com/golang/go/issues/36932#issue-558228750
in my opinion, this is not a doc issue, as it can lead to nondeterministic behavior (json key ordering is not guaranteed, and any golang app consuming json is at the mercy of whoever generated the json, deterministic or not).
Today I also encounted this problem. It surprised me. Apparently what the doc describes is not the same as what it performs. If we write field value in JSON string, we need the exact match, not case insensitive match.
Does anybody have some solutions for this? Now I just write all properties match all cases. But I still can not promise whether the result is right in all situation.
@fredgan the workaround is to write a custom unmarshaller using a map, and then convert the keys in the map explicitly to the type you care about.
here's an example: https://play.golang.org/p/nAz4Hs56mtB
in the above example, i used a map[string]string
, but you may need to use a map[string]interface{}
if you have heterogeneous types and use type assertions before setting them on your target struct.
Change https://golang.org/cl/221117 mentions this issue: encoding/json: clarify how we decode into structs
I've come to realise that pretty much all of my previous comments in this thread were wrong :) The current documentation is technically correct, as @rsc points out in https://github.com/golang/go/issues/14750#issuecomment-194911844, but a bit misleading, as @ysmolsky points out in https://github.com/golang/go/issues/14750#issuecomment-429574698.
I do think that many parts of the json package could be designed better, and I think the edge cases concerning missing, repeated, or case-insensitive-matching keys are some of them.
Having said that, this issue is largely not about making a breaking change to the JSON decoder, or adding an option to change its behavior. This is a documentation issue; far too many users (myself included!) were misreading the docs. I hope that my CL above fixes that problem.
I think that a proposal to modify the decoder's API or behavior should be filed separately, once this issue is fixed with docs. This thread has become too long to discuss any one particular solution, and proposals have had their own process for a few years now.
@mvdan i don't understand how the docs are correct today. they say an exact match is preferred, but it is obviously not at all. that's what throws people off. there is not preferring happening at all, it's a race case of what is encountered first.
i would be happy with a change to the docs to reflect that it's simply a case-insensitive match, with a brief mention of the proper approach if a case-sensitive match is required (writing a custom unmarshaller using a map with string keys). my concerns about it being a bug were a result of the fact that i didn't think there was a workaround to get a case-sensitive match. knowing there is a workaround, i can accept this behavior with the documentation change.
@justinrixx please read the comments on the CL above.
To summarize my exchange with @dsnet on Gerrit: Yes, the current behavior is technically correct from the point of view of the decoder, as "incoming object" can easily be understood to be the key-value pairs being parsed in the order in which they appear.
However, what Joe argues is that this is an implementation detail - the JSON spec doesn't specify object keys to be ordered in any way. We could change the underlying behavior to be closer to what most users would expect, not to stick to what's most efficient or most logical from the decoder's point of view. For example, by skipping case-insensitive matches of a field if a case-sensitive match for it already happened.
The devil is in the details, of course. Would this change slow down the decoder too much? Would it break too much existing code? That is yet to be determined, and I'd like to find some time to experiment with it in the near future.
Change https://golang.org/cl/224079 mentions this issue: encoding/json: skip inexact matches after exact matches
The CL above is the first iteration of the experiment I mentioned two weeks ago. Decoding structs is ~1% slower, but we get the benefit we want. Please test the CL and let me know if your code breaks, or if it properly fixes this issue for you.
1% performance loss is unfortunate, but I can't figure out a way around it. Alternative patches without the slow-down are welcome.
I think the CL is interesting, but it would, from my perspective, be even more useful to explicitly define an option. It's useful to be able to set that a case-insensitive match be treated as an unknown field.
As a comment on https://github.com/golang/go/issues/14750#issuecomment-422234166:
I would vote for allowing case-sensitive matching as a decoder option enabled through a method call. (UseStrictNames
, DisableEqualFold
, or some better name) over struct-tags.
I want to advocate for this solution not because it's a better solution, but because it's in line with the current design for DisallowUnknownFields
and UseNumbers
. Having one annoying -- but consistent -- API has a value; it means the same work-around can be used for all cases where a custom json.Unmarshaler
implementation is needed.
The DisallowUnknownFields
and UseNumbers
are in my opinion within the same class of problems, and that puts some restrains on how a strictly case-sensitve parser should be implemented.
Some notes:
jsonDecoder(data []byte) json.Decoder
helper-functions to reduce the boiler plate in json.Unmarshaler
implementations.json.Unmarshaler
interface that accepts passing along options (or better still, accept a pre-initialized (sub) json.Docoder
with inherited options) is certainly possible, but it's a separate issue. It is not _given_ that it's always preferred.PS! The Encoder/Decoder interfaces defined in #12001 (old issue) could serve as a way of passing along decoder option. Perhaps that problem can be discussed there, although the motivation here is different.
Most helpful comment
I think the documentation is clearly misleading for the current situation. The flow of recent tickets shows it. I was and still am confused by this:
I mean, technically and logically it is correct, but still confusing. Should we at least to mention that it will match the last matching value from the json whenever there is more than one possible choice?