Go: encoding/json: parser ignores the case of member names

Created on 10 Mar 2016  路  38Comments  路  Source: golang/go

  1. What version of Go are you using? 5.3
  2. What operating system and processor architecture are you using? amd64,windows
  3. What did you do?
    Read this: https://mailarchive.ietf.org/arch/msg/json/Ju-bwuRv-bq9IuOGzwqlV3aU9XE
  4. What did you expect to see?
    ...
  5. What did you see instead?
    ...
NeedsDecision Security

Most helpful comment

I think the documentation is clearly misleading for the current situation. The flow of recent tickets shows it. I was and still am confused by this:

Unmarshal matches incoming object keys to the keys used by Marshal ..., preferring an exact match but also accepting a case-insensitive match

I mean, technically and logically it is correct, but still confusing. Should we at least to mention that it will match the last matching value from the json whenever there is more than one possible choice?

All 38 comments

Playgroud link: http://play.golang.org/p/9j0ome9HqK

@rsc, @adg, thoughts? This surprised me. I thought we only did the case insensitive thing when there was no struct tag.

But from the ietf link above:

is looks quite diabolical for security as it is trivial to create valid JSON values that will be interpreted differently by different implementations.

Related to "differently by different implementations", we permit JSON keys twice: http://play.golang.org/p/lPgEj1T6Zk (two "alg"). That probably differs between implementations in either its output or whether it's even successful.

This has been the behavior at least as far back as Go 1.2 (I can't run
earlier on my Mac).
The docs also seem to state quite clearly that this is what happens:

To unmarshal JSON into a struct, Unmarshal matches incoming object keys to
the keys used by Marshal (either the struct field name or its tag),
preferring an exact match but also accepting a case-insensitive match.
Unmarshal will only set exported fields of the struct.

I understand there are security implications if JSON is used in security
contexts, and I was a little surprised too, but the docs are very clear. I
doubt that encoding/json's behavior should be dictated by security
concerns. FWIW, I don't believe we should go out of our way to reject
repeated fields either, especially not now. I might go so far as to argue
that using JSON in security standards is a mistake anyway, but I won't do
that here.

In any event it's too late to change the defaults. If we want to support
the security use case, maybe we could have a UseStrictNames method on
Decoder (like UseNumber).

UseStrictNames SGTM.

I would consider addressing a bunch of related issue as an option:
https://github.com/golang/go/issues/14749
https://github.com/golang/go/issues/14135

The other issues are unrelated and shouldn't be mixed in here. For one
thing, we were talking about a decoding problem and #14749 is an encoding
problem.

UseStrictNames does look like a decent way to add case-sensitive decoding support in a backward compatible way.

However, while that would make case-sensitive decodes possible, it wouldn't help make this safer mode common or the default.

How about also defining a "strict" field tag option (c.f. "omitempty")? That should allow types to be safely used via newDecoder, or Unmarshall, or when the decoding is within a library. You can define the safety in the type, without finding every place it might be decoded.

Shall I send a UseStrictNames CL? Should that apply strict names across the board, whether or not the user supplied a struct tag?

type Foo struct {
        Bar string
        Baz string `json:"baz"`
}

So that would only match exactly "Bar" and "baz".

@cespare If it's simple, then yes sure. But after a thought on #15314 I wonder if maybe the UseStrictNames method is the wrong approach and instead it should be a tag attribute on the field. Then it can be enforced by the author of the struct instead of the author of the unmarshal call. Specifically, something like json:"Foo,exactname".

Would it get super repetitive if the person had to set it on every tag name?

I want to implement that thing. It was such a disgrace when I started working with financial and ticker data which has lots of single-char keys in JSON (to be as short as possible) and I have to define all of the keys/fields in order to protect the fields of the struct to be eventually overwritten. Meh.

p.s. I'm new to Go so any assistance or mentoring would be very supportive and motivative :)

@ianlancetaylor @rsc @bradfitz anyone, guys? :)

I'm a bit confused by this issue. From the godoc that has already been quoted:

preferring an exact match but also accepting a case-insensitive match.

What part of the decoder's current implementation prefers exact matches? As far as I can tell, it accepts exact and case-insensitive matches all the same, as if the godoc was:

accepting either an exact or a case-insensitive match.

Going back to the very first playground link; if we're decoding both "alg" and "ALG" into Alg string `json:"alg"`, I'd presume that "ALG" would be ignored, as "alg" was previously decoded in the same object and has preference.

This is still an issue - https://play.golang.org/p/uyYYa186mez :(

@npenzin, that's why this bug is still open.

I took a stab at implementing this: https://go-review.googlesource.com/c/go/+/132735

/cc @dsnet to help make a decision

I very much agree that we must do something. I'd personally prefer an encoder option instead of a struct field tag. As far as I've seen, users tend to either care about this for an entire operation, or not at all. Users could always use maps or custom unmarshalers if they want their own case sensitivity rules.

What part of the decoder's current implementation prefers exact matches? As far as I can tell, it accepts exact and case-insensitive matches all the same

The logic does direct comparison first, then falls back on unicode equal fold logic:
https://github.com/golang/go/blob/f082dbfd4f23b0c95ee1de5c2b091dad2ff6d930/src/encoding/json/decode.go#L700-L706

In regards to json API changes, Russ is the ultimate decision maker.

That said, it seems like there's two clear APIs for this:

  • Be a struct field tag option.

    • Has the advantage that the equal fold logic is only relevant on struct fields, and so a field tag is the exact granularity control for this.

    • Has the disadvantage of being very verbose disabling this functionality for all fields.

    • Places the the author of the struct in control of the strictness of casing.

  • Be an json.Encoder option.

    • Has the advantage of being a single place place control this for the entire unmarshal.

    • Has the disadvantage of possibly over-specifying behavior on more than desired.

    • Places the unmarshal caller in control of strictness of casing.

Another thing to consider is that sometimes a type needs to implement UnmarshalJSON themselves, where they call json.Unmarshal again for some sub-value. In such situations it is a bit odd that a "SetStrictCasing" option on the top-level unmarshal has no effect on a sub-tree of the JSON input. However, my observation of that oddity is already true for Decoder.DisallowUnknownFields and Decoder.UseNumber. There is a general issue where top-level unmarshal options are not propagated to recursive calls to UnmarshalJSON and more top-level options will only exacerbate the issue.

It doesn't seem like there's been much discussion to inform which approach to take.

Thinking about it more, I want to expand more on the problem of top-level options not propagating through recursive UnmarshalJSON calls as we're hitting similar issues with the increase of various protobuf implementations.

Imagine the sequence of events over the lifetime of a package that I'm the author of:

  • At the initial release of my package, I have a type Foo that is a struct with a set of fields.
  • People use type Foo in their programs with the encoding/json package. It's not a use-case I intend, but fine, Hyrum's law.
  • At a later point, I decide to add better support for json because there are other fields I want to add for which encoding/json cannot handle (e.g., unmarshaling into an unexported field). Thus, I want to add a UnmarshalJSON method.

Am I allowed to add the UnmarshalJSON method?

  • According to typical Go1 compatibility rules, I should have the freedom to add any methods I want.
  • Given that UnmarshalJSON is a magic method, maybe I should at least replicate as much of the behavior of json.Unmarshal being called on my type. However, the presence of top-level options that I don't know about makes it impossible for me to implement UnmarshalJSON in any way that is backwards compatible.

Thus, I'm actually concerned about more top-level options being added to the unmarshaler as that feature does not cooperate well with the fact that the json package also respects the json.Unmarshaler interface. It subverts control away from the author of types.

In that case should there maybe be an UnmarshalJSONWithOptions or something that passes the options of the decoder to the method and is preferred over the normal UnmarshalJSON?

Or should UseNumber be a struct tag option as well? But DisallowUnknownFields is a bit harder to implement as struct tag.

I know Hyrum's law but how often do people really marshal/unmarshal structs from packages that they have no control over? I know I never do since most of the time these structs have internal fields as well that are important for the state as well.

In that case should there maybe be an UnmarshalJSONWithOptions or something that passes the options of the decoder to the method and is preferred over the normal UnmarshalJSON?

UnmarshalJSONWithOptions is a pretty costly interface for users to start implementing. While it provides flexibility in pushing down top-level options, it also has downsides:

  • What do you do when a new top-level option is added that a current implementer does not know about?

    • We had to deal with this problem with protobufs (our solution here). The approach taken for protobufs is heavy-weight because we don't anticipate that many people to actually implement their own protobuf messages. However, it seems common for users to implement their own JSON unmarshaler.

  • That new API is tied to the encoding/json package, when that wasn't the case with the simpler UnmarshalJSON method.

how often do people really marshal/unmarshal structs from packages that they have no control over?

Unfortunately, often enough. The situation I came up with above was not a hypothetical, but modeled off real problems I run into.

The logic does direct comparison first, then falls back on unicode equal fold logic

That seems like just case insensitive matching to me. I can't come up with a scenario where a case sensitive match takes precedence over a case insensitive one, from the user's perspective. Since both match, it's always the last that wins - independently of which is the case sensitive match.

Perhaps the code was meant as a quick path to avoid equalFold. Otherwise, I can't figure out its purpose.

Since both match, it's always the last that wins - independently of which is the case sensitive match.

This is demonstrably how the parser works. I agree it functions in a case-insensitive manner.

@mvdan Yes it's a fast path to try and avoid equalFold which can be expensive in case of unicode strings.

@dsnet I see your point, but making it a struct tag option would mean the package that defined the type has to specify if it should be case sensitive or not. In your example where Foo wasn't intended to be used with encoding/json initially it probably wouldn't have these struct tags and would always be case insensitive.

On the one hand it feels weird if the person who maintains the package of Foo gets to decide if my JSON has to be case sensitive or not. On the other hand that person can already enforce any rule they want by implementing UnmarshalJSON on Foo.

Yes it's a fast path to try and avoid equalFold which can be expensive in case of unicode strings.

Then my point is that this piece of godoc should be removed:

preferring an exact match but also accepting a case-insensitive match.

The decoder always prefers the latest match, not the case-sensitive one.

@mvdan the code is:

for i := range fields {
    if bytes.Equal(ff.nameBytes, key) {
        f = ff
        break
    }
    if f == nil && ff.equalFold(ff.nameBytes, key) {
        f = ff
    }
}

which means that when it's an exact match (bytes.Equal) it doesn't execute the equalFold as we already break out of the loop before that.

Besides this is not the point of this issue and not the point of the discussion here.

I think the documentation is clearly misleading for the current situation. The flow of recent tickets shows it. I was and still am confused by this:

Unmarshal matches incoming object keys to the keys used by Marshal ..., preferring an exact match but also accepting a case-insensitive match

I mean, technically and logically it is correct, but still confusing. Should we at least to mention that it will match the last matching value from the json whenever there is more than one possible choice?

I don't think it's 100% correct, since even if there's an exact match with the field tag it won't always choose it (like in #28190).

@mvdan pointed me here from https://github.com/golang/go/issues/36932#issue-558228750

in my opinion, this is not a doc issue, as it can lead to nondeterministic behavior (json key ordering is not guaranteed, and any golang app consuming json is at the mercy of whoever generated the json, deterministic or not).

Today I also encounted this problem. It surprised me. Apparently what the doc describes is not the same as what it performs. If we write field value in JSON string, we need the exact match, not case insensitive match.

Does anybody have some solutions for this? Now I just write all properties match all cases. But I still can not promise whether the result is right in all situation.

@fredgan the workaround is to write a custom unmarshaller using a map, and then convert the keys in the map explicitly to the type you care about.

here's an example: https://play.golang.org/p/nAz4Hs56mtB

in the above example, i used a map[string]string, but you may need to use a map[string]interface{} if you have heterogeneous types and use type assertions before setting them on your target struct.

Change https://golang.org/cl/221117 mentions this issue: encoding/json: clarify how we decode into structs

I've come to realise that pretty much all of my previous comments in this thread were wrong :) The current documentation is technically correct, as @rsc points out in https://github.com/golang/go/issues/14750#issuecomment-194911844, but a bit misleading, as @ysmolsky points out in https://github.com/golang/go/issues/14750#issuecomment-429574698.

I do think that many parts of the json package could be designed better, and I think the edge cases concerning missing, repeated, or case-insensitive-matching keys are some of them.

Having said that, this issue is largely not about making a breaking change to the JSON decoder, or adding an option to change its behavior. This is a documentation issue; far too many users (myself included!) were misreading the docs. I hope that my CL above fixes that problem.

I think that a proposal to modify the decoder's API or behavior should be filed separately, once this issue is fixed with docs. This thread has become too long to discuss any one particular solution, and proposals have had their own process for a few years now.

@mvdan i don't understand how the docs are correct today. they say an exact match is preferred, but it is obviously not at all. that's what throws people off. there is not preferring happening at all, it's a race case of what is encountered first.

i would be happy with a change to the docs to reflect that it's simply a case-insensitive match, with a brief mention of the proper approach if a case-sensitive match is required (writing a custom unmarshaller using a map with string keys). my concerns about it being a bug were a result of the fact that i didn't think there was a workaround to get a case-sensitive match. knowing there is a workaround, i can accept this behavior with the documentation change.

@justinrixx please read the comments on the CL above.

To summarize my exchange with @dsnet on Gerrit: Yes, the current behavior is technically correct from the point of view of the decoder, as "incoming object" can easily be understood to be the key-value pairs being parsed in the order in which they appear.

However, what Joe argues is that this is an implementation detail - the JSON spec doesn't specify object keys to be ordered in any way. We could change the underlying behavior to be closer to what most users would expect, not to stick to what's most efficient or most logical from the decoder's point of view. For example, by skipping case-insensitive matches of a field if a case-sensitive match for it already happened.

The devil is in the details, of course. Would this change slow down the decoder too much? Would it break too much existing code? That is yet to be determined, and I'd like to find some time to experiment with it in the near future.

Change https://golang.org/cl/224079 mentions this issue: encoding/json: skip inexact matches after exact matches

The CL above is the first iteration of the experiment I mentioned two weeks ago. Decoding structs is ~1% slower, but we get the benefit we want. Please test the CL and let me know if your code breaks, or if it properly fixes this issue for you.

1% performance loss is unfortunate, but I can't figure out a way around it. Alternative patches without the slow-down are welcome.

I think the CL is interesting, but it would, from my perspective, be even more useful to explicitly define an option. It's useful to be able to set that a case-insensitive match be treated as an unknown field.

As a comment on https://github.com/golang/go/issues/14750#issuecomment-422234166:

I would vote for allowing case-sensitive matching as a decoder option enabled through a method call. (UseStrictNames, DisableEqualFold, or some better name) over struct-tags.

I want to advocate for this solution not because it's a better solution, but because it's in line with the current design for DisallowUnknownFields and UseNumbers. Having one annoying -- but consistent -- API has a value; it means the same work-around can be used for all cases where a custom json.Unmarshaler implementation is needed.

The DisallowUnknownFields and UseNumbers are in my opinion within the same class of problems, and that puts some restrains on how a strictly case-sensitve parser should be implemented.

Some notes:

  • Projects can easily define their own jsonDecoder(data []byte) json.Decoder helper-functions to reduce the boiler plate in json.Unmarshaler implementations.
  • Making a new json.Unmarshaler interface that accepts passing along options (or better still, accept a pre-initialized (sub) json.Docoder with inherited options) is certainly possible, but it's a separate issue. It is not _given_ that it's always preferred.

PS! The Encoder/Decoder interfaces defined in #12001 (old issue) could serve as a way of passing along decoder option. Perhaps that problem can be discussed there, although the motivation here is different.

Was this page helpful?
0 / 5 - 0 ratings