Go: encoding/json: json.Number accepts quoted values by default

Created on 23 Sep 2019  Â·  31Comments  Â·  Source: golang/go

What version of Go are you using (go version)?

$ go version
go version go1.13 darwin/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output

$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/viacheslav.poturaev/Library/Caches/go-build"
GOENV="/Users/viacheslav.poturaev/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GONOPROXY="github.com/hellofresh"
GONOSUMDB="github.com/hellofresh"
GOOS="darwin"
GOPATH="/Users/viacheslav.poturaev/go"
GOPRIVATE="github.com/hellofresh"
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/Cellar/go/1.13/libexec"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.13/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/h7/7qg3nbt91bb6mgk2xtqqpwc40000gp/T/go-build850653985=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?


I've decoded a JSON value of "123" into json.Number.
https://play.golang.org/p/9Nwjn3rBFxI

What did you expect to see?

I've expected to see an error.

What did you see instead?

I've seen a successful result and an int64 value of 123 accessible with .Int64().

I could not find any documentation that json.Number accepts quoted string values by default and was expecting the opposite (as per "being precise" motivation expressed in https://github.com/golang/go/issues/22463#issuecomment-340567609).

Not sure if this is a documentation or implementation issue.

NeedsDecision

Most helpful comment

I don't think this behaviour should change because I rely on it as I'm sure many other people do.

All 31 comments

Looks like the Number validation changed very recently (in CL 195045; see #14702).

Is this still reproducible using gotip?

Same behavior with go version devel +361ab73 Mon Sep 23 05:57:54 2019 +0000 darwin/amd64.

I think such behavior is historical, at least in go1.12 it is still same. Maybe the documentation has to be more explicit.

CC @dsnet @bradfitz @mvdan

Thanks for checking.

Some changes related to json.Number from @breml were merged recently, perhaps he has thoughts.

Without looking deeply into the issue, remember that json.Number is a string type underneath, so perhaps this was meant to be supported by design.

json.Number is a string; the Int64() and Float64() methods return an error if the conversion is invalid.

I can't think of why this would need to exist unless it were intended to work this way.

To force conformance to a specific type in JSON, you could use the type directly instead of json.Number.

I think this could be clarified in documentation but changing the behavior would cause previously-working code to break.

@kentquirk being a string for raw JSON value is ok, being a quoted string is less ok.

As you would not expect the following code to return valid integer:

strconv.Atoi(`"123"`)

JSON.org defines values as:

A value can be a string in double quotes, or a number, or true or false or null, or an object or an array.

Which makes an explicit difference between double-quoted string and number.

The documentation says:

A Number represents a JSON number literal.

I also think this issue in likely about clarifying documentation given the potential impact of behavior change. Though maybe it would make sense to revisit behavior for Go 2.

I quickly checked the code and I have come to the following conclusions:

  • This has worked for a long time, the relevant lines (https://github.com/golang/go/blame/c1000c500cb4cec2991f8c1924cd5fff05279658/src/encoding/json/decode.go#L951 and https://github.com/golang/go/blame/c1000c500cb4cec2991f8c1924cd5fff05279658/src/encoding/json/decode.go#L955) have been unchanged since more than 9 years.
  • This issue can be seen as related to the issue I fixed some days ago (https://github.com/golang/go/issues/14702).
  • The decoding of a quoted numerical value into json.Number should (in my opinion) only work, if the option ,string is set in the struct tags.
  • My change https://github.com/golang/go/commit/c1000c500cb4cec2991f8c1924cd5fff05279658 did not alter this behavior.
  • But the condition I inserted in the above mentioned commit could be extended to handle this case as well if the block https://github.com/golang/go/blame/c1000c500cb4cec2991f8c1924cd5fff05279658/src/encoding/json/decode.go#L952-L954 would be changed to:
if v.Type() == numberType && (!fromQuoted || !isValidNumber(string(s))) {
    return fmt.Errorf("json: invalid number literal, trying to unmarshal %q into Number", item)
}

I feel like the entire point of json.Number is to deal with the complex mess of things that JS programs will do, because JS is way too casual about this stuff.

If you have JSON coming from JS that you don't control, you may see numbers encoded as strings. Worse, you may see them sometimes encoded as strings and sometimes as floats or integers, even for the same field. Your options to consume that in a Go API are:

  • treat it as json.RawMessage and then evaluate the byte stream and unmarshal it manually
  • treat is as a json.Number and use .Int64 or .Float64()

I would argue that the latter is preferable -- because if you knew for sure that it was it not a string, you could have simply unmarshaled it directly into an int or float directly.

If you're going to insist that json.Number force the value to be not-a-string, the correct fix should be to remove it entirely, because it is otherwise useless.

Additional purpose of json.Number is dealing with float64 53-bit precision of int64.

I think that's a good idea but can you point me to any code that supports that? If you marshal an int64 to a string you may store a value that doesn't successfully convert to a JS Number but as far as I can tell, json.Number won't care and can't help you there.

I don't think this behaviour should change because I rely on it as I'm sure many other people do.

@Gobd unless I'm missing something, this would be trivial to fix when upgrading to Go 1.14, if the change is done in that release. The json.Number fields in question would just need to get the ,string tag option added. Adding that option shouldn't break earlier Go versions either.

Non-API breaking changes should be kept at a minimum for sure, but given that the program in question is relying on the opposite of what the documentation says, and it would have a very simple fix, I don't see why we should be conservative. Particularly when the new behavior would be more consistent.

There's also the argument that json.Number is especially useful since it allows both non-strings and strings when decoding at the same time. That was never documented and I think it was clearly a mistake. I personally think that keeping that behavior is wrong; why should json.Number allow both at the same time, while other types like int and float64 do not? It would make encoding/json more inconsistent.

I would argue that the latter is preferable -- because if you knew for sure that it was it not a string, you could have simply unmarshaled it directly into an int or float directly.

No - the purpose of json.Number has nothing to do with decoding strings. Its sole purpose is to not lose precision and formatting when encoding/decoding. See https://codereview.appspot.com/6202068.

If it has been used by some to accept strings and non-strings quickly, that's simply using an unintended and undocumented side effect, which I'm arguing against at the start of my comment.

To support the idea that this was never an intended or documented use of json.Number - applying @breml's fix above doesn't break any of the existing tests or examples.

The json.Number fields in question would just need to get the ,string tag option added.

The upgrade would be more complicated in case of non-object, e.g. "123" or ["123", 456, 789.0].

What benefit does @breml's fix have over documenting & adding tests to show that it supports strings & non-strings? I think that would be prefered vs changing current behaviour for what seems like no benefit.

@Gobd the benefit is consistency, and actually following what has been documented since the beginning, including the purpose of the string option. The more special cases the json package documents and implements, the harder it is to understand and maintain in the long run.

The upgrade would be more complicated in case of non-object

Please read my longer comment above; I mention this specific use case.

For example, here's a simple piece of code to accept both a string and a non-string for integer values (not using json.Number since that accepts both today):

https://play.golang.org/p/bQBzvSvMv1e

It requires ten more lines of code, but I think that's completely acceptable when your restriction is that you need to accept different kinds of JSON at once. And as said before, json.RawMessage or interface{} are always an option.

This behavior has been supporting quoted numbers since before go1.0. Changing it now (regardless of how good an argument could be made for it) seems like a bad idea. Just because an argument could be made that programs that are relying upon this behavior are incorrect, or buggy, does not change that people who are relying upon this behavior will feel like there was a violation of the go1 compat guarantee because their programs that worked before have stopped working now.

I will also note that the specification behavior in the proto3 JSON mapping is to _always_ quote int64 values, because it is actually impossible by specification to make a guarantee of more than 53-bits of accuracy with JSON. That Go’s encoding/json permits higher than 53-bits of accuracy is irrelevant to the fact that proxies or any other handling of JSON is technically by spec permitted to drop anything more than 53-bits of accuracy.

So, it remains, the only way to guarantee full int64 range and accuracy is to quote the number in a string. Making json.Number restrictive to only unquoted numbers makes supporting this method of guaranteeing numeric precision more difficult.

This change just broke the "field":"" which previously worked and the use of json.Number was on purpose. I don't have the option to change the json input I'm getting.

@mvrhov there has been no change yet. If there was, this issue would not be open with the NeedsDecision label.

@puellanivis it's indeed a tough choice. If we can't come to an agreement on changing it (which seems to be the case), then it's probably too late or painful to change it. I can have a look at a documentation change for 1.15, to at least have the documentation reflect the internal behavior that too many people have been relying on already.

@mvdan A documentation change would not be enough. If we decide to raise the status of the current behavior to "fully supported", then we need to add test cases for this as well.

~In general, I am still in favor of this change, because in my opinion the ,string tag is there to support unmarshaling of quoted (numeric) values.~

In regards to the go1 compat guarantee I do not agree with @puellanivis. I do not think, that this fix would be a violation, because this behavior is nowhere documented or specified. If we would treat this as a violation, we would end up in a situation, where we can never fix a bug in Go or the stdlib, because someone could potentially rely on the buggy behavior and it would break their code.

Additionally, I would like to mention, that the documentation of type Number in the encoding/json package states the following:

A Number represents a JSON number literal.

In the JSON specification, a number literal is defined as (link):

integer fraction exponent

So the type Number is clearly specified and a JSON number literal is never quoted.

I am not arguing that just because someone might depend upon a buggy behavior, that bugs cannot be fixed.

I am saying that changing unspecified behavior has to be done intelligently, and with respect to the users that might (accidentally) be relying upon that unspecified behavior.

Even if we “fix” all of this, or introduce a ,string tag, or some other way to continue supporting the current behavior, we should not just shove out a “your code now fails” patch all in one version. We should properly document that the behavior was unspecified, that it was never intended, and that it will stop working in the future.

And now after reading the two main standards ECMA-404, and RFC 8259, I don’t even think a change is necessary.

The ECMA-404 standard specifically says:

The goal of this specification is only to define the syntax of valid JSON texts. Its intent is not to provide any semantics or interpretation of text conforming to that syntax

And RFC 8259 also only defines only a grammar, not semantics.

So, even with json.Number accepting quoted integers, this package accepts all text that conforms to the JSON grammars that both standards layout. And there are no standards pointing to any MUST semantics here, both of them only cover syntax. And a number encoded in a string is valid syntax.

@puellanivis I am not sure, if I get your point in your last comment.

The documentation of json.Number states: "A Number represents a JSON number literal.".
A JSON number literal has a clearly defined syntax and this syntax does not include any quotes.

Of course the encoding/json package accepts all text that conforms to the JSON grammars, but if you want to decode a quoted string (maybe containing a character sequence, that is a valid JSON number literal), json.Number is not the right target type, because the documentation limits the usage of json.Number to "JSON number literals".

Regardless of what the documentation says, it has, since before go1.0, accepted string-quoted numeric values as well. The documentation and the implementation are in disagreement. The solution is to either fix the code, or fix the documentation.

Because the JSON standard does not dictate semantics, there is no ground to stand on to say that we cannot have json.Number support quoted numeric values like it has for over 9 years.

Nothing breaks if we fix the documentation, it is a strict superset of functionality. However, many things will break if we force the code to fit the documentation.

To put it a different way: we can consider this an _implementation_ bug (“json.Number accepts quoted values when its documentation says it should only accept number literals”), or a _documentation_ bug (“json.Number's documentation says it represents a JSON number literal, when it actually represents any reasonable JSON encoding of a number”).

@puellanivis's point, as I understand it, is that the choice between those interpretations should be informed by existing usage.

Yes, that is way more clearly what I wanted to say.

Though, I wouldn’t say that I am firmly against changing the code to fit the documentation, but it should at least be carefully measured about what the effect either decision would lead to.

* The decoding of a quoted numerical value into `json.Number` should (in my opinion) only work, if the option `,string` is set in the struct tags.

Can't @breml 's suggestion be implemented the opposite way? Like, instead of a ,string tag that ensures it accepts quoted numbers, we can use a ,strictlyNumeric (or something else) tag to ensure it accepts only numeric values and not strings. This would not break compatibility, and would practically solve the problem.

Such a change would still require a change of documentation to document the current behavior, and the new “strictlyNumeric” functionality.

What @puellanivis said. Adding a new option is not a good solution; if anything, the bar for new json options is pretty high at this point.

Personally, I've started to give up trying to fix the historical design issues with encoding/json; I've tried with a number of other undocumented or unintended behaviors (or even those that go directly against the documentation), but most of them end up getting reverted because they break existing code. So I'm pretty sure this would fall under the same category.

Whether or not the documentation should be updated is up to personal preference, I guess. I would personally not document these implementation quirks that were never meant to be.

Was this page helpful?
0 / 5 - 0 ratings