Go: encoding/json: unmarshal into slice reuses element data between len and cap

Created on 20 Jul 2017  路  19Comments  路  Source: golang/go

What version of Go are you using (go version)?

go1.8

What operating system and processor architecture are you using (go env)?

GOOS=nacl
GOARCH=amd64p32

What did you do?

https://play.golang.org/p/lbYUhgOe--

What did you expect to see?

According to the json Unmarshal docs, Unmarshal resets the slice length to zero and then appends each element to the slice.

What did you see instead?

Instead Unmarshal resets the slice length to zero and then modifies the elements of the underlying slice.

NeedsFix

Most helpful comment

https://golang.org/pkg/encoding/json/#Unmarshal

To unmarshal a JSON array into a slice, Unmarshal resets the slice length to zero and then appends each element to the slice

@bradfitz This may be working as intended, but not as described in the docs. The docs describe the slice length being set to zero (which happens), then appending (which is not what is happening).

The builtin append will replace the struct in a slice wholly. Unmarshal does not. Since the builtin append does one thing, and Unmarshal does a second, we should not say that Unmarshal is appending.

All 19 comments

This is working as intended. See https://blog.golang.org/slices for how slices work.

"resets the slice length to zero and then appends each element to the slice" and "resets the slice length to zero and then modifies the elements of the underlying slice" are the same thing.

https://golang.org/pkg/encoding/json/#Unmarshal

To unmarshal a JSON array into a slice, Unmarshal resets the slice length to zero and then appends each element to the slice

@bradfitz This may be working as intended, but not as described in the docs. The docs describe the slice length being set to zero (which happens), then appending (which is not what is happening).

The builtin append will replace the struct in a slice wholly. Unmarshal does not. Since the builtin append does one thing, and Unmarshal does a second, we should not say that Unmarshal is appending.

TL;DR
Probably update documentation explaining the actual behaviour of Unmarshal as the current doc is misleading.

Extended version:
I had a look at this issue, and here are my findings:

On the encoding/json Unmarshal documentation this is mentioned:

If a JSON value is not appropriate for a given target type, or if a JSON number overflows the target type, Unmarshal skips that field and completes the unmarshaling as best it can.

Now, we could argue that in the use case that is described in this issue, a json object with just Name and no Age is a value which is not "appropriate" for the struct target type which expects both, and therefore the code is completing the unmarshaling "as best it can".

In the counterexample of the manual SliceAppend, the creation of a new Person initialises the age to the zero value, hence the append overrides everything, while during unmarshalling this doesn't happen.

An option is to set the element value to the target type "zero value" before updating it by assigning the decoded value from the json: in such a way everything would be assigned to its zero value (as the new Person{} example) and the unmarshal would do its job "as best it can" for the other data, for example here:

if i < v.Len() {
   // Calculate zero value for target type 
   z := reflect.Zero(v.Type().Elem())
   // Assign the zero value to the element
   v.Index(i).Set(z)
   // Decode into element.
   d.value(v.Index(i))
}

In general, I miss the use case of passing an initialised slice to unmarshal new values that override the old values, where I could just pass a new empty slice (and I agree that the description in this case is misleading).

I have a green test, but I'm not sure of the performance implications of adding those steps (i.e. setting the value twice, zero and then decoded one).
If it makes sense to proceed, I can write more tests to cover types with different zero values, as at the moment I'm using strings and ints).

At a first sight, I would probably suggest to update the documentation as I expect someone is relying on that behaviour.

[update]
If I run all tests with this solution, jsonrpc tests break, therefore the solution indicated might need further investigation should we decide to proceed in that direction.

This does not look like working as intended and more like data corruption. Appending must overwrite the old value. This must print 1/1, but it prints 1/3:
https://play.golang.org/p/myimTJdn42

The code does not seem to regard even slice len. So if one uses this to get some kind of merging, she can get unexpected garbage with whatever was there between len and cap.

I agree that when appending to a slice, unmarshal should start with fresh zeroed slice elements and not the old slice elements that happen to sit between len and cap. Too late for Go 1.10 though.

Dmitry's example is clearly incorrect behavior and should be fixed. However, the fix should preserve the current behavior in this variant: https://play.golang.org/p/H0kRWEEiEW. (Resetting the length to 0 does not mean resetting the capacity to 0.)

Duplicate (closed) bug with other repros: https://github.com/golang/go/issues/24155

Change https://golang.org/cl/98516 mentions this issue: encoding/json: Fix for #21092 - Zero out target before decoding

As someone who actually depends on the fact that Unmarshal does NOT zero out maps/slices, this would break a lot of my code.

@OneOfOne out of curiosity, what is your usecase that requires data to be retained for maps/slices?

@JeremyLoy in one case to override the default values in a map and in a few other, filling the map from multiple input files.

Maybe you can add an extra field to the json decoder to make this optional.

@OneOfOne Based on reading @rsc's comment, I don't think your case will be affected.

Please excuse my rambling explanation: Unmarshalling onto existing slices would continue to merge data for elements up to len() and only zero elements between len() and cap(). If you read in a file with 3 slice elements and then read a second file with 4 slice elements, the first 3 elements will be merged (in that values in fields contained in the JSON objects in the second file will overwrite values from the JSON objects in the first but leave any values contained in the objects in the first file that were not also in the second file). The fourth element would just be exactly what was contained in the fourth element of the second file.

If you are for some reason reading in the first file, then setting your slice header back a = a[:0] before reading in the second file, this will affect you. But also, I don't see why you would be doing that.

Change https://golang.org/cl/191783 mentions this issue: encoding/json: don't reuse slice/array elements when decoding

I wasn't happy with the approach in the previous CL, and it also lacked proper testing and benchmarking, so I've started from scratch. I think this should be merged well before the freeze, as some users might be depending on the opposite of what the documentation says, so the surprises should come sooner than later :)

This change has produced a small but significant number of test failures in Google's codebase. A reduced example of one case which was broken is:

type T struct {
  Index    int
  Elements []json.RawMessage
}

var message T
tmp := []interface{}{&message.Index, &message.Elements}
err = json.Unmarshal(raw[0], &tmp)
if err != nil {
  return message, err
}
return message, nil

https://play.golang.org/p/iNBD_-mhWTI

@neild, this issue is closed but #39427 has an active discussion.

The change is being reverted. Reopening to allow me to retry a smaller change in 1.16.

I didn't have time to try this again in 1.16; moving to 1.17.

Was this page helpful?
0 / 5 - 0 ratings