Runtime: Consider supporting byte arrays in JsonValueConverterByteArray

Created on 1 Aug 2019 · 17Comments · Source: dotnet/runtime

Consider:

var json = "[1, 2, 3, 128]";

Deserializing this as a byte[] currently fails since the byte[] converter expects a base-64 encoded string:

JsonSerializer.Deserialize<byte[]>(json);

Unhandled exception. System.Text.Json.JsonException: The JSON value could not be converted to System.Byte[]. Path: $ | LineNumber: 0 | BytePositionInLine: 1.
 ---> System.InvalidOperationException: Cannot get the value of a token type 'StartArray' as a string.
   at System.Text.Json.Utf8JsonReader.TryGetBytesFromBase64(Byte[]& value)
   at System.Text.Json.Utf8JsonReader.GetBytesFromBase64()
   at System.Text.Json.Serialization.Converters.JsonConverterByteArray.Read(Utf8JsonReader& reader, Type typeToConvert, JsonSerializerOptions options)

While reading, the converter could be modified to support an actual array type:

C# if (reader.TokenType == JsonTokenType.StartArray) { // Read as byte[] } else { return reader.GetBytesFromBase64(); }

Serialization can remain the same as users are free to change the behavior if they want a different format.

area-System.Text.Json

Source

pranavkm

👍1

Most helpful comment

@AceHack You write it like the built-in one;

https://github.com/dotnet/runtime/blob/4a4e347b964a2c8d2216ec382e4fb481965bb2fc/src/libraries/System.Text.Json/src/System/Text/Json/Serialization/Converters/Value/ByteArrayConverter.cs#L7-L18

khellang on 4 Jun 2020

👍2

All 17 comments

/cc @JamesNK @ahsonkhan @steveharter

pranavkm on 1 Aug 2019

👍

Json.NET supports deserializing both formats. byte[] is still always written as base64.

JamesNK on 1 Aug 2019

The current approach is intentional:
https://github.com/dotnet/corefx/blob/6b21afcb486552ab6cc25e1f198de11502b6d8b6/src/System.Text.Json/tests/Serialization/Array.ReadTests.cs#L77-L78

In general, the S.T.Json stack has tried to be rigid with very little type coercion or inference (particularly on deserialization), at least by default. Like most customization, I believe this problem could be resolved by a custom converter.

See previous issue comment: https://github.com/dotnet/corefx/issues/38494#issuecomment-501571451

In this case, I would be OK with making deserialization more flexible by default, or if there are concerns with that, introduce an option/attribute to opt-in to the flexibility. @bartonjs, what are your thoughts here?

ahsonkhan on 1 Aug 2019

Deserializing a JSON array of numbers into a byte array is perfectly normal. Just like deserializing a JSON array of numbers into an Int32 array is normal. Accepting both base64 string and JSON array is very easy.

If people want to output a JSON array for byte[] then at that point you should tell them to write a custom converter.

JamesNK on 1 Aug 2019

👍2

In browser land, byte arrays are represented as Uint8Array. Here's what you get if you try to stringify it:

JSON.stringify(new Uint8Array([1, 2, 3]))
"{"0":1,"1":2,"2":3}"

JSON.stringify(Array.from(new Uint8Array([1, 2, 3])))
"[1,2,3]"

Producing a base64 encoded string from Uint8Array is a tricky since it require polyfills: https://stackoverflow.com/a/12713326. It'll just be much more convenient to support the array structure (the second one) out of the box.

pranavkm on 1 Aug 2019

Json.NET supports deserializing both formats. byte[] is still always written as base64.

My feelings here are the same with NumericString vs number: Having the deserialize support both by default makes it harder for someone to understand why the format of data changes between deserialize and serialize (which is mainly a problem for interior nodes, like App -> FE, FE -> BE, when part of the data is processed/inserted/replaced by the FE and part is passed "as-is").

Similar to NumericString, I'm happy with a switch to support dual-format-read (that's read-only-what-you-write by default) and/or an attribute on the individual property to say that this one is different and special.

If int[] and byte[] both work, but behave differently (I assume int[] is a JSON Array of JSON Numbers) that's probably something that needs to be called out in a moderately visible place in the serializer docs. Simplicitly in transition would lead me to suggest that byte[] should be to/from Array by default, and an attribute is required to make it base64.... even if base64 is what "everyone" wants.

bartonjs on 1 Aug 2019

I am having a similar issue when de-serializing to byte array with elements in 0:100 range. I am not sure if it is related, as the same code works with sbyte[], ushort[] and short.

Repro:

dotnet info
...snip...
Host (useful for support):
  Version: 3.0.0-preview8-28379-12
  Commit:  38f6ef72ca
...snip...

_{(downloaded from: https://dotnetcli.azureedge.net/dotnet/Sdk/release/3.0.1xx/dotnet-sdk-latest-win-x64.zip)}

dotnet new console -n dotnet/corefx#39961 and type dotnet/corefx#39961\Program.cs:

```c#
using System.Text.Json;
public static class Program
{
public static void Main() =>
JsonSerializer.Deserialize }

`dotnet run` throws:

```sh
Unhandled exception. System.Text.Json.JsonException: The JSON value could not be converted to System.Byte[]. Path: $ | LineNumber: 0 | BytePositionInLine: 1.
 ---> System.InvalidOperationException: Cannot get the value of a token type 'StartArray' as a string.
   at System.Text.Json.Utf8JsonReader.TryGetBytesFromBase64(Byte[]& value)
   at System.Text.Json.Utf8JsonReader.GetBytesFromBase64()

Replacing byte[] with sbyte[], ushort[] or short[] works fine (on which I had to apply .Cast<byte> in my real world app, which incurs an undesired overhead).

am11 on 2 Aug 2019

Having the deserialize support both by default makes it harder for someone to understand why the format of data changes between deserialize and serialize

I don't think it's hard to understand that reading can support multiple formats, but writing has to pick a specific one. IMO, it makes sense to follow Postel's law in this case.

khellang on 2 Aug 2019

👍1

As a concept? No, it's not hard to understand. But my principle is this: If a decision has to be made, err on the side of the problem easier to discover.

If Serialize(Deserialize(payload)) changed an array to base64 the downstream receiver (particularly if it's a different framework) will likely just report the data is invalid, with no specific notion of what went wrong. So now you're searching for "stuff doesn't work after re-serializing"... good luck. If the default is single-format-read and you get base64 when it expects an array you get an exception in your app that tells you precisely where in the content it failed. Now you're searching for something like "JsonSerializer base64 byte array", and you'll quickly find the answer.

bartonjs on 2 Aug 2019

I am having a similar issue when de-serializing to byte array with elements in 0:100 range. I am not sure if it is related, as the same code works with sbyte[], ushort[] and short.

Yes, this is the same issue, and intentional. We had opt'd to make the default representation for byte[] be base64 encoded string instead as that was the more common use case/expectation. For instance, serializing a byte[] produces a base-64 string (which can then be deserialized back to a byte[]).

Here were the feasible options:
1) Treat byte[] as Base64 and read/write just that format, by default (users requiring flexibility OR requiring byte[] support need to implement a converter). <- chosen
2) Treat byte[] as byte[] and read/write just that format, by default (users requiring flexibility OR requiring Base64 support need to implement a converter).
3) Read both Base64 and byte[] as byte[], but write byte[] as Base64 only, by default (stricter serialization/deserialization would require a converter - read/write asymmetry)

As a general question, is byte[]/Base64 one of the only cases where this level of flexibility required, or are there other commonly occurring data format changes that folks expect when deserializing JSON (that follow "Postel's law")? We have held the line on symmetry (by default) in other instances as well - such as what format certain .NET Types are accepted (DateTime, numbers, etc.).

ahsonkhan on 2 Aug 2019

👍1

A custom converter can be written for this. We can reopen if there's more customer demand.

layomia on 22 Feb 2020

JSON (JavaScript Object Notation) only has one number type. The equivalent in C# is byte or sbyte or short or ushort or int or uint or long or double, etc. It's a question of bit length. A byte[] is not always binary data. For the communication to JavaScript we expect byte[] as number[]. Now we have to write a lot of code and confuse our developers and reviewers because of this base64 issue. It's a nice idea to convert binary data to base64. It's good for a Blob, Buffer or other binary data. Also maybe for byte[]. But not as default value. This should be an option (globally or locally) to System.Text.Json.JsonSerializer. In my case I just have to transmit 8 bytes. I don't want casts, CustomSerializer or other special implementations for a number[] type.

class JsonData {
  public uint id { get; set; }
  public byte[] data { get; set; }
}

json: { id: 42, data: [ 1, 2, 3, 4, 5, 6, 7, 8] }

Again, the base64 conversion is a nice idea. But it should be a simple option without any hassle.

infacto on 27 Feb 2020

👍2

This issue is preventing me from sending byte array across javascript interop in my blazor application. Javascript's atob does not deserialize it correctly.

nvmkpk on 29 May 2020

How do you write a custom converter for byte arrays? That's not clear to me.

AceHack on 4 Jun 2020

@AceHack You write it like the built-in one;

https://github.com/dotnet/runtime/blob/4a4e347b964a2c8d2216ec382e4fb481965bb2fc/src/libraries/System.Text.Json/src/System/Text/Json/Serialization/Converters/Value/ByteArrayConverter.cs#L7-L18

khellang on 4 Jun 2020

👍2

I am currently seeing issues with byte[] conversion.
In the built-in ByteArrayConverter what is the expected json output?
When I serialise a byte[] i am seeing the value as an array of ints.
{"value":[1,2,3,4,5,6,7,8]}
What I am expecting (which is probably a wrong assumption) is:
{"value":"h7w(%eP!"}
NOTE: values above are for example and not real base64 converted values.

I have also created my own converter and assigned it to my property using the JsonConverterAttribute and it appears it never gets called.
How can I find out why this is not converting as expected?

EDIT:
I have just re-run my code and it would appear my custom converter is being called for deserialisation, but not for serialisation.
Unsure why this is happening as the JSON handling is part of web api controller implementation and there is certainly not any difference in the JsonSerilazationOptions being used.
Why with the same options are there different converters being used for reading and writing?
Especially as I am specifically targeting my custom converter using the attribute.

izzycoding on 27 Aug 2020

I have just re-run my code and it would appear my custom converter is being called for deserialisation, but not for serialisation.
Unsure why this is happening as the JSON handling is part of web api controller implementation and there is certainly not any difference in the JsonSerilazationOptions being used.
Why with the same options are there different converters being used for reading and writing?
Especially as I am specifically targeting my custom converter using the attribute.

Please file a new issue and provide a simplified repro of the incorrect behavior you are seeing, along with the model you are trying to serialize.

You will have better luck getting attention to it that way.
https://github.com/dotnet/runtime/issues/new/choose

ahsonkhan on 28 Aug 2020

Was this page helpful?

0 / 5 - 0 ratings