Runtime: Enable Utf8JsonReader to read json from stream

Created on 22 Jul 2019  路  11Comments  路  Source: dotnet/runtime

Basically, implement analogue of JsonTextReader(TextReader).

My scenario is reading the result of: docker inspect [image] (which produces a json document), either called via Process.Start() or piped in via standard input. Both scenarios result in TextReader objects. I鈥檇 like to see either a new constructor to enable this scenario or some straightforward collaborative mechanism between the two readers.

Related: https://github.com/dotnet/corefx/issues/38581

area-System.Text.Json

Most helpful comment

I get the point above that Utf8JsonReader may not be the best place for deserializing from a stream. But given that the docs for DataContractJsonSerializer direct people to this namespace, there should be _some_ entry point in the System.Text.Json namespace that handles this.

Especially since Utf8JsonWriter _does_ support Stream as a data sink. It's weird that the API is not symmetric.

All 11 comments

Maybe but not really since it's a ref struct, you'd need to make sure all of the data was in the stream before you parse, which kinda defeats the purpose of a Stream. We'd need to make the reader a class so that you could store the stream as a field or we'd need some new class that used the reader and a stream together (like the JsonSerializer).

Why not use the JsonSerializer directly into a JsonElement?

Or copy this logic https://github.com/dotnet/corefx/blob/347412c9a917c71a744d8e20b090da90aa558a79/src/System.Text.Json/src/System/Text/Json/Serialization/JsonSerializer.Read.Stream.cs#L75-L226 馃槃

Maybe but not really since it's a ref struct, you'd need to make sure all of the data was in the stream before you parse, which kinda defeats the purpose of a Stream.

A ref struct can hold a normal reference just fine, so it could theoretically work. IMO it would be bad, though, because the ValueSpan (or ValueSequence) properties would be returning Spans to buffers the user never owned, making their lifetime ambiguous (at best).

Certainly we could make a Stream-based wrapper to do the buffer management, which inverts the flow:

C# public class Utf8JsonStreamReader { ... public JsonTokenType TokenType { get; } public int TokenStartIndex { get; } public int TokenLength { get; } public void CopyTokenValue(Span<byte> destination); public void Read(); ... }

But that seems awkward.

A ref struct can hold a normal reference just fine, so it could theoretically work

But it can't have async operations (Rich didn't mention that in his description, but I expect David was assuming that as a necessity).

I'd be happy w/o the wrapper (avoiding lifetime and async challenges), and for the ability to provide the json reader with document lines, one at a time. Ideally (for my scenario), I could give the reader an IEnumerable<Span<Byte>> but that isn't possible for other reasons.

I am looking to read json from a file stream. Struggling to understand how to do that..
I found this stack overflow in which someone has written a a complicated wrapper - is this really necessary? https://stackoverflow.com/questions/54983533/parsing-a-json-file-with-net-core-3-0-system-text-json

My use case is that I want to open a json file, and navigate to a particular section of it, then deserialise just that particular section of it. I thought I best use Utf8JsonReader for this so that I can read through the stream ignoring / skipping irrelevent tokens until l get to the relevent section of JSON that I want, then I can process just that section and then close the file stream - without having to load the whole file into memory, or read any more information that is strictly necessary.

I hope this gets fixed.

@dazinator from the same answer you linked, in the comments somebody found it a few bugs and fixed them in this repo.

I came here to figure out DeserializeAsync errors:

']' is an invalid start of a property name. Expected a '"'. LineNumber: XXXXX | BytePositionInLine: XX.

'0x0D' is invalid within a JSON string. The string should be correctly escaped. Path: $[5] | LineNumber: XXXXX | BytePositionInLine: XX.

Or similar depending on order of items in json array of source stream.

And i have custom JsonConverter witch can process full item type with referencing properties.

Reason: Utf8JsonReader buffer size is not enough to cover one item in array.

Decision: Increasing JsonSerializerOptions.DefaultBufferSize to cover item size(i my case: to 128KB)

I get the point above that Utf8JsonReader may not be the best place for deserializing from a stream. But given that the docs for DataContractJsonSerializer direct people to this namespace, there should be _some_ entry point in the System.Text.Json namespace that handles this.

Especially since Utf8JsonWriter _does_ support Stream as a data sink. It's weird that the API is not symmetric.

I came up with the following code. It reads JSON dataset as IAsyncEnumerable sequence of Dictionary<string, object> records:

```C#
using System;
using System.Collections.Generic;
using System.IO;
using System.Runtime.CompilerServices;
using System.Text.Json;
using System.Threading;
using Opw.HttpExceptions;

namespace YourApp
{
using Record = IEnumerable>;

public static class JsonArrayReader
{
    public static IAsyncEnumerable<Record> ReadJsonRecords(this Stream input, CancellationToken cancellationToken)
    {
        bool isArrayStart = true;
        return Parse(input, cancellationToken, (ref Utf8JsonReader reader) =>
        {
            if (isArrayStart)
            {
                ReadArrayStart(ref reader);
                isArrayStart = false;
            }
            return ReadRecords(ref reader);
        });
    }

    private delegate IEnumerable<T> Parser<T>(ref Utf8JsonReader reader);

    // inspired by https://github.com/scalablecory/system-text-json-samples/blob/master/json-test/JsonParser.ParseSimpleAsync.cs
    private static async IAsyncEnumerable<T> Parse<T>(Stream input, [EnumeratorCancellation] CancellationToken cancellationToken, Parser<T> parser)
    {
        var buffer = new byte[4096];
        var fill = 0;
        var consumed = 0;
        var done = false;
        var readerState = new JsonReaderState();

        while (!done)
        {
            if (fill == buffer.Length)
            {
                if (consumed != 0)
                {
                    buffer.AsSpan(consumed).CopyTo(buffer);
                    fill -= consumed;
                    consumed = 0;
                }
                else
                {
                    Array.Resize(ref buffer, buffer.Length * 3 / 2);
                }
            }

            int read = await input.ReadAsync(buffer.AsMemory(fill), cancellationToken).ConfigureAwait(false);

            fill += read;
            done = read == 0;

            foreach (var item in ParseBuffer())
            {
                yield return item;
            }
        }

        IEnumerable<T> ParseBuffer()
        {
            var reader = new Utf8JsonReader(buffer.AsSpan(consumed, fill - consumed), done, readerState);
            var result = parser(ref reader);
            consumed += (int)reader.BytesConsumed;
            readerState = reader.CurrentState;
            return result;
        }
    }

    private static void ReadArrayStart(ref Utf8JsonReader reader)
    {
        if (!reader.Read())
        {
            throw new BadRequestException("Unexpected EOF");
        }

        // skip comments
        while (reader.TokenType == JsonTokenType.Comment)
        {
            reader.Skip();
        }

        if (reader.TokenType != JsonTokenType.StartArray)
        {
            throw new BadRequestException($"Expect JSON array, but got {reader.TokenType}");
        }
    }

    private static IEnumerable<Record> ReadRecords(ref Utf8JsonReader reader)
    {
        var records = new List<Record>();
        while (true)
        {
            if (!reader.Read())
            {
                if (reader.TokenType == JsonTokenType.EndArray)
                {
                    break;
                }
                throw new BadRequestException("Unexpected EOF");
            }

            if (reader.TokenType == JsonTokenType.EndArray)
            {
                break;
            }

            if (reader.TokenType != JsonTokenType.StartObject)
            {
                throw new BadRequestException($"Expect {JsonTokenType.StartObject}, but got {reader.TokenType}");
            }

            var record = ReadRecord(ref reader);
            if (record == null)
            {
                break;
            }
            records.Add(record);
        }

        return records;
    }

    private static Record ReadRecord(ref Utf8JsonReader reader)
    {
        try
        {
            var savePoint = reader;
            var result = JsonSerializer.Deserialize<Dictionary<string, object>>(ref savePoint);
            reader = savePoint;
            return result;
        }
        catch (JsonException)
        {
            return null;
        }
    }
}

}
```

It reuses idea from https://github.com/scalablecory/system-text-json-samples/blob/master/json-test/JsonParser.ParseSimpleAsync.cs.

Also if you target .NET 3+ you might have to implement JsonConverter for Dictionary<string, object> because of https://github.com/dotnet/runtime/issues/1573.

I came here to figure out DeserializeAsync errors:

']' is an invalid start of a property name. Expected a '"'. LineNumber: XXXXX | BytePositionInLine: XX.

'0x0D' is invalid within a JSON string. The string should be correctly escaped. Path: $[5] | LineNumber: XXXXX | BytePositionInLine: XX.

Or similar depending on order of items in json array of source stream.

And i have custom JsonConverter witch can process full item type with referencing properties.

Reason: Utf8JsonReader buffer size is not enough to cover one item in array.

Decision: Increasing JsonSerializerOptions.DefaultBufferSize to cover item size(i my case: to 128KB)

@alexandrvslv, can you please file a separate issue with a simplified repro test app of the issue you were seeing. At first glance, this seems like a bug, and you shouldn't need to increase the DefaultBufferSize to fix it. It would be good for us to understand what our JSON payload looks like, your custom JsonConverter implementation, and the root cause of the issue you were seeing (also include our TFM or STJ package version).

@alexandrvslv, can you please file a separate issue with a simplified repro test app of the issue you were seeing. At first glance, this seems like a bug, and you shouldn't need to increase the DefaultBufferSize to fix it. It would be good for us to understand what our JSON payload looks like, your custom JsonConverter implementation, and the root cause of the issue you were seeing (also include our TFM or STJ package version).

@ahsonkhan, i will try to reproduce the issue, it may take some time to implement test for serialize to json with custom formater, process data with some pipe, and deserialize it with custom parser.
i have a lot of changes since 2020 Feb. even model is simplified and transfer separated to sub request's.

Was this page helpful?
0 / 5 - 0 ratings