Basically, implement analogue of JsonTextReader(TextReader).
My scenario is reading the result of: docker inspect [image]
(which produces a json document), either called via Process.Start() or piped in via standard input. Both scenarios result in TextReader objects. I鈥檇 like to see either a new constructor to enable this scenario or some straightforward collaborative mechanism between the two readers.
Maybe but not really since it's a ref struct, you'd need to make sure all of the data was in the stream before you parse, which kinda defeats the purpose of a Stream. We'd need to make the reader a class so that you could store the stream as a field or we'd need some new class that used the reader and a stream together (like the JsonSerializer).
Why not use the JsonSerializer directly into a JsonElement?
Or copy this logic https://github.com/dotnet/corefx/blob/347412c9a917c71a744d8e20b090da90aa558a79/src/System.Text.Json/src/System/Text/Json/Serialization/JsonSerializer.Read.Stream.cs#L75-L226 馃槃
Maybe but not really since it's a ref struct, you'd need to make sure all of the data was in the stream before you parse, which kinda defeats the purpose of a Stream.
A ref struct can hold a normal reference just fine, so it could theoretically work. IMO it would be bad, though, because the ValueSpan (or ValueSequence) properties would be returning Spans to buffers the user never owned, making their lifetime ambiguous (at best).
Certainly we could make a Stream-based wrapper to do the buffer management, which inverts the flow:
C#
public class Utf8JsonStreamReader
{
...
public JsonTokenType TokenType { get; }
public int TokenStartIndex { get; }
public int TokenLength { get; }
public void CopyTokenValue(Span<byte> destination);
public void Read();
...
}
But that seems awkward.
A ref struct can hold a normal reference just fine, so it could theoretically work
But it can't have async operations (Rich didn't mention that in his description, but I expect David was assuming that as a necessity).
I'd be happy w/o the wrapper (avoiding lifetime and async challenges), and for the ability to provide the json reader with document lines, one at a time. Ideally (for my scenario), I could give the reader an IEnumerable<Span<Byte>>
but that isn't possible for other reasons.
I am looking to read json from a file stream. Struggling to understand how to do that..
I found this stack overflow in which someone has written a a complicated wrapper - is this really necessary? https://stackoverflow.com/questions/54983533/parsing-a-json-file-with-net-core-3-0-system-text-json
My use case is that I want to open a json file, and navigate to a particular section of it, then deserialise just that particular section of it. I thought I best use Utf8JsonReader
for this so that I can read through the stream ignoring / skipping irrelevent tokens until l get to the relevent section of JSON that I want, then I can process just that section and then close the file stream - without having to load the whole file into memory, or read any more information that is strictly necessary.
I hope this gets fixed.
@dazinator from the same answer you linked, in the comments somebody found it a few bugs and fixed them in this repo.
I came here to figure out DeserializeAsync
errors:
']' is an invalid start of a property name. Expected a '"'. LineNumber: XXXXX | BytePositionInLine: XX.
'0x0D' is invalid within a JSON string. The string should be correctly escaped. Path: $[5] | LineNumber: XXXXX | BytePositionInLine: XX.
Or similar depending on order of items in json array of source stream.
And i have custom JsonConverter
witch can process full item type with referencing properties.
Reason: Utf8JsonReader
buffer size is not enough to cover one item in array.
Decision: Increasing JsonSerializerOptions.DefaultBufferSize
to cover item size(i my case: to 128KB)
I get the point above that Utf8JsonReader
may not be the best place for deserializing from a stream. But given that the docs for DataContractJsonSerializer direct people to this namespace, there should be _some_ entry point in the System.Text.Json namespace that handles this.
Especially since Utf8JsonWriter
_does_ support Stream
as a data sink. It's weird that the API is not symmetric.
I came up with the following code. It reads JSON dataset as IAsyncEnumerable
sequence of Dictionary<string, object>
records:
```C#
using System;
using System.Collections.Generic;
using System.IO;
using System.Runtime.CompilerServices;
using System.Text.Json;
using System.Threading;
using Opw.HttpExceptions;
namespace YourApp
{
using Record = IEnumerable
public static class JsonArrayReader
{
public static IAsyncEnumerable<Record> ReadJsonRecords(this Stream input, CancellationToken cancellationToken)
{
bool isArrayStart = true;
return Parse(input, cancellationToken, (ref Utf8JsonReader reader) =>
{
if (isArrayStart)
{
ReadArrayStart(ref reader);
isArrayStart = false;
}
return ReadRecords(ref reader);
});
}
private delegate IEnumerable<T> Parser<T>(ref Utf8JsonReader reader);
// inspired by https://github.com/scalablecory/system-text-json-samples/blob/master/json-test/JsonParser.ParseSimpleAsync.cs
private static async IAsyncEnumerable<T> Parse<T>(Stream input, [EnumeratorCancellation] CancellationToken cancellationToken, Parser<T> parser)
{
var buffer = new byte[4096];
var fill = 0;
var consumed = 0;
var done = false;
var readerState = new JsonReaderState();
while (!done)
{
if (fill == buffer.Length)
{
if (consumed != 0)
{
buffer.AsSpan(consumed).CopyTo(buffer);
fill -= consumed;
consumed = 0;
}
else
{
Array.Resize(ref buffer, buffer.Length * 3 / 2);
}
}
int read = await input.ReadAsync(buffer.AsMemory(fill), cancellationToken).ConfigureAwait(false);
fill += read;
done = read == 0;
foreach (var item in ParseBuffer())
{
yield return item;
}
}
IEnumerable<T> ParseBuffer()
{
var reader = new Utf8JsonReader(buffer.AsSpan(consumed, fill - consumed), done, readerState);
var result = parser(ref reader);
consumed += (int)reader.BytesConsumed;
readerState = reader.CurrentState;
return result;
}
}
private static void ReadArrayStart(ref Utf8JsonReader reader)
{
if (!reader.Read())
{
throw new BadRequestException("Unexpected EOF");
}
// skip comments
while (reader.TokenType == JsonTokenType.Comment)
{
reader.Skip();
}
if (reader.TokenType != JsonTokenType.StartArray)
{
throw new BadRequestException($"Expect JSON array, but got {reader.TokenType}");
}
}
private static IEnumerable<Record> ReadRecords(ref Utf8JsonReader reader)
{
var records = new List<Record>();
while (true)
{
if (!reader.Read())
{
if (reader.TokenType == JsonTokenType.EndArray)
{
break;
}
throw new BadRequestException("Unexpected EOF");
}
if (reader.TokenType == JsonTokenType.EndArray)
{
break;
}
if (reader.TokenType != JsonTokenType.StartObject)
{
throw new BadRequestException($"Expect {JsonTokenType.StartObject}, but got {reader.TokenType}");
}
var record = ReadRecord(ref reader);
if (record == null)
{
break;
}
records.Add(record);
}
return records;
}
private static Record ReadRecord(ref Utf8JsonReader reader)
{
try
{
var savePoint = reader;
var result = JsonSerializer.Deserialize<Dictionary<string, object>>(ref savePoint);
reader = savePoint;
return result;
}
catch (JsonException)
{
return null;
}
}
}
}
```
It reuses idea from https://github.com/scalablecory/system-text-json-samples/blob/master/json-test/JsonParser.ParseSimpleAsync.cs.
Also if you target .NET 3+ you might have to implement JsonConverter
for Dictionary<string, object>
because of https://github.com/dotnet/runtime/issues/1573.
I came here to figure out
DeserializeAsync
errors:']' is an invalid start of a property name. Expected a '"'. LineNumber: XXXXX | BytePositionInLine: XX.
'0x0D' is invalid within a JSON string. The string should be correctly escaped. Path: $[5] | LineNumber: XXXXX | BytePositionInLine: XX.
Or similar depending on order of items in json array of source stream.
And i have custom
JsonConverter
witch can process full item type with referencing properties.Reason:
Utf8JsonReader
buffer size is not enough to cover one item in array.Decision: Increasing
JsonSerializerOptions.DefaultBufferSize
to cover item size(i my case: to 128KB)
@alexandrvslv, can you please file a separate issue with a simplified repro test app of the issue you were seeing. At first glance, this seems like a bug, and you shouldn't need to increase the DefaultBufferSize
to fix it. It would be good for us to understand what our JSON payload looks like, your custom JsonConverter implementation, and the root cause of the issue you were seeing (also include our TFM or STJ package version).
@alexandrvslv, can you please file a separate issue with a simplified repro test app of the issue you were seeing. At first glance, this seems like a bug, and you shouldn't need to increase the DefaultBufferSize to fix it. It would be good for us to understand what our JSON payload looks like, your custom JsonConverter implementation, and the root cause of the issue you were seeing (also include our TFM or STJ package version).
@ahsonkhan, i will try to reproduce the issue, it may take some time to implement test for serialize to json with custom formater, process data with some pipe, and deserialize it with custom parser.
i have a lot of changes since 2020 Feb. even model is simplified and transfer separated to sub request's.
Most helpful comment
I get the point above that
Utf8JsonReader
may not be the best place for deserializing from a stream. But given that the docs for DataContractJsonSerializer direct people to this namespace, there should be _some_ entry point in the System.Text.Json namespace that handles this.Especially since
Utf8JsonWriter
_does_ supportStream
as a data sink. It's weird that the API is not symmetric.