We are working on improving our position in the TechEmpower JSON benchmark.
Our recent changes in the networking stack allowed us to improve the throughput by +20%. But we are slowly getting to the point where we won't be able to optimize it any further and we are looking for some other places that could be improved.
Naturally, one of them can be JSON serialization itself.
As of today, we are spending +- 4.6% of the total CPU time for the JSON serialization in the JSON benchmark. 1% of CPU time translates to circa 10 thousand requests per second.
@steveharter Could you please take a look at the breakdown below and see if there is anything that we could improve?
The breakdown:
The code itself is super simple, it's more or less:
private static readonly JsonSerializerOptions SerializerOptions = new JsonSerializerOptions();
using (Utf8JsonWriter utf8JsonWriter = new Utf8JsonWriter(writer.Output))
{
JsonSerializer.Serialize<JsonMessage>(utf8JsonWriter, new JsonMessage { message = "Hello, World!" }, SerializerOptions);
}
public struct JsonMessage
{
public string message { get; set; }
}
The actual code can be found in the aspnet/benchmarks repo.
The command required to run the benchmark:
git clone https://github.com/aspnet/Benchmarks.git
cd benchmarks\src\BenchmarksDriver
dotnet run -- --jobs ..\BenchmarksApps\Kestrel\PlatformBenchmarks\benchmarks.json.json --scenario "JsonPlatform"
Every command should contain the address of the server and client machine which you can get from @sebastienros
--server "$secret1" --client "$secret2"
If you want to test how your change in System.Text.Json.dll
(or any other .dll
) affects the performance of the TechEmpower benchmark you need to pass it to the driver:
--output-file "C:\Projects\runtime\artifacts\bin\System.Text.Json\net5.0-Release\System.Text.Json.dll"
cc @Jozkee @layomia and @ahsonkhan
If we make the simplifying assumptions that Utf8JsonWriter is as optimized as it can be and that JsonSerializer must use it for its JSON writing, there's still some measurable overhead that could be reduced, but it's not the majority:
```C#
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Buffers;
using System.Text.Json;
[MemoryDiagnoser]
public class Program
{
static void Main(string[] args) => BenchmarkSwitcher.FromAssemblies(new[] { typeof(Program).Assembly }).Run(args);
private static readonly JsonSerializerOptions SerializerOptions = new JsonSerializerOptions();
private static readonly ArrayBufferWriter<byte> Writer = new ArrayBufferWriter<byte>();
public struct JsonMessage
{
public string message { get; set; }
}
[Benchmark]
public void Serialize()
{
Writer.Clear();
using (var utf8JsonWriter = new Utf8JsonWriter(Writer))
{
JsonSerializer.Serialize(utf8JsonWriter, new JsonMessage { message = "Hello, World!" }, SerializerOptions);
}
}
[Benchmark]
public void Write()
{
Writer.Clear();
using (var utf8JsonWriter = new Utf8JsonWriter(Writer))
{
var message = new JsonMessage { message = "Hello, World!" };
utf8JsonWriter.WriteStartObject();
utf8JsonWriter.WriteString("message", message.message);
utf8JsonWriter.WriteEndObject();
}
}
}
```
| Method | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
|---------- |---------:|--------:|--------:|-------:|------:|------:|----------:|
| Serialize | 199.1 ns | 0.68 ns | 0.60 ns | 0.0229 | - | - | 144 B |
| Write | 126.0 ns | 1.88 ns | 1.67 ns | 0.0191 | - | - | 120 B |
@adamsitnik, you might look at tweaking the benchmark itself to reuse the Utf8JsonWriter, e.g.
```C#
[Benchmark]
public void Serialize1()
{
Writer.Clear();
using (var utf8JsonWriter = new Utf8JsonWriter(Writer))
{
JsonSerializer.Serialize(utf8JsonWriter, new JsonMessage { message = "Hello, World!" }, SerializerOptions);
}
}
[ThreadStatic]
private static Utf8JsonWriter t_writer = new Utf8JsonWriter(Writer);
[Benchmark]
public void Serialize2()
{
Writer.Clear();
Utf8JsonWriter utf8JsonWriter = t_writer;
utf8JsonWriter.Reset(Writer);
JsonSerializer.Serialize(utf8JsonWriter, new JsonMessage { message = "Hello, World!" }, SerializerOptions);
}
```
| Method | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
|----------- |---------:|--------:|--------:|-------:|------:|------:|----------:|
| Serialize1 | 190.9 ns | 0.78 ns | 0.69 ns | 0.0229 | - | - | 144 B |
| Serialize2 | 182.2 ns | 2.46 ns | 2.30 ns | 0.0038 | - | - | 24 B |
Note huge, but there might be ways to trim it further, and every little bit helps :)
@AraHaan
Oh ye that reminds me, I am also working on a property to json classes that similar to JsonIgnore or w/e properties that allows specifying the serialization options as well.
It seems you're discussing https://github.com/dotnet/runtime/issues/36671. If so, it would be better to add these notes to that issue, otherwise log a new one.
oh ye that reminds me, Why the write method does not have means to know what property name to in the json serialize to.
Can you log an issue with your question, and include code samples if possible?
I'll mark the above comments "Off topic" so that this issue can stay focused on perf discussions.
Comparing the current System.Text.Json serializer with Utf8Json/SpanJson, S.T.J. is still roughly twice a slow, even for the small TE message. See below in details for benchmarks.
Possible improvements:
ArrayPoolWriter
currently allocates 256 bytes by default, the message is only 21 bytes (or something in that ballpark), you could reduce the size or track it. SpanJson tracks the size of the buffer for the last serialized message to have an idea on how big the next message could be. I have no idea what the actual performance difference is, back when I wrote SpanJson it was maybe 5%.Some of the precalculation might not be useful until SourceGenerators are available.
P.S. I haven't looked into STJ for awhile, but back in the day the writer was pretty much on par with the writers of utf8json/spanjson, but that when it still was a ref struct, but I guess it has grown, has received more usability features and a safety harness.
As for the actual JsonWriter:
Looking at the source code of Utf8JsonWriter
,
https://github.com/dotnet/runtime/blob/6ce2c39f4df40c9acaae4318ee4a61adab86e0ba/src/libraries/System.Text.Json/src/System/Text/Json/Writer/Utf8JsonWriter.cs#L103
Are Throw Helpers not necessary anymore in 5.0?
In some places they are used, in some others they are not
https://github.com/dotnet/runtime/blob/6ce2c39f4df40c9acaae4318ee4a61adab86e0ba/src/libraries/System.Text.Json/src/System/Text/Json/Writer/Utf8JsonWriter.cs#L1022-L1025
https://github.com/dotnet/runtime/blob/6ce2c39f4df40c9acaae4318ee4a61adab86e0ba/src/libraries/System.Text.Json/src/System/Text/Json/Writer/Utf8JsonWriter.WriteProperties.String.cs#L425-L428
Maybe fastpath the parts if it has no stream (Not sure where the Streams overload is used, afaik not in the benchmarks), put the streams part in a different method and handle the ArrayPoolWriter version here, should reduce the code size of that method, maybe up to an inlining point:
https://github.com/dotnet/runtime/blob/6ce2c39f4df40c9acaae4318ee4a61adab86e0ba/src/libraries/System.Text.Json/src/System/Text/Json/Writer/Utf8JsonWriter.cs#L244-L255
I'm not sure if anything can be done for the runtime _options checking (like indention and validation), but as far as I see most already have fast path/slow path there.
BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19041.264 (2004/?/20H1)
Intel Core i9-9900K CPU 3.60GHz (Coffee Lake), 1 CPU, 16 logical and 8 physical cores
.NET Core SDK=5.0.100-preview.5.20224.12
[Host] : .NET Core 5.0.0 (CoreCLR 5.0.20.22404, CoreFX 5.0.20.22404), X64 RyuJIT
Job-MJKIUC : .NET Core 5.0.0 (CoreCLR 5.0.20.22404, CoreFX 5.0.20.22404), X64 RyuJIT
UnrollFactor=2
| Method | Mean | Error | StdDev | Code Size | Gen 0 | Gen 1 | Gen 2 | Allocated |
|------------------------------------- |----------:|---------:|---------:|----------:|-------:|------:|------:|----------:|
| SpanJsonSerialize | 74.34 ns | 0.962 ns | 1.251 ns | 453 B | 0.0067 | - | - | 56 B |
| SpanJsonSerializeUnsafe | 64.21 ns | 0.139 ns | 0.123 ns | 1340 B | - | - | - | - |
| SpanJsonWriteMessageDirectly | 52.00 ns | 0.364 ns | 0.340 ns | 2749 B | - | - | - | - |
| SpanJsonWriteMessageDirectlyNameSpan | 59.79 ns | 0.079 ns | 0.066 ns | 2839 B | - | - | - | - |
| SystemTextJsonSerialize | 178.82 ns | 1.572 ns | 1.470 ns | 672 B | 0.0172 | - | - | 144 B |
| SystemTextJsonWriteMessageDirectly | 111.32 ns | 1.342 ns | 1.190 ns | 955 B | 0.0143 | - | - | 120 B |
| Utf8JsonSerialize | 53.18 ns | 0.123 ns | 0.109 ns | 477 B | 0.0067 | - | - | 56 B |
| Utf8JsonSerializeUnsafe | 47.94 ns | 0.383 ns | 0.339 ns | 176 B | - | - | - | - |
| Utf8JsonWriteMessageDirectly | 61.42 ns | 0.059 ns | 0.055 ns | 16325 B | - | - | - | - |
using System;
using System.Buffers;
using System.IO;
using System.Linq;
using System.Text;
using System.Text.Json;
using System.Threading.Tasks;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Configs;
using BenchmarkDotNet.Diagnosers;
using BenchmarkDotNet.Jobs;
using BenchmarkDotNet.Order;
using BenchmarkDotNet.Running;
using SpanJson;
using JsonSerializer = SpanJson.JsonSerializer;
namespace HelloWorldBenchmarks
{
public class Program
{
private static void Main(string[] args)
{
BenchmarkRunner.Run<HelloWorldMessageBenchmark>();
}
}
public class MyConfig : ManualConfig
{
public MyConfig()
{
AddJob(Job.Default.WithUnrollFactor(2));
AddDiagnoser(MemoryDiagnoser.Default);
Orderer = new DefaultOrderer(SummaryOrderPolicy.Default, MethodOrderPolicy.Alphabetical);
}
}
[Config(typeof(MyConfig))]
[DisassemblyDiagnoser()]
public class HelloWorldMessageBenchmark
{
public struct JsonMessage
{
public string message { get; set; }
}
private const string Message = "Hello, World!";
private static readonly JsonMessage JsonMessageInput = new JsonMessage { message = Message };
private static readonly JsonSerializerOptions SerializerOptions = new JsonSerializerOptions();
private static readonly ArrayBufferWriter<byte> Writer = new ArrayBufferWriter<byte>();
private static readonly byte[] NameByteArray = Encoding.UTF8.GetBytes("\"message\":");
private static ReadOnlySpan<byte> NameSpan =>
new byte[] {0x22, 0x6D, 0x65, 0x73, 0x73, 0x61, 0x67, 0x65, 0x22, 0x3A};
[Benchmark]
public byte[] SpanJsonSerialize()
{
var message = JsonMessageInput;
return JsonSerializer.Generic.Utf8.Serialize(message);
}
[Benchmark]
public void SpanJsonSerializeUnsafe()
{
var message = JsonMessageInput;
var buffer = JsonSerializer.Generic.Utf8.SerializeToArrayPool(message);
ArrayPool<byte>.Shared.Return(buffer.Array);
}
[Benchmark]
public void Utf8JsonSerializeUnsafe()
{
var message = JsonMessageInput;
Utf8Json.JsonSerializer.SerializeUnsafe(message);
}
[Benchmark]
public void Utf8JsonSerialize()
{
var message = JsonMessageInput;
Utf8Json.JsonSerializer.Serialize(message);
}
[Benchmark]
public void SpanJsonWriteMessageDirectly()
{
using (var jsonWriter = new JsonWriter<byte>(64))
{
jsonWriter.WriteUtf8BeginObject();
jsonWriter.WriteUtf8Verbatim(7306916068917079330UL, 14882);
jsonWriter.WriteUtf8String(Message);
jsonWriter.WriteUtf8EndObject();
}
}
[Benchmark]
public void SpanJsonWriteMessageDirectlyNameSpan()
{
using (var jsonWriter = new JsonWriter<byte>(64))
{
jsonWriter.WriteUtf8BeginObject();
jsonWriter.WriteUtf8Verbatim(NameSpan);
jsonWriter.WriteUtf8String(Message);
jsonWriter.WriteUtf8EndObject();
}
}
[Benchmark]
public void Utf8JsonWriteMessageDirectly()
{
var buffer = ArrayPool<byte>.Shared.Rent(64);
var jsonWriter = new Utf8Json.JsonWriter(buffer);
jsonWriter.WriteBeginObject();
jsonWriter.WriteRaw(NameByteArray);
jsonWriter.WriteString(Message);
jsonWriter.WriteEndObject();
ArrayPool<byte>.Shared.Return(buffer);
}
[Benchmark]
public void SystemTextJsonSerialize()
{
Writer.Clear();
var message = JsonMessageInput;
using (var utf8JsonWriter = new Utf8JsonWriter(Writer))
{
System.Text.Json.JsonSerializer.Serialize(utf8JsonWriter, message, SerializerOptions);
}
}
[Benchmark]
public void SystemTextJsonWriteMessageDirectly()
{
Writer.Clear();
var message = JsonMessageInput;
using (var utf8JsonWriter = new Utf8JsonWriter(Writer))
{
utf8JsonWriter.WriteStartObject();
utf8JsonWriter.WriteString("message", message.message);
utf8JsonWriter.WriteEndObject();
}
}
}
}
On the writer, there are likely several micro-optimization in the writer (as above) that may help a bit. @ahsonkhan has invested a lot in the current design and optimization so he would be best to cover that.
On the serializer, we are considering a code-gen effort for POCOs but I don't think that will help much for these simple POCO write scenarios unless they include cold-start scenarios where caches haven't been warmed up yet.
Also to date our serializer benchmarks and subsequent optimizations have not focused specifically on these TechEmpower scenarios that use a very simple POCO. Plus we have spent more time and effort on deserialization performance than serialization. I'll spend some time on this simple POCO + serialization scenario and see if anything pops out.
I plan on having a PR out by the weekend that should have some decent gains for small POCOs.
Nice. Thanks, Steve.
The PR will need to wait until next week. I hope to get 13-15% when all said an done for the "tiny POCO struct" scenario.
Also, does the POCO need to be a struct? Can it be a class? Using a class is actually faster but the code changes I'm working on makes them closer in perf.
Basic changes:
Also note that STJ is fastest compared to Jil, Json.NET and Utf8Json when serializing more complicated POCOs or those with large collections (but still ~twice as slow as Utf8Json for tiny POCOs). On an existing benchmark, here's the result:
| Method | Mean | Error | StdDev | Median | Min | Max | Gen 0 | Gen 1 | Gen 2 | Allocated |
|--------- |---------:|---------:|---------:|---------:|---------:|---------:|--------:|-------:|------:|----------:|
| Jil | 32.63 us | 0.136 us | 0.127 us | 32.65 us | 32.40 us | 32.84 us | 12.3424 | 1.7069 | - | 56.65 KB |
| JSON.NET | 36.48 us | 0.323 us | 0.302 us | 36.48 us | 35.99 us | 36.96 us | 12.3188 | 2.0290 | - | 59.33 KB |
| Utf8Json | 20.33 us | 0.153 us | 0.143 us | 20.34 us | 20.07 us | 20.52 us | 3.9059 | 0.3189 | - | 24.55 KB |
| STJ | 17.57 us | 0.100 us | 0.089 us | 17.56 us | 17.39 us | 17.71 us | 4.0630 | 0.2802 | - | 24.97 KB |
Nice, I wish all the libraries that cannot possibly upgrade to STJ could benefit from this. Sadly a library I use (Discord.Net) relies on things currently not in STJ (or at least not known to be yet). However if I had some help and at least started the move on a fork maybe eventually it will be ported in a fully working way that wont break everything on it. Other than that I am really happy on what STJ does on my project and it's speed as well. I am just bummed that the library I depend on pulls in Newtonsoft.Json and that means that's an extra dependency that I would love to cut out completely but cant yet.
@Tornhoof
Precalculate the propertyname + delimiter: "message":
Yes thanks. The linked PR does that.
Are Throw Helpers not necessary anymore in 5.0?
In some places they are used, in some others they are not
It looks like the ThrowHelper pattern is not normally used in public entry points. I believe this is so the exception contains a nicer call stack.
Update on the code-gen front: other than having must faster first-time (de)serialization and less memory consumption, the throughput will increase by ~10% on deserialization and ~15% on serialization. We are still determining whether 5.0 will support the code-gen effort or whether that will move to 6.0.
Most helpful comment
Comparing the current System.Text.Json serializer with Utf8Json/SpanJson, S.T.J. is still roughly twice a slow, even for the small TE message. See below in details for benchmarks.
Possible improvements:
ArrayPoolWriter
currently allocates 256 bytes by default, the message is only 21 bytes (or something in that ballpark), you could reduce the size or track it. SpanJson tracks the size of the buffer for the last serialized message to have an idea on how big the next message could be. I have no idea what the actual performance difference is, back when I wrote SpanJson it was maybe 5%.Some of the precalculation might not be useful until SourceGenerators are available.
P.S. I haven't looked into STJ for awhile, but back in the day the writer was pretty much on par with the writers of utf8json/spanjson, but that when it still was a ref struct, but I guess it has grown, has received more usability features and a safety harness.
As for the actual JsonWriter:
Looking at the source code of
Utf8JsonWriter
,https://github.com/dotnet/runtime/blob/6ce2c39f4df40c9acaae4318ee4a61adab86e0ba/src/libraries/System.Text.Json/src/System/Text/Json/Writer/Utf8JsonWriter.cs#L103
Are Throw Helpers not necessary anymore in 5.0?
In some places they are used, in some others they are not
https://github.com/dotnet/runtime/blob/6ce2c39f4df40c9acaae4318ee4a61adab86e0ba/src/libraries/System.Text.Json/src/System/Text/Json/Writer/Utf8JsonWriter.cs#L1022-L1025
https://github.com/dotnet/runtime/blob/6ce2c39f4df40c9acaae4318ee4a61adab86e0ba/src/libraries/System.Text.Json/src/System/Text/Json/Writer/Utf8JsonWriter.WriteProperties.String.cs#L425-L428
Maybe fastpath the parts if it has no stream (Not sure where the Streams overload is used, afaik not in the benchmarks), put the streams part in a different method and handle the ArrayPoolWriter version here, should reduce the code size of that method, maybe up to an inlining point:
https://github.com/dotnet/runtime/blob/6ce2c39f4df40c9acaae4318ee4a61adab86e0ba/src/libraries/System.Text.Json/src/System/Text/Json/Writer/Utf8JsonWriter.cs#L244-L255
I'm not sure if anything can be done for the runtime _options checking (like indention and validation), but as far as I see most already have fast path/slow path there.
| Method | Mean | Error | StdDev | Code Size | Gen 0 | Gen 1 | Gen 2 | Allocated |
|------------------------------------- |----------:|---------:|---------:|----------:|-------:|------:|------:|----------:|
| SpanJsonSerialize | 74.34 ns | 0.962 ns | 1.251 ns | 453 B | 0.0067 | - | - | 56 B |
| SpanJsonSerializeUnsafe | 64.21 ns | 0.139 ns | 0.123 ns | 1340 B | - | - | - | - |
| SpanJsonWriteMessageDirectly | 52.00 ns | 0.364 ns | 0.340 ns | 2749 B | - | - | - | - |
| SpanJsonWriteMessageDirectlyNameSpan | 59.79 ns | 0.079 ns | 0.066 ns | 2839 B | - | - | - | - |
| SystemTextJsonSerialize | 178.82 ns | 1.572 ns | 1.470 ns | 672 B | 0.0172 | - | - | 144 B |
| SystemTextJsonWriteMessageDirectly | 111.32 ns | 1.342 ns | 1.190 ns | 955 B | 0.0143 | - | - | 120 B |
| Utf8JsonSerialize | 53.18 ns | 0.123 ns | 0.109 ns | 477 B | 0.0067 | - | - | 56 B |
| Utf8JsonSerializeUnsafe | 47.94 ns | 0.383 ns | 0.339 ns | 176 B | - | - | - | - |
| Utf8JsonWriteMessageDirectly | 61.42 ns | 0.059 ns | 0.055 ns | 16325 B | - | - | - | - |