I noticed that there is a large difference between the Parser Performance in Utf8Parser and the Utf16 parser for Guid and Timespan.
Code is in the details at the end.
Timespan
For Format 'c' of Timespan the performance of Utf16 is pretty much equal to Utf8, for the other two formats it's quite different.
Guid
Guid suffers from similar problems,
looking into Guid.cs, I see that the checks for invalid symbols is fairly inefficient:
I guess that's visible in the benchmark below too, but it does not appear to be the bulk of the difference (as 'B' and 'P' are still twice as slow as the UTF8 version).
If I didn't mess up the benchmarks too much (again, both @ahsonkhan and @stephentoub fixed my bad benchmark last time), it might be useful to port the utf8 code to the utf16 code base.
BenchmarkDotNet=v0.10.14, OS=Windows 10.0.17134
Intel Core i7-4790K CPU 4.00GHz (Haswell), 1 CPU, 8 logical and 4 physical cores
Frequency=3906246 Hz, Resolution=256.0003 ns, Timer=TSC
.NET Core SDK=2.1.300
[Host] : .NET Core 2.1.0 (CoreCLR 4.6.26515.07, CoreFX 4.6.26515.06), 64bit RyuJIT
DefaultJob : .NET Core 2.1.0 (CoreCLR 4.6.26515.07, CoreFX 4.6.26515.06), 64bit RyuJIT
| Method | Mean | Error | StdDev |
|--------------------------------- |----------:|----------:|----------:|
| UTF16_Guid_TryParse_Format_D | 215.37 ns | 1.5212 ns | 1.2702 ns |
| UTF8_Guid_TryParse_Format_D | 86.62 ns | 0.3426 ns | 0.3205 ns |
| UTF16_Guid_TryParse_Format_N | 281.59 ns | 4.3998 ns | 4.1156 ns |
| UTF8_Guid_TryParse_Format_N | 69.79 ns | 0.1989 ns | 0.1763 ns |
| UTF16_Guid_TryParse_Format_B | 202.41 ns | 1.8355 ns | 1.6271 ns |
| UTF8_Guid_TryParse_Format_B | 90.40 ns | 0.1929 ns | 0.1710 ns |
| UTF16_Guid_TryParse_Format_P | 203.18 ns | 0.1306 ns | 0.1158 ns |
| UTF8_Guid_TryParse_Format_P | 90.31 ns | 0.4293 ns | 0.4015 ns |
| UTF16_TimeSpan_TryParse_Format_c | 113.62 ns | 0.0920 ns | 0.0816 ns |
| UTF8_TimeSpan_TryParse_Format_c | 96.62 ns | 0.1512 ns | 0.1415 ns |
| UTF16_TimeSpan_TryParse_Format_G | 934.01 ns | 8.0425 ns | 7.5230 ns |
| UTF8_TimeSpan_TryParse_Format_G | 57.75 ns | 0.0955 ns | 0.0893 ns |
| UTF16_TimeSpan_TryParse_Format_g | 923.88 ns | 0.9195 ns | 0.7678 ns |
| UTF8_TimeSpan_TryParse_Format_g | 95.00 ns | 0.1921 ns | 0.1797 ns |
public class ParserBenchmark
{
private static readonly Guid Guid = Guid.NewGuid();
private static readonly string GuidStringD = Guid.ToString("D");
private static readonly string GuidStringN = Guid.ToString("N");
private static readonly string GuidStringB = Guid.ToString("B");
private static readonly string GuidStringP = Guid.ToString("P");
private static readonly byte[] GuidBytesD = Encoding.UTF8.GetBytes(GuidStringD);
private static readonly byte[] GuidBytesN = Encoding.UTF8.GetBytes(GuidStringN);
private static readonly byte[] GuidBytesB = Encoding.UTF8.GetBytes(GuidStringB);
private static readonly byte[] GuidBytesP = Encoding.UTF8.GetBytes(GuidStringP);
private static readonly TimeSpan TimeSpan = TimeSpan.MinValue;
private static readonly string TimeSpanStringc = TimeSpan.ToString("c", CultureInfo.InvariantCulture);
private static readonly string TimeSpanStringG = TimeSpan.ToString("G", CultureInfo.InvariantCulture);
private static readonly string TimeSpanStringg = TimeSpan.ToString("g", CultureInfo.InvariantCulture);
private static readonly byte[] TimeSpanBytesc = Encoding.UTF8.GetBytes(TimeSpanStringc);
private static readonly byte[] TimeSpanBytesG = Encoding.UTF8.GetBytes(TimeSpanStringG);
private static readonly byte[] TimeSpanBytesg = Encoding.UTF8.GetBytes(TimeSpanStringg);
[Benchmark]
public Guid UTF16_Guid_TryParse_Format_D()
{
Guid.TryParseExact(GuidStringD, "D", out var result);
return result;
}
[Benchmark]
public Guid UTF8_Guid_TryParse_Format_D()
{
Utf8Parser.TryParse(GuidBytesD, out Guid result, out _, 'D');
return result;
}
[Benchmark]
public Guid UTF16_Guid_TryParse_Format_N()
{
Guid.TryParseExact(GuidStringN, "N", out var result);
return result;
}
[Benchmark]
public Guid UTF8_Guid_TryParse_Format_N()
{
Utf8Parser.TryParse(GuidBytesN, out Guid result, out _, 'N');
return result;
}
[Benchmark]
public Guid UTF16_Guid_TryParse_Format_B()
{
Guid.TryParseExact(GuidStringB, "B", out var result);
return result;
}
[Benchmark]
public Guid UTF8_Guid_TryParse_Format_B()
{
Utf8Parser.TryParse(GuidBytesB, out Guid result, out _, 'B');
return result;
}
[Benchmark]
public Guid UTF16_Guid_TryParse_Format_P()
{
Guid.TryParseExact(GuidStringP, "P", out var result);
return result;
}
[Benchmark]
public Guid UTF8_Guid_TryParse_Format_P()
{
Utf8Parser.TryParse(GuidBytesP, out Guid result, out _, 'P');
return result;
}
[Benchmark]
public TimeSpan UTF16_TimeSpan_TryParse_Format_c()
{
TimeSpan.TryParseExact(TimeSpanStringc, "c", CultureInfo.InvariantCulture, out var result);
return result;
}
[Benchmark]
public TimeSpan UTF8_TimeSpan_TryParse_Format_c()
{
Utf8Parser.TryParse(TimeSpanBytesc, out TimeSpan result, out _, 'c');
return result;
}
[Benchmark]
public TimeSpan UTF16_TimeSpan_TryParse_Format_G()
{
TimeSpan.TryParseExact(TimeSpanStringG, "G", CultureInfo.InvariantCulture, out var result);
return result;
}
[Benchmark]
public TimeSpan UTF8_TimeSpan_TryParse_Format_G()
{
Utf8Parser.TryParse(TimeSpanBytesG, out TimeSpan result, out _, 'G');
return result;
}
[Benchmark]
public TimeSpan UTF16_TimeSpan_TryParse_Format_g()
{
TimeSpan.TryParseExact(TimeSpanStringg, "g", CultureInfo.InvariantCulture, out var result);
return result;
}
[Benchmark]
public TimeSpan UTF8_TimeSpan_TryParse_Format_g()
{
Utf8Parser.TryParse(TimeSpanBytesg, out TimeSpan result, out _, 'g');
return result;
}
}
Thanks. I've not yet reviewed your benchmark, but in general a lot of attention was paid to the performance of the new parsers/formatters, and not all of that work was ported back to the original parsers/formatters... but should be.
@joshfree, @ahsonkhan, it'd be great if the remaining work there could be catalogued and either done or issues opened so that others can tackle it.
Here is the list of all UTF8 formaters and parsers. We should go through each of them and port any applicable optimizations to the CoreLib utf16 ones:
Formatters:
Parsers:
Misc:
To add some support for this, in my test benchmarks I found it to be twice as fast to copy ASCII bytes from a string to a byte array and then call Utf8Parser.TryParse instead of just calling int.TryParse directly on the strings.
To add some support for this, in my test benchmarks I found it to be twice as fast to copy ASCII bytes from a string to a byte array and then call Utf8Parser.TryParse instead of just calling int.TryParse directly on the strings.
Can you share those benchmarks? What inputs? What version of .NET Core? (There's definitely significant room for improvement; seeing your benchmarks will help whoever works on improving it.)
Thanks.
Oops, I left out the link: https://gist.github.com/Zhentar/07b92a52c619641ab61aab50b1e5ec91
Timespan. For Format 'c' of Timespan the performance of Utf16 is pretty much equal to Utf8, for the other two formats it's quite different.
Those other formats are culture-sensitive with TimeSpan.ToString/TryFormat, but Utf8Formatter ignores culture; while I'm sure there are improvements that can/should be made to TimeSpan.ToString/TryFormat for those formats, it needs to continue to respect the current culture, which incurs cost.
@pentp, is there anything relevant from the decimal Utf8Parser/Formatter support to port over to coreclr, or should we check those of as well?
I don't think there's anything special about decimal in Utf8Parser, it uses the general TryParseNumber method which is in Utf8Parser.Number.cs.
The Utf8Formatter part is more involved, I don't know if it's faster or not, so probably needs some investigation at least.
Ok, thanks.
Will not make 3.0
Will not make 3.0
Most of them did. I'd be ok closing this at this point.
Most helpful comment
Thanks. I've not yet reviewed your benchmark, but in general a lot of attention was paid to the performance of the new parsers/formatters, and not all of that work was ported back to the original parsers/formatters... but should be.
@joshfree, @ahsonkhan, it'd be great if the remaining work there could be catalogued and either done or issues opened so that others can tackle it.