Runtime: Performance regression in ByteMark.BenchIDEAEncryption

Created on 1 Sep 2020  路  15Comments  路  Source: dotnet/runtime

Repro

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f netcoreapp3.1 netcoreapp5.0 --filter ByteMark.BenchIDEAEncryption

By looking at the full historical data it looks that this particular regression has been introduced in the last few weeks

obraz

And it's not related to memory or GC:

| Method | Runtime | Mean | Error | StdDev | Median | Min | Max | Ratio | Gen 0 | Gen 1 | Gen 2 | Allocated |
|-------------------- |-------------- |-----------:|--------:|--------:|-----------:|-----------:|-----------:|------:|------:|------:|------:|----------:|
| BenchIDEAEncryption | .NET Core 3.1 | 949.9 ms | 3.77 ms | 3.53 ms | 949.2 ms | 942.4 ms | 955.5 ms | 1.00 | - | - | - | 610.2 KB |
| BenchIDEAEncryption | .NET Core 5.0 | 1,452.8 ms | 3.59 ms | 3.36 ms | 1,452.8 ms | 1,446.5 ms | 1,458.9 ms | 1.53 | - | - | - | 610.2 KB |

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

@DrewScoggins this regression did not get detected by the bot. I am not sure why, perhaps it's too fresh?

cc @AndyAyersMS it might be some interesting low-level regression I guess?

area-CodeGen-coreclr tenet-performance

Most helpful comment

Note this isn't benchmarking anything in System.Security - its benchmarking a hand-rolled implementation of the IDEA algorithm.

All 15 comments

Tagging subscribers to this area: @bartonjs, @vcsjones, @krwq
See info in area-owners.md if you want to be subscribed.

I assume this is Windows specific @adamsitnik ?

Note this isn't benchmarking anything in System.Security - its benchmarking a hand-rolled implementation of the IDEA algorithm.

I assume this is Windows specific @adamsitnik ?

@danmosemsft It looks like it's not Windows-specific

The Ubuntu historical data

obraz

It looks like it's not Windows-specific

Ah, good info, I guess that means we can exclude machine configuration/the crypto work itself.

I guess that means we can exclude ... the crypto work itself.

It's not using the crypto stack, it's a custom algorithm implementation inside the performance repository just for use as a benchmark. So it's just low level things like array allocation, array read/write, field access, and GC.

https://github.com/dotnet/performance/blob/454476401e17ed7f4d8b899ecf7661eb6cd63bad/src/benchmarks/micro/runtime/Bytemark/ByteMark.cs

https://github.com/dotnet/performance/blob/8aed638c9ee65c034fe0cca4ea2bdc3a68d2a6b5/src/benchmarks/micro/runtime/Bytemark/idea.cs

Disregard, I missed your comment @vcsjones

This is a jit specific benchmark, will relabel as codegen.

History strongly implicates #40535. Wonder if #40871 has any impact here?

cc @briansull @echesakovMSFT @dotnet/jit-contrib

This is known and due to #40535

From comments in #40535

it has many unsafe casts and is a awkward port of the C code:
The regression occurs in
private static void cipher_idea(byte[] xin, byte[] xout, int offset, char[] Z)
Where we inline the MUL and low16 methods, into the inner loop.
This forces x1, x4, t2 and t1 to become GT_LCL_FLD and thus memory accesses.

            MUL(ref x1, Z[idx++]);
            MUL(ref x4, Z[idx++]);
            MUL(ref t2, Z[idx++]);
            MUL(ref t1, Z[idx++]);

// These were macros in the original C code

/* #define low16(x) ((x) & 0x0FFFF) */
private static char low16(int x)
{
    return (char)((x) & 0x0FFFF);
}

/* #define MUL(x,y) (x=mul(low16(x),y)) */
private static void MUL(ref char x, char y)
{
    x = mul(low16(x), y);
}

This regression is due to a necessary correctness fix for Issue #620

This item regressed with the first fix #40535
And the second fix #40871 also has the same regression
In both cases the only code in that has a regression is the ByteMark IDEA benchmark
Since 5.0 is locked down we will investigate this further and fix it in .Net 6.0

This regression is due to a necessary correctness fix for Issue #620

This item regressed with the first fix #40535
And the second fix #40871 also has the same regression

@briansull I don't think your conclusion was correct. I re-ran ByteMark.BenchIDEAEncryption benchmark with the following versions of .NET Core:

1) 3.1.303
2) 5.0.100-rc.1.20454.5 that includes #40535
3) 6.0.0-alpha.1.20453.28 that is based on 09c9d10db3ee63a3719870b5433191b9466fbad5 and includes #40871
4) Private build of #41838 merged onto a916def25585f474712c7e2cbd763977c3c37ff6

The results show that the performance with my fix in #40871 is no worse than 3.1.303. In fact, the performance with the fix is slightly better.

3.1.303

BenchmarkDotNet=v0.12.1.1405-nightly, OS=Windows 10.0.19041.450 (2004/May2020Update/20H1)
Intel Core i7-6700 CPU 3.40GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.1.303
  [Host]     : .NET Core 3.1.6 (CoreCLR 4.700.20.26901, CoreFX 4.700.20.31603), X64 RyuJIT
  Job-UVDUWY : .NET Core 3.1.6 (CoreCLR 4.700.20.26901, CoreFX 4.700.20.31603), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000  Runtime=.NET Core 3.1  Arguments=/p:DebugType=portable
Toolchain=netcoreapp3.1  IterationTime=250.0000 ms  MaxIterationCount=20
MinIterationCount=15  WarmupCount=1

| Method | Mean | Error | StdDev | Median | Min | Max | Gen 0 | Gen 1 | Gen 2 | Allocated |
|-------------------- |---------:|---------:|---------:|---------:|---------:|---------:|------:|------:|------:|----------:|
| BenchIDEAEncryption | 952.8 ms | 11.77 ms | 10.43 ms | 955.8 ms | 928.0 ms | 966.4 ms | - | - | - | 610.2 KB |

5.0.100-rc.1.20454.5

BenchmarkDotNet=v0.12.1.1405-nightly, OS=Windows 10.0.19041.450 (2004/May2020Update/20H1)
Intel Core i7-6700 CPU 3.40GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=5.0.100-rc.1.20454.5
  [Host]     : .NET Core 5.0.0 (CoreCLR 5.0.20.45114, CoreFX 5.0.20.45114), X64 RyuJIT
  Job-HDRRWB : .NET Core 5.0.0 (CoreCLR 5.0.20.45114, CoreFX 5.0.20.45114), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000  Runtime=.NET Core 5.0  Arguments=/p:DebugType=portable
Toolchain=netcoreapp5.0  IterationTime=250.0000 ms  MaxIterationCount=20
MinIterationCount=15  WarmupCount=1

| Method | Mean | Error | StdDev | Median | Min | Max | Gen 0 | Gen 1 | Gen 2 | Allocated |
|-------------------- |--------:|---------:|---------:|--------:|--------:|--------:|------:|------:|------:|----------:|
| BenchIDEAEncryption | 1.445 s | 0.0110 s | 0.0098 s | 1.444 s | 1.425 s | 1.459 s | - | - | - | 610.2 KB |

6.0.0-alpha.1.20453.28

BenchmarkDotNet=v0.12.1.1405-nightly, OS=Windows 10.0.19041.450 (2004/May2020Update/20H1)
Intel Core i7-6700 CPU 3.40GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=6.0.100-alpha.1.20454.7
  [Host]     : .NET Core 6.0.0 (CoreCLR 6.0.20.45328, CoreFX 6.0.20.45328), X64 RyuJIT
  Job-YABALU : .NET Core 6.0.0 (CoreCLR 6.0.20.45328, CoreFX 6.0.20.45328), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000  Runtime=.NET Core 5.0  Arguments=/p:DebugType=portable
Toolchain=netcoreapp5.0  IterationTime=250.0000 ms  MaxIterationCount=20
MinIterationCount=15  WarmupCount=1

| Method | Mean | Error | StdDev | Median | Min | Max | Gen 0 | Gen 1 | Gen 2 | Allocated |
|-------------------- |---------:|--------:|--------:|---------:|---------:|---------:|------:|------:|------:|----------:|
| BenchIDEAEncryption | 905.3 ms | 6.77 ms | 5.65 ms | 905.5 ms | 894.4 ms | 917.6 ms | - | - | - | 610.2 KB |

#41838 merged onto release/5.0-rc2

BenchmarkDotNet=v0.12.1.1405-nightly, OS=Windows 10.0.19041.450 (2004/May2020Update/20H1)
Intel Core i7-6700 CPU 3.40GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=6.0.100-alpha.1.20454.7
  [Host]     : .NET Core 6.0.0 (CoreCLR 6.0.20.45328, CoreFX 6.0.20.45328), X64 RyuJIT
  Job-OWYMAW : .NET Core 5.0 (CoreCLR 42.42.42.42424, CoreFX 42.42.42.42424), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000  Arguments=/p:DebugType=portable  Toolchain=CoreRun
IterationTime=250.0000 ms  MaxIterationCount=20  MinIterationCount=15
WarmupCount=1

| Method | Mean | Error | StdDev | Median | Min | Max | Gen 0 | Gen 1 | Gen 2 | Allocated |
|-------------------- |---------:|---------:|---------:|---------:|---------:|---------:|------:|------:|------:|----------:|
| BenchIDEAEncryption | 924.3 ms | 13.43 ms | 12.57 ms | 921.3 ms | 900.6 ms | 946.2 ms | - | - | - | 611.04 KB |

If #41838 is merged it will close this issue as well.

Closed by #41838

cc @JulieLeeMSFT

Great that we fixed this issue with 41848.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

chunseoklee picture chunseoklee  路  3Comments

jzabroski picture jzabroski  路  3Comments

sahithreddyk picture sahithreddyk  路  3Comments

btecu picture btecu  路  3Comments

omariom picture omariom  路  3Comments