Runtime: [Perf -16%] System.Collections.CopyTo<String> (6)

Created on 12 Jun 2020  路  18Comments  路  Source: dotnet/runtime

Run Information

Architecture | x64
-- | --
OS | Windows 10.0.18362
Changes | diff

Regressions in System.Collections.CopyTo

Benchmark | Baseline | Test | Test/Base | Modality | Baseline Outlier
-- | -- | -- | -- | -- | --
Span | 593.19 ns | 644.50 ns | 1.09 | | False
Memory | 584.93 ns | 641.29 ns | 1.10 | | False
Array | 493.59 ns | 592.57 ns | 1.20 | | False
ReadOnlySpan | 563.26 ns | 628.13 ns | 1.12 | | False
ReadOnlyMemory | 595.64 ns | 657.34 ns | 1.10 | | False
List | 487.56 ns | 650.31 ns | 1.33 | | False

graph
graph
graph
graph
graph
Historical Data in Reporting System

Repro

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f netcoreapp5.0 --filter 'System.Collections.CopyTo<String>*';

Histogram

System.Collections.CopyTo.Span(Size: 2048)

[524.780 ; 546.348) | @@@
[546.348 ; 573.011) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[573.011 ; 597.536) | @@@@@@
[597.536 ; 627.739) | @
[627.739 ; 654.402) | @@@@@@@@@@@@@@@@@@

System.Collections.CopyTo.Memory(Size: 2048)

[565.045 ; 589.110) | @@@@@@@@@@@@@@@@@@@@
[589.110 ; 614.506) | @@@@@@@@@@@@@@
[614.506 ; 641.785) | @@@@@@@@@
[641.785 ; 677.056) | @@@@@@@@@@@@@@@@

System.Collections.CopyTo.Array(Size: 2048)

[482.924 ; 507.846) | @@@@@@
[507.846 ; 529.187) | @@@@@@@@@@@@@@@@@@@@@@
[529.187 ; 557.054) | @@@@@@@@@@@
[557.054 ; 579.833) | @@@@@@@
[579.833 ; 601.175) | @@@@@@@@@@@@@

System.Collections.CopyTo.ReadOnlySpan(Size: 2048)

[509.340 ; 528.944) | @
[528.944 ; 558.954) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[558.954 ; 591.488) | @@@@@
[591.488 ; 619.556) | @
[619.556 ; 649.566) | @@@@@@@@@@@@@@@@@@

System.Collections.CopyTo.ReadOnlyMemory(Size: 2048)

[522.769 ; 539.054) | @
[539.054 ; 569.696) | @@@@@@@@@@@@@@@@@@@@@@@@@
[569.696 ; 598.854) | @@@@@@@@@@@@@
[598.854 ; 634.010) | @@
[634.010 ; 664.652) | @@@@@@@@@@@@@@@@@
[664.652 ; 686.400) | @

System.Collections.CopyTo.List(Size: 2048)

[463.079 ; 493.989) | @@
[493.989 ; 537.133) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[537.133 ; 560.837) | @
[560.837 ; 603.982) | 
[603.982 ; 626.781) | 
[626.781 ; 669.926) | @@@@@@@@@@@@@@@@@@@

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

arch-x64 area-System.Collections os-windows tenet-performance tenet-performance-benchmarks

Most helpful comment

GC.KeepAlive(new byte[r.Next(32)]); will make string arrays aligned randomly. It assumes that objects are allocated in memory next to each other (internal implementation details of the current allocators).

if (r.Next(10) == 0) GC.Collect(); should move the buffers out of Gen0. Maybe that line can be if (r.Next(10) == 0) for (int i = 0; i < 5; i++) GC.Collect(); to make sure that these buffers are really moved to Gen2.

All 18 comments

Tagging subscribers to this area: @eiriktsarpalis
Notify danmosemsft if you want to be subscribed.

@adamsitnik

Commit range as best I can tell. @DrewScoggins the interactive graph will not set a commit for "before" for the runtime repo -- it just says "0". So I bounded the start by the date/time of that datapoint. Is that right, or am I reading it wrong?

C:\git\runtime>git log --color --graph --pretty=format:"%Cred%h%Creset -%C(yellow)%d%Creset %s %Cgreen(%cr) %C(bold blue)<%an>%Creset %cD" --abbrev-commit 7b4a9e69a6d77159fcb1462e7db320291699582a...c07be36e3ce00b06e99d02669fd658a813cdce11 src/coreclr/src/vm/** src/libraries/** src/coreclr/src/System.Private.CoreLib/**
* f1dd1d36c77 - Change location of some source files (#1376) (7 months ago) <Adeel Mujahid> Tue, 7 Jan 2020 16:27:29 -0800
* 9de4006e296 - Fix Reading Time Formats with  quotes  from ICU (#1370) (7 months ago) <Tarek Mahmoud Sayed> Tue, 7 Jan 2020 16:10:26 -0800
* 5da85d0c4b6 - Delete Tests.props (#1236) (7 months ago) <Jan Kotas> Tue, 7 Jan 2020 12:15:13 -0800
* 690542e3a9a - Another PAL cleanup (#1359) (7 months ago) <Jan Vorlicek> Tue, 7 Jan 2020 08:21:34 -0800
* 66cb5f9a1bd - New PhysicalAddress.TryParse methods taking span and string (#1057) (7 months ago) <Alexander Nikolaev> Tue, 7 Jan 2020 14:39:46 +0100
* c1a51e2c538 - Make sure default ALC is initialized before resolving satellite assembly (#1032) (7 months ago) <Elinor Fung> Mon, 6 Jan 2020 17:26:10 -0800
* e82f56f2fc4 - Fix broken docs links (#1005) (7 months ago) <Jeff Handley> Mon, 6 Jan 2020 17:17:21 -0800
* 578fe57d906 - PAL unused functions cleanup (#1313) (7 months ago) <Jan Vorlicek> Mon, 6 Jan 2020 16:48:58 -0800
* 1ffda0f0ab9 - Force Windows SDK to be 10.0 (#1331) (7 months ago) <Jan Kotas> Mon, 6 Jan 2020 16:48:03 -0800
* dfed9e24f18 - Fix compMatchedVM by using osType to detect cross target compilation in case of AOT scenario (#1243) (7 months ago) <Alexandre Mutel> Mon, 6 Jan 2020 16:03:14 -0800
* 8cc2aa35933 - Optimize AdjustMaxWorkersActive (#1103) (7 months ago) <Bill Wert> Mon, 6 Jan 2020 14:31:11 -0800
* 6425b2d1c25 - Copy change to XsdDateTime (#1319) (7 months ago) <ts2do> Mon, 6 Jan 2020 16:57:32 -0500
* 5afe409d594 - Remove STREAM_MEMORY_OVERLOADS_AVAILABLE (#1328) (7 months ago) <Eric StJohn> Mon, 6 Jan 2020 12:49:43 -0800
* de6f41e37f0 - CoreFX Renames in System.IO PathInternal.Tests (#1017) (7 months ago) <Erhan Atesoglu> Mon, 6 Jan 2020 13:34:08 -0500
* f07bf3523a2 - Fix typo: availabe -> available (#1324) (7 months ago) <Andrew Arnott> Mon, 6 Jan 2020 10:29:28 -0800
* 92603a6e11c - implement GetScopeId() on Linux (#1038) (7 months ago) <Tomas Weinfurt> Mon, 6 Jan 2020 12:17:50 -0500
* f6d14eeaf29 - Add asserts to catch unintentional use of generic overloads (#1200) (7 months ago) <Jan Kotas> Mon, 6 Jan 2020 09:15:24 -0800
* 25d316cefaf - Move Corelib shared files under Common (#1307) (7 months ago) <Jan Kotas> Mon, 6 Jan 2020 09:07:16 -0800
* 4d3ec15a0e1 - Implement HttpContent and HttpClient cancellation APIs (#686) (7 months ago) <Miha Zupan> Mon, 6 Jan 2020 12:28:15 +0100
* fd5625087f1 - Share compiler config code between libs and coreclr (#1072) (7 months ago) <Adeel Mujahid> Mon, 6 Jan 2020 10:19:58 +0100
* 5b968685ce8 - Consolidate .netcoreapp.cs files because System.Numerics.Vectors project is no longer cross-compiled (#1310) (7 months ago) <Roman Marusyk> Sun, 5 Jan 2020 18:11:45 -0800
* 2c6ec170197 - Fix Environment.UserInteractive on Windows (#1234) (7 months ago) <Dan Moseley> Fri, 3 Jan 2020 23:11:04 -0800
* 5bba5aa36a3 - Removing ArrayTypeDesc (#1201) (7 months ago) <Vladimir Sadov> Fri, 3 Jan 2020 21:53:15 -0800
* 680487ffa4d - Revert "Revert "Support GetCultureInfo with predefinedOnly flag (#654)" (#1257)" (#1261) (7 months ago) <Tarek Mahmoud Sayed> Fri, 3 Jan 2020 15:12:18 -0800
* 4bd7e3accac - Make corelib work in VS (#1284) (7 months ago) <Dan Moseley> Fri, 3 Jan 2020 14:59:58 -0800
* 5e2c61bc2a0 - Improve message for missing C++ workload (#1253) (7 months ago) <Dan Moseley> Fri, 3 Jan 2020 09:29:50 -0800
* cb1e7be56ea - Fix CompareInfo and SQL tests (#1254) (7 months ago) <Tarek Mahmoud Sayed> Fri, 3 Jan 2020 08:47:11 -0800
* f1c7ed882a9 - Move most of CoreLib interop under Common (#1247) (7 months ago) <Jan Kotas> Thu, 2 Jan 2020 20:21:27 -0800
* c3260404f41 - Arm64: Implement AbsoluteCompare* hardware intrinsics (#1223) (7 months ago) <Egor Chesakov> Thu, 2 Jan 2020 16:41:13 -0800
* 4f27799093d - Revert "Support GetCultureInfo with predefinedOnly flag (#654)" (#1257) (7 months ago) <Tarek Mahmoud Sayed> Thu, 2 Jan 2020 15:31:34 -0800
* 5272637ab8a - Support GetCultureInfo with predefinedOnly flag (#654) (7 months ago) <Tarek Mahmoud Sayed> Thu, 2 Jan 2020 15:25:42 -0800
* c4f221c9c31 - Combine pal_compiler.h definitions (#1240) (7 months ago) <Adeel Mujahid> Wed, 1 Jan 2020 17:01:57 -0800

Nothing jumps out there. I guess next step is to profile the scenario for 5.0 vs 3.1 all up. This would also eliminate a configuration/microcode change. @DrewScoggins do we have any way eg more datapoints to bound this better? Also, how can I see the full trend from 3.1 to now?

Reproduces locally. Note that ImmutableArray scenario didn't regress, presumably because nothing is actually copied 馃槃

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19041.421 (2004/?/20H1)
Intel Core i7-7700 CPU 3.60GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=5.0.100-rc.1.20401.3
[Host] : .NET Core 3.1.6 (CoreCLR 4.700.20.26901, CoreFX 4.700.20.31603), X64 RyuJIT
Job-LFQMVW : .NET Core 3.1.6 (CoreCLR 4.700.20.26901, CoreFX 4.700.20.31603), X64 RyuJIT
Job-GJJUUG : .NET Core 5.0.0 (CoreCLR 5.0.20.40101, CoreFX 5.0.20.40101), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000 Arguments=/p:DebugType=portable IterationTime=250.0000 ms
MaxIterationCount=20 MinIterationCount=15 WarmupCount=1

| Method | Job | Runtime | Toolchain | Size | Mean | Error | StdDev | Median | Min | Max | Ratio | RatioSD | Gen 0 | Gen 1 | Gen 2 | Allocated |
|--------------- |----------- |-------------- |-------------- |----- |---------:|---------:|---------:|---------:|---------:|---------:|------:|--------:|------:|------:|------:|----------:|
| Array | Job-LFQMVW | .NET Core 3.1 | netcoreapp3.1 | 2048 | 452.9 ns | 4.30 ns | 4.02 ns | 452.1 ns | 446.5 ns | 460.0 ns | 1.00 | 0.00 | - | - | - | - |
| Array | Job-GJJUUG | .NET Core 5.0 | netcoreapp5.0 | 2048 | 494.8 ns | 2.72 ns | 2.27 ns | 493.7 ns | 491.5 ns | 498.6 ns | 1.09 | 0.01 | - | - | - | - |
| | | | | | | | | | | | | | | | | |
| Span | Job-LFQMVW | .NET Core 3.1 | netcoreapp3.1 | 2048 | 496.5 ns | 10.11 ns | 11.65 ns | 499.2 ns | 468.1 ns | 513.5 ns | 1.00 | 0.00 | - | - | - | - |
| Span | Job-GJJUUG | .NET Core 5.0 | netcoreapp5.0 | 2048 | 560.5 ns | 3.50 ns | 3.28 ns | 559.5 ns | 556.1 ns | 568.0 ns | 1.13 | 0.03 | - | - | - | - |
| | | | | | | | | | | | | | | | | |
| ReadOnlySpan | Job-LFQMVW | .NET Core 3.1 | netcoreapp3.1 | 2048 | 472.6 ns | 5.42 ns | 4.80 ns | 471.7 ns | 466.5 ns | 482.5 ns | 1.00 | 0.00 | - | - | - | - |
| ReadOnlySpan | Job-GJJUUG | .NET Core 5.0 | netcoreapp5.0 | 2048 | 554.0 ns | 2.76 ns | 2.58 ns | 553.3 ns | 549.5 ns | 557.8 ns | 1.17 | 0.01 | - | - | - | - |
| | | | | | | | | | | | | | | | | |
| Memory | Job-LFQMVW | .NET Core 3.1 | netcoreapp3.1 | 2048 | 520.6 ns | 6.19 ns | 5.79 ns | 522.0 ns | 504.5 ns | 528.1 ns | 1.00 | 0.00 | - | - | - | - |
| Memory | Job-GJJUUG | .NET Core 5.0 | netcoreapp5.0 | 2048 | 573.6 ns | 6.42 ns | 5.69 ns | 572.9 ns | 563.9 ns | 587.1 ns | 1.10 | 0.02 | - | - | - | - |
| | | | | | | | | | | | | | | | | |
| ReadOnlyMemory | Job-LFQMVW | .NET Core 3.1 | netcoreapp3.1 | 2048 | 489.6 ns | 9.34 ns | 10.76 ns | 492.8 ns | 455.5 ns | 500.1 ns | 1.00 | 0.00 | - | - | - | - |
| ReadOnlyMemory | Job-GJJUUG | .NET Core 5.0 | netcoreapp5.0 | 2048 | 574.8 ns | 3.51 ns | 3.28 ns | 575.3 ns | 569.5 ns | 581.8 ns | 1.18 | 0.03 | - | - | - | - |
| | | | | | | | | | | | | | | | | |
| List | Job-LFQMVW | .NET Core 3.1 | netcoreapp3.1 | 2048 | 444.1 ns | 4.83 ns | 4.52 ns | 442.8 ns | 437.9 ns | 455.7 ns | 1.00 | 0.00 | - | - | - | - |
| List | Job-GJJUUG | .NET Core 5.0 | netcoreapp5.0 | 2048 | 541.4 ns | 4.29 ns | 3.58 ns | 540.7 ns | 536.9 ns | 548.4 ns | 1.22 | 0.02 | - | - | - | - |
| | | | | | | | | | | | | | | | | |
| ImmutableArray | Job-LFQMVW | .NET Core 3.1 | netcoreapp3.1 | 2048 | 656.6 ns | 3.69 ns | 3.45 ns | 655.6 ns | 651.0 ns | 662.6 ns | 1.00 | 0.00 | - | - | - | - |
| ImmutableArray | Job-GJJUUG | .NET Core 5.0 | netcoreapp5.0 | 2048 | 442.7 ns | 3.16 ns | 2.95 ns | 442.3 ns | 437.2 ns | 447.5 ns | 0.67 | 0.00 | - | - | - | - |

Note the Array.Copy regression was in November (https://github.com/dotnet/runtime/issues/39112)

*Memory, *Span use Buffer.Memmove
List, Array use Array.Copy which uses Buffer.BulkMoveWithWriteBarrier not Memmove (since I assume pMT->ContainsGCPointers is true)

@jkotas thoughts about this one, since we were discussing earlier Array.Copy change? Also, why is it OK for Span to always use Memmove?

Also https://github.com/dotnet/runtime/issues/37808 is about a regression (on Ubuntu) for 1000 element Buffer.BlockCopy which seems to be some validation then a call to Buffer.Memmove.

@jkotas thoughts about this one

Memory copying performance depends a lot on alignment of the source and destination arrays. In the case of object references that we have here, it also depends on the generation that the destination array lives in. I do not see the benchmarks controlling for these variables. It makes the benchmark many modal by construction.

Notice that the trend changed in January, but it has several flips to baseline in February. It confirms the hypothesis.

Here is a small program that demonstrates the many modal nature of memory copying:

using System;
using System.Diagnostics;

class Program
{
    static void Work(ReadOnlySpan<string> from, Span<string> to)
    {
        for (int i = 0; i < 1000000; i++) from.CopyTo(to);
    }

    static void Main(string[] args)
    {
        Random r = new Random();
        for (;;)
        {
            var sw = new Stopwatch();
            GC.KeepAlive(new byte[r.Next(32)]);
            var from = new string[2048];
            GC.KeepAlive(new byte[r.Next(32)]);
            var to = new string[2048];
            if (r.Next(10) == 0) GC.Collect();
            sw.Start();
            Work(from, to);       
            sw.Stop();
            Console.WriteLine(sw.ElapsedMilliseconds);
        }
    }
}

On my machine, it shows that the performance is oscilating between ~460 and ~800.

What do the GC.KeepAlive(new byte[r.Next(32)]); lines achieve? Is your sample causing either buffer to get promoted?

GC.KeepAlive(new byte[r.Next(32)]); will make string arrays aligned randomly. It assumes that objects are allocated in memory next to each other (internal implementation details of the current allocators).

if (r.Next(10) == 0) GC.Collect(); should move the buffers out of Gen0. Maybe that line can be if (r.Next(10) == 0) for (int i = 0; i < 5; i++) GC.Collect(); to make sure that these buffers are really moved to Gen2.

OK thanks I think next action here for us is to make changes like that in the benchmarks and remeasure 3.1 and 5.0

I agree. This looks like a place where the test is not measuring the right thing.

Also to answer your questions Dan:

These 3.1->5.0 comparisons are weird because they span the coreclr/corefx->runtime switch. As a result I never really built in a way to do commit level comparisons across the repos, but I would say that what you did was correct. It would be hard to go bak and get any more data points to try and bound this, but it would be possible. Going forward we should not have this problem as we have more reliable ways of triggering runs out the installer repo, and running perf tests in the runtime repo directly.

GC.KeepAlive(new byte[r.Next(32)]); will make string arrays aligned randomly.

Nice "trick"! I wonder if BenchmarkDotNet should do something like this between the iterations to try to ensure more random memory alignments (cc @AndreyAkinshin @billwert )

Nice "trick"! I wonder if BenchmarkDotNet should do something like this between the iterations to try to ensure more random memory alignments

Would be an interesting experiment! Perhaps we should take some random sample of benchmarks (1/type?) and see what stability this gives us over repeated runs compared to not doing it?

Does BDN have an opportunity to do this? In this case it needs to influence the alignment of arrays that are allocated within the setup method:

c# [GlobalSetup] public void Setup() { _array = ValuesGenerator.ArrayOfUniqueValues<T>(Size); _list = new List<T>(_array); _immutablearray = Immutable.ImmutableArray.CreateRange(_array); _destination = new T[Size]; }
If I'm understanding correctly, the GC.KeepAlive(new byte[r.Next(32)]); needs to be sprinkled within there. But the allocations are within the benchmark itself perhaps BDN could help as you propose.

@layomia @eiriktsarpalis Should we move this to 6.0.0?

@jeffhandley Based on the comments above I think it makes sense to move.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

sahithreddyk picture sahithreddyk  路  3Comments

jkotas picture jkotas  路  3Comments

matty-hall picture matty-hall  路  3Comments

yahorsi picture yahorsi  路  3Comments

omariom picture omariom  路  3Comments