Runtime: Do loop unrolling for slow span copyto (slow path)

Created on 14 Mar 2017  Â·  20Comments  Â·  Source: dotnet/runtime

area-System.Memory easy enhancement up-for-grabs

Most helpful comment

@WinCPP, I opened Issue dotnet/corefx#18242 just for tracking purposes.

You can submit a fix separate or with this issue if you want.

All 20 comments

@shiftylogic can you please provide some hints for every up-for-grabs? Where to start, estimated complexity. Thanks!

Where to start
The implementation of Span.CopyTo method uses simple for / while loops for the slow path (which is used for any overlapping span structures or spans containing reference types). There are 2 loops in the CopyTo method (one forwards and one backwards depending on how the structures are overlapped).

For a reference on what loop unrolling might look like, have a peek at the Span.Fill implementation.

Time / Complexity
This is likely a small task (maybe a couple hours with testing). For testing, ensure that portable span is being used. With this work item, it might be useful to add a few more tests to the test suite as well to make sure all the overlapped scenarios are covered.

Marking as 'easy' (for new contributors) - @shiftylogic please adjust if you disagree.

May I :-) ?

@WinCPP Sure. Just submit a PR when you are ready.

Assigning to @WinCPP to capture he is working on it ...

I have started working on this now...

@shiftylogic One quick question about the spans in general and the logic in CopyTo (hope I don't sound stupid)... given the logic in CopyTo at this line mentioned below...

```C#
uint blockSize = byteCount > uint.MaxValue ? uint.MaxValue : (uint)byteCount;

Is it that the memory is always allocated in multiples of uint.MaxValue size? Sounds too large though... I was not able to understand the logic in while loop. Assuming `byteCount` were `uint.MaxValue + 256 bytes`, in first iteration it would copy `uint.MaxValue` number of bytes and in second iteration it should copy just 256 bytes. But as per the above logic, since `byteCount` didn't change inside the loop (can't change because it is also participating in exit condition), would `blockSize` not be `uint.MaxValue` again resulting into buffer overrun in the `CopyBlock`.

I think that the intention of using `(uint)byteCount` in the `else` part of `? :` was legit but just that it is not reachable...? Perhaps following was intended?

```C#
   uint blockSize = (byteCount - index) > uint.MaxValue ? uint.MaxValue : (byteCount - index);

Alternatively, declare blockSize as ulong outside loop and initializing it to byteCount and then use it inside loop...?

```C#
// Before while
ulong blockSize = byteCount;

while (index < byteCount)
{
blockSize = blockSize > uint.MaxValue ? uint.MaxValue : (uint)byteCount;
// rest of the logic...
}
```
I am confused here... Appreciate inputs.

This looks like a legitimate bug. The test cases for this are not part of CI, mostly because it requires allocating a very large chunk of memory to hit. The test cases I used for testing the functionality universally used a multiple of MaxValue, so we never hit the bad case.

Nice catch.

@shiftylogic. Thanks for the feedback. Should I fix it in this issue or we want to create separate issue?

@WinCPP, I opened Issue dotnet/corefx#18242 just for tracking purposes.

You can submit a fix separate or with this issue if you want.

Nice one @WinCPP

An unrelated question about Fill method... would it be an optimization if the primary fill loop ran with segments increasing as power of 2 until the filled size >= (total length / 2), and then copy remainder in single shot from the already filled segment of the span; both using Unsafe.CopyBlock? Such an algorithm would fill faster than running the primary fill loop as multiple of 8 which still assigns values one-by-one. The example below is simple one to illustrate what I want to convey, but real algorithm may have to balance out the profitability of single assignments in 8-jump loop against that of exponentially filling path which involves multiple more instructions due to extra logic within CopyBlock, but may turn out to be be more profitable for initializing large size contiguous memory segments. All this would be possible because we are initializing each element with same value. Just thought of sharing...

Between, why does Span.CopyTo take slow path if the type is a reference or contains references...?

E.g. Consider initialization of 46 elements as below where the numbers at end of each line indicate the counts after the step has been taken.

I = Increase
T = Total filled
^ = source marker for the iteration
| = fill place marker for the iteration
. = empty
0 = filled

0         1         2         3         4         5         6
0123456789012345678901234567890123456789012345678901234567890123456789
0............................................. I = 1   T = 1

 |
0............................................. I = 1   T = 2
^
  ||
00............................................ I = 2   T = 4
^^
    ||||
0000.......................................... I = 4   T = 8
^^^^
        ||||||||
00000000...................................... I = 8   T = 16
^^^^^^^^
                ||||||||||||||||
0000000000000000.............................. I = 16  T = 32
^^^^^^^^^^^^^^^^
                                ||||||||||||||
00000000000000000000000000000000.............. I = 14  T = 46
^^^^^^^^^^^^^^

would it be an optimization if the primary fill loop ran with segments increasing as power of 2 until the filled size

This was discussed in length in https://github.com/dotnet/coreclr/pull/8498 that had implementation very much like what you are proposing. The summary is: It is guaranteed to be slower for large number of elements because of processor cache polution. It may be faster for certain types of T, for some ranges of elements but figuring out the formula for that is rocket science. Not worth it.

@jkotas Thanks for giving the insight and sharing the issue dotnet/SqlClient#101. That answers my question about references as well! Just to summarize my understanding w.r.t. references... to avoid pinning, we use item by item copy, hence the slow path...

to avoid pinning, we use item by item copy

It is not done to avoid pinning. References need GC bookeping, and the item by item copy is the only way to achieve that in the general case.

Hi All! I'm stuck. Even with syntax errors in the file, the compilation succeeds. I worked on the same file for dotnet/corefx#18242 and things used to compile. I am not able to figure out what is happening.

I tried doing msbuild /t:Clean followed by msbuild in folder .\src\System.Memory (with purposeful syntax errors) but no use. I am not sure if the SpanHelpers.cs file is even being picked up. Also there is no build log file in .\src\System.Memory. Any directions are much appreciated! Thanks!

If it helps, these are the few lines after I fire msbuild.

M:\corefx\src\System.Memory [Issue-17118 ↑80 +0 ~1 -0 !]> msbuild
Microsoft (R) Build Engine version 14.0.25420.1
Copyright (C) Microsoft Corporation. All rights reserved.

Building the projects in this solution one at a time. To enable parallel build, please add the "/m" switch.
Build started 4/15/2017 9:41:56 PM.
Project "M:\corefx\src\System.Memory\System.Memory.sln" on node 1 (default targets).
ValidateSolutionConfiguration:
  Building solution configuration "Debug|Any CPU".
The target "ExpandProjectReferences" listed in a BeforeTargets attribute at "M:\corefx\buildvertical.targets (42,11)" does not exist in the project, and will be ignored.
The target "Sync" listed in a BeforeTargets attribute at "M:\corefx\Tools\codeOptimization.targets (85,11)" does not exist in the project, and will be ignored.
The target "BuildAllProjects" listed in a BeforeTargets attribute at "M:\corefx\Tools\versioning.targets (218,11)" does not exist in the project, and will be ignored.
The target "Sync" listed in a BeforeTargets attribute at "M:\corefx\Tools\OptionalTooling.targets (13,11)" does not exist in the project, and will be ignored.
Project "M:\corefx\src\System.Memory\System.Memory.sln" (1) is building "M:\corefx\src\System.Memory\tests\System.Memory.Tests.csproj.metaproj" (2) on node 1 (default targets).
Project "M:\corefx\src\System.Memory\tests\System.Memory.Tests.csproj.metaproj" (2) is building "M:\corefx\src\System.Memory\src\System.Memory.csproj.metaproj" (3) on node 1 (default targets).
Project "M:\corefx\src\System.Memory\src\System.Memory.csproj.metaproj" (3) is building "M:\corefx\src\System.Memory\ref\System.Memory.csproj" (4) on node 1 (default targets).
CoreCompile:
  C:\Program Files (x86)\MSBuild\14.0\bin\csc.exe /noconfig /unsafe+ /nowarn:1701,1702 /nostdlib+ /warn:4 /define:DEBUG;TRACE;DEBUGRESOURCES;SIGNED /reference:M:\corefx\bin/ref/netstandard/netstandard.dll /debug+ /debug:full /delaysign+
   /keyfile:M:\corefx\Tools/Open.snk /optimize- /out:M:\corefx\bin/obj/ref/System.Memory/4.0.0.0/netstandard/System.Memory.dll /ruleset:M:\corefx\CodeAnalysis.ruleset /target:library /warnaserror+ /utf8output /analyzer:M:\corefx\Tools/n
  et46/analyzers/Microsoft.DotNet.CodeAnalysis.dll System.Memory.cs M:\corefx\bin/obj/ref/System.Memory/4.0.0.0/netstandard/_AssemblyInfo.cs M:\corefx\Tools\/BlockReflectionAttribute.cs @M:\corefx\Tools\checksum.rsp
  Using shared compilation with compiler from directory: M:\corefx\Tools\net46\roslyn\tools
OpenSourceSign:
  Creating "M:\corefx\bin/obj/ref/System.Memory/4.0.0.0/netstandard/System.Memory.dll.oss_signed" because "AlwaysCreate" was specified.
CopyFilesToOutputDirectory:
  Copying file from "M:\corefx\bin/obj/ref/System.Memory/4.0.0.0/netstandard/System.Memory.dll" to "M:\corefx\bin/ref/System.Memory/4.0.0.0/netstandard/System.Memory.dll".
  System.Memory -> M:\corefx\bin\ref\System.Memory\4.0.0.0\netstandard\System.Memory.dll
  Copying file from "M:\corefx\bin/obj/ref/System.Memory/4.0.0.0/netstandard/System.Memory.pdb" to "M:\corefx\bin/ref/System.Memory/4.0.0.0/netstandard/System.Memory.pdb".
BinPlaceFiles:
  Creating hard link to copy "M:\corefx\bin\ref\System.Memory\4.0.0.0\netstandard\System.Memory.dll" to "M:\corefx\bin/ref/netcoreapp/System.Memory.dll".
  Creating hard link to copy "M:\corefx\bin\ref\System.Memory\4.0.0.0\netstandard\System.Memory.pdb" to "M:\corefx\bin/ref/netcoreapp/System.Memory.pdb".
  Creating hard link to copy "M:\corefx\bin\ref\System.Memory\4.0.0.0\netstandard\System.Memory.dll" to "M:\corefx\bin/ref/netstandard/System.Memory.dll".
  Creating hard link to copy "M:\corefx\bin\ref\System.Memory\4.0.0.0\netstandard\System.Memory.pdb" to "M:\corefx\bin/ref/netstandard/System.Memory.pdb".
Done Building Project "M:\corefx\src\System.Memory\ref\System.Memory.csproj" (default targets).

Don't know if I should suspect these subset of lines from above... where System.Memory.dll is being copied from netstandard... I haven't changed anything in my sln / csproj files...

CopyFilesToOutputDirectory:
  Copying file from "M:\corefx\bin/obj/ref/System.Memory/4.0.0.0/netstandard/System.Memory.dll" to "M:\corefx\bin/ref/System.Memory/4.0.0.0/netstandard/System.Memory.dll".
  System.Memory -> M:\corefx\bin\ref\System.Memory\4.0.0.0\netstandard\System.Memory.dll
  Copying file from "M:\corefx\bin/obj/ref/System.Memory/4.0.0.0/netstandard/System.Memory.pdb" to "M:\corefx\bin/ref/System.Memory/4.0.0.0/netstandard/System.Memory.pdb".
BinPlaceFiles:
  Creating hard link to copy "M:\corefx\bin\ref\System.Memory\4.0.0.0\netstandard\System.Memory.dll" to "M:\corefx\bin/ref/netcoreapp/System.Memory.dll".
  Creating hard link to copy "M:\corefx\bin\ref\System.Memory\4.0.0.0\netstandard\System.Memory.pdb" to "M:\corefx\bin/ref/netcoreapp/System.Memory.pdb".
  Creating hard link to copy "M:\corefx\bin\ref\System.Memory\4.0.0.0\netstandard\System.Memory.dll" to "M:\corefx\bin/ref/netstandard/System.Memory.dll".
  Creating hard link to copy "M:\corefx\bin\ref\System.Memory\4.0.0.0\netstandard\System.Memory.pdb" to "M:\corefx\bin/ref/netstandard/System.Memory.pdb".
Done Building Project "M:\corefx\src\System.Memory\ref\System.Memory.csproj" (default targets).

The CopyTo implementation that you are fixing is for "slow Span". "slow Span" is not used in netcoreapp. The best way to test it is using .NET Framework: First build from the root using build.cmd -framework:netfx, and then specify TargetGroup=netfx when testing your changes, like src\System.Memory\tests>msbuild /p:TargetGroup=netfx /t:BuildAndTest.

Thanks @jkotas. It helped!

@shiftylogic @jkotas I've filed PR dotnet/corefx#18435. Kindly check.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

noahfalk picture noahfalk  Â·  3Comments

nalywa picture nalywa  Â·  3Comments

matty-hall picture matty-hall  Â·  3Comments

aggieben picture aggieben  Â·  3Comments

Timovzl picture Timovzl  Â·  3Comments