Runtime: Buffer::BlockCopy may spend too long without GC polling

Created on 9 Oct 2019 · 9Comments · Source: dotnet/runtime

Buffer::BlockCopy is an FCALL that delegates to the native memmove which may spend a lot of time there depending on how much is being copied.

If GC needs to sync with user threads at inconvenient time, everything will stop until the memmove is done. GC would wait for memmove, everything else will wait for GC.
There was an actual scenario reported when GC pauses could take up to a minute due to this.
(dealing with very large streams, potentially swapped out, ...)

We should either "chunk" large copying into smaller pieces with intermittent GC polling, or just move the whole thing to managed code.

area-VM-coreclr

Source

VSadov

Most helpful comment

move the whole thing to managed code.

This. We have prior art in CoreRT. I will give it a shot

jkotas on 9 Oct 2019

👍5

All 9 comments

move the whole thing to managed code.

This. We have prior art in CoreRT. I will give it a shot

jkotas on 9 Oct 2019

👍5

Would be interesting to compare, in mono we emit @llvm.memmove intrinsic for Buffer.BlockCopy (and we ask llvm to place safepoints for us)

EgorBo on 9 Oct 2019

and we ask llvm to place safepoints for us

Do you have safepoints inside the loop that @llvm.memmove expands into when the length is not constant?

jkotas on 9 Oct 2019

@jkotas just checked, unfortunately we don't, llvm is able to unroll it for small constants but otherwise it converts it into a libc call so the problem remains.

UPD: However, there is an ability to expand memmove into loops (expandMemMoveAsLoop) in LLVM IR it seems (and then -place-safepoints will be able to insert sp placeholders for us, will check)

EgorBo on 9 Oct 2019

This. We have prior art in CoreRT. I will give it a shot

The memcpy routine that is distributed with MSVC (C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.24.28202\crt\src\x64\memcpy.asm) currently defines a "large block" as 128-bytes (for non overlapping buffers) and uses prefetching and non-temporal stores for this scenario.

Is this something that CoreRT was handling?

tannergooding on 9 Oct 2019

Is this something that CoreRT was handling?

That is handled in both CoreCLR and CoreRT. Both delegate to CRT for blocks over certain size (with a proper PInvoke frame that avoids the GC starvation problem).

jkotas on 9 Oct 2019

👍2

Keeping this open to fix other places where we do large copies in cooperative mode.

jkotas on 18 Oct 2019

This is fixed for all memory copy variants exposed by the framework now.

We have other similar problems for sure. I have opened dotnet/coreclr#27683 on one found via codereview. Also, @adamsitnik is going to run experiment to see whether Benchmark.NET can be used to find these types of issues.

jkotas on 5 Nov 2019