Runtime: API Proposal: Add Span accessor for MemoryMapped files

Created on 1 Jun 2020 · 13Comments · Source: dotnet/runtime

Background and Motivation

.NET had acccess to memory mapped files for a long time but using them either requires unsafe pointers, a BinaryReader|Writer like API or Stream. With the advent of Span one could access them more easily and pass them directly without intermediate copies / buffers to a larger set of APIs.

Proposed API

We would add

namespace System.IO
{
    public class MemoryMappedFile : IDisposable
    {
+        public MemoryMappedMemoryManager CreateMemoryManager();
+        public MemoryMappedMemoryManager CreateMemoryManager(long offset, int size);
+        public MemoryMappedMemoryManager CreateMemoryManager(long offset, int size, MemoryMappedFileAccess access);
    }

+    public class MemoryMappedMemoryManager : MemoryManager<byte>
+    {
+    }
}

Unlike most other Span APIs we allow passing long offset and int size in order to work with files larger than 2GB which we cannot directly slice into due to all Span related classes being int length based. If you need to work with files larger than 2GB, you need to call CreateMemoryManager with increasing offsets.

Usage Examples

``` C#
using MemoryMappedFile file = MemoryMappedFile.CreateFromFile("test.txt");
using MemoryMappedMemoryManager manager = file.CreateMemoryManager();
Memory memory = manager.Memory;
Console.WriteLine(Encoding.UTF8.GetString(memory.Span));

## Alternative Designs
We could also add a `string.Create` like API to the `MemoryMappedViewAccessor` where `MemoryMappedViewAccessor` manages the lifecycle and the design ensures that the Span does not outlive the `MemoryMappedViewAccessor`.
```diff
namespace System.IO
{
    public class MemoryMappedViewAccessor : IDisposable
    {
+        public void UseSpan<TState>(TState state, SpanAction<byte, TState> action);
+        public TResult UseSpan<TState, TResult>(TState state, SpanAction<byte, TState> action);
    }

which would be used like
C# using MemoryMappedFile file = MemoryMappedFile.CreateFromFile("test.txt"); using MemoryMappedViewAccessor accessor = file.CreateViewAccessor(); accessor.UseSpan((object)null, (span, state) => { Console.WriteLine(Encoding.UTF8.GetString(span)); });

Risks

Low risk as far as only adding APIs is concerned.
Designs that allow the Span to outlive the memory mapped file could encounter an access violation if trying to use the span past that point.

api-suggestion area-System.IO

Source

Suchiman

Most helpful comment

I really hope we don’t end up with this callback design everywhere. I understand we don’t have the means to track lifetime like this outside of the stack but I fear we’ll end up with a set of overloads like this.

davidfowl on 1 Jun 2020

👍5

All 13 comments

The Span could be used after disposal of the MemoryMappedFile(ViewAccessor) though, leading to an access violation when attempting to use the Span.

That's a _substantial_ risk.

Edit: Consider your earlier example, which I've copied again below.

using MemoryMappedFile file = MemoryMappedFile.CreateFromFile("test.txt");
ReadOnlySpan<byte> span = file.CreateReadOnlySpan();
Console.WriteLine(Encoding.UTF8.GetString(span));

If you forget to put the _using_ keyword in front of that, your application could AV. Just due to a single missing keyword. Nobody reviewing this code would have any indication that they're entering a world of manually managed memory. See also https://github.com/dotnet/runtime/issues/33768, which any UnmanagedMemoryAccessor which returns a Memory<T> or Span<T> instance would immediately run afoul of.

GrabYourPitchforks on 1 Jun 2020

@GrabYourPitchforks An alternative design that ensures that the Span does not outlive the MemoryMappedFile would be similiar to string.Create, i've added that to Alternative Designs.

Now that you mention it though, MemoryManager<T> is the type i've had in mind for this design but couldn't remember.

Suchiman on 1 Jun 2020

👍1

@GrabYourPitchforks i've revised the proposal with MemoryManager and removed the ones that seem too dangerous. MemoryManager however does not alleviate the danger of an access violation i presume? I've only read about it in passing.

Suchiman on 1 Jun 2020

Your "alternative design" callback-based proposal doesn't run the same risk of AVs as returning the span directly. The implementation of MemoryMappedFile can ensure that the span remains alive during the duration of the callback.

The MemoryManager part would still remain dangerous since it's a wrapper around unmanaged memory. At that point, I'm not opposed to forcing the caller to drop down to unsafe C# code, obtain the raw _void*_, and create the Memory<T> or Span<T> themselves. Once the caller writes the word "unsafe", it's on them to ensure proper memory management.

GrabYourPitchforks on 1 Jun 2020

👍1

davidfowl on 1 Jun 2020

👍5

@GrabYourPitchforks @VSadov @Maoni0 what if we had a Pinned Byte Array Heap backed by memory mapped files ...

mjsabby on 1 Jun 2020

@Suchiman , you can third-party implementation like this. The library is on NuGet.

My concern with memory-mapped file is different. I would like to iterate through mapped segments using ReadOnlySequence<byte> value type. To do that, ReadOnlySequenceSegment<T> should provide "activation" and "deactivation" methods:

public abstract class ReadOnlySequenceSegment<T>
{
  internal protected virtual void Activate();
  internal protected virtual void Deactivate();
}

Activate method can be called by ReadOnlySequence when the segment becomes active, i.e. selected as the current segment. With this two methods, it's possible to write lazy loading and unloading segments of memory-mapped file.

sakno on 2 Jun 2020

👍1

@davidfowl I agree. One thing I'd like to see the runtime maintain is an arbitrary memory range to object mapping. For example:

0x00007000_10000000 .. 0x00007000_1FFFFFFF -> [object A]
0x00007000_20000000 .. 0x00007000_200007FF -> [object B]

When the GC is running, for any T& seen (not T* since they're not GC-tracked), any object associated with the range will be seen as "live" and won't be collected. This means that you can associate arbitrary memory ranges with explicit SafeHandles.

It doesn't solve all problems. For example, it could make GC more expensive because now it's _yet another_ location that needs to be queried during GC. Additionally, it won't solve AVs resulting from use-after-free. We still don't want to end up with an API surface where this is possible:

var obj = GetSomeObject();
var span = obj.GetSpan();
obj.Dispose();
var value = span[0]; // AV?

In general, AVs should _only_ be possible within an unsafe block or when using unsafe-equivalent code. For the above pattern, it might be fixed by deferring the dispose as shown below. This pattern would still rely on the memory range sidecar table mentioned earlier.

var obj = GetSomeObject();
var span = obj.GetSpan();
obj.Dispose(); // no-ops since GetSpan() was called and the Span was returned
var value = span[0]; // no AV
// we now just have to wait for obj's finalizer to kick in

GrabYourPitchforks on 3 Jun 2020

❤1

@GrabYourPitchforks Frozen Objects do that by setting up GC segments with arbitrary ranges and set a flag that says the heap is read only.

I don't know if much of that could be repurposed. Also I don't know how these segment additions scale. What happens if you have 1000 segments?

mjsabby on 3 Jun 2020

What happens if you have 1000 segments?

Definitely a good question for GC folks. @Maoni0, thoughts? I'm just pulling ideas out of thin air and haven't thought any of this through. :)

GrabYourPitchforks on 4 Jun 2020

@GrabYourPitchforks assuming checking the X (1000 in the above comment) ranges in addition to stack scan that already occurs for by-ref types is prohibitive, one mitigation could be that this is only done for Gen2 GCs or something. This would of course be in addition to the basic pay for play check where we wouldn't even enter this code path if there are no active memory mapped files that were mapped using this API.

mjsabby on 19 Aug 2020

@richlander @jkotas I said I would tag you on the issue

This is a sorely missing capability only the runtime can solve. Provable safe memory mapped files very much like by-ref types or a similar mechanism to Frozen Objects of setting up new segments. Need assessment on how bad the perf impact if there are 1000s of such things.

mjsabby on 9 Oct 2020

obj.Dispose(); // no-ops since GetSpan() was called and the Span was returned

The check to turn Dispose into no-op here would require suspending the runtime and doing stack walk on all threads. It would make the Dispose very expensive.

I am not convinced that this design works.