Runtime: Span<T>

Created on 17 Jun 2016  Â·  88Comments  Â·  Source: dotnet/runtime

Turn https://github.com/dotnet/corefxlab/tree/master/src/System.Slices into efficient first class feature.

area-Meta

Most helpful comment

I do not think it makes sense to have first class blittable constrain - it is too much complexity, for the value that it provides.

The ability to convert from say Span<int> to Span<byte> proved really useful in Midori. It's something I'd love to preserve with CLR as well.

The language already has this exact concept in the spec under the guise unmanaged type. Essentially a struct that transitively contains no reference types. The compiler is already making this check for other scenarios. Why not expose it as a real language feature and generic constraint?

struct Point { int x; int y }
void M<T>() where T : blittable // implies struct
M<Point>() // Okay 
M<object>() // Nope, not blittable 

All 88 comments

  • [ ] Marshaler support (as with arrays)?

I just love it, hope for some up-for-grabs issues!

Does "Add Span to framework contracts" include items like adding to Stream Read/Write etc? Or is it making Span available as a type? (Assuming latter, and others will be new issues after)

The CoreFxLab's prototype has useful extensions methods for Span<byte> and Span<char> like Read/Write and StartsWith/EndsWith. Will CoreCLR's Span get them?

Would there be any sane method to allow Span<> to integrate with List<> that it could operate on the underlying array?

@omariom, we need the corfxlab API and the corlcr one to be the same (portable). The reason for it is that I think we need to use the corfxlab implementation for previously shipped runtime/frameworks, e.g. .NET Framework 4.5. Otherwise, the addoption of Span will be very much inhibited by inability to write portable code. Of course, the out-of-band span will be slower, but as long as the API is available in portable code, we should be good.

@benaadams, we need some existing APIs, e.g. Stream.Read/Write to accept spans, but I think it's a separate workitem form the one @jkotas listed.

@dotnetchris, what do you consider "sane" in this context? If you mean wrapping Span into List then no (Span is stack only, and does not always have "underlying array").

@adamsitnik, I am not sure we want completely open up for grabs so early (as there is lots of tricky work here) but you have done lots of good work in corfxlab and so if you (presonally) would like to take dotnet/coreclr#5857, it would be great! Same for @omariom :-)

@KrzysztofCwalina Thanks! consider it done ;)

@KrzysztofCwalina i meant for Span to access List's internal array

I think we might also need to constrain T to primitive type (i.e. type with no managed object references). Otherwise, it will be possible to create Span over native memory without GC being aware of the roots.

@KrzysztofCwalina
Only certain operations seem unsafe in this sense: Cast, BlockEquals and Span<byte>.Read/Write.

@dotnetchris, yes, it would be great to add List<T>.Slice that returns Span<T>. Same with other slicable types, e.g. String.Slice -> Span<char>.

@omariom, it seems like accessors and copy operations too, e.g.
var nativeSpan = new Span(nativeMemory, len);
nativeSpan[0] = new object();
// will GC know that the object is rooted? will it change the native memory when the object is rooted and moves?

Similarly when an array backed span of object sis copied to native span.

@KrzysztofCwalina
Then I think it is worth adding a new constraint to CLR and C# - blittable(or other name). And ability to apply it at members level.
With such constraint we could have Spans of any type but some operations would be only allowed on Spans of types having no references.

For these kind of operations, I think the blittability should be enforced via a dynamic check if needed, that the JIT can eliminate by treating it as intrinsic; or by Roslyn analyzer. I do not think it makes sense to have first class blittable constrain - it is too much complexity, for the value that it provides.

Also, the casting between different blitable Span types is pretty questionable operation. It has portability problems because of it lets you create misaligned Spans. I do not think it should be in the "safe" Span API set. (Having it in "unsafe" span API set is fine.)

var nativeSpan = new Span(nativeMemory, len);
nativeSpan[0] = new object();
// will GC know that the object is rooted? will it change the native memory when the object is rooted and moves?

When you are doing interop with unmanaged pointers, you have to know what you are doing and you have to get it right. If you do not get it right, your program will crash in a very bad way.

In your example, the GC will not track the native memory and your program will crash in a very bad way. It is no different from if you pass wrong len to Span(void * p, int len) that will crash your program in a very bad way as well.

I know that we have historically tried to prevent some of the bad uses, but I always had mixed feelings about it.

Maybe we should have the dynamic check for blitability in the regular Span(void * p, int len), but also have extension method without any checks in "unsafe" span API set for power-users?

I think it would be really good to have a "unsafe" span API set for high perf scenarios. As long as there will be a code path without unnecessary checks in cases where you know what you are doing.

Also I hope it will still be possible to query the span whether it is based on native or managed memory and have access to native pointer or managed array/pinning depending on which memory is used.

@jkotas

If Type had ContainsReferences property treated by JIT as a constant then it would be no-op in the dangerous methods when T is blittable and throw otherwise.

For CoreLab's Span another solution should be found. May be Reflection? Though it will be slow.

For corefx one might use the solution shown here: https://github.com/AndreyAkinshin/BlittableStructs/blob/master/BlittableStructs/BlittableHelper.cs

It is probably pretty slow (haven't benchmarked it) on first call, but should then be fast perhaps even inlineable. The helper is also discussed in a blog post: http://aakinshin.net/en/blog/dotnet/blittable/

Using ReadOnlySpan<char> as the span type for substrings seems to preclude the addition of a lot of interesting/useful text-specific members (intuitive ToString(), StartsWith(string), etc.). Is there room for a separate StringSpan type for this case?

I do not think it makes sense to have first class blittable constrain - it is too much complexity, for the value that it provides.

The ability to convert from say Span<int> to Span<byte> proved really useful in Midori. It's something I'd love to preserve with CLR as well.

The language already has this exact concept in the spec under the guise unmanaged type. Essentially a struct that transitively contains no reference types. The compiler is already making this check for other scenarios. Why not expose it as a real language feature and generic constraint?

struct Point { int x; int y }
void M<T>() where T : blittable // implies struct
M<Point>() // Okay 
M<object>() // Nope, not blittable 

The ability to convert from say Span to Span proved really useful in Midori.

Could even have a safe convert where it checks (length * sizeof(T1)) % sizeof(T2) == 0 (might want also an unsafe convert where it doesn't - as may be pre-checking and re-slicing).

You can already have explicit layout overlapping arrays; but the length property gets confused as its based on the instantiation type (which makes sense); but the span approach could have the appropriate length for each sized datatype.

Example use reading from stream/disk/network from byte[] => Vector3[] or saving Vector3[] => byte[]

@nblumhardt, the operations you listed could be extension methods over ROS, couldn't they?

@KrzysztofCwalina yes, though the ergonomics of extension methods are still second-rate because of the need to add a using statement before they're discoverable.

Tooling helps, and is improving, but at present it's still easier to find members on a type than it is to find extensions. The prevalence of "remove unused usings" as an automated refactoring means using System; is not reliably present in every code file. Perhaps a minor issue.

Even if the extension method route is preferred, could the character-oriented extensions like StartWith() be included in this PR, just in case the experience of implementing them reveals any opportunities.

(Stretching a little further, would implementing and optimizing the most common string-like methods on Span<char> open the door to providing them in the future on Span<byte> with UTF-8 or user-specified encoding?)

Thanks for the reply, apologies if I missed earlier discussions around this BTW :-)

I've been (somewhat) following development of the C# Slicing issue, but I'm having trouble understanding how exactly these two issues are related. A few questions to clear things up:

  1. Is this the CLR-part of the actual array slicing feature in C#, or considered something entirely different?
  2. As far as I followed slicing, a core feature of an array slice was that it could be treated the same as an array. Otherwise - it would be impossible to integrate with existing libraries and APIs and essentially turn into another ArraySegment<T> that exists but is barely used. Will this be possible with a Span<T>?
  3. It seems that Span<T> is currently considered stack-only. That does rule out a few use cases. Is this a temporary state or considered a final decision?

Overall, I'm in huge favor of having a mechanism (like Span<T>?) to provide differently typed, offset or truncated views on a memory block, mostly for performance reasons. However, I'm also worried that it might stop halfway to its full potential, see also this CoreCLR issue.

Sorry to interrupt - as a mostly passive observer, I might have missed some details.

@ilexp it's a bit confusing because the ability to have efficient slices requires both runtime and language work.

  1. Yes. The CLR portion is making the necessary runtime changes to properly handle the "ref" field used in Span<T>.
  2. No. Having Span<T> == T[] is not a goal of this particular approach. This approach is solving a slightly different problem: uniform representation for contiguous memory allocations. Whether that memory be arrays, native memory or strings. Unifying T[] and Span<T> is possible and has been explored in great detail. But it has a number of negative tradeoffs associated with it.
  3. This particular type is indeed stack only. The design includes mechanisms for heap storage though when needed (so yes you will be able to use it in async methods).

Has unifying T[*] and Span<T> been considered and if so what are the negative trade-offs associated with that?

I assume by T[*] you mean arrays? If yes, than unifying Span and T[] would make T[] slower, i.e. additional indirection when accessing array items in cases where the compiler cannot optimize the indirection out. In addition, arrays would become one word/pointer larger

... and the unified Span would be slow and expensive too.

If yes, than unifying Span and T[] would make T[] slower,

Depends on which design. The design where Span<T> is a class will have the behavior you outlined. In the version where it is a struct the perf should be equivalent. The downside of course is tearing issues.

My original question was poorly thought out. What I meant was that the Span<T> proposal seems to be along the lines of:

  1. slicing into array/strings
  2. reinterpret_casting structs as arrays e.g. struct Point3d { int x, y, x; }; int[] arr = Span.Reinterpret<int[]>(ref some_Point3d);

But what is also desired by people is the ability to reinterpret single/multidimesional arrays as a single/multidimensional arrays with custom strides/starting indices. e.g.

``` C#
int[] arr = new int[100];
var sub_arr = Span.Slice(arr, 10, 10); // reinterpret arr as a 10x10 matrix

int [,] arr2 = new int[10, 10];
var arr2.Column(0); // get a column vector. This is a slice with a stride of sizeof(int[10]) between elements

var arr3 = Span.Reversed(arr); // This is a slice with a stride -sizeof(int) to simulate reversal without allocating
```

The original slice proposal is biased towards 0 indexing, adjacent elements, and single-dimensional arrays. It would be nice if the slicing proposal could be slightly tweaked to support the above kind of use cases.

Depends on which design. The design where Span is a class will have the behavior you outlined. In the version where it is a struct the perf should be equivalent. The downside of course is tearing issues.

From a performance-oriented perspective, I'd be against any solution that would make T[] slower. That just seems like it isn't worth it. However, if there is a (struct-based) approach without negative performance implications, why explore that direction in a prototype?

As far as tearing issues go, can you explain in a few words what's behind that? If these issues only occur in certain cases, is it possible to solve it by implementing a validation when creating the affected Span<T>?

@ilexp

As far as tearing issues go, can you explain in a few words what's behind that?

Imagine for a second that Span<T> was defined roughly as the following:

struct Span<T> { 
  object DataPointer;
  int Length;
  int Index;
}

This is a multi-word struct hence assignment of Span<T> is not guaranteed to be atomic. This means that when a field of Span<T> is assigned on one thread, it's possible to view only part of the assignment on another thread. For instance the other thread may see the change in DataPointer but not Length or Index.

For user defined types this can create odd combinations of values but doesn't threaten the safety of the system. For types like Span<T> though this is not the case. The consistency of the data is important to maintaining type safety. Consider this example:

// Thread1:
sharedObject.Field = (new int[] {}).AsSpan();

// Thread2 
sharedObject.Field = (new int[] {1, 2, 3}).AsSpan();
Console.WriteLine(sharedObject.Field[1]);

Since assignments of Span<T> are not atomic it's possible for the read of sharedObject.Field[1] to see parts of the value assigned in Thread1 and parts of the value assigned in Thread2. That means it could potentially see: DataPointer from Thread1, and Length + Index from Thread2.

In that case the bounds check would succeed because 1 <= 2 but the pointer being read from has a real length of 0. The user is now reading (or worse writing) random memory in the system. Type safety is gone.

@jaredpar Ah, that's bad. Hypothetically, are there ways to make the assignment of Span<T> an atomic operation? Does the CLR have an atomic-assign instruction, or a way to JIT certain assignments in a way that they are essentially atomic?

Does the CLR have an atomic-assign instruction, or a way to JIT certain assignments in a way that they are essentially atomic?

CLR more or less offers whatever the hardware offers and the hardware doesn't offer a lot when the data exceeds the machine word size (8 bytes in the case of x64). x64 has cmpxchg16b which may be useful here but it may also turn out to be quite expensive.

And it's not only the assignment itself that is a problem. When you're calling a member of a struct you're passing the address of the struct value as this, you're not making a copy/assignment. The called Span<T> method would need to make an atomic copy of itself and use that copy. That means that not only normal field assignments are potentially slow but so are all uses of a Span<T> field.

@jaredpar @jkotas

The language already has this exact concept in the spec under the guise unmanaged type. Essentially a struct that transitively contains no reference types.

So the language has it, as well as the runtime.
Let's just unleash the beast to the public! )

Add Span and ReadOnlySpan as yet another ByRef-like type in Roslyn, so that unsupported operations on Span like boxing or creating Span[] are compile errors

Instead of the hardcoded list hack, having an attribute for this is long overdue.

It is already decided that Span is on stack thing because of torn writes issues.
But do torn writes completely forbid non atomic structs - like slices or delegate as struct - to be located on heap? What if the field is readonly?

C# class Foo { private readonly Span<byte> bytes; }

Wrote and then realized this can be assigned to a visible location right in the ctor.
Or ctors can be inlined and their code can be reordered with the publication of the object ref :disappointed:

How does a stack only based Span<T> work with auto heap promotion?

e.g.
async - its stack but on the heap (would probably be ok)
closure - its a lambda but it captures (would probably be bad)

Can they work with async?

It does not work well with these. Async code will have to hold to a reference type "request context", and only get spans from such context.
An example of such context is HttpRequest in https://github.com/dotnet/corefxlab/tree/master/demos/LowAllocationWebServer

Sort of serialize and re-hydrate the span? Could that be built into the async machinery in someway as async shouldn't see torn-writes as its "acting" as sync stack code. Might be something for roslyn?

I don't think we can allow "serializing" ("boxing") spans. The conversion is one way: from context to span.
And so in all prototypes I developed, I ended up with the following flow:

  1. Server rents buffer from a pool (can be lazy)
  2. Server wraps the buffer into "request context"
  3. The context is passed to user code to process the request.
  4. If user code needs to get access to the request bytes, it gets a span from the context. The context never gives raw buffer (e.g. array) to user code.
  5. If user code needs to suspend the thread (for async call), it stores the context for later when it's woken up.
  6. When the request it completed (i.e. response sent), the server returns the buffer to the pool, knowing that user code cannot be holding to it as the server only ever gave spans (stack only) to user code.

@benaadams the implementation of compiler features async, await, lambdas, etc ... use heap storage for lifted values. This causes issues with Span<T> as it can't be stored in the heap directly due to the use of interior pointers and tearing issues.

It's possible for the compiler to work around this by using a different type for heap storage. Essentially think of HeapSpan<T> which removed the interior pointer and didn't have tearing issues. The compiler could do translation under the hood to make Span<T> work with async.

This only works though if there is a way to translate between Span<T> and HeapSpan<T>. Such a translation has to solve a couple of problems:

  1. Determine if the span memory is native or managed.
  2. In the case of managed find the head of the object to which the interior pointer refers to.

Once both of these is understood it's easy to construct a safe HeapSpan<T> type which looks essentially like the following:

class HeapSpan<T> { 
  object Data;  // Head of object if managed, null if native memory
  IntPtr Offset; // Offset from Data where Span<T> starts
  int Length;
}

Unfortunately getting the above information is quite tricky and the viability of the solution depends on the implementation of Span<T>. There are a couple to consider:

Span (ref T, int)

In this case the design of Span<T> is essentially:

struct Span<T> { 
  ref T Data;
  int Length;
}

Determining if this object points to native memory is practically impossible. Sure it's possible with a lot of very slow heap scanning to determine at a specific point in time if the memory is managed or native. But the moment that determination is made it can be invalidated by actions on other threads.

Finding the head of Data is similarly expensive. The result is more stable because the result refers to a managed object.

These taken together mean that this design really can't have a safe heap translation.

Span (object, ref T, int)

In this case the design of Span<T> is essentially:

struct Span<T> { 
  object Head;
  ref T Data;
  int Length;
}

This type works by using Head to answer both of the pertinent questions. In the case of managed memory Head points to the object for which the interior pointer exists. In the case of native memory itsnull. Hence it's trivially to convert this implementation toHeapSpan` safely.

There are a few downsides of HeapSpan<T> to consider:

  • Generally requires a fatter Span<T> type.
  • The implementation trick can bleed into presentation in some corner case language scenarios.
  • Makes 100% safe native allocations over Span<T> impossible. Consumer can always store your native allocation in the heap.

When the request it completed (i.e. response sent), the server returns the buffer to the pool, knowing that user code cannot be holding to it as the server only ever gave spans (stack only) to user code.

This is an interesting twist :+1:

It would mean though that a span couldn't be passed into an async function (like a ref can't)?

Can we somehow have Span fields but only in unsafe context? Like pointers..

hmm2

@benaadams, correct. In code I wrote, the thing that flows across threads is a reference type "context". The context holds to buffers, but never hands the buffers to separate components, but rather wraps them into spans first. The code that works on spans is synchronous.

@omariom, we have toyed with the idea, but it would require complex work in the runtime, compilers, language syntax, etc. I wonder if we cannot start without it (and live with the stack-only hard limitations) and then add this feature later when the limitations prove too large.

Please see dotnet/coreclr#7636 for my concerns regarding the name "Span". I think that this type should be renamed or moved into a different namespace to avoid collisions with existing Span types.

I created a separate issue to keep this issue clean. Looking forward to your feedback. :)

/cc @russellhadley

Is the problem of the smart field pointer "object or native memory pointer" is solved or what is the current plan?

I was wondering if we could introduce a new kind of "smart" pointer type which would be recognized by the GC to scan it safely and efficiently. For example, would it be possible to assume that we reserve 1 bit for this smart pointer (from example, from the lower bit where we don't allow to point to an odd aligned address) that would indicate whether the pointer is a managed or an unmanaged memory? Downside is that it would slow down indexer access (to clear the bit and access the real memory behind), not sure also if such approach is entirely feasible with how the GC is working... (e.g impossible to have a non-interruptible intrinsic cast between smart pointer to managed/unmanaged)... thoughts?
The nice thing is that it would be possible to store this smart pointer into the heap and allow Span<T> on the heap.

If it was possible, that would open the usage of such smart pointer for similar scenarios outside of span, that would be amazing!

Something like this:

``` c#
// Special type recognized by the GC that can point to either a managed
// or unmanaged memory and has a bit stored directly in the lower bits to detect it
struct SmartPointer { IntPtr ptr}

struct Span{
SmartPointer smartPtr;
int length;
}
```

([Edit] After thinking about it, not sure it plays well with pointer into an array element so that we could not avoid having to store an index along the length and store the root array in the smart pointer?[/Edit])

@xoofx I think the Memory<T> component which can be on heap and you can surface Span<T> from might cover this? Its still in corefxlab as the design isn't finalised.

Currently the source for Span.cs doesn't have the following ctor overload, which is included in the API Review for Span<T>

public Span(T[] array, int start);

It has public Span(T[] array) and public Span(T[] array, int start, int length), but not this one, should it be added?

Yes, the public surface of the coreclr builtin Span needs to be updated for the most recent design iteration. There are a few more methods in addition to the constructor you have mentioned. @AtsushiKan is OOF till end of the year. If somebody would like to do it sooner, PR with the update will be welcomed.

@jkotas, I just sent a PR, see https://github.com/dotnet/coreclr/pull/8354

Just throwing this out there... are memory-mapped files enough of a use case to justify long instead of int for the "length" values at this point?

(edit: and obviously "offset" values as well where applicable)

We decided against long length. See https://github.com/dotnet/apireviews/tree/master/2016/11-04-SpanOfT#spant-and-64-bit

@KrzysztofCwalina: thanks for the link. I came to roughly the same conclusion as you guys did when thinking about it myself, until I realized that memory-mapped files could be exposed as a Span<byte> the "unsafe" way and everything would "just work"...

...unless the file happens to be over 2 GiB in size (not uncommon, I imagine), in which case it's pretty much impossible to make that interoperate nicely with Span<T>-based APIs without building extra infrastructure around it. Even then, the limitation seems much more likely to come up in these situations than when dealing with more "traditional" types of contiguous virtual memory blocks.

I just wanted to make sure this use case was at least briefly considered and deemed not worth the cost of changing Span<T> APIs to 64-bit now (as opposed to later, once there's a bunch of production code actively using the 32-bit APIs and expecting them to keep working).

Span is a stack space window on the data; its easy enough to create a span window beyond 2GiB on it:

new Span(memStream.PositionPointer + longOffset, length)

The extra interop would be in creating a new span every 2GiB. Bear in mind that the Span is ephemeral and is a stack-only window; so the question on how much issue it would cause is how often is >2GB is processed in a single loop, with no async, I/O waits etc...

@benaadams agreed that creating windows of up to 2GiB each is how that use case would have to work; this kind of thing is what I was referring to when I said "building extra infrastructure around it".

At the risk of sounding overly contrarian or even particularly attached to this point, you don't have to actually process >2GiB in the stack-only lifetime of a single Span<T> instance to run into problems; all it takes is a file format where data at one point in the file lists an offset to a different point in the file, perhaps very far away. I imagine that some database applications are examples of things that might work like this.

Heck, a persistent developer could work around this, of course; it may just require some redesign if they start with having everything accept Span<T> and then the application outgrows that limitation. Whether or not that usability hit is acceptable, compared to the cost of making all Span<T> instances "jumbo" for everyone at this point in time, is something I have no way to even begin to answer.

I just wanted to raise up a reason why a >2GiB Span<T> might actually be a more commonly desired use case than anticipated, since it doesn't imply someone actually allocating a >2GiB buffer in memory with the intention of actually using all or most of the bytes.

(of course, when I say 2GiB, I'm under the impression that Span<short> would probably be able to support up to 4GiB, Span<int> would probably be able to support up to 8GiB, and so on...)

How Span will work from F# when it is implemented natively in CLR and becomes stack-only? E.g. I have a C# library that already uses Span in some places (and I am preparing to use it much more), but then that library is consumed by F# libraries. One of the important use case is a lambda with a span: Reduce(Func<Span<T>,U> reducer), transforming a span without a risk to return a reference to an underlying array. (So far I assume that ByRef Span could be used inside such lambda.)

Will there be any issues with using native Span from F#, or it will just work at CLR level and no changes from F# compiler will be needed (or are they are already planned)? Or it will fallback to manual Span implementation as a struct?

Span<T> as of right now will have all of the same restrictions as T* (so it can't be used in generic arguments). It will also require a new version of C# to properly use it and will likely be blocked in older versions of the compiler. I assume F# would need to do similar work for it to work there.

/cc @jaredpar

Func<Span<T>,U>

Span<T> cannot be used as generic argument. Func<Span<T>,U> won't compile or run once the Span is fully implemented.

You may want to look at Memory<T> https://github.com/dotnet/corefxlab/blob/master/docs/specs/memory.md and use that for generics or interop with F#.

Thanks a lot! Good I asked before going too far with existing struct implementation and wrong assumptions.

Memory is very useful thing, especially in combination with Reserved/OwnedMemory and buffer pools. Since Memory has public API with Span, won't it have the same issues as Span itself when used from F#?

Why would F# have problems Memory\

@KrzysztofCwalina Memory<T> has a public property of a type Span<T>. If Span<T> is not supposed to work (at least initially) from F# (as @davidfowl said above), how Memory<T> will work? Or it will work just fine if I do not touch that property from code?

@davidfowl did not say that Span will not work in F#. Today, C# does not implement these checks either, and Span works in C#. The checks just make it easier for the programmer using it to use it correctly. In the absence of the checks, the runtime will throw at runtime.

Performance optimizations - dotnet/coreclr#9161 completed.

When Span will land in .NET Framework?

@WillSullivan Updated - thanks.

@HFadeel, Span\

Updated the milestone to 2.1. We need to go through the list above and make sure that there is nothing major falling through the cracks for Span<T> end-to-end experience for .NET Core 2.1.

cc @ahsonkhan @KrzysztofCwalina @stephentoub @joshfree

thanks for giving this a nudge @jkotas

Besides the VM correctness and stress workitems, I still worry that debug-ability of code using Spans is a pretty big issue. I will turn lots of people off spans.

Why so negative? This api is for scenarios that are not exact everyone’s cup of tea. Don’t see anything bad about the api and design.

Reflection?

Is this work item related to IsReferenceOrContainsReferences?

If so, we already have the implementation of that check in place (here and here). Is there any additional context for this?

It is about calling methods on Span or that take Span arguments via reflection:

  • It is not possible to do it via existing reflection methods. We should have test to verify that e.g. typeof(SpanExtensions).GetMethod("AsReadOnlySpan", new Type[] { typeof(string) }).Invoke(null, new object[] { "Hello" }); fails gracefully.
  • We may want to look into adding new reflection APIs that allow calling these methods via reflection. This does not need to be for 2.1.

Marking "Debugging" and "APIs for working with specialty spans" as complete.

Also "VM - Correctness" and "VM - Stress" are done.

What's left:

  • JIT - Optimizations – apply same sort of optimizations to Span as it does for Strings or Arrays (treat indexers as intrinsics, range check elimination, etc.)

    • Better optimization for CORINFO_HELP_RUNTIMEHANDLE_METHOD dotnet/coreclr#10051

  • Reflection?
  • Marshaler support (as with arrays) dotnet/coreclr#14460

What is the possibility of adding a Span/Memory constructor for working with memory mapped files? Currently, it looks like I have to have unsafe code in order to do this:

```C#
var dbPath = "test.txt";
var initialSize = 1024;
var mmf = MemoryMappedFile.CreateFromFile(dbPath);
var mma = mmf.CreateViewAccessor(0, initialSize).SafeMemoryMappedViewHandle;
Span bytes;
unsafe
{
byte* ptrMemMap = (byte*)0;
mma.AcquirePointer(ref ptrMemMap);
bytes = new Span(ptrMemMap, (int)mma.ByteLength);
}

Also, it seems like I can only create `Span`s, as there aren't public constructors for `Memory` that take a pointer (maybe I'm missing the reason for this). But since the view accessors have safe memory handles that implement  `System.Runtime.InteropServices.SafeBuffer` (i.e., they have a pointer and a length)...it seems natural to be able to leverage this for `Span`/`Memory`. So what would be nice is something like this:

```C#
var dbPath = "test.txt";
var initialSize = 1024;
var mmf = MemoryMappedFile.CreateFromFile(dbPath);
var mma = mmf.CreateViewAccessor(0, initialSize).SafeMemoryMappedViewHandle;
var mem = new Memory(mma);
var span = mem.Span.Slice(0, 512);

I also noticed that the indexer and internal length of Span uses int. With memory mapped files (especially for database scenarios) it is reasonable that the target file will exceed the upper limit for int. I'm not sure about the performance impact of long based indexing or if there is some magic way to have it both ways, but it would be convenient for certain scenarios.

https://github.com/dotnet/corefx/issues/26603 has the discussion about Span & memory mapped files.

Unfortunately, looking at https://github.com/dotnet/corefx/issues/26603 along with the referenced code in the benchmarks didn't clear things up for me. It seems like that particular use case is geared to copying small bits of the memory mapped files into Spans and ReadOnlySegments. It looks like the solution still involves unsafe code with OwnedMemory<T>, which is what I'd like to avoid. I don't have experience with manual memory management in C#, so some of this is a little difficult to grasp. That's what I found appealing about Span/Memory is that I could now access additional performance and reduce/eliminate copying data around without the headache of manual memory management and the issues that come with it. It seems memory mapped files fit into target paradigm of Span/Memory (unifying the APIs around contiguous random access memory), so hopefully some type of integration of memory mapped files and Span/Memory makes it in at some point.

@KrzysztofCwalina I think we should create something first class with Memory mapped files and the new buffer primitives (ReadOnlySequence).

@kstewart83 all we have right now are extremely low level primitives that you have to string together to make something work. That specific issue was about the performance gap between using Span directly and using the ReadOnlySequence (the gap has been reduced for that specific scenario).

Dealing with anything bigger than an int you'll need to use ReadOnlySequence<T> which is just a view over a linked list of ReadOnlyMemory<T>.

I think we should create something first class with Memory mapped files and the new buffer primitives (ReadOnlySequence).

I have re-opened the dotnet/corefx#26603 issue. Feel free to move the discussion there (See https://github.com/dotnet/corefx/issues/26603#issuecomment-370306008).

@ahsonkhan Is there more work on this issue for 2.1?

Is there more work on this issue for 2.1?

There is no other work for 2.1.

There are two items left from the original list, but both are marked as future. They can be tracked separately (and outside of 2.1):

  • Marshaler support (as with arrays) dotnet/coreclr#14460
  • Reflection dotnet/coreclr#17296

Closing this issue!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Timovzl picture Timovzl  Â·  3Comments

btecu picture btecu  Â·  3Comments

GitAntoinee picture GitAntoinee  Â·  3Comments

aggieben picture aggieben  Â·  3Comments

jamesqo picture jamesqo  Â·  3Comments