Runtime: Need a method similar to S.R.CS.RuntimeHelpers.InitializeArray, but for spans

Created on 8 Feb 2018 · 42Comments · Source: dotnet/runtime

Initialization of literal arrays like new int[]{1,2,3,4,5,6} has a special path where the blittable data is stored directly in the PE data and at runtime, instead of assigning every element of the array to a corresponding constant, we call InitializeArray with the target array instance and the token for the field that represents the data blob.

Similar technique would be very useful when initializing spans.
We already have two scenarios:

stack allocated spans - stackalloc int[] {1,2,3,4,5}
optimized conversion of literal arrays to ReadOnlySpan - (ReadOnlySpan<int>)new int[]{1,2,3,4,5}

in fact in the case with ReadOnlySpan conversion, it would be possible and desirable to just refer to the PE data directly.

The preferred form of the API would be:

```C#
ReadOnlySpan GetGetReadOnlySpanFromTemplate(RuntimeFieldHandle fldHandle);

Another acceptable alternative is:  
 (but the one above feels more convenient since verifiability is less of a problem)

```C#
        ref T GetRefToTemplateData<T>(RuntimeFieldHandle fldHandle);

A valid question to be asked here - "if it is possible to just load a reference to the field in the first place, why there is a need for the API?"

The problem is that the blob data is always stored in littleendian format, so on a bigendian machine the blob data is valid only for 1-byte sized elements.
Similarly to the case of InitializeArray, this API would allow the runtime to abstract away the endianness of the blob.
In a littleendian context (which is the most common case) the implementation could trivially forward to the field data and in bigendian case it may do fixups by either making a copy of the data while changing the endianness or even by performing the fixup in-place.

NOTE: possibility of in-place fixup would require that the same blob is not used to initialize span data of different sizes - say shorts and longs.
Such restriction would be acceptable on the C# side and runtime could validate that such "sharing" did not happen, or make it undefined behavior if that happens.

NOTE: the presence of the API is statically known to the compiler, so it would be ok if some runtimes do not have it right away or never. Then optimization will simply not consider 1+ element sizes.

api-approved area-System.Memory

Source

VSadov

👍2

Most helpful comment

This API would have to be implemented as JIT intrinsic that turns it into address constant.

The current InitializeArray is also implemented as JIT intrinsic, for similar reason.

jkotas on 23 Nov 2020

👍2

All 42 comments

VSadov on 8 Feb 2018

CC:@jkotas, @marek-safar, @alrz, @jcouv , @jaredpar, @stephentoub

VSadov on 8 Feb 2018

Having this in .NET Core 2.1 would be very useful.
Is there any chance for this API making it though at this point?

VSadov on 8 Feb 2018

how about stackalloc on pointers? e.g. int* p = stackalloc[] { 1,2,3 };

alrz on 8 Feb 2018

@alrz - the resulting type does not matter here. From the bigendian point of view we store int32 elements with their bytes in a wrong order. The data will not look like consecutive 1,2,3 on bigendian machine.

VSadov on 8 Feb 2018

I understand that, but we're going to use this for stackalloc initializers on pointers as well, I guess the second alternative could work though ref T GetRefToTemplateData<T>(RuntimeFieldHandle fldHandle);

In fact, stackalloc initializers "only" work on pointer types, no special codegen for Span. We're just passing an already initialized localloc to the Span constructor.

alrz on 8 Feb 2018

@alrz - right, GetRefToTemplateData would work equally well for stackalloc that results in a pointer or in a span.

VSadov on 8 Feb 2018

👍1

It is fairly easy to get a pointer from a span. For example “(void*)&s[0]” in IL

VSadov on 8 Feb 2018

@VSadov I just don't see why there has be an indirection here, we can construct both Span and ReadOnlySpan directly from a pointer (or ref for that matter), but GetReadOnlySpanFromTemplate needs a conversion for anything other than ReadOnlySpan

alrz on 8 Feb 2018

One reason is that ReadOnlySpan can be used safely in the ReadOnlySpan case, which would not require dealing with pointers, and would be verifiable.

Since span is basically a range-checking reference is is safer to use. Conceptually, it is the right thing to represent a chunk of data with known length.

We will see though what CLR/FX guys will say when it gets to actually implementing that API.

VSadov on 8 Feb 2018

currently none of Span tests are verifiable due to the unsafe nature of the constructor... I am more biased towards the current implementation of the stackalloc inits, I think that approach need more changes compared to just returning a ref here. We should see how things work out in either of these cases.

alrz on 8 Feb 2018

@VSadov, what's the conclusion here? Do we need this API? If yes, please mark the issue are ready for review and let's discuss it asap.

KrzysztofCwalina on 23 Feb 2018

We need this API to implement optimizations.
It is not strictly _needed_ - it is not blocking, but it is a "very good to have" one.

Considering that this will have to be implemented as JIT intrinsic and be platform specific, I am very doubtful that it can make it for 7.3

Lets put it for the review though.

VSadov on 23 Feb 2018

@jkotas what are your thoughts on this?

terrajobst on 27 Feb 2018

This method is Span equivalent of the existing RuntimeHelpers.InitializeArray method. It is expected to be only ever be called by C# compiler generated code, never called directly by user. The proposed design (GetReadOnlySpanFromTemplate) looks fine to me.

The end-to-end story for this requires changes in C# compiler, runtime, JIT, and potentially debugger expression evaluator to come together. We should do this post .NET Core 2.1 when we have a runway for doing the work and testing that it all fits together well.

jkotas on 27 Feb 2018

The end-to-end story for this requires changes in C# compiler, runtime, JIT, and potentially debugger expression evaluator to come together. We should do this post .NET Core 2.1 when we have a runway for doing the work and testing that it all fits together well.

Moving to future.

ahsonkhan on 28 Feb 2018

How about this:

class RuntimeHelpers
{
    ReadOnlySpan<T> CreateSpan<T>(RuntimeFieldHandle fldHandle);    
}

terrajobst on 12 Jun 2018

Use unmanaged constraint, so it can't refer to objects or types containing object?

class RuntimeHelpers
{
    ReadOnlySpan<T> CreateSpan<T>(RuntimeFieldHandle fldHandle) where T : unmanaged;    
}

benaadams on 12 Jun 2018

How expensive is that method? Is it possible to cache the return value somehow if it’s expensive?

ektrah on 12 Jun 2018

The following looks good.

C# class RuntimeHelpers { ReadOnlySpan<T> CreateSpan<T>(RuntimeFieldHandle fldHandle) }

Not sure if unmanaged would be very useful. T must be blittable - in a sense that an array of bytes can be trivially mapped to T[]. unmanaged structs may still get their fields aligned, packed, reordered. May contain IntPtr...

It basically means that T must be a primitive value type of a known size like int, char or an enum. Runtime will need to validate that.

@ektrah Yes, runtime will have to cache the result if it did any transformations to the data - to correct endianness or alignment. The original field is technically accessible from the user code, so the transformation, if needed, cannot be done in-place.
If no changes are needed, it would make sense to just wrap the field in a span and return. This should generally be a case on little endian machines..

VSadov on 13 Jun 2018

Video

Alright, this is it:

class RuntimeHelpers
{
    ReadOnlySpan<T> CreateSpan<T>(RuntimeFieldHandle fldHandle);    
}

terrajobst on 26 Jun 2018

Why not put it on MemoryMarshal? This where a number of unsafe Span helpers already exist. Prefer we keep all of those in one place so the entirety of the feature is easy to rationalize about.

jaredpar on 26 Jun 2018

It is Span equivalent of InitializeArray that lives on RuntimeHelpers. There is nothing unsafe about either InitializeArray nor CreateSpan.

jkotas on 26 Jun 2018

👍1

Is this still required since the compiler has used ldflda for a long time?

huoyaoyuan on 12 Oct 2020

@huoyaoyuan, the compiler is only able to do that where the element size is 1 byte, anything larger would require adjusting for potential differences in endianness. The compiler uses RuntimeHelpers.InitializeArray to make the necessary adjustments when initializing an array, I believe this issue is about getting something similar for ReadOnlySpan<> so that the compiler can avoid the need to create an array when the element size > 1 byte.

https://sharplab.io/#v2:EYLgHgbALANALiAlgGwD4AEBMBGAsAKHQGYACLEgYRIG8CT6zSAlAUwEMATAeQDtkBPAMoAHNjwA8wfnBYA+EgFkAFAEoSAXnk8WAdxJSZAbQC6NEgAYYJbFcxWiV2CQCsViCQC+AbjoNiJVk5eARExcUQeOHkAOVUNLV0SCLgTM0trW3tHK1cSd28CDyA==

brian-reichle on 13 Oct 2020

Some question found while implementing it.

The storage of RVA field looks to be endianess aware, and InitializeArray has code to handle them.
Under LE runtime, returning the address of RVA field directly is OK. This also works for bytes.
Under BE runtime, reversing of endianess should be required. But where to store the reversed data? Creating an array every time regresses the performance improvements totally.

huoyaoyuan on 18 Oct 2020

In CoreCLR, it would need to allocate the copy with the right endianness on the loader heap. It is not that important to implement the big-endian support for CoreCLR. CoreCLR does not run on big-endian systems today and there are number of issues that would need to be fixed before it can.

In Mono, implementing the endian swapping for Mono is more important. Mono runs on big-endian systems today. @lambdageek Could you please provide guidance for Mono?

jkotas on 18 Oct 2020

It looks like that the reversed form depends on the type of T, so it may cause some architectural issue, although every field is supposed to have only one usage in practice.

huoyaoyuan on 18 Oct 2020

@jaredpar, if the runtime were to expose this soon, would C# be able to target it for .NET 6? I expect there'd be a bit of a back and forth to get all the ducks in a row, e.g. runtime exposes the API, C# takes advantage of it, runtime updates all places that could utilize it to do so (e.g. changing some internal array static fields to be span static props).

stephentoub on 23 Nov 2020

Seems pretty reasonable and probably low / med-low costing.

Curious: how much benefit is this expected to provide here?

jaredpar on 23 Nov 2020

In CoreCLR, it would need to allocate the copy with the right endianness on the loader heap. It is not that important to implement the big-endian support for CoreCLR. CoreCLR does not run on big-endian systems today and there are number of issues that would need to be fixed before it can.

In Mono, implementing the endian swapping for Mono is more important. Mono runs on big-endian systems today. @lambdageek Could you please provide guidance for Mono?

I think for Mono it will be very similar to how ves_icall_System_Runtime_CompilerServices_RuntimeHelpers_InitializeArray works today, with the following changes:

for LE we can just return a span over the result of mono_field_get_rva
for BE, we may need to add a second field to MonoFieldDefaultValue something like char *host_data; /* RVA swapped to host endianness using the type of MonoClassField* for sizes */
in mono_field_get_rva on BE we would allocate from the class mempool and write the byte-swapped data to host_data if it's not already initialized.
(extra credit) for AOT it would be nice to store the byte-swapped data in the AOT image and refer to that instead of doing a copy at runtime. I don't think this will be easy.

lambdageek on 23 Nov 2020

how much benefit is this expected to provide here?

@VSadov, can you comment on this, and how the costs would compare to just using a static T[] field?

stephentoub on 23 Nov 2020

An interesting case is (ReadOnlySpan<int>)new int[]{1,2,3,4,5} - on little-endian machine that can be a direct reference to the metadata without copying or allocating anything. We already can do this as long as the element type is 1-byte size. I am not sure how commonly this pattern is used, since requiring 1-byte elements seems quite limiting.

Ultimately, compared to wrapping a static array this saves the allocation and initialization of that array. I would say the savings are not huge.
The issue was logged with an assumption that supporting this might be fairly cheap too, especially on little-endian HW, which is nowdays almost everything.

If there are other reasons why this would require nontrivial work or extra copy even on little-endian (for example due to alignment requirements), then it might not be worth it.

VSadov on 23 Nov 2020

👍1

Ultimately, compared to wrapping a static array this saves the allocation and initialization of that array.

What about after the array has already been initialized? This API for span will need to be called on every access, right? How does that compare to the cost of accessing the static array field?

stephentoub on 23 Nov 2020

This API would have to be implemented as JIT intrinsic that turns it into address constant.

The current InitializeArray is also implemented as JIT intrinsic, for similar reason.

jkotas on 23 Nov 2020

👍2

If the API takes a handle, there could be some extra work with figuring the location of the field, unless it is an intrinsic that does it at JIT-time.

Alternatively the API may also take a ref to the field (C# compiler could provide it), then it might be possible to just use the ref and make it roughly the same as wrapping a static field (on little endian machine).

VSadov on 23 Nov 2020

@jkotas - are there alignment guarantees for metadata blobs?

VSadov on 23 Nov 2020

It is up to IL producers to guarantee the alignment for RVA statics.

For example, managed C++ does emit the RVA statics with the right alignment. The IL rewriters (at least the ones we own - e.g. crossgen) do preserve it.

I think we can make the API throw when the blob is not sufficiently aligned.

jkotas on 23 Nov 2020

Then in terms of impact, assuming it's as-fast-or-faster to access one of these, there are a bunch of places we'd used them, just as there are a bunch of places we used the support that was added for ReadOnlySpan<OneBytePrimitive>. A quick survey suggests there are dozens of places we'd use this with private static readonly char[] and private static readonly int[] fields, where we could avoid the array allocation / copy and get potentially faster access to boot, with access sites either just indexing into them or using them with APIs that support spans.

stephentoub on 24 Nov 2020

An interesting case is (ReadOnlySpan)new int[]{1,2,3,4,5} - on little-endian machine that can be a direct reference to the metadata without copying or allocating anything

How does returning pointers into data section of the executable mix with unloadability? Would we need to make a copy of the data if the assembly is part of an unloadable load context? I assume there's no good way to track the reference within the span.

MichalStrehovsky on 24 Nov 2020

In coreclr span is similar to a byref parameter in terms of GC tracking. It should keep the context alive.
Spans are also stack-only and thus relatively short-lived, so keeping the context alive for too long is unlikely to be a problem.

VSadov on 24 Nov 2020