Initialization of literal arrays like new int[]{1,2,3,4,5,6} has a special path where the blittable data is stored directly in the PE data and at runtime, instead of assigning every element of the array to a corresponding constant, we call InitializeArray with the target array instance and the token for the field that represents the data blob.
Similar technique would be very useful when initializing spans.
We already have two scenarios:
stackalloc int[] {1,2,3,4,5}(ReadOnlySpan<int>)new int[]{1,2,3,4,5}in fact in the case with ReadOnlySpan conversion, it would be possible and desirable to just refer to the PE data directly.
The preferred form of the API would be:
```C#
ReadOnlySpan
Another acceptable alternative is:
(but the one above feels more convenient since verifiability is less of a problem)
```C#
ref T GetRefToTemplateData<T>(RuntimeFieldHandle fldHandle);
A valid question to be asked here - "if it is possible to just load a reference to the field in the first place, why there is a need for the API?"
The problem is that the blob data is always stored in littleendian format, so on a bigendian machine the blob data is valid only for 1-byte sized elements.
Similarly to the case of InitializeArray, this API would allow the runtime to abstract away the endianness of the blob.
In a littleendian context (which is the most common case) the implementation could trivially forward to the field data and in bigendian case it may do fixups by either making a copy of the data while changing the endianness or even by performing the fixup in-place.
NOTE: possibility of in-place fixup would require that the same blob is not used to initialize span data of different sizes - say shorts and longs.
Such restriction would be acceptable on the C# side and runtime could validate that such "sharing" did not happen, or make it undefined behavior if that happens.
NOTE: the presence of the API is statically known to the compiler, so it would be ok if some runtimes do not have it right away or never. Then optimization will simply not consider 1+ element sizes.
CC:@jkotas, @marek-safar, @alrz, @jcouv , @jaredpar, @stephentoub
Having this in .NET Core 2.1 would be very useful.
Is there any chance for this API making it though at this point?
how about stackalloc on pointers? e.g. int* p = stackalloc[] { 1,2,3 };
@alrz - the resulting type does not matter here. From the bigendian point of view we store int32 elements with their bytes in a wrong order. The data will not look like consecutive 1,2,3 on bigendian machine.
I understand that, but we're going to use this for stackalloc initializers on pointers as well, I guess the second alternative could work though ref T GetRefToTemplateData<T>(RuntimeFieldHandle fldHandle);
In fact, stackalloc initializers "only" work on pointer types, no special codegen for Span. We're just passing an already initialized localloc to the Span constructor.
@alrz - right, GetRefToTemplateData would work equally well for stackalloc that results in a pointer or in a span.
It is fairly easy to get a pointer from a span. For example “(void*)&s[0]” in IL
@VSadov I just don't see why there has be an indirection here, we can construct both Span and ReadOnlySpan directly from a pointer (or ref for that matter), but GetReadOnlySpanFromTemplate needs a conversion for anything other than ReadOnlySpan
One reason is that ReadOnlySpan can be used safely in the ReadOnlySpan case, which would not require dealing with pointers, and would be verifiable.
Since span is basically a range-checking reference is is safer to use. Conceptually, it is the right thing to represent a chunk of data with known length.
We will see though what CLR/FX guys will say when it gets to actually implementing that API.
currently none of Span tests are verifiable due to the unsafe nature of the constructor... I am more biased towards the current implementation of the stackalloc inits, I think that approach need more changes compared to just returning a ref here. We should see how things work out in either of these cases.
@VSadov, what's the conclusion here? Do we need this API? If yes, please mark the issue are ready for review and let's discuss it asap.
We need this API to implement optimizations.
It is not strictly _needed_ - it is not blocking, but it is a "very good to have" one.
Considering that this will have to be implemented as JIT intrinsic and be platform specific, I am very doubtful that it can make it for 7.3
Lets put it for the review though.
@jkotas what are your thoughts on this?
This method is Span equivalent of the existing RuntimeHelpers.InitializeArray method. It is expected to be only ever be called by C# compiler generated code, never called directly by user. The proposed design (GetReadOnlySpanFromTemplate) looks fine to me.
The end-to-end story for this requires changes in C# compiler, runtime, JIT, and potentially debugger expression evaluator to come together. We should do this post .NET Core 2.1 when we have a runway for doing the work and testing that it all fits together well.
The end-to-end story for this requires changes in C# compiler, runtime, JIT, and potentially debugger expression evaluator to come together. We should do this post .NET Core 2.1 when we have a runway for doing the work and testing that it all fits together well.
Moving to future.
How about this:
class RuntimeHelpers
{
ReadOnlySpan<T> CreateSpan<T>(RuntimeFieldHandle fldHandle);
}
Use unmanaged constraint, so it can't refer to objects or types containing object?
class RuntimeHelpers
{
ReadOnlySpan<T> CreateSpan<T>(RuntimeFieldHandle fldHandle) where T : unmanaged;
}
How expensive is that method? Is it possible to cache the return value somehow if it’s expensive?
The following looks good.
C#
class RuntimeHelpers
{
ReadOnlySpan<T> CreateSpan<T>(RuntimeFieldHandle fldHandle)
}
Not sure if unmanaged would be very useful. T must be blittable - in a sense that an array of bytes can be trivially mapped to T[]. unmanaged structs may still get their fields aligned, packed, reordered. May contain IntPtr...
It basically means that T must be a primitive value type of a known size like int, char or an enum. Runtime will need to validate that.
@ektrah Yes, runtime will have to cache the result if it did any transformations to the data - to correct endianness or alignment. The original field is technically accessible from the user code, so the transformation, if needed, cannot be done in-place.
If no changes are needed, it would make sense to just wrap the field in a span and return. This should generally be a case on little endian machines..
Alright, this is it:
class RuntimeHelpers
{
ReadOnlySpan<T> CreateSpan<T>(RuntimeFieldHandle fldHandle);
}
Why not put it on MemoryMarshal? This where a number of unsafe Span helpers already exist. Prefer we keep all of those in one place so the entirety of the feature is easy to rationalize about.
It is Span equivalent of InitializeArray that lives on RuntimeHelpers. There is nothing unsafe about either InitializeArray nor CreateSpan.
Is this still required since the compiler has used ldflda for a long time?
@huoyaoyuan, the compiler is only able to do that where the element size is 1 byte, anything larger would require adjusting for potential differences in endianness. The compiler uses RuntimeHelpers.InitializeArray to make the necessary adjustments when initializing an array, I believe this issue is about getting something similar for ReadOnlySpan<> so that the compiler can avoid the need to create an array when the element size > 1 byte.
https://sharplab.io/#v2:EYLgHgbALANALiAlgGwD4AEBMBGAsAKHQGYACLEgYRIG8CT6zSAlAUwEMATAeQDtkBPAMoAHNjwA8wfnBYA+EgFkAFAEoSAXnk8WAdxJSZAbQC6NEgAYYJbFcxWiV2CQCsViCQC+AbjoNiJVk5eARExcUQeOHkAOVUNLV0SCLgTM0trW3tHK1cSd28CDyA==
Some question found while implementing it.
The storage of RVA field looks to be endianess aware, and InitializeArray has code to handle them.
Under LE runtime, returning the address of RVA field directly is OK. This also works for bytes.
Under BE runtime, reversing of endianess should be required. But where to store the reversed data? Creating an array every time regresses the performance improvements totally.
In CoreCLR, it would need to allocate the copy with the right endianness on the loader heap. It is not that important to implement the big-endian support for CoreCLR. CoreCLR does not run on big-endian systems today and there are number of issues that would need to be fixed before it can.
In Mono, implementing the endian swapping for Mono is more important. Mono runs on big-endian systems today. @lambdageek Could you please provide guidance for Mono?
It looks like that the reversed form depends on the type of T, so it may cause some architectural issue, although every field is supposed to have only one usage in practice.
@jaredpar, if the runtime were to expose this soon, would C# be able to target it for .NET 6? I expect there'd be a bit of a back and forth to get all the ducks in a row, e.g. runtime exposes the API, C# takes advantage of it, runtime updates all places that could utilize it to do so (e.g. changing some internal array static fields to be span static props).
Seems pretty reasonable and probably low / med-low costing.
Curious: how much benefit is this expected to provide here?
In CoreCLR, it would need to allocate the copy with the right endianness on the loader heap. It is not that important to implement the big-endian support for CoreCLR. CoreCLR does not run on big-endian systems today and there are number of issues that would need to be fixed before it can.
In Mono, implementing the endian swapping for Mono is more important. Mono runs on big-endian systems today. @lambdageek Could you please provide guidance for Mono?
I think for Mono it will be very similar to how ves_icall_System_Runtime_CompilerServices_RuntimeHelpers_InitializeArray works today, with the following changes:
mono_field_get_rvaMonoFieldDefaultValue something like char *host_data; /* RVA swapped to host endianness using the type of MonoClassField* for sizes */mono_field_get_rva on BE we would allocate from the class mempool and write the byte-swapped data to host_data if it's not already initialized.how much benefit is this expected to provide here?
@VSadov, can you comment on this, and how the costs would compare to just using a static T[] field?
An interesting case is (ReadOnlySpan<int>)new int[]{1,2,3,4,5} - on little-endian machine that can be a direct reference to the metadata without copying or allocating anything. We already can do this as long as the element type is 1-byte size. I am not sure how commonly this pattern is used, since requiring 1-byte elements seems quite limiting.
Ultimately, compared to wrapping a static array this saves the allocation and initialization of that array. I would say the savings are not huge.
The issue was logged with an assumption that supporting this might be fairly cheap too, especially on little-endian HW, which is nowdays almost everything.
If there are other reasons why this would require nontrivial work or extra copy even on little-endian (for example due to alignment requirements), then it might not be worth it.
Ultimately, compared to wrapping a static array this saves the allocation and initialization of that array.
What about after the array has already been initialized? This API for span will need to be called on every access, right? How does that compare to the cost of accessing the static array field?
This API would have to be implemented as JIT intrinsic that turns it into address constant.
The current InitializeArray is also implemented as JIT intrinsic, for similar reason.
If the API takes a handle, there could be some extra work with figuring the location of the field, unless it is an intrinsic that does it at JIT-time.
Alternatively the API may also take a ref to the field (C# compiler could provide it), then it might be possible to just use the ref and make it roughly the same as wrapping a static field (on little endian machine).
@jkotas - are there alignment guarantees for metadata blobs?
It is up to IL producers to guarantee the alignment for RVA statics.
For example, managed C++ does emit the RVA statics with the right alignment. The IL rewriters (at least the ones we own - e.g. crossgen) do preserve it.
I think we can make the API throw when the blob is not sufficiently aligned.
Then in terms of impact, assuming it's as-fast-or-faster to access one of these, there are a bunch of places we'd used them, just as there are a bunch of places we used the support that was added for ReadOnlySpan<OneBytePrimitive>. A quick survey suggests there are dozens of places we'd use this with private static readonly char[] and private static readonly int[] fields, where we could avoid the array allocation / copy and get potentially faster access to boot, with access sites either just indexing into them or using them with APIs that support spans.
An interesting case is (ReadOnlySpan
)new int[]{1,2,3,4,5} - on little-endian machine that can be a direct reference to the metadata without copying or allocating anything
How does returning pointers into data section of the executable mix with unloadability? Would we need to make a copy of the data if the assembly is part of an unloadable load context? I assume there's no good way to track the reference within the span.
In coreclr span is similar to a byref parameter in terms of GC tracking. It should keep the context alive.
Spans are also stack-only and thus relatively short-lived, so keeping the context alive for too long is unlikely to be a problem.
In coreclr span is similar to a byref parameter in terms of GC tracking
Most helpful comment
This API would have to be implemented as JIT intrinsic that turns it into address constant.
The current
InitializeArrayis also implemented as JIT intrinsic, for similar reason.