Runtime: Consuming static readonly Vector128<T> fields results in non-optimal codegen

Created on 2 Oct 2019  路  16Comments  路  Source: dotnet/runtime

Sample code:

private static readonly Vector128<sbyte> Vector0x0F = Vector128.Create((sbyte)0x0F);

public static int GetMask(Vector128<sbyte> data)
{
    return Sse2.MoveMask(Sse2.Add(data, Vector0x0F));
}

Codegen:

00007ffc`19901370 c5f877          vzeroupper
00007ffc`19901373 c5f91001        vmovupd xmm0,xmmword ptr [rcx]
00007ffc`19901377 48b8482ca75ea8010000 mov rax,1A85EA72C48h
00007ffc`19901381 488b00          mov     rax,qword ptr [rax]  ; this dereference could be avoided
00007ffc`19901384 c5f9fc4008      vpaddb  xmm0,xmm0,xmmword ptr [rax+8]
00007ffc`19901389 c5f9d7c0        vpmovmskb eax,xmm0
00007ffc`1990138d c3              ret

Since the static ctor has already been run, the JIT could be optimized to burn into the codegen the exact memory location of where the Vector0x0F field lives. This would allow eliding the dereference highlighted in the above codegen sample.

/cc @tannergooding

category:cq
theme:basic-cq
skill-level:expert
cost:medium

JitUntriaged area-CodeGen-coreclr

Most helpful comment

FixedAddressValueTypeAttribute was introduced for safe and pure managed C++. Managed C++ compiler emitted this attribute on your behalf in pure/safe modes.

The runtime does not take advantage of this attribute for optimizations and this attribute is actually bad for performance. This attribute introduces permanent pinning in the GC heap that prevents full GC heap compaction and that the GC has to allocate around. Back when safe and pure managed C++ was popular, we had multiple situations where we traced down GC-related performance problems to this attribute.

I do not think we should count on FixedAddressValueTypeAttribute to be a part of the solution for this. If we wanted the statics to have fixed addresses, we would want to redo how and where the statics are allocated.

All 16 comments

It feels a bit like this from @mikedn https://github.com/dotnet/coreclr/pull/21711 (looking at the code gen sample diffs he provides) though I may be wrong?

This is not related to dotnet/coreclr#21711 (though I happen to have noticed this pattern while working on that). My understanding is that static struct fields actually contain references to boxed structs so there's no way to burn the exact address in code because GC can move the boxed object.

so there's no way to burn the exact address in code because GC can move the boxed object.

Would System.Runtime.CompilerServices.FixedAddressValueTypeAttribute solve this?

Would System.Runtime.CompilerServices.FixedAddressValueTypeAttribute solve this?

Doesn't seem to have any effect on the JIT side. It could be that the JIT doesn't know about it or simply ignores it.

The bigger question is if it's a good idea to use FixedAddressValueTypeAttribute. It looks like it makes the assembly uncollectible, it's not obvious that it's a good trade off.

@jkotas @janvorli ?

I don't think that using the FixedAddressValueTypeAttribute would help at all. Even if the JIT somehow knew the value was already initialized and could read the value out, it would not be a correct transformation. readonly variables can still be modified using reflection.

@janvorli dotnet/coreclr#20886 blocked changing the value of initialized readonly statics via reflection.

FixedAddressValueTypeAttribute was introduced for safe and pure managed C++. Managed C++ compiler emitted this attribute on your behalf in pure/safe modes.

The runtime does not take advantage of this attribute for optimizations and this attribute is actually bad for performance. This attribute introduces permanent pinning in the GC heap that prevents full GC heap compaction and that the GC has to allocate around. Back when safe and pure managed C++ was popular, we had multiple situations where we traced down GC-related performance problems to this attribute.

I do not think we should count on FixedAddressValueTypeAttribute to be a part of the solution for this. If we wanted the statics to have fixed addresses, we would want to redo how and where the statics are allocated.

Oh, I wish I could just use const keyword for them directly in methods (or at least static readonly fields inside methods, just like inner methods), e.g. if you port this C/C++ to C#'s S.R.I.X86 you will have a lot of single-use fields 馃檨 https://github.com/lemire/simdjson/blob/master/src/haswell/stage1_find_marks.h#L24-L86

I'm not sure why you would need const/static to begin with, you can use normal local variables. The bigger problem with that simdjson code might be that it relies on lambdas...

you can use normal local variables.

One issue here is that Vector128.Create(constant) is not yet handled (unlike Vector2/3/4 which are).

You fixed the latter in https://github.com/dotnet/coreclr/pull/14393 but we still have https://github.com/dotnet/coreclr/issues/22388 tracking the former.

One issue here is that Vector128.Create(constant) is not yet handled (unlike Vector2/3/4 which are).

I was thinking that if you use these values in a loop then it's sufficient to just store the "constant" vector in a local variable outside of the loop. That will be similar to what we can achieve by special JIT constant SIMD support or by using static readonly fields. But yes, if we're not dealing with loops (or if the loop trip count is so small that the constant SIMD creation is significant) then we'll probably need some JIT support.

@jkotas, given that we are exposing GC.AllocateUninitializedArray, users could explicitly do something like the below to create some an array of static readonly "constants" that is both pinned and aligned for the lifetime of the program, correct?

public static readonly Vector128[] FixedData = CreateFixedData();

private static Vector128[] CreateFixedData()
{
    var fixedData = GC.AllocateUninitializedArray<Vector128<float>>(count, alignment: 16, pinned: true);

    fixedData[0] = Vector128.Create(...);
    // ...

    return fixedData;
}

Would it possibly be worthwhile exposing an attribute that makes this "easier" to do and which would allow the JIT to automatically optimize access to said data (my initial guess would be this pattern isn't common enough to warrant a JIT optimization; and people could make do by explicitly doing the above and using Unsafe.AsPointer)?

This attribute introduces permanent pinning in the GC heap that prevents full GC heap compaction and that the GC has to allocate around.

Btw, isn't GC going to have a separate heap for pinned objects so it will no longer be a problem?

Btw, isn't GC going to have a separate heap for pinned objects so it will no longer be a problem?

That's kind of what I was getting at.

With the introduction of the new GC.AllocateUninitializedArray and the features that come with it; it should be possible for us to actually allocate said data directly to the pinned heap, for example.

Additionally, it might be possible for the JIT to recognize that such static readonly fields exist in the pinned heap and emit more efficient code (which might also be useful with the C++/CLI support that is being brought to core).

However, it might not be worth the investment when you could reasonably achieve the same thing via manual mechanisms (considering it is likely a rare usecase).

users could explicitly do something like the below to create some an array of static readonly "constants" that is both pinned and aligned for the lifetime of the program, correct?

That would again mean redoing how statics are dealt with by the runtime. If we spent time going that refactoring, we can make this "just work" without any special hints.

Was this page helpful?
0 / 5 - 0 ratings