Runtime: API change SequenceReader.ctor pass ReadOnlySequence via in

Created on 1 Mar 2019  路  14Comments  路  Source: dotnet/runtime

 public ref partial struct SequenceReader<T> where T : unmanaged, System.IEquatable<T>
 {
-    public SequenceReader(System.Buffers.ReadOnlySequence<T> sequence);
+    public SequenceReader(in System.Buffers.ReadOnlySequence<T> sequence);
 }

Drops 32-bytes of stack zeroing and this from call site

vmovdqu  xmm0, qword ptr [rbx]
vmovdqu  qword ptr [rbp-1D0H], xmm0
vmovdqu  xmm0, qword ptr [rbx+16]
vmovdqu  qword ptr [rbp-1C0H], xmm0

Its a 4 element struct so its unlikely to be covered by calling convention; however it is also an Aggressive Inline (albeit a frighteningly large one) and there has been speculation that the Jit could eventually elide the copy (@mikedn mentioned it on the ValueTaskAwaiter PR); though it doesn't currently.

@stephentoub raised the following concern https://github.com/dotnet/corefx/pull/35678#discussion_r261459129

I would, however, urge us to avoid polluting methods like this with in unless we find that a) it makes a meaningful difference and b) we expect that difference will never be able to captured by JIT-level changes instead.

/cc @stephentoub @jkotas @AndyAyersMS

area-System.Memory

Most helpful comment

It seems that the SequenceReader already has AggressiveInlining and if it inlines, the fact that the parameter is passed by value or ref shouldn't matter. If it matters then it's likely because of something that the JIT doesn't do today but could probably do later.

In general I advise caution about passing parameters by reference. If the method doesn't end up being inlined then you might just make things worse. You save a copy at the expense of potentially making a local variable address exposed and killing pretty much optimizations around it. And handling address exposed variables in the JIT is something more difficult to do. The readonly part of in may help a bit with that but then I don't know if the JIT should actually rely on that because it cannot be enforced, much like C++'s const.

All 14 comments

It seems that the SequenceReader already has AggressiveInlining and if it inlines, the fact that the parameter is passed by value or ref shouldn't matter. If it matters then it's likely because of something that the JIT doesn't do today but could probably do later.

In general I advise caution about passing parameters by reference. If the method doesn't end up being inlined then you might just make things worse. You save a copy at the expense of potentially making a local variable address exposed and killing pretty much optimizations around it. And handling address exposed variables in the JIT is something more difficult to do. The readonly part of in may help a bit with that but then I don't know if the JIT should actually rely on that because it cannot be enforced, much like C++'s const.

Can always add both overloads?

 public ref partial struct SequenceReader<T> where T : unmanaged, System.IEquatable<T>
 {
    public SequenceReader(ReadOnlySequence<T> sequence) : this(in sequence);
    public SequenceReader(in ReadOnlySequence<T> sequence);
 }

@jkotas what's our architectural stance on this? We could offer both overloads and let developers decide; it's an advanced API. If we offer both, the default will be pass by value, with advanced folks being able to force in via call side modifier.

You can't overload on the basis of in/out/ref parameters iirc. The others could be a static factory method perhaps?

It turns out you can overload ref/out/in and none of them, but not between them - just learned this as well.

what's our architectural stance on this?

The proposed API is working around deficiency in JIT optimizations. The general stance is that we do not want to be tweaking the public API surface for point-in-time deficiencies in JIT optimizations. Doing so would lead to nonsensical API surface over time. @mikedn 's comment above goes into more details on this: https://github.com/dotnet/corefx/issues/35679#issuecomment-468562023

We can make exceptions to this rule if the perf loss is too big to be left on the table, and the JIT fix is too hard. It does not seem to be the case here.

@jkotas, what about the case when an API is not AggressiveInlining. It seems like the JIT should also be able to rationalize an in parameter that is then inlined and do the efficient thing.

ReadOnlySequence<T> is large enough that all currently supported ABIs will cause an implicit shadow copy and pass by reference in the case that the method isn't inlined, so it would normally be the efficient thing to pass these parameters by in to avoid the shadow copy.

what about the case when an API is not AggressiveInlining

It is not the case with this proposal - I expect that the proposed API would be marked with AggressiveInlining as well.

@jkotas

So that means we shouldn't make it in then and improve the JIT. Presumably, if the loss is too high, would you be opposed to doing the overload?

if the loss is too high, would you be opposed to doing the overload?

Correct. As I have said above, I do not think it is the case here.

Sounds good, so I'm closing.

The proposed API is working around deficiency in JIT optimizations.

Presumably the Jit not being able to drop refs when inlining is also a deficiency in JIT optimization?

Presumably the Jit not being able to drop refs when inlining is also a deficiency in JIT optimization?

Yes. The jit can often fix this, but not always, even when it arguably should.

And some cases can't ever be fixed, as @mikedn notes. Say the ref propagates to a method that does not get inlined -- in cases like that all accesses to the struct can be pessimized, even if that call is rare.

With "by value" semantics the jit has some natural points at where it can safely change struct representation for best optimization. With "by ref" the jit needs to be able to see and possibly modify all touches to do this, or allow for switching representations of the same struct variable over regions of code. Going forward we're more likely to invest in by-value opts (eg struct copy prop & reverse copy prop) than in some kind of region-based promotion to make the messy "by ref" cases more tractable. Admittedly we're not great at by-value opts today, but we are slowly chipping away at it.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

matty-hall picture matty-hall  路  3Comments

GitAntoinee picture GitAntoinee  路  3Comments

jamesqo picture jamesqo  路  3Comments

btecu picture btecu  路  3Comments

yahorsi picture yahorsi  路  3Comments