Runtime: [Jit] Improve struct passing (reduce copies)

Created on 18 Feb 2018  路  18Comments  路  Source: dotnet/runtime

Before they are out there and it becomes a breaking change (byval => byref)

ReadOnlyMemory is a readonly struct; currently a copy is made and that copy is then passed by reference; but there is an advantage to passing it in and no-disadvantage (as call-site doesn't need to be annotated with in) and since it is readonly and passed by ref anyway due to its size.

e.g. changing

static int CompareOrdinalIgnoreCase(ReadOnlySpan<char> strA, ReadOnlySpan<char> strB)

to

static int CompareOrdinalIgnoreCase(in ReadOnlySpan<char> strA, in ReadOnlySpan<char> strB)

Changes the call site from

G_M57521_IG05:
       lea      rcx, bword ptr [rsp+38H]
       mov      bword ptr [rcx], r9
       mov      dword ptr [rcx+8], edx
       lea      rcx, bword ptr [rsp+28H]
       mov      bword ptr [rcx], r8
       mov      dword ptr [rcx+8], edx
       lea      rcx, bword ptr [rsp+38H]
       lea      rdx, bword ptr [rsp+28H]
       call     CompareInfo:CompareOrdinalIgnoreCase(struct,struct):int

to

G_M57521_IG05:
       mov      bword ptr [rsp+28H], r8
       mov      dword ptr [rsp+30H], eax
       lea      rcx, bword ptr [rsp+28H]
       call     CompareInfo:CompareOrdinalIgnoreCase(byref,byref):int

/cc @jkotas @stephentoub @KrzysztofCwalina @mikedn

Design Discussion area-System.Memory

Most helpful comment

I would prefer we pass small structs by-value (implicitly converted to by-ref in some ABIs) as this gives us the best opportunity to optimize down the road, and an additional incentive to do so.

Staying with by-value would mean sacrificing some perf in the meantime -- most of the "selectively jit better" technologies are in early stages of development.

@CarolEidt can you take a look at this too?

All 18 comments

and no-disadvantage

There is a disadvantage, if not a perf one: it looks strange and unnecessary and requires explanation. A significant point of a readonly struct is to avoid such copies needing to be made, e.g. the C# compiler avoiding emitting copies when calling methods on a readonly struct stored in a readonly field. If anything, this just highlights a discrepancy between what contract the C# compiler adheres to and what the JIT adheres to, and while it may be frowned upon as the readonly-ness is only conveyed in an attribute, I'd prefer to see the JIT simply recognize readonly struct and if it's going to pass the struct byref anyway, not make a copy first.

cc: @AndyAyersMS

I'd prefer to see the JIT simply recognize readonly struct and if it's going to pass the struct byref anyway, not make a copy first.

That would definitely be a bonus

Interesting idea, but typically these language guarantees don't hold up at the IL level, so the jit can't rely on them for correctness.

Another thought is to use this attribute as an inlining hint and boost profitability. Maybe not crank things all the way up like aggressive inlining does, but just be somewhat more aggressive as inlining ought to eliminate the copy cost and that cost only matters when the callees are relatively small. Won't help if the call can't be inlined but perhaps worthwhile anyways.

It could potentially also be an illink optimization (changing the callee signature and the call site), at least for internal and private methods.

cc: @sbomer, @JosephTremoulet, @jaredpar

It wouldn't help if the calleer is (string, string) and passes to a non-inlining (ROM, ROM) to do the work via conversion as it's not in the original signature?

Might not be a huge cost the difference is currently is as follows (going via implicit operator ReadOnlySpan<char>(string value)):

       lea      r8, bword ptr [rbx+12]
       lea      r9, bword ptr [rsp+38H]
       mov      bword ptr [r9], rax
       mov      dword ptr [r9+8], ecx
       lea      rcx, bword ptr [rsp+28H]
       mov      bword ptr [rcx], r8
       mov      dword ptr [rcx+8], edx
       lea      rcx, bword ptr [rsp+38H]
       lea      rdx, bword ptr [rsp+28H]
       call     CompareInfo:CompareOrdinalIgnoreCase(struct,struct):int
       mov      bword ptr [rsp+38H], rax
       mov      dword ptr [rsp+40H], ecx
       lea      rcx, bword ptr [rbx+12]
       mov      bword ptr [rsp+28H], rcx
       mov      dword ptr [rsp+30H], edx
       lea      rcx, bword ptr [rsp+38H]
       lea      rdx, bword ptr [rsp+28H]
       call     CompareInfo:CompareOrdinalIgnoreCase(byref,byref):int

The title and description talks about ReadOnlyMemory, but the example shows a Span. Which one did you meant? The tradeoff is not the same for Span and Memory.

Changes the call site from / to:

But then you pay for it in the method being called. Also, the tradeoff is calling convention specific. For example, there are common cases where passing Span as in would hurt on Unix.

For Span, I think that this is really a question of what the best calling convention for Span is. AFAIK, Midori has done number of experiments with different custom calling conventions for Span and there was not a clear winner.

The title and description talks about ReadOnlyMemory, but the example shows a Span. Which one did you meant?

Ah, was looking a ReadOnlySpan so meant that; however ReadOnlyMemory would be similar.

But then you pay for it in the method being called.

Due to non-enregistration of the sub-fields in the callee?

For example, there are common cases where passing Span as in would hurt on Unix.

Is this due to wider registers in the calling convention as it can be passed in a single register? Span contains a ref field; can this be passed as joined data in a register even if a wider one is available?

Due to non-enregistration of the sub-fields in the callee?

The sub-fields can still get en-registered, but there is always going to be extra de-reference at least.

Is this due to wider registers in the calling convention as it can be passed in a single register?

Structs up to 16 bytes are passed in registers on Unix x64. Span is passed in two regular registers if they are available on Unix x64.

Last I looked the JIT was not doing a great job taking advantage of the SysV conventions for struct passing. dotnet/coreclr#12865 talks about this in the context of returns but I believe I've seen this for arguments too.

The upshot of that is that if you were to go measure the actual perf impact of switching to explicit by-ref args on Unix x64, you might see perf wins, but that result would be somewhat misleading. And using explicit by-ref would indeed be an impediment to getting perf to where it ultimately could be.

As far as a linker solution goes -- the jit needs to know that the callee does not modify the struct in order to avoid the caller side copy., so the analysis is possibly more general than just looking for read-only struct types.

There's still an issue of how the JIT knows the annotations can be trusted, given that method bodies can be switched at runtime. I suppose if we can create the right trust mechanism then the jit can start trusting the C# compiler too....

Assuming the annotations can be trusted and the jit relies on them when generating caller code, they'd introduce a dependence between caller and callee, We could track this at runtime like we do for inlining dependence, so that if say some profiler steps in and updates the callee we have a way of knowing that the caller's code is now potentially invalid and must be updated.

Interesting idea, but typically these language guarantees don't hold up at the IL level, so the jit can't rely on them for correctness.

This has correctness issues even at the C# level. Even in the cases where the struct is readonly it can still insert observable changes when a parameter is changed from "by value" to "by (in) reference". Consider the following code:

``` c#
readonly struct Point {
internal int X, Y;
}

class Example {
Point field;

void M1() {
M2(field);
}

void M2(Point p) {
field = default;
Console.WriteLine(filed.X);
}
}
```

Consider here if M2 is changed to have a in Point here. The effect will be observable because the write of field will now have an effect on the parameter p.

Even though the struct is readonly that doesn't mean the location it's stored in is readonly as well.

@jaredpar, @AndyAyersMS, what's your advice? What should we do here? Should we scrub the APIs and selectively add "in" into places where it makes sense, or we can selectively JIT better code where we know it's safe?

I would prefer we pass small structs by-value (implicitly converted to by-ref in some ABIs) as this gives us the best opportunity to optimize down the road, and an additional incentive to do so.

Staying with by-value would mean sacrificing some perf in the meantime -- most of the "selectively jit better" technologies are in early stages of development.

@CarolEidt can you take a look at this too?

I would love to see us (me?) spend some time eliminating the unnecessary and redundant copies that are generated for both incoming and outgoing struct args, especially on x64/ux. I would prefer to see us spend time on that rather than cluttering our APIs with unnecessary workarounds for this weakness. That said, I confess that I haven't analyzed the case of interest here, so I don't know to what extent this is the underlying issue. Also, these issues have been known since support was initially added for this ABI, and it has not yet bubbled to top priority.

OK, thanks. In this case I will assign the issue to you @CarolEidt. We might also want to change the issue's title.

Shall we open this issue in coreclr instead and track it there (and close this one)?

Makes sense -- there are already meta issues around structs over in coreclr, for instance: first class structs and in the optimization roadmap but it is important not to lose track of specific examples.

Agree with @AndyAyersMS that it makes sense to open a separate issue for this (I was actually surprised that there wasn't one already addressing this). Opened https://github.com/dotnet/coreclr/issues/16619 and assigned it to myself. I will attempt to at least do an assessment of how difficult this will be in the near term.

Closing this issue in favor of the one in coreclr: https://github.com/dotnet/coreclr/issues/16619

Thanks Carol.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

EgorBo picture EgorBo  路  3Comments

bencz picture bencz  路  3Comments

nalywa picture nalywa  路  3Comments

jchannon picture jchannon  路  3Comments

aggieben picture aggieben  路  3Comments