Roslyn: Optimization proposal: In "o is (int x, int y)", consider attempting to unbox as ValueTuple<int, int>

Created on 13 May 2019 · 16Comments · Source: dotnet/roslyn

Version Used:

Master (11 May 2019)

Steps to Reproduce:

public void M(object o)
{
    if (o is (int x, int y))
    {
    }
}

SharpLab.

Expected Behavior:

Generated code attempts to unbox o as a ValueTuple<int, int>, and does not box x and y.

Actual Behavior:

Generated code uses ITuple, and boxes both x and y.

Discussion:

I understand why the compiler has to emit code which uses ITuple.

As a developer, if I want to try and unbox a ValueTuple, I instinctually reach for the syntax shown above. Given that it works, I probably won't be aware that both x and y get boxed, and that I should instead have written something like if (o is ValueTuple<int, int>(var x, var y)).

The proposal is to optimize the specific case where the input is an object, and the pattern looks like something which could be used to create a ValueTuple (i.e. no sub-patterns, all types are known, etc). In this case, the compiler would try and unbox o to a ValueTuple of the relevant kind, before trying the ITuple route.

This would add some additional code size (although using ITuple is fairly wordy already), and some additional cost for the cases where o is not a ValueTuple (I'm not sure how often this would occur in practice). It would probably take a bit of effort to share common code between the ValueTuple path and the ITuple path for cases like o is (int x, 3), if that's a worthwhile thing to do.

This should have no visible impact to the user (other than reduced allocations).

I asked this on gitter, and opinion was mixed on whether this was worthwhile. The suggestion was that I open the issue here and tag you, @gafter.

Area-Compilers Code Gen Quality Feature Request New Language Feature - Pattern Matching

Source

canton7

👍2

All 16 comments

Seems like a worthwhile optimization for the compiler to first check if the instance is ValueTuple<...> if the compiler can infer generic type arguments from the recursive patterns, but if that test fails still fallback to using ITuple which would support more scenarios. That would make it slightly more expensive in the failure case, but more more efficient in the successful case.

I guess the question is how often someone might try to match against the exact type of the tuple and how frequently we expect that match to fail. I understand that the spec might allow for some wiggle room here to add as an optimization in the future?

HaloFour on 13 May 2019

Could we get a benchmarkdotnet on the cost difference between x is (int a, int b) and x is ValueTuple<int, int>. I'm curious what sort of difference we see here. I imagine it could be significant as one will avoid boxing altogether, but i would like to see what it comes out as.

CyrusNajmabadi on 13 May 2019

BenchmarkDotNet=v0.11.5, OS=Windows 10.0.16299.1087 (1709/FallCreatorsUpdate/Redstone3)
Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R), 1 CPU, 8 logical and 4 physical cores
Frequency=2062499 Hz, Resolution=484.8487 ns, Timer=TSC
  [Host]     : .NET Framework 4.7.2 (CLR 4.0.30319.42000), 32bit LegacyJIT-v4.7.3394.0
  DefaultJob : .NET Framework 4.7.2 (CLR 4.0.30319.42000), 32bit LegacyJIT-v4.7.3394.0

| Method | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
|--------------- |----------:|----------:|----------:|-------:|------:|------:|----------:|
| TestValueTuple | 1.261 ns | 0.0585 ns | 0.0650 ns | - | - | - | - |
| TestPattern | 21.364 ns | 0.3384 ns | 0.3165 ns | 0.0057 | - | - | 24 B |

From:

[MemoryDiagnoser]
public class Benchmark
{
    private readonly object input = (3, 4);

    [Benchmark]
    public int TestValueTuple()
    {
        if (input is ValueTuple<int, int> (var x, var y))
        {
            return x + y;
        }
        return 0;
    }

    [Benchmark]
    public int TestPattern()
    {
        if (input is (int x, int y))
        {
            return x + y;
        }
        return 0;
    }
}

I wonder whether the JIT's outsmarted me though, since the Gen 0 count is much lower than I would have expected.

canton7 on 13 May 2019

Yeah, i'm surprised by that as well. can you have a narrower test of:

c# if (input is ValueTuple<int, int> v) { return v.Item1 + v.Item2; } return 0;

Thanks!

CyrusNajmabadi on 13 May 2019

Sure, I'll put that together tomorrow.

canton7 on 13 May 2019

While you're at it, it would be interesting to know what the runtime degradation is when the input is ValueTuple<object, object> containing two int values.

gafter on 14 May 2019

The results seem really weird to me.

https://gist.github.com/HaloFour/8b435c4adc7a7832ac02e4c306131d95

BenchmarkDotNet=v0.11.5, OS=Windows 10.0.17763.55 (1809/October2018Update/Redstone5)
Intel Xeon CPU E5405 2.00GHz, 1 CPU, 4 logical and 4 physical cores
.NET Core SDK=2.2.202
  [Host] : .NET Core 2.2.3 (CoreCLR 4.6.27414.05, CoreFX 4.6.27414.05), 64bit RyuJIT
  Core   : .NET Core 2.2.3 (CoreCLR 4.6.27414.05, CoreFX 4.6.27414.05), 64bit RyuJIT

Job=Core  Runtime=Core

|                            Method |      Mean |     Error |    StdDev |  Gen 0 | Gen 1 | Gen 2 | Allocated |
|---------------------------------- |----------:|----------:|----------:|-------:|------:|------:|----------:|
|                    MatchArityOnly | 20.399 ns | 0.0190 ns | 0.0178 ns |      - |     - |     - |         - |
|                   MatchValueTuple |  3.527 ns | 0.0134 ns | 0.0125 ns |      - |     - |     - |         - |
|     MatchValueTupleAndDeconstruct |  3.518 ns | 0.0028 ns | 0.0022 ns |      - |     - |     - |         - |
|                       MatchITuple | 65.040 ns | 0.5763 ns | 0.5391 ns | 0.0151 |     - |     - |      48 B |
|              MatchITuplePartially | 44.820 ns | 0.3003 ns | 0.2663 ns | 0.0076 |     - |     - |      24 B |
|                MatchITupleInexact | 41.850 ns | 0.0766 ns | 0.0716 ns |      - |     - |     - |         - |
| MatchValueTupleWithITupleFallback | 44.345 ns | 0.0863 ns | 0.0807 ns |      - |     - |     - |         - |

HaloFour on 14 May 2019

👍1

The results seem really weird to me.

MatchArityOnly:

ITuple tuple = t1 as ITuple;
if (tuple != null && tuple.Length == 2)
{
    return 1;
}
return 0;

Time: 20ns
Gen 0: -
Casts to ITuple: 1
Boxes: 0
Virtual method calls: 1

MatchValueTuple:

object obj = t1;
if (obj is ValueTuple<int, int>)
{
    ValueTuple<int, int> valueTuple = (ValueTuple<int, int>)obj;
    return valueTuple.Item1 + valueTuple.Item2;
}
return 0;

Time: 3.5ns
Gen 0: -
Casts to ITuple: 0
Boxes: 0
Virtual method calls: 0

MatchValueTupleAndDeconstruct:

object obj = t1;
if (obj is ValueTuple<int, int>)
{
    ValueTuple<int, int> obj2 = (ValueTuple<int, int>)obj;
    int item = obj2.Item1;
    int item2 = obj2.Item2;
    return item + item2;
}
return 0;

Time: 3.5ns
Gen 0: -
Casts to ITuple: 0
Boxes: 0
Virtual method calls: 0

MatchITuple:

ITuple tuple = t1 as ITuple;
if (tuple != null && tuple.Length == 2)
{
    object obj = tuple[0];
    if (obj is int)
    {
        int num = (int)obj;
        object obj2 = tuple[1];
        if (obj2 is int)
        {
            int num2 = (int)obj2;
            return num + num2;
        }
    }
}
return 0;

Time: 65ns
Gen 0: 0.0151
Casts to ITuple: 1
Boxes: 2
Virtual method calls: 3

MatchITuplePartially:

ITuple tuple = t1 as ITuple;
if (tuple != null && tuple.Length == 2)
{
    object obj = tuple[0];
    if (obj is int)
    {
        return (int)obj;
    }
}
return 0;

Time: 45ns
Gen 0: 0.0076
Casts to ITuple: 1
Boxes: 1
Virtual method calls: 2

MatchITupleInexact:

ITuple tuple = t2 as ITuple;
if (tuple != null && tuple.Length == 2)
{
    object obj = tuple[0];
    if (obj is int)
    {
        int num = (int)obj;
        object obj2 = tuple[1];
        if (obj2 is int)
        {
            int num2 = (int)obj2;
            return num + num2;
        }
    }
}
return 0;

Time: 42ns
Gen 0: -
Casts to ITuple: 1
Boxes: 0 (the boxes were created when t2 was instantiated)
Virtual method calls: 3

MatchValueTupleWithITupleFallback:

object obj = t2;
if (obj is ValueTuple<int, int>)
{
    ValueTuple<int, int> obj2 = (ValueTuple<int, int>)obj;
    int item = obj2.Item1;
    int item2 = obj2.Item2;
    return item + item2;
}
ITuple tuple = t2 as ITuple;
if (tuple != null && tuple.Length == 2)
{
    obj = tuple[0];
    if (obj is int)
    {
        int num = (int)obj;
        object obj3 = tuple[1];
        if (obj3 is int)
        {
            int num2 = (int)obj3;
            return num + num2;
        }
    }
}
return 0;

Time: 44ns
Gen 0: -
Casts to ITuple: 1
Boxes: 0 (the boxes were created when t2 was instantiated)
Virtual method calls: 3

If you solve that as a set of linear equations, you get (approximately):

Cast to ITuple: 8ns
Box: 12ns
Virtual method call: 12ns

Each box is also responsible for 24B (which is the minimum object size on x64).

I'm sure that there are factors beyond these which invalidate those results to some degree, but they are roughly consistent across all of those benchmarks.

So it looks like unboxing the ValueTuple directly isn't just quicker because it avoids boxing its members, it also avoids a cast to ITuple and some virtual method calls, which are equally expensive.

canton7 on 14 May 2019

👀2

@canton7

What's weird to me is that MatchValueTupleWithITupleFallback appears to be consistently faster than MatchITuple despite the fact that the former is doing more work given that the is operation would always fail so it would have to fallback to using ITuple directly. Must be some JIT weirdness.

HaloFour on 14 May 2019

@HaloFour

What's weird to me is that MatchValueTupleWithITupleFallback appears to be consistently faster than MatchITuple despite the fact that the former is doing more work given that the is operation would always fail so it would have to fallback to using ITuple directly.

MatchITuple looks at t1. MatchValueTupleWithITupleFallback looks at t2. t1 is a ValueTuple<int, int> and so calls to ITuple[] need to box. t2 is a ValueTuple<object, object>, and so the ints are already boxed, and calls to ITuple[] don't need to box again. This is where the cost difference comes from: two boxes, at ~12ns each. The (failed) attempt to unbox as a ValueTuple<int, int> probably costs around 3ns (44 + 2*12 - 65).

A better comparison would be to have another test which is the same as MatchITuple, but uses t2. Then neither this new test nor MatchValueTupleWithITupleFallback would have to box.

canton7 on 14 May 2019

@canton7

Oh yeah, that makes sense. The benchmark wouldn't capture the cost of that upfront boxing.

HaloFour on 14 May 2019

👍1

So, struct type patterns (or any patterns that would match specifically on a struct) would only match if the target is exactly that struct type. No coercions are permitted. Because of that it seems worthwhile to test against ValueTuple<...> at least when all of the recursive patterns would require that all of the elements be some kind of struct. The only time that would not match but the ITuple approach would match is if the target is some type that implements that interface but isn't ValueTuple<...>, such as System.Tuple<...>. So, my question would be how often we expect to see non-ValueTuple<...> implementations of ITuple in pattern matching, and if it's worth taking that small hit up front in those cases vs. avoiding boxing.

HaloFour on 14 May 2019

@HaloFour

The only time that would not match but the ITuple approach would match is if the target is some type that implements that interface but isn't ValueTuple<...>, ...

What about ValueTuple<IComparable, IFormattable> containing 0 and 1. That would match the pattern too.

gafter on 14 May 2019

@gafter

What about ValueTuple<IComparable, IFormattable> containing 0 and 1. That would match the pattern too.

Or ValueTuple<object, object>, but I'm talking about the cases where the recursive type patterns are for structs, e.g. foo is (int x, int y), in which case it could only make a direct match. You could still match through ITuple against int in those cases, and it would still probably be less expensive, but you're not saving yourself the cost of the additional boxing.

HaloFour on 14 May 2019

The reason I propose optimising this specific corner of pattern matching is because it's the corner that people trying to unbox a ValueTuple are going to reach for, and they're going to be surprised or unaware that it's ~20x more expensive than they expected (more for larger ValueTuples).

There are going to be loads of cases where this optimization won't be applicable, and that's fine. It's only emitted when it looks like someone is trying to unbox an object to a ValueTuple, and it's got a ~5% cost if it's emitted but not applicable (ignoring code size, less for larger ValueTuples).

Even if it turns out that the majority of people who write patterns which look like they're trying to unbox a ValueTuple aren't actually trying to do that, there's an argument to say we should support those who are, given the relatively small runtime cost.

(Of course, we're still talking about additional code size, compiler complexity and developer time, for what amounts to tens of nanoseconds.)

canton7 on 14 May 2019

👍1

I've tagged this as a "Code Gen Quality" issue, but it is not clear to me that this would improve things more often than it would make things worse.

gafter on 4 Sep 2020

👍1

Was this page helpful?

0 / 5 - 0 ratings