It would be very convenient to have a CLR instruction for switching on the type of an object. The C# compiler could use this to generate code for a pattern-matching switch statement. The IL instruction would be followed by a series of (type, label) pairs. At runtime it would consume its operand, determine the first type in the list that the object inherits from, and jump to the corresponding label with a reference of the appropriate type on the stack. It would be required to operate in constant amortized time, no matter the number of types and whether or not those types are sealed.
We currently generate a series of type tests and branches for this case.
It is intended that this instruction would run in constant time for user-written types.
Why would this be better than a sequence of if/else pairs? Is this about reducing IL size? Do you expect the GC to have a more optimal implementation than the c# compiler generating the equivalent of below?
if (o.GetType() == typeof(string))
{
…
}
else if (o.GetType() == typeof(object))
{
....
}
else
{
...
}
@davidwrighton I think for literature on compiler optimization of pattern matching, the following two OCaml papers give you requisite knowledge and understanding:
Compiling Pattern Matching to Good Decision Trees
Optimizing Pattern Matching
Further, by hoisting this to a logical IL instruction, it is conceivable that really large F#/C# data science algorithms utilizing complex pattern matching could benefit from hot path optimization of the compiled decision tree. I realize C# only implements basic switch-on-type pattern matching right now, but the general benefits are wide-spread to further optimization. For OOP type-matching like switch-on-type, the compiler can specifically reason that:
a) there are significantly more "default case" type matches that fall-through to the default case
b) it is probably more reasonable to optimize for the default case to be evaluated first
The only remaining question is - if you're going to add an instruction, why only take one type parameter? The IL instruction should allow stacking it so it works as a decision tree. For a view into how OCaml does this, see the free chapter 23 of Real World Ocaml: https://v1.realworldocaml.org/v1/en/html/the-compiler-backend-byte-code-and-native-code.html
@gafter What do you think?
@davidwrighton
Here are two reasons:
[edited 2019-11-04 to add]
It is intended that the instruction run in amortized constant time. A straightforward translation of the sequence of type tests would not do that.
@jzabroski Thanks for those references!
The IL sequence you provided is only correct for sealed types
Ok, you can flip the sequence to use isinst
instead. (The JIT has optimization that turns isinst into a simple type check for sealed types.)
Only the runtime can know which concrete types occur most frequently at runtime
The most efficient way to do a type switch over large number of types is hashtable. The runtime can make this work well by proving efficient way to get a type hash code if it is not efficient enough today.
Other than that, the compiler or the compiler support library can build about as efficient hashtable as the runtime itself. I think it is likely that building the hashtable in the compiler or the compiler support library would give you more flexibility for implementing more complex matching patterns efficiently.
The most efficient way to do a type switch over large number of types is hashtable.
Are you suggesting that we build a hashtable dynamically and incrementally? Wouldn't that prevent the unloading of types once they appear as keys in that table?
I think it is likely that building the hashtable in the compiler or the compiler support library would give you more flexibility for implementing more complex matching patterns efficiently
That would remove all flexibility that the runtime would otherwise have to implement type switches in the most efficient manner for the platform.
Wouldn't that prevent the unloading of types once they appear as keys in that table?
Yes, the hashtable would need to be aware of type unloading. Hashtables aware of type unloading are not unusual and the primitives available for building them are not great. It is something that we should improve.
That would remove all flexibility that the runtime would otherwise have to implement type switches in the most efficient manner for the platform.
As the C# pattern matching is getting more complex, I think it starting to look pretty similar to dynamic
from runtime point of view. The runtime support for dynamic
is in separate library, separate from the JIT and core runtime. It is pretty efficient, it tunes itself for the most frequently used signatures where possible, etc. In theory, it could have been a bit more efficient if it was all built into the JIT. In practice, I do not think it would be actually more efficient if it was all built into the JIT. We would spent all time dealing with the complexity of building it into the JIT and there would be no time or appetite left to fine tune the performance. Note that we have done work in the runtime for dynamic
by introducing and improving the primitives that it is built out of, e.g. we have done a lot of performance work on delegates.
We should consider taking a similar route for pattern matching as we have taken for dynamic
keyword: Identify the key most primitive operations that the core runtime needs to provide in order to make it work great; but keep the overall runtime support for pattern matching in separate library.
@jkotas You may have convinced me.
It looks like the break-even point for using a hashtable is about 20 types in a switch. See https://github.com/dotnet/roslyn/issues/31515#issuecomment-473630695 for a benchmark.
I'm not sure how the jit could do better. Without evidence to the contrary, perhaps this issue should be closed?
Improved the prototype library so that the break-even point for using a hashtable is about 10 types in a switch. See https://github.com/gafter/TypeSwitch/blob/master/TypeSwitch/TypeSwitchDispatch.cs
I believe it is more appropriate for this kind of support to be directly in the runtime, where it can be optimized dynamically based on the program's actual behavior. The library approach fixes the time-space tradeoff at compile-time and makes it nearly impossible for the runtime to do any better.
Maybe we can teach the JIT to recognize the pattern:
if (type == typeof(A)) ...
else if (type == typeof(B)) ...
...
No IL changes and the IL works on older runtimes.
They are subtype tests, not exact type tests.
The runtime/JIT does recognize many framework types as intrinsics and optimizes them in custom ways. The library approach does not prevent the time-space tradeoff from getting optimized based on the actual program behavior if the type is recognized as intrinsic.
@jkotas Can you please give me an example of a case in which the runtime eliminates a static field due to replacing it and its use with intrinsic behavior?
The JIT is able to inline the value of readonly static fields. It does not eliminate it. The field is still there, but it is not used on the hot path.
@jkotas Making an appropriate time-space tradeoff would require eliminating the static field (which uses a lot of space to get reasonable performance). If the runtime does not do that kind of thing, then a new instruction would be an appropriate approach.
@jkotas Is there any way I can check whether the JIT has inlined something?
For example from some code I have written, how would I tell this has been inlined:
static class Constructor<T>
where T : new()
{
public static readonly Func<IScope, T> Empty = Expression.Lambda<Func<IScope, T>>(scope => scope.Exists<T>() ?? new T()).Compile();
}
Similarly, is there any "Auditing Machine" I can query inside the runtime to determine how far away the code is/was from being JIT'ed?
@jzabroski Your question is not relevant to this issue. Please open a new issue on this.
Making an appropriate time-space tradeoff would require eliminating the static field
@gafter I do not think elimination of the static field would be required for a decent time-space tradeoff.
Designs that require low-level implementation are harder to evolve and improve. They may be better in theory, but we find them to be worse in practice because of the experience and amount of work required to optimize them.
I do not have a problem with this issue being open as a new IL instruction proposal. But i is not obvious to me that implementing this as a new IL instruction will achieve the best results.
Dumping my notes for future reference
A code sequence
push e
typeswitch 3
type1 label1
type2 label2
type3 label3
fallthrough: // original e on stack
...
label1: // type1 on stack
...
Would be defined to be equivalent to the following, except required to execute in (amortized) constant time:
push e
dup
isinst type1
brfalse next1
isinst type1
goto labe1
next1:
dup
isinst type2
brfalse next2
isinst type2
goto labe2
next2:
dup
isinst type3
brfalse next3
isinst type3
goto labe3
next3:
fallthrough: // original e on stack
...
label1: // type1 on stack
...
Let's make the JIT recognize that IL pattern then. No need for new opcodes.
Let's make the JIT recognize that IL pattern then. No need for new opcodes.
Then all the patterns JIT recognize must be documented
as C# is not the only language generating IL.
And that will make IL RISC and the patterns will become CISC.
Let's make the JIT recognize that IL pattern then. No need for new opcodes.
The problem is that this pattern must be reliably recognized by all JITs, not just one of them under some circumstances. Otherwise programmers cannot rely on the performance in writing their algorithms. Adding a new instruction adds confidence that JITs recognize it.
It seems like this is something that might fit fairly well as an intrinsic. No new IL or IL pattern matching would be needed, and it could be engineered so that jit/runtime optimization was possible but also optional.
We'd agree on some BCL method(s) for this operation, eg int TypeSwitch(Type t, Type x_0, Type x_1, ...)
, and find an efficient way to cope with the varadic cases (one can always just implement up to some N, and chain calls to handle cases larger than N, I suppose). This would return the index i
of the lowest-numbered x_i
that satisfies t isinst x_i
, or -1 if none satisfy.
This method (or set of methods) would be targeted by C#, which when doing pattern matching would follow this up with an actual switch to direct control to the appropriate label. There might be some complexities fitting this in with side effect ordering -- I don't know how much code can run during the matching phase currently.
The TypeSwitch
method(s) would be fully implemented in the BCL by default. So no jit or runtime work would be required. This implementation could be via hashtable, linear search, etc, whatever seems suitable.
The TypeSwitch
method(s) could also be marked with [Intrinsic]
so that the they could be recognized by the jit as methods with known behavior. Doing this allows the possibility for special handling, but does not create an obligation. Then when the jit encountered a call to TypeSwitch
where the x_i
are all jit-time known types, the jit could interact with the runtime to see if it was possible to optimize the test away entirely (say, t
's type is known at jit time), or if it made sense to peel off some of the cases for eager testing, or build a full decision tree based on what is known about the likely types of t
and the class relationships of the x_i
, or fuse the call and following switch into a hash-based indirect jump sequence. If the x_i
are not jit-time constants, or the evidence for benefit from specialization is weak, the jit could just defer to calling the helper.
That is a very pragmatic middle-ground. Also, by hoisting TypeSwitch up to intrinsic level, it might even be possible to do fancy C# to FPGA synthesizer (completely different target architecture), as you have not yet dipped into the needless abstraction of CIL registers (and even so, the registers become fields in a special component named device in the FPGA), so your approach is likely the most portable and meets specific use case of "emit event if not optimized" so that developers can see in Benchmark.Net or whatever what's going on and what basis they can make optimizer assumptions.
@AndyAyersMS
... could be engineered so that jit/runtime optimization was possible but also optional ...
The "non-optimized" path is likely to be significantly worse that the straightforward IL sequence when not optimized (for a few reasons*). That is very unfortunate, though possibly not fatal. The fatal flaw in this approach is that it relies on an optional optimization for an asymptotic performance improvement. A straightforward complexity analysis demonstrates that you cannot reliably get that asymptotic improvement to the program as a whole unless the optimization is reliably applied (nearly) everywhere it is applicable. That is why I believe a new IL instruction is the best approach - it guarantees that the runtime recognizes the "pattern". I'll explain in more detail when we meet on Monday afternoon.
If the performance is linear time (in the size of the data to the call) when not optimized but constant (amortized) time (which is what we want) when "optimized", then is it not something that we would be likely to want to rely on in designing language features (though as I will explain we need to rely on that in designing upcoming features), and probably not something we would benefit from using in normal C# code-gen scenarios.
*Some of the reasons that your proposed intrinsic is likely to be significantly slower than the (also unfortunately linear-time) straightforward IL sequence:
TypeSwitchDispatch
, which is about the best I believe a library can do.It is fine to add a guarantee that the generated code will be always amortized constant time and never linear (for large number of types, not observable for small number of types), for either API or IL instruction design.
For non-optimized code, the JIT is going to call a method to implement this, even if it is a special IL instruction. For optimized code, the JIT can recognize the method call as intrinsic and achieve the same outcomes as a IL instruction, again no fundamental difference between the two designs.
I believe that the interesting optimizations for the API vs. IL instruction design discussion are:
The question to ask is whether introducing a new IL instruction is worth the trouble and whether it will make the above set of optimizations significantly easier to do.
Adding @dsyme to this thread.
A primitive or intrinsic to aid efficient compilation of pattern matching would be very useful and clearly is part of an improved overall design for .NET in the 21st century. I thought a lot about proposing such an intrinsic around the time 2001-03 as we worked through the foundations necessary for F# (generics etc.) but didn't iterate to a concrete design.
It would be great if the switch were over a compat set of integers. thought about a design which would dynamically augment the method table with extra slots for a compact integer tag (e.g. one slot for each commonly used switch) to avoid any hashing but there are obvious tradeoffs and difficulties with that (method table size, NGEN etc.). The runtime gurus on the thread would make much better decisions than me on this.
On the same topic, for large union switches F# stores an extra integer tag in the object to allow switch
-based switching.
In advanced ML-family languages supporting GADTs, each branch of a switch in pattern matching can potentially introduce localized type information, e.g. the type systems means you "know" that on one particular branch a type variable T
has definite type string
. It's possible that F#, C# and the CLR should eventually support this. Introducing this contextual type information is much more suited to a new opcode, though could also be done if recognizing an intrinsic
How would you guarantee constant time switches for n types, each of which have thousands of implementors/derived types?
Eg:
```csharp
private string Switch(object o) {
return o switch {
IReadOnlyList _ =>...
IList _ => ...
IEnumerable _ => ...
ISerialisable _ => ...
etc.
}
}
Ah, I see that we cache the index that a type was found at, so that after first usage it's constant time.
This issue track the same request as https://github.com/dotnet/runtime/issues/29002. I believe the current consensus is that it can be done as an API in which the set of types to match are passed as generic type arguments (e.g. the elements of a tuple type). See https://github.com/gafter/TypeSwitch/blob/master/TypeSwitch/TypeSwitchDispatch.cs for a prototype that does not address the issue mentioned in https://github.com/dotnet/runtime/issues/29002#issuecomment-502904329
Most helpful comment
A primitive or intrinsic to aid efficient compilation of pattern matching would be very useful and clearly is part of an improved overall design for .NET in the 21st century. I thought a lot about proposing such an intrinsic around the time 2001-03 as we worked through the foundations necessary for F# (generics etc.) but didn't iterate to a concrete design.
It would be great if the switch were over a compat set of integers. thought about a design which would dynamically augment the method table with extra slots for a compact integer tag (e.g. one slot for each commonly used switch) to avoid any hashing but there are obvious tradeoffs and difficulties with that (method table size, NGEN etc.). The runtime gurus on the thread would make much better decisions than me on this.
On the same topic, for large union switches F# stores an extra integer tag in the object to allow
switch
-based switching.In advanced ML-family languages supporting GADTs, each branch of a switch in pattern matching can potentially introduce localized type information, e.g. the type systems means you "know" that on one particular branch a type variable
T
has definite typestring
. It's possible that F#, C# and the CLR should eventually support this. Introducing this contextual type information is much more suited to a new opcode, though could also be done if recognizing an intrinsic