/cc @jkotas
There are certain scenarios today - largely involving activation, serialization, and DI - where library authors perform codegen in order to perform operations on arbitrary types. The primary reason for this is performance. The standard Reflection APIs are too slow to be used in the code paths targeted by these library authors, and though codegen has a large upfront cost it performs considerably better when amortized over the lifetime of the application.
This approach generally works well, but the .NET Framework is considering scenarios where it must operate in environments which do not allow codegen. This renders ineffective the existing performance improvement techniques used by these library authors.
We are uniquely positioned to provide a set of APIs which can cover the majority of scenarios traditionally involving reflection-based codegen. The general idea is that library authors can rely on the APIs we provide to work correctly _both_ in codegen-enabled _and_ in codegen-disallowed environments. Alternatively, the library authors can detect at runtime whether codegen is enabled, and if so they can use their existing highly-optimized codegen logic, falling back to the new API surface if codegen is disallowed.
namespace System.Reflection {
public delegate ref TField FieldAccessor<TField>(object target);
public delegate ref TField FieldAccessor<TTarget, TField>(ref TTarget target);
/// <summary>
/// Provides factories that can be used by serializers, formatters, and DI systems
/// to perform reflection-like activities in performance-critical code paths.
/// </summary>
[SecurityCritical]
public static class ReflectionServices
{
public static bool IsCodegenAllowed { get; }
/*
* FIELD ACCESSORS, GETTERS, AND SETTERS
*/
public static FieldAccessor<TField> CreateFieldAccessor<TField>(FieldInfo fieldInfo);
public static FieldAccessor<TTarget, TField> CreateFieldAccessor<TTarget, TField>(FieldInfo fieldInfo);
public static Func<object, object> CreateFieldGetter(FieldInfo fieldInfo);
public static Func<object, TField> CreateFieldGetter<TField>(FieldInfo fieldInfo);
public static Func<TTarget, TField> CreateFieldGetter<TTarget, TField>(FieldInfo fieldInfo);
public static Action<object, object> CreateFieldSetter(FieldInfo fieldInfo);
public static Action<object, TField> CreateFieldSetter<TField>(FieldInfo fieldInfo);
// TTarget must not be value type; field must be an instance field.
public static Action<TTarget, TField> CreateFieldSetter<TTarget, TField>(FieldInfo fieldInfo);
/*
* PARAMETERLESS OBJECT CREATION
*/
public static Func<object> CreateInstanceFactory(Type type);
public static Func<T> CreateInstanceFactory<T>();
/*
* PARAMETERFUL OBJECT CREATION
*/
public static Func<object[], object> CreateInstanceFactory(ConstructorInfo constructorInfo);
public static Func<object[], T> CreateInstanceFactory<T>(ConstructorInfo constructorInfo);
public static Delegate CreateInstanceFactoryTyped(ConstructorInfo constructorInfo, Type delegateType);
/*
* PROPERTY GETTERS AND SETTERS
* TODO: How would indexed properties be represented? Using the normal method invocation routines?
*/
public static Func<object, object> CreatePropertyGetter(PropertyInfo propertyInfo);
public static Func<object, TProperty> CreatePropertyGetter<TProperty>(PropertyInfo propertyInfo);
public static Func<TTarget, TProperty> CreatePropertyGetter<TTarget, TProperty>(PropertyInfo propertyInfo);
public static Action<object, object> CreatePropertySetter(PropertyInfo propertyInfo);
public static Action<object, TProperty> CreatePropertySetter<TProperty>(PropertyInfo propertyInfo);
// TTarget must not be value type; property must be an instance property.
public static Action<TTarget, TProperty> CreatePropertySetter<TTarget, TProperty>(PropertyInfo propertyInfo);
/*
* METHODS
*/
public static Func<object, object[], object> CreateMethodInvoker(MethodInfo methodInfo);
// If instance method, 'delegateType' must be open over 'this' parameter.
public static Delegate CreateMethodInvoker(MethodInfo methodInfo, Type delegateType);
/*
* EVENTS
*/
public static Action<object, object> CreateEventSubscriber(EventInfo eventInfo);
// Event must be an instance event.
public static Action<TTarget, TDelegate> CreateEventSubscriber<TTarget, TDelegate>(EventInfo eventInfo);
public static Action<object, object> CreateEventUnsubscriber(EventInfo eventInfo);
// Event must be an instance event.
public static Action<TTarget, TDelegate> CreateEventUnsubscriber<TTarget, TDelegate>(EventInfo eventInfo);
}
}
These APIs are not geared toward standard application developers who are already comfortable using the existing Reflection API surface. They are instead geared toward advanced library developers who need to perform Reflection operations in performance-sensitive code paths.
These APIs must work in a codegen-disallowed execution environment. (Are there exceptions?)
These APIs do not need to cover all scenarios currently allowed by the existing methods on MethodInfo
and related types. For example, constructors that take ref
or out
parameters are sufficiently rare that we don't need to account for them. They can be invoked via the standard Reflection APIs.
These APIs do not need to have the same observable behavior as using the Reflection APIs; e.g., we may determine that these APIs should not throw TargetInvocationException
on failure. But these APIs must provide consistent behavior regardless of whether they're running within a codegen-enabled or a codegen-disallowed environment.
Delegate creation does not need to be particularly optimized since there will be many checks performed upfront and we will ask callers to cache the returned delegate instances. However, once the delegates are created their invocation must be faster than calling the existing Reflection APIs. (Exception: if codegen is disallowed, then delegate invocation should be faster than calling the existing Reflection APIs wherever possible, and it must not be slower.)
It is an explicit goal to get serialization library authors to prefer this system over hand-rolling codegen for most member access scenarios. The selling points of this API would be ease of use (compared to hand-rolling codegen), performance, and the ability to work in a wide variety of execution environments.
It is an explicit non-goal to have performance characteristics equal to or better than a library's own custom codegen. For example, a DI system might choose to codegen a single method that both queries a service provider to get dependency instances and calls newobj
on the target constructor. Such a system will always outperform these generalized APIs, but the API performance should be good enough that library authors would be generally satisfied using them over Reflection as a fallback in these scenarios.
These APIs do not need to support custom implementations of MemberInfo
. Only support for CLR-backed members is required.
Something I forgot to mention above - there's also a consideration for making the existing Reflection APIs faster. However, this requires further thought, and there is likely only so much we can do because each invocation of the APIs would need to perform _both_ setup _and_ invocation. We're also not able to change the observable behavior of the existing Reflection APIs.
Sounds promising! I need to think more about how this fits all of our uses in Orleans (Serialization, RPC, Activation). This current proposal wouldn't be able to replace our existing codegen for creating proxy objects (where an interface whose methods return Task<T>
is implemented via codegen to make remote calls. Currently implemented here)
Will the field setters work on initonly
fields? Being able to serialize/deserialize immutable types is an important use case, especially since get-only properties are backed by initonly
fields.
EDIT: currently the CLR seems to be relatively loose about how it enforces initonly
since we are able to write to initonly
fields from generated IL, although that IL is unverifiable. We never use IL to write to initonly
fields once an object is logically constructed (deserialized / deep-copied), anyhow. A blessed path here would be greatly appreciated.
How about ref
-returned property getter?
class Ref
{
int[] _items;
public ref int First => ref _items[0];
}
I want APIs like the following
public delegate ref TResult RefFunc<TArg, TResult>(TArg arg);
public static RefFunc<object, TProperty> CreateRefPropertyGetter<TProperty>(PropertyInfo propertyInfo);
CreateMethodInvoker
use generics to specify the delegate type? In similar APIs, I always find it annoying that I have to write e.g. (Action)CreateMethodInvoker(methodInfo, typeof(Action))
, when it could be just CreateMethodInvoker<Action>(methodInfo)
.CreateMethodInvoker
effectively just Delegate.CreateDelegate
? Though most of the other methods don't have similar equivalents, so I think it makes sense to provide it for symmetry.object[]
for parameters, does it make sense to use custom delegate type with params
instead of Func
? That way, I could write e.g. CreateInstanceFactory<Foo>(constructorInfo).Invoke(3.14, "pi", null)
instead of CreateInstanceFactory<Foo>(constructorInfo).Invoke(new object[] { 3.14, "pi", null })
.Unless there's a glaring technical reason why it can't be done, shouldn't there be a mass of generic overloads for CreateMethodInvoker
and other methods currently showing an object[]
parameter?
Otherwise there's an array creation and an arbitrary number of boxing allocations every time the provided delegate is called.
c#
public static Func<TInstance, T1, T2, TReturn> CreateMethodInvoker<TInstance, T1, T2, TReturn>(MethodInfo methodInfo);
Useful for situations where you know a given type has a method that conforms to a signature, but it doesn't have an interface you can use to access it directly.
At least until Shapes are ready :)
The proposal so far shows an API that looks convenient to use, I like it, specially as it would remove human error from the equation.
Two question:
What makes this faster than the existing system? It is not clear to me how this API and this proposal make dynamically code generated faster.
You mention that this should work in places where no code generation is allowed, is the idea that we would run a slow code path, or that we would have an off-line code generator? The latter would probably call for a different design - see for example the way this one works: https://github.com/neuecc/Utf8Json
Rather than a "mass of generics", why not just:
public static D CreateMethodInvoker<D>(MethodInfo) where D : Delegate
These APIs do not need to have the same observable behavior as using the Reflection APIs; e.g., we may determine that these APIs should not throw TargetInvocationException on failure. But these APIs must provide consistent behavior regardless of whether they're running within a codegen-enabled or a codegen-disallowed environment
I'm assuming (and it would good to be explicit) that other old-style behavior we don't want to support are:
For the non-generic apis, automatic conversion of every type under the sun to the required type as MethodInfo.Invoke()
does. (i.e. BindingFlags.ExactBinding
is implied.)
Automatic replacement of Missing
with the default parameter value.
From the comment on CreateMethodInvoker
, I assume you also don't want to support the ability to create open delegates over static methods or closed delegates over instance methods. (The existing CreateDelegate
methods have become increasingly permissive about this over time, a fact about which I have mixed feelings.)
Also, I'd find it more convenience for these to be extension methods over the *Info
parameter.
I suspect IsCodegenAllowed
will be questioned - do we have other precedents of "quality of service" querying apis in corefx? Also, the name makes it sound like the other apis won't work if it returns false
even though the opposite is true here.
The Event and Property cases just boil down to retrieving the correct accessor method and calling CreateMethodInvoker
on that, right? Given the specialized nature of this api, maybe that should just be the guidance for all events and properties, not just the ones that aren't index/return or pass refs?
This is an area where I play a lot. I've been down this road many times, and have switched "engines" many times - it is very time consuming to do so. For me, frankly the "real" answer here is to get better compile-time codegen tooling - so that our libraries hook into the build chain painlessly and emit appropriate code then, without consumers needing to jump though magic hoops and arcane incantations.
In the absence of that... well, I can kinda see some benefit for greenfield scenarios, but except for the full and proper compile-time emit, personally I wouldn't feel overly compelled to try to change engine another time on an existing library.
If this is a suggestion for a new MS / corefx API: frankly I'd much rather that time was spent giving us compile-time codegen. Same target scenario, better (IMO) result.
Just my tuppence.
Additionally: unless I'm mistaken, everything here is already possible via "expression trees" - which IIRC spoof Compile()
via interpreted (reflection) trees when full compilation isn't available. What would be the scenario where this new API would be a compelling improvement over expression trees?
additional additional:
The API exposed is too basic and simplistic. It isn't sufficient to just provide delegates that implement property accessors. That's enough for casual usage, but so is regular unoptimized reflection :) Given your stated audience, the typical scenario is much closer to "emit a complex single method body that accesses 12 properties on 4 inputs, performs a series of complex operations on all those things (including several custom loops), then does 3 further operations with the results - and includes exception handling". The API proposed above doesn't even begin to touch on that.
And then comes the killer word: async
. Yeah, I want to be able to emit an async
method that uses await
s. Oh, and it must support value-task-T as well as task-T. And support the kind of insane optimizations that library authors love, like checking task statuses to avoid an await
(and the resulting state machine) when possible. Granted, these things are already incredibly hard to do via (say) ILGenerator
- but are absurdly easy to do if we just emit code (whether raw C# or a Roslyn tree). The compiler is really, really good at doing this stuff. I say: let's build on the excellent compiler to solve these problems.
edit: oh, and support ref
returns and ref-like types and ref-locals. No, I'm serious here. I've been in the planning stage of a new "core" for protobuf-net that uses compile-time emit, fully async, and plugs into the "channels" or "pipelines" or "streams v2" (or whatever it is called this week) API, and this is the kind of stuff that is involved.
I've built three serializers, ZeroFormatter(original format), MessagePack for C#(binary) and Utf8Json(json).
My company is creating mobile game for iOS/Android by Unity so have to support both .NET(Core) and Unity(AOT/IL2CPP).
Therefore, all serializers support two areas, runtime codegen and pre-compiled codegen.
for serializer optimization, proposal api is not sufficient.
I've show decompiled code at here.
https://github.com/neuecc/Utf8Json#performance-of-serialize
for example, in serialize
// proposal design
// cost of outer accessor loop, cost of call delegate and can not avoid boxing.
foreach(var getterAccessor in accessors)
{
writer.Write(getterAccessor.Invoke(value));
}
// Current Utf8Json design, call member directly.
writer.WriteInt32(value.foo);
writer.WriteString(value.bar);
In my area - Game, performance deterioration is not allowed.
Therefore, I do code generation instead of fallback to reflection.
I currently analyze .csproj
(or .cs
) by Roslyn and generate the source code.
I have to maintain two code bases(runtime codegen - TypeInfo, pre-compiled codegen - Roslyn), which is a burden.
I'm very happy if single codebase can do.
By the way, in my case, runtime codegen is better than pre-compiled codegen in performance.
Because x32/x64 and endianness can be determined at runtime so I generated optimized code for it but pre-compiled can not.
for example, at deserialize, embed endian dependent ulong inline.
https://github.com/neuecc/Utf8Json#performance-of-deserialize
@migueldeicaza The idea is that for codegen-disallowed scenarios we鈥檇 run down a different code path that鈥檚 still significantly faster than what鈥檚 otherwise available using standard Reflection APIs. Consider creating an object using a simple parameterless ctor. In a codegen-disallowed world, we could implement this via a calli into the allocator followed by a calli into the constructor. This is basically what the newobj instruction gets JITted into anyway, so it would have similar performance to a newobj, but with the slight additional overhead of an indirection or two.
@neuecc "but pre-compiled can not." well, there's only two to choose from... it probably wouldn't hurt badly to emit both; just a consideration from someone who feels the same pain points
@neuecc Thanks for the insight into your scenario! I want to point out that one assumption you had is incorrect; these APIs do not require values to be boxed if you really want to avoid that. There are overloads that take and return non-object. You鈥檇 still incur the cost of the delegate indirection once per member instead of once per type, however. This shouldn鈥檛 show up as too bad a thing in profiler runs considering things like String-to-UTF8 conversion are far heavier than a simple virtual dispatch.
@GrabYourPitchforks yes, but my sample assume accessors
is Func<object, object>[]
.
If we do without IL Emit, it is difficult to deal with different types for each field.
Ah, but if Action<Writer, TTarget>[]
will solve it.
foreach(var writeAction in writeActions)
{
// writeAction that creates by ExpressionTree(? how to create?) uses Func<TTarget, TField> accessor
writeAction(writer, value);
}
@neuecc
// TContainer = type that contains the properties / fields to serialize
static class SerializationFactories<TContainer> {
public static Action<Writer, TContainer> CreateSerializerForField(FieldInfo fieldInfo) {
if (fieldInfo.FieldType == typeof(int)) {
return CreateIntSerializer(fieldInfo);
} else if (fieldInfo.FieldType == typeof(string)) {
// ...
} else {
return CreateObjectSerializer(fieldInfo);
}
}
private static Action<Writer, TContainer> CreateIntSerializer(FieldInfo fieldInfo) {
Utf8String fieldNameAsUtf8 = ...;
Func<TContainer, int> getter = ReflectionServices.CreateFieldGetter<TContainer, int>(fieldInfo);
return (writer, @this) => writer.WriteInt(fieldNameAsUTf8, getter(@this));
}
private static Action<Writer, TContainer> CreateLongSerializer(FieldInfo fieldInfo) { /* ... */ }
private static Action<Writer, TContainer> CreateStringSerializer(FieldInfo fieldInfo) { /* ... */ }
private static Action<Writer, TContainer> CreateObjectSerializer(FieldInfo fieldInfo) {
return (Action<Writer, TContainer>)typeof(SerializationFactories<TContainer>).GetMethod("CreateObjectSerializerCore").MakeGenericMethod(typeof(fieldInfo.FieldType)).Invoke(null, new[] { fieldInfo });
}
private static Action<Writer, TContainer> CreateObjectSerializerCore<TField>(FieldInfo fieldInfo) {
Utf8String fieldNameAsUtf8 = ...;
Func<TContainer, TField> getter = ReflectionServices.CreateFieldGetter<TContainer, TField>(fieldInfo);
return (writer, @this) => writer.WriteObject<TField>(fieldNameAsUtf8, getter(@this));
}
}
@GrabYourPitchforks This only works if the set of types you care about is finite and fixed.
@jkotas
"This only works if the set of types you care about is finite and fixed."
Which is the case in the majority of the scenarios addressed by this proposal. Remember: this isn't trying to replace Reflection. (If you're trying to improve Reflection all-up, just make changes directly to the Reflection APIs and ignore this proposal.) The goal of this proposal is to make certain scenarios (namely, serialization and a small handful of others) easier for library authors to write in a high-performance manner that works both in codegen-allowed and in codegen-disallowed environments.
For serialization, there is only a fixed, finite set of primitive types supported by any given protocol. Consider integers, strings, possibly binary and Guid
and DateTimeOffset
. Serializers generally treat all other data as complex types which are constructed from one or more underlying primitives, so it's really just a matter of recursion at that point.
If you're trying to improve Reflection all-up, just make changes directly to the Reflection APIs and ignore this proposal
I see this proposal as a start of discussion how to improve Reflection all-up - we should attempt to cover as many holes in the existing reflection APIs as possible.
@ufcpp RefFunc
is an interesting proposal and will probably have great utility going forward. :) Your scenario can be accomplished today via Delegate.CreateDelegate
, which also means it can be accomplished via the proposed ReflectionServices.CreateMethodInvoker
. I'm curious whether there's large demand for adding a convenience overload of CreatePropertyGetter for this.
"The idea is that for codegen-disallowed scenarios we鈥檇 run down a
different code path that鈥檚 still significantly faster than what鈥檚 otherwise
available using standard Reflection APIs."
Why not just implement this improved code path in the existing "compile
unavailable" expression tree path? This gives you an established rich API
that already covers everything cited in the example API, and would improve
the performance of a wide range of existing code including expression trees
emitted directly from the compiler via IQueryable-T. "Expression trees are
now much faster even on runtimes that don't allow compilation" would be a
great release note - much better than "a very few niche folks light make
use of a new and barely tested API".
If we had to choose between this and better support for compile time codegen (see Marc's linked thread), I would choose better compile time codegen support.
As pointed out in other comments, this doesn't add anything which we don't already do today - unless I'm mistaken?
Expression trees have a couple of issues when used for things like serializers that I think are worth improving.
My hope is that this API will provide performance reasonably close to compiled Expressions that's also consistently pretty good across runtimes.
@GrabYourPitchforks I accidentally have done the second part of the proposal (about properties) while I was doing dotnet/corefx#36506.
Re a high-performance API to use instead of System.Reflection.MethodBase.Invoke(object obj, object[] parameters)
or System.Delegate.DynamicInvoke(params object[] args)
, how about the following?
public interface IDynamicInvocationParameters
{
void SetRParameter<T>(int parameterOrdinal, T newValue) where T : class;
void SetSParameter<T>(int parameterOrdinal, T newValue) where T : struct;
T GetRParameter<T>(int parameterOrdinal) where T : class;
T GetSParameter<T>(int parameterOrdinal) where T : struct;
int Count { get; }
}
class MethodBase // == System.Reflection.MethodBase
{
public void DynamicInvoke(object instance, IDynamicInvocationParameters parameters);
}
The above-proposed method System.Reflection.MethodBase.DynamicInvoke(object, IDynamicInvocationParameters)
would be implemented in CoreFX and written to accept any implementation of IDynamicInvocationParameters
. A class that implements IDynamicInvocationParameters would _not_ be provided by CoreFX. In your own app, you'd write your own class that implements IDynamicInvocationParameters in whatever way best suits whatever kind of dynamic scenario you're trying to create in your app. It's also your own responsibility to cache and reuse your instances of your class that implements IDynamicInvocationParameters.
In the case of normal input parameters, you must use IDynamicInvocationParameters.SetRParameter
or SetSParameter
before MethodBase.DynamicInvoke
. However in the case of a C# out
parameter, then ofcourse it's the other way around: MethodBase.DynamicInvoke
does IDynamicInvocationParameters.SetRParameter/SetSParameter
and then afterwards your app does IDynamicInvocationParameters.GetRParameter/GetSParameter
to retrieve the value of the output parameter.
In the case of a C# ref
parameter (meaning both input and output), then the sequence is:
IDynamicInvocationParameters.SetRParameter
or SetSParameter
to provide the input value for each parameter that is either normal input or ref
(input+output) but not out
.System.Reflection.MethodBase.DynamicInvoke(object, IDynamicInvocationParameters)
MethodBase.DynamicInvoke
executes IDynamicInvocationParameters.SetRParameter
or SetSParameter
for each ref
and out
parameter.IDynamicInvocationParameters.GetRParameter
or GetSParameter
for each ref
and out
parameter.If the invoked method has a return value, then it would also be transferred via IDynamicInvocationParameters, but I'll leave this topic for another message. Multiple return values should be supported because the latest version of C# does indeed support multiple return values with the help of System.ValueTuple<>
. This could be supported by treating each return value as if it is an out
parameter, and assigning each return value an ordinal that can be used with IDynamicInvocationParameters.SetRParameter
etc.
I would like to provide proof-of-concept implementation of this proposal which is described here in details, code is here. The implementation allows to reflect any member as the delegate instance of the following types:
Func
, Action
etc.out
or ref
parameters.Quick example:
internal delegate MemoryStream MemoryStreamConstructor(byte[] buffer, bool writable);
//using custom delegate
MemoryStreamConstructor ctor = typeof(MemoryStream).GetConstructor(new[] { typeof(byte[]), typeof(bool) }).Unreflect<MemoryStreamConstructor>();
//or
ctor = Type<MemoryStream>.Constructor.Get<MemoryStreamConstructor>();
//using delegate from .NET
Func<byte[], bool> ctor = typeof(MemoryStream).GetConstructor(new[] { typeof(byte[]), typeof(bool) }).Unreflect<Func<byte[], bool>>();
//or
ctor = Type<MemoryStream>.Constructor<byte[], bool>.Get();
//using Function
Function<(byte[] buffer, bool writable), MemoryStream> ctor = typeof(MemoryStream).GetConstructor(new[] { typeof(byte[]), typeof(bool) }).Unreflect<Function<(byte[], bool), MemoryStream>>();
//or
ctor = Type<MemoryStream>.GetConstructor<(byte[] buffer, bool writable)>();
Benchmarks are here. The big advantage of using Function
or Procedure
delegate is that the parameters in the signature can be partially typed. As a result, you can reflect method into delegate even if parameter types is not known at compile time. For instance:
Function<(object text, object result), bool> tryParse = typeof(decimal).GetMethod(nameof(decimal.TryParse), new[] { typeof(string), typeof(decimal).MakeByRefType() }).Unreflect<Function<(object, object), bool>>();
Moving to Future - not high enough priority at this time for the 5.0 schedule.
Most helpful comment
This is an area where I play a lot. I've been down this road many times, and have switched "engines" many times - it is very time consuming to do so. For me, frankly the "real" answer here is to get better compile-time codegen tooling - so that our libraries hook into the build chain painlessly and emit appropriate code then, without consumers needing to jump though magic hoops and arcane incantations.
In the absence of that... well, I can kinda see some benefit for greenfield scenarios, but except for the full and proper compile-time emit, personally I wouldn't feel overly compelled to try to change engine another time on an existing library.
If this is a suggestion for a new MS / corefx API: frankly I'd much rather that time was spent giving us compile-time codegen. Same target scenario, better (IMO) result.
Just my tuppence.