Currently there is no way to pass around a heterogeneous set of .NET value types without boxing them into objects or creating a custom wrapper struct. To facilitate low allocation exchange of value types we should provide a struct that allows passing the information without heap allocations. The canonical example of where this would be useful is in String.Format.
Related proposals and sample PRs
Goals
Non Goals
Nice to Have
General Approach
Variant is a struct that contains an object pointer and a "union" struct that allows stashing of arbitrary blittable (i.e. where unmanaged) value types that are within a specific size constraint.
Sample Usage
``` C#
// Consuming method
public void Foo(ReadOnlySpan
{
foreach (Variant item in data)
{
switch (item.Type)
{
case VariantType.Int32:
// ...
}
}
}
// Calling method
public void Bar()
{
var data = Variant.Create(42, true, "Wow");
Foo(data.ToSpan());
// Only needed if running on .NET Framework
data.KeepAlive();
}
````
Surface Area
``` C#
namespace System
{
///
///
///
public readonly struct Variant
{
public readonly VariantType Type;
/// <summary>
/// Get the value as an object if the value is stored as an object.
/// </summary>
/// <param name="value">The value, if an object, or null.</param>
/// <returns>True if the value is actually an object.</returns>
public bool TryGetValue(out object value);
/// <summary>
/// Get the value as the requested type <typeparamref name="T"/> if actually stored as that type.
/// </summary>
/// <param name="value">The value if stored as (T), or default.</param>
/// <returns>True if the <see cref="Variant"/> is of the requested type.</returns>
public unsafe bool TryGetValue<T>(out T value) where T : unmanaged;
// We have explicit constructors for each of the supported types for performance
// and to restrict Variant to "safe" types. Allowing any struct that would fit
// into the Union would expose users to issues where bad struct state could cause
// hard failures like buffer overruns etc.
public Variant(bool value);
public Variant(byte value);
public Variant(sbyte value);
public Variant(short value);
public Variant(ushort value);
public Variant(int value);
public Variant(uint value);
public Variant(long value);
public Variant(ulong value);
public Variant(float value);
public Variant(double value);
public Variant(decimal value);
public Variant(DateTime value);
public Variant(DateTimeOffset value);
public Variant(Guid value);
public Variant(object value);
/// <summary>
/// Get the value as an object, boxing if necessary.
/// </summary>
public object Box();
// Idea is that you can cast to whatever supported type you want if you're explicit.
// Worst case is you get default or nonsense values.
public static explicit operator bool(in Variant variant);
public static explicit operator byte(in Variant variant);
public static explicit operator char(in Variant variant);
public static explicit operator DateTime(in Variant variant);
public static explicit operator DateTimeOffset(in Variant variant);
public static explicit operator decimal(in Variant variant);
public static explicit operator double(in Variant variant);
public static explicit operator Guid(in Variant variant);
public static explicit operator short(in Variant variant);
public static explicit operator int(in Variant variant);
public static explicit operator long(in Variant variant);
public static explicit operator sbyte(in Variant variant);
public static explicit operator float(in Variant variant);
public static explicit operator TimeSpan(in Variant variant);
public static explicit operator ushort(in Variant variant);
public static explicit operator uint(in Variant variant);
public static explicit operator ulong(in Variant variant);
public static implicit operator Variant(bool value);
public static implicit operator Variant(byte value);
public static implicit operator Variant(char value);
public static implicit operator Variant(DateTime value);
public static implicit operator Variant(DateTimeOffset value);
public static implicit operator Variant(decimal value);
public static implicit operator Variant(double value);
public static implicit operator Variant(Guid value);
public static implicit operator Variant(short value);
public static implicit operator Variant(int value);
public static implicit operator Variant(long value);
public static implicit operator Variant(sbyte value);
public static implicit operator Variant(float value);
public static implicit operator Variant(TimeSpan value);
public static implicit operator Variant(ushort value);
public static implicit operator Variant(uint value);
public static implicit operator Variant(ulong value);
// Common object types
public static implicit operator Variant(string value);
public static Variant Create(in Variant variant) => variant;
public static Variant2 Create(in Variant first, in Variant second) => new Variant2(in first, in second);
public static Variant3 Create(in Variant first, in Variant second, in Variant third) => new Variant3(in first, in second, in third);
}
// Here we could use values where we leverage bit flags to categorize quickly (such as integer values, floating point, etc.)
public enum VariantType
{
Object,
Byte,
SByte,
Char,
Boolean,
Int16,
UInt16,
Int32,
UInt32,
Int64,
UInt64,
DateTime,
DateTimeOffset,
TimeSpan,
Single,
Double,
Decimal,
Guid
}
// This is an "advanced" pattern we can use to create stack based spans of Variant. Would also create at least a Variant3.
public readonly struct Variant2
{
public readonly Variant First;
public readonly Variant Second;
public Variant2(in Variant first, in Variant second);
// This is for keeping objects rooted on .NET Framework once turned into a Span (similar to GC.KeepAlive(), but avoiding boxing).
[MethodImpl(MethodImplOptions.NoInlining)]
public void KeepAlive();
public ReadOnlySpan<Variant> ToSpan();
}
}
```
FAQ
Why "Variant"?
Why isn't Variant a ref struct?
Span of ref structs.What about variadic argument support (__arglist, ArgIterator, etc.)?
What about TypedReference and __makeref, etc.?
TypedReference is a ref struct (see above). Variant gives us more implementation flexibility, doesn't rely on undocumented keywords, and is actually faster. (Simple test of wrapping/unwrapping an int it is roughly 10-12% faster depending on inlining.)Why not support anything that fits?
How about enums?
cc: @jaredpar, @vancem, @danmosemsft, @jkotas, @davidwrighton, @stephentoub
Would this be a 16 byte (Guid/Decimal) + enum sized struct? (24 bytes with padding on x64)
TypedReference is a ref struct (see above). Variant gives us more implementation flexibility, doesn't rely on undocumented keywords, and is actually faster.
These all can be fixed, without too much work. TypedReference has been neglected, but that does not mean it is a useless type. (Some of this is described in https://github.com/dotnet/corefx/issues/29736.)
I think fixing TypedReference would be a better choice than introducing a new Variant type, if everything else is equal.
Allow all types by falling back to boxing
I think the design should allow all types without falling back to boxing.
Work on .NET Framework
This should be a non-goal. It is fine if the winning design that we pick happens to work on .NET Framework, but trying to make it work on .NET Framework should be an explicit non-goal. We have made a contious design to not restrict our design choices to what works on .NET Framework.
Would this be a 16 byte (Guid/Decimal) + enum sized struct? (24 bytes with padding on x64)
Goal is 24 bytes. We've looked at a lot of different ways of packing that in. A pointer and 16 bytes of data. It might involve some contortions or dropping down to 12 bytes of data.
but that does not mean it is a useless type.
Not trying to infer it is useless, just not appropriate in this case. I'm not sure how you'd make it a non-ref struct or make as fast as something targeted at key types.
This should be a non-goal.
Fair enough, I've changed it to nice-to-have. There are, however, real business needs for mitigating formatting inefficiencies on .NET Framework.
I think the design should allow all types without falling back to boxing.
I think we should have some design that does this but I don't think we can provide a solution that solves everything for all scenarios well. Having multiple approaches doesn't seem like a terrible thing to me, particularly given that we could make this sort of solution available much much sooner than full varargs support.
I think we should have some design that does this but I don't think we can provide a solution that solves everything for all scenarios well. Having multiple approaches doesn't seem like a terrible thing to me, particularly given that we could make this sort of solution available much much sooner than full varargs support.
FWIW, this approach feels very limited to me, in that I see supporting every value type as a key scenario. I would rather see, for example, a simple unsafe annotation/attribute that would let the API tell the JIT that it promises wholeheartedly an argument won't escape, and then add an overload that takes a [UnsafeWontEscape] params ReadOnlySpan<object> args, where the JIT would stack-allocate the boxes for any value types provided. Just an example.
FWIW, this approach feels very limited to me, in that I see supporting every value type as a key scenario. I would rather see, for example, a simple unsafe annotation/attribute that would let the API tell the JIT that it promises wholeheartedly an argument won't escape, and then add an overload that takes a
[UnsafeWontEscape] params ReadOnlySpan<object>args, where the JIT would stack-allocate the boxes for any value types provided. Just an example.
To be super clear, I don't see this as a solves-all-boxing solution. I absolutely think we can benefit from broader approaches, but I have a concern about being efficient with core types. Being able to quickly tell that you have an int and extract it is super valuable I think. Certainly for the String.Format case, for example. :)
Being able to quickly tell that you have an int and extract it is super valuable I think.
Depends on how the actual formatting is implemented. If you can dispatch a virtual formatting method, ability to switch over a primitive type does not seem super valuable.
[UnsafeWontEscape] params ReadOnlySpan<object> args
Something like this would work too. It is pretty similar to ReadOnlySpan<TypedReference> on the surface, with different tradeoffs and low-level building blocks.
It is pretty similar to ReadOnlySpan
on the surface, with different tradeoffs and low-level building blocks.
I'd be fine with that as well if it was similarly seamless to a caller.
Rather than an attribute and a promise I'd like to leverage the type system if possible here. 😄
What if instead we added a JIT intrinsic that "boxes" value types into a ref struct named Boxed. This type would have just enough information to allow manipulation of the boxed value:
ref struct Boxed {
Type GetBoxedType();
T GetBoxedValue<T>();
}
The JIT could choose to make this a heap or stack allocation depending on the scenario. The important part is that it would move the boxing operation into a type whose lifetime we need to carefully monitor. The compiler will do it for us.
That doesn't completely solve the problem because you can't have ReadOnlySpan<Boxed> as a ref struct can't be a generic argument. That's not because of a fundamental limitation of the type system but more because we didn't have a motivating scenario. Suppose this scenario was enough and we went through the work in C# to allow it. Then we could have the signature of the method be params ReadOnlySpan<Boxed>. No promises needed here, the compiler will be happy to make developers do the right thing 😉
That also sounds reasonable.
(Though the [UnsafeWontEscape] approach could also work on the existing APIs: we just attribute the existing object arguments in the existing methods, and apps just get better.)
How would struct Boxed differ from existing TypedReference (with extra methods added to make it useful)?
Either way, it sounds reasonable too.
I do think that if our goal is just to solve the parameter passing problem, something based on references (which can work uniformly on all types) is worth thinking about (this is Jan's TypedReference approach).
However that does leave out the ability to have something that can represent anything (but all primitives efficiently (without extra allocation)) that you can put into objects (which is what Variant is).
I think the fact that we don't have a standard 'Variant' type in the framework is rather unfortunate. Ultimately it is an 'obvious' type to have in the system (even if ultimately you solve the parameter passing issue with some magic stack allocated array of types references).
I also am concernd that we are solving a 'simple' problem (passing prameters) with a more complex one (tricky refernece based classes whose safety is at best subtle).
I think we should have a Variant class, it is straightforward, and does solve some immediate problems without having to design a rather advanced feature (that probably would not make V3.0.
For what it is worth...
we don't have a standard 'Variant' type in the framework is rather unfortunate.
I agree with that and the Variant proposal would look reasonable to me if the Variant was optimized for primitive types only. The proposal makes it optimized for primitive types and set of value types that we think are important for logging today. It does not feel like a design that will survive over time. I suspect that there will be need to optimize more types, but it won't be possible to extend the design to fit them.
Note that generally speaking, a Variant is a chunk of memory that holds things in-line and a pointer that allows you to hold 'anything'.
Semantically it is always the case that a variant can hold 'anything', so that is nice in that the there is not a 'sematic' cliff, only a performance cliff (thus as long as the new types that we might want to add in the future are not perf critical things are OK. I note that the list that really are perf-critical are pretty small and likely to not change over time (int, string, second tier are long, and maybe DateTime(Offset)). So I don't think we are taking a huge risk there.
And there are things you can do 'after the fact' Lets assume we only alotted 16 bytes for in-line data but we wanted something bigger. If there is any 'skew' to the values (this would for most types, but not for random number generated IDs), you could at least store the 'likely' values inline and box the rest. It would probably be OK, and frankly it really is probably the right tradeoff (it would be surprising to me that a new type in the future so dominated the perf landscape over existing types that it was the right call to make the struct bigger to allow it to be stored inline). That has NEVER happened so far.
Indeed from a cost-benefit point of view, we really should be skewing things to the int and string case becasue these are so much more likely to dominate hot paths. We certainly don't want this to be bigger than 3 pointers, and it would be nice to get it down to 2 (but that does require heroics for any 8 byte sized things (long, double, datetime ...), so I think we are probably doing 3.
But it does feel like a 'stable' design (5 years from now we would not feel like we made a mistake), sure bugger types will be slow, but I don't think would want to make the type bigger even if we could. It would be the wrong tradeoff.
So, I think Variant does have a reasonablys table design point, that can stand the test of time.
From my point of view, I would prefer that the implementation be tuned for overwhelmingly likely case of int an string). My ideal implementation would be a 8 bytes of inline-data / discriminator, and 1 object pointer. This is a pro
One of the main use cases this is being proposed for is around string interpolation and string formatting.
I realize there are other uses cases, so not necessarily instead of a something Variant-like, but specifically to address the case of string interpolation, I had another thought on an approach….
Today, you can define a method like:
```C#
AppendFormat(FormattableString s);
and use that as the target of string interpolation, e.g.
```C#
AppendFormat($”My type is {GetType()}. My value is {_value:x}.”);
Imagine we had a pattern (or an interface, though that adds challenge for ref structs) the compiler could recognize where a type could expose a method of the form:
```C#
AppendFormat(object value, ReadOnlySpan
The type could expose additional overloads as well, and the compiler would use normal overload resolution when determining which method to call, but the above would be sufficient to allow string interpolation to be used with the type in the new way. We could add this method to StringBuilder, for example, along with additional overloads for efficiency, e.g.
```C#
public class StringBuilder
{
public void AppendFormat(object value, ReadOnlySpan<char> format);
public void AppendFormat(int value, ReadOnlySpan<char> format);
public void AppendFormat(long value, ReadOnlySpan<char> format);
public void AppendFormat(ReadOnlySpan<char> value, ReadOnlySpan<char> format);
… // etc.
}
We could also define new types (as could anyone), as long as they implemented this pattern, e.g.
```C#
public ref struct ValueStringBuilder
{
public ValueStringBuilder(Span
public void AppendFormat(FormattableString s);
public void AppendFormat(object value, ReadOnlySpan<char> format);
public void AppendFormat(int value, ReadOnlySpan<char> format);
public void Appendformat(long value, ReadOnlySpan<char> format);
public void AppendFormat(ReadOnlySpan<char> value, ReadOnlySpan<char> format);
… // etc.
public Span<char> Value { get; }
}
Now, when you call:
```C#
ValueStringBuilder vsb = …;
vsb.AppendFormat($”My type is {GetType()}. My value is {_value:x}.”);
rather than generating what it would generate today if this took a FormattableString:
```C#
vsb.AppendFormat(FormattableStringFactory.Create("My type is {0}. My value is {1:x}.”, new object[] { GetType(), (object)_value }));
or if it took a string:
```C#
vsb.AppendFormat(string.Format("My type is {0}. My value is {1:x}.”, GetType(), (object)_value));
it would instead generate:
C#
vsb.AppendFormat(“My type is “, default);
vsb.AppendFormat(GetType(), default);
vsb.AppendFormat(“. My value is “, default);
vsb.AppendFormat(_value, “x”);
vsb.AppendFormat(".", default);
There are more calls here, but most of the parsing is done at compile time rather than at run time, and a type can expose overloads to allow any type T to avoid boxing, including one that takes a generic T if so desired.
If you throw out Guid and Decimal (as they are 16 bytes); then you could use the object pointer as the discriminator; rather than enum.
e.g.
public readonly struct Variant : IFormattable
{
private readonly IntPtr _data;
private readonly object _typeOrData;
public unsafe bool TryGetValue<T>(out T value) where T : IFormattable
{
if (typeof(T) == typeof(int))
{
if ((object)typeof(T) == _typeOrData)
{
value = Unsafe.As<IntPtr, int>(in _data);
}
value = default;
return false;
}
// etc.
}
public override string ToString()
{
return ToString(null, null);
}
public string ToString(string format, IFormatProvider formatProvider)
{
if ((object)typeof(int) == _typeOrData)
{
return Unsafe.As<IntPtr, int>(in _data).ToString(format, formatProvider);
}
// etc.
}
}
And box others to _typeOrData, not ideal though
@benaadams - Generally I like the kind of approach you are suggesting.
In my ideal world, Variant would be a object reference and an 8 bytes for buffer. It should be super-fast on int and string, and non-allocating on data types 8 bytes or smaller (by using the object as a discriminator for 8 byte types). For Datatypes larger than 8 bytes, either box, or you encode the common values into 8 bytes or less, and box the uncommon values.
This has the effect of skewing the perf toward the overwhelmingly common cases of int and string (and they don't pay too much extra bloat for the rarer cases).
@stephentoub Generally speaking I like the idea of moving parsing to compile time. I'll play around to see what sort of perf implications it has.
One thing I'd want to make sure we have an answer for is how do we fit ValueFormatableString (or something similar) into this picture? Ideally we can add just one overload to Console.WriteLine() that will magically suck $"" away from Console.WriteLine(string). Could we leverage ValueStringBuilder for this?
``` C#
int count = 42;
Console.WriteLine($"The count is {count}.");
// And we have the following overload
void WriteLine(in ValueStringBuilder builder);
// Then C# generates:
ValueStringBuilder vsb = new ValueStringBuilder();
// ... the series of Appends() ...
WriteLine(vsb);
vsb.Dispose(); // Note that this isn't critical, it just returns any rented space to the ArrayPool
We could also add overloads that take `IFormatProvider, ValueStringBuilder`? Or possibly just add an optional `IFormatProvider` on `ValueStringBuilder`? Then something like this could happen:
``` C#
Console.WriteLine(myFormatProvider, $"The count is {count}.");
// Creates the following
ValueStringBuilder vsb = new ValueStringBuilder(myFormatProvider);
// ... the series of Appends() ...
WriteLine(vsb);
vsb.Dispose();
@benaadams, @vancem
If you throw out Guid and Decimal (as they are 16 bytes); then you could use the object pointer as the discriminator; rather than enum.
Pulling DateTimeOffset along for the ride is kind of important if we support DateTime as it is now the preferred replacement. That pushes over 8. The way I would squish that and Guid/Decimal in 24 bytes is to use sentinel objects for Guid/Decimal and squeeze a 4 byte enum in the "union". (Which is the same sort of thing @Vance is talking about, but with a bigger bit bucket.) Ultimately we're stuck with some factor of 8 due to the struct packing, if we dial to 16 (the absolute smallest), it would require making 8 byte items slow and putting anything larger into a box.
It would be cool if we could borrow bits from the object pointer (much like an ATOM is used in Win32 APIs), but that obviously would require runtime support.
It is pretty common to pass around strings as Span<char> in modern high performance C#. It would be really nice if the high-performance formatting supported consuming Span<char> items.
It would be really nice if the high-performance formatting supported consuming Span
items.
This is one of the advantages I see to the aforementioned AppendFormat approach. In theory you just have another AppendFormat(ReadOnlySpan<char> value, ReadOnlySpan<char> format) overload, and then you could do $"This contains a {string.AsSpan(3, 7)}" and have that "just work".
@stephentoub
This is one of the advantages I see to the aforementioned AppendFormat approach.
Indeed. In the AppendFormat approach the compiler would simply translate every value in the interpolated string to valueFormattableStringBuilder.AppendFormat(theValue) and then bind the expression exactly as it would be bound if typed out. That means you can add specialized overloads like AppendFormat(ReadOnlySpan<char>) now or years down the road and the compiler would just pick them up.
I'm going to break out a separate proposal for "interpolated string -> Append sequence" and do a bit of prototyping to examine the performance.
@JeremyKuhne, I opened https://github.com/dotnet/corefx/issues/35986.
Just to add my 2 cents here- storing heterogeneous data whose types are not known are compile time has a lot more uses than just string interpolation. Take our old friend DataTable for example, to this day it remains the only way in the BCL to hold dynamic tabular data (until and unless a modern DataFrame type is ever added). And 100% of everything that you put in a DataTable is boxed.
Having a true Variant type could bring great performance benefits in such a scenario.
I'd even say its a far more important scenario than string interpolation. Most metrics have shown the popularity of Python exploding to one of the most-used languages in the last few years. And the reason is because of the great libraries it has for working with data. The market is clearly saying it wants better and more efficient ways of working with data and .NET should oblige.
@MgSam do you think avoiding boxing on common types is good enough? The initial proposal doesn't handle everything, but allows putting data on the heap (e.g. creating Variant[]). There are ways to create references to anything already (__makeref() and TypedReference), but:
Stashing arbitrary struct data in Variant isn't safe, so we have to restrict it to types that are known to have no ill side effects if their backing fields have random data. We're also constrained by the size of what we can stash.
Yes, I think common types likely cover 95% of the use cases. You don't often have nested objects when working with large tables of data.
I would like to propose different approach without introducing a new type in BCL. Somewhere in this repo I saw the proposal introducing ValueFormattableString type which is value type equivalent of FormattableString class. Let's start from it. In .NET we actually already have stack-based representation of values of different types. This is a family of value tuple types. So, ValueFormattableString can be created as generic value type:
public readonly struct ValueFormattableString<TArgs>
where TArgs : struct, ITuple
{
public ValueFormattableString(string format, TArgs args);
}
With tuples, we can avoid boxing of arbitrary value type passed for formatting. Generally, we have two situations here:
TArgs is a tuple value typeTArgs is a custom value type implementing S.R.CS.ITuple interfaceThe second case is rare and can be handled easily but without optimizations:
string[] formattingArgs = new string[args.Length];
for (int i = 0; i < args.Length; i++)
{
object item = args[i];
formattingArgs[i] = item is IFormattable ? item.ToString(null); item.ToString();
}
Assume that the using tuple types (first case) is more common way to represent the arguments for the formattable string (moreover, this way can be natively supported by C# compiler). Now we need to solve the problem with converting individual tuple element to the string without unnecessary allocations.
Proposal # 1:
Introduce JIT intrinsic method like this:
internal static string TupleItemToString<T>(in T tuple, int index, IFormatProvider? provider) where T : struct, ITuple;
JIT can easily replace this method with pure IL implementation for each generic argument T represented by the tuple type. The following example demonstrates transformation of this method for ValueTuple<int, object> tuple type:
internal static string TupleItemToString(in ValueTuple<int, object> tuple, int index, IFormatProvider? provider) => index switch
{
0 => tuple.Item1.ToString(provider), // because type Int32 implements IFormattable interface
1 => tuple.Item2.ToString(), // type Object doesn't implement IFormattable interface
_ => throw new ArgumentOutOfRangeException(nameof(index))
};
Proposal # 2:
It requires #26186. Each field can be extracted using TypedReference without boxing and converted to string.
Proposal # 3:
Introduce IFormattingArgumentsSupplier public interface in BCL:
public interface IFormattingArgumentsSupplier
{
string ToString(int index, IFormatProvider? provider = null);
int Length { get; }
}
Now this interface can be implemented explicitly by each value tuple type. Also, we need to replace ITuple with this interface in constraints:
public readonly struct ValueFormattableString<TArgs>
where TArgs : struct, IFormattingArgumentsSupplier
{
public ValueFormattableString(string format, TArgs args);
}
The implementation of such formattable string is trivial because it's possible to obtain string representation of individual tuple item without boxing.
Additionally, with such interface string.Format method can overloaded to avoid heap allocations:
public sealed class String
{
public static string Format(IFormatProvider provider, string format, TArgs args)
where TArgs : struct, IFormattingArgumentsSupplier;
public static string Format(string format, TArgs args)
where TArgs : struct, IFormattingArgumentsSupplier;
}
Usage of this method is very from C# because of native support of tuple types:
string result = string.Format("{0} + {1} = {2}", (40, 2, 42));
Most helpful comment
These all can be fixed, without too much work. TypedReference has been neglected, but that does not mean it is a useless type. (Some of this is described in https://github.com/dotnet/corefx/issues/29736.)
I think fixing TypedReference would be a better choice than introducing a new Variant type, if everything else is equal.
I think the design should allow all types without falling back to boxing.
This should be a non-goal. It is fine if the winning design that we pick happens to work on .NET Framework, but trying to make it work on .NET Framework should be an explicit non-goal. We have made a contious design to not restrict our design choices to what works on .NET Framework.