Runtime: Migrate new convenience APIs into extension methods

Created on 18 Mar 2017 · 39Comments · Source: dotnet/runtime

Highly used generic types (mostly Collections) pay high cost for every API (method) we add to it.

We should revisit the decisions of adding new APIs (esp. convenience APIs) directly on the type and consider moving them into extension methods instead (in future and potentially revisit those we added during 2.0 and did not publish yet).

Context: see @jkotas's comment and guidance here: https://github.com/dotnet/coreclr/pull/10203#issuecomment-287526223

area-System.Collections

Source

karelz

Most helpful comment

We've finalized our set of principles to this list:

CONSIDER making rare methods on generic types extension methods to avoid forcing the code gen to instantiate all these methods
CONSIDER making methods on generic types extension methods when they should only apply to a specific instances, e.g. Span<byte> vs. Span<int>
DO prefer extension methods on interfaces over extension methods on concrete types, e.g. prefer IDictionary<K,V>.GetFoo() over Dictionary<K, V>.GetFoo()
AVOID exposing implementation details of concrete types to extension methods. However, if absolutely necessary it is a viable option.
AVOID defining extension methods just because it's possible to do so. They create complexity, noise, and are often not as performant.

For the APIs @safern posted earlier this means:

BitArray looks good as proposed
CollectionExtensions looks good as proposed
Constructors on Dictionary<K,V> look good as proposed
Remove Dictionary<K,V>.GetValueOrDefault - dotnet/corefx#17917
TryAdd should remain
TryRemove (i.e. Remove) should follow TryAdd - https://github.com/dotnet/corefx/issues/15622 (PR in progress)
TryAdd/TryRemove (i.e. Remove): we also add extension methods for the interfaces on CollectionExtensions - dotnet/corefx#17918
TryGetValue should remain, impossible to implement reasonably
Queue/Stack look good as proposed as we don't have interfaces

terrajobst on 4 Apr 2017

👍4

All 39 comments

@terrajobst @stephentoub @weshaggard @danmosemsft @KrzysztofCwalina we should discuss it on Tuesday the latest ...

@jkotas might want to join us if we expect difference in opinions during the discussion ;-)

karelz on 18 Mar 2017

cc @ianhays @safern @AlexGhiondea @joperezr for Collections and Runtime areas which are potentially impacted the most in CoreFX.

karelz on 18 Mar 2017

Remove -- https://github.com/dotnet/corefx/issues/15622
TryAdd https://github.com/dotnet/corefx/issues/1942 -- this one I think we were OK with the perf and code size

danmosemsft on 18 Mar 2017

It is not a high cost for every API. It becomes high cost once the per-API cost gets multiplied by non-trivial number of APIs. Death by thousand paper cuts sort of problem.

Also, this is not a black and white. Some new additions are fine - in particular the ones that make a significant dent in the usability of the type (e.g. new APIS on ArraySegment), but there is balance to strike.

jkotas on 18 Mar 2017

Here's a proposal (for common generic classes):

If it is primarily convenience API (usability), make it extension method, unless it would have really bad perf which we believe matters.
- Examples: Dictionaty<,>.GetValueOrDefault(), Remove, etc.
If it is core API for the type, add it into the type - any recent examples?
If it is perf-sensitive API which cannot be implemented in extension method efficiently, add it on the type
- Note: When unsure about perf-sensitivity, make it extension method - we can always move it into the type later when the value is clear.
- Examples: TryAdd (see https://github.com/dotnet/corefx/issues/17275#issuecomment-287564658)

karelz on 18 Mar 2017

Examples: Dictionaty<,>.GetValueOrDefault(), Remove, TryAdd, etc.

TryAdd was convenience but also to avoid two lookups when one would suffice, in a commonly-used pattern. The latter part can't be done externally to the type.

stephentoub on 18 Mar 2017

Some apis get a benefit from being a member e.g. Dictionary<TKey,TValue>.TryAdd not doing a double check

Others just use the public interface so shouldn't loose much from being an extension
e.g.

Span<T>.Clear() -> .Clear(this Span<T>) (though struct copy overhead?)
Span<T>.Fill(T value) -> .Fill(this Span<T>, T value)
Array.Fill<T>(T[] array, T value) -> .Fill<T>(this T[] array, T value)
Array.Fill<T>(T[] array, T value, int startIndex, int count) -> .Fill<T>(this T[] array, T value, int startIndex, int count)

Slightly tangential, but related does ValueTuple need to implement 7 interfaces? (especially the ones that end up boxing)

benaadams on 18 Mar 2017

Thanks @stephentoub @benaadams for pointing out that TryAdd belongs to category [3] and not [1]. I didn't go deeper on the examples - I probably should have. My proposal is now updated to avoid further confusion.

karelz on 19 Mar 2017

Another reason for using extensions on generic types is it allows for specialisation of the methods based on type (e.g. Span.IndexOf(this Span<T>, ...) that also has Span.IndexOf(this Span<byte>, ...))

benaadams on 20 Mar 2017

We discussed this. Conclusion:

Let's create a list of all the convenience APIs like the Remove with an out, GetValueOrDefault and decided what to do with it.
We don't want to "hack" the APIs to optimize the code gen, but practically speaking we need to keep those constraints in mind.
In this instance, we already have an extension home (CollectionExtensions) and we can design most of the convenience target the interfaces in which case extension methods are the most logical design point.

terrajobst on 21 Mar 2017

@ianhays can you please list the APIs we added in 2.0 on any collection? @weshaggard can help if needed.
Thanks!

karelz on 21 Mar 2017

Is a principle established for API's that provide perf benefits, like TryAdd? I assume those can continue to go on the real types so long as they don't excessively bloat the code nor have a perf impact on existing API.

danmosemsft on 21 Mar 2017

It will be case by case decisions, but high-level principles are:

If it is overload of existing instance method, it should be instance method
If the primary purpose of the API is convenience, it should be extension method
If the primary purpose of the API is perf and it has/will have non-trivial usage, it should be instance method
- Note: The fact that something can be implemented faster (e.g. one dictionary lookup vs. two) as instance methods than extension method is NOT justification enough to claim it's primary purpose is perf. Perf reasons should be backed by real usage data/scenarios. They will get scrutiny.
- Overall we do not want to penalize 90% scenarios for "small" improvement in 1% scenarios.

karelz on 21 Mar 2017

Huh, I just noticed that TryAdd is used as perf method: https://github.com/dotnet/corefx/pull/17201 ... I think we will be easier on that one :(

karelz on 21 Mar 2017

Sure thing @karelz. Here are all the APIs that are exclusive to netcoreapp2.0 added in System.Collections:

```c#
namespace System.Collections
{
public sealed partial class BitArray : System.Collections.ICollection, System.Collections.IEnumerable, System.ICloneable
{
public System.Collections.BitArray RightShift(int count) { throw null; }
public System.Collections.BitArray LeftShift(int count) { throw null; }
}
}
namespace System.Collections.Generic
{
public static class CollectionExtensions
{
public static TValue GetValueOrDefault(this IDictionary dictionary, TKey key) { throw null; }
public static TValue GetValueOrDefault(this IDictionary dictionary, TKey key, TValue defaultValue) { throw null; }
public static TValue GetValueOrDefault(this IReadOnlyDictionary dictionary, TKey key) { throw null; }
public static TValue GetValueOrDefault(this IReadOnlyDictionary dictionary, TKey key, TValue defaultValue) { throw null; }
}

public partial class Dictionary<TKey, TValue> : System.Collections.Generic.ICollection<System.Collections.Generic.KeyValuePair<TKey, TValue>>, System.Collections.Generic.IDictionary<TKey, TValue>, System.Collections.Generic.IEnumerable<System.Collections.Generic.KeyValuePair<TKey, TValue>>, System.Collections.Generic.IReadOnlyCollection<System.Collections.Generic.KeyValuePair<TKey, TValue>>, System.Collections.Generic.IReadOnlyDictionary<TKey, TValue>, System.Collections.ICollection, System.Collections.IDictionary, System.Collections.IEnumerable, System.Runtime.Serialization.IDeserializationCallback, System.Runtime.Serialization.ISerializable
{
    public Dictionary(System.Collections.Generic.IEnumerable<System.Collections.Generic.KeyValuePair<TKey, TValue>> collection) { }
    public Dictionary(System.Collections.Generic.IEnumerable<System.Collections.Generic.KeyValuePair<TKey, TValue>> collection, System.Collections.Generic.IEqualityComparer<TKey> comparer) { }
    public bool TryAdd(TKey key, TValue value) { throw null; }
    public TValue GetValueOrDefault(TKey key) { throw null; }
    public TValue GetValueOrDefault(TKey key, TValue defaultValue) { throw null; }
}
public partial class HashSet<T> : System.Collections.Generic.ICollection<T>, System.Collections.Generic.IEnumerable<T>, System.Collections.Generic.IReadOnlyCollection<T>, System.Collections.Generic.ISet<T>, System.Collections.IEnumerable, System.Runtime.Serialization.IDeserializationCallback, System.Runtime.Serialization.ISerializable
{
    public HashSet(int capacity) { }
    public HashSet(int capacity, System.Collections.Generic.IEqualityComparer<T> comparer) { }
    public bool TryGetValue(T equalValue, out T actualValue) { throw null; }
}
public partial class Queue<T> : System.Collections.Generic.IEnumerable<T>, System.Collections.Generic.IReadOnlyCollection<T>, System.Collections.ICollection, System.Collections.IEnumerable
{
    public bool TryDequeue(out T result) { throw null; }
    public bool TryPeek(out T result) { throw null; }
}
public partial class SortedSet<T> : System.Collections.Generic.ICollection<T>, System.Collections.Generic.IEnumerable<T>, System.Collections.Generic.IReadOnlyCollection<T>, System.Collections.Generic.ISet<T>, System.Collections.ICollection, System.Collections.IEnumerable, System.Runtime.Serialization.IDeserializationCallback, System.Runtime.Serialization.ISerializable
{
    public bool TryGetValue(T equalValue, out T actualValue) { throw null; }
}
public partial class Stack<T> : System.Collections.Generic.IEnumerable<T>, System.Collections.Generic.IReadOnlyCollection<T>, System.Collections.ICollection, System.Collections.IEnumerable
{
    public bool TryPeek(out T result) { throw null; }
    public bool TryPop(out T result) { throw null; }
}

}
```

To get this list i diffed our current System.Collections ref file with the oldest one on Github, then removed API's that were present in both or that were present in the recent ref file but a part of the last netfx release (i.e. netstandard2.0 APIs).

ianhays on 23 Mar 2017

👍1

@karelz do you think this is going to make it into 2.0.0 or should we probably move it to 2.1.0/future?

safern on 29 Mar 2017

@safern moving it doesn't make sense. Either we do it, or we close it.
I was not marked as api-ready-for-review (and we didn't have time anyway yesterday). Pushing to next week.

karelz on 29 Mar 2017

@safern moving it doesn't make sense. Either we do it, or we close it.
I was not marked as api-ready-for-review (and we didn't have time anyway yesterday). Pushing to next week.

So with the list that @ianhays provided is it ready to be marked as api-ready-for-review so that it can be reviewed next week?

safern on 29 Mar 2017

Yes it is. I just did it (got distracted from doing that after my previous reply, sorry)

karelz on 29 Mar 2017

🎉1

No problem. Just wanted to make sure that it was good to go so that it is reviewed next Tuesday. Thanks!

safern on 29 Mar 2017

Yep, thanks for chasing it down, appreciate it!

karelz on 29 Mar 2017

👍1

We've finalized our set of principles to this list:

CONSIDER making rare methods on generic types extension methods to avoid forcing the code gen to instantiate all these methods
CONSIDER making methods on generic types extension methods when they should only apply to a specific instances, e.g. Span<byte> vs. Span<int>
DO prefer extension methods on interfaces over extension methods on concrete types, e.g. prefer IDictionary<K,V>.GetFoo() over Dictionary<K, V>.GetFoo()
AVOID exposing implementation details of concrete types to extension methods. However, if absolutely necessary it is a viable option.
AVOID defining extension methods just because it's possible to do so. They create complexity, noise, and are often not as performant.

For the APIs @safern posted earlier this means:

BitArray looks good as proposed
CollectionExtensions looks good as proposed
Constructors on Dictionary<K,V> look good as proposed
Remove Dictionary<K,V>.GetValueOrDefault - dotnet/corefx#17917
TryAdd should remain
TryRemove (i.e. Remove) should follow TryAdd - https://github.com/dotnet/corefx/issues/15622 (PR in progress)
TryAdd/TryRemove (i.e. Remove): we also add extension methods for the interfaces on CollectionExtensions - dotnet/corefx#17918
TryGetValue should remain, impossible to implement reasonably
Queue/Stack look good as proposed as we don't have interfaces

terrajobst on 4 Apr 2017

👍4

@safern please make sure we have appropriate issues for any work

danmosemsft on 4 Apr 2017

@terrajobst could you consider an addition to the list?

DO prefer extension methods to instance methods on generic types if they gain no extra benefit from having access to non-public members.

This is because the extension methods allow additional specialisation by type whereas generic instance methods allow no further specialization (from https://github.com/dotnet/corefxlab/issues/1314#issuecomment-291645150)

As shown by

class Program
{
    static void Main(string[] args)
    {
        var value = new MyType<byte>();

        value.Output();
    }
}

public class MyType<T>
{
    public void Output()
    {
        Console.WriteLine("Generic instance method called");
    }
}

public static class MyTypeExtensions
{
    public static void Output(this MyType<byte> myType)
    {
        Console.WriteLine("MyType<byte> Extension method called");
    }
}

Outputs

Generic instance method called

benaadams on 5 Apr 2017

This is because the extension methods allow additional specialisation by type

For value types, instance methods can specialize implementation via guarded type checks that'll result in dead branches being trimmed by the JIT. But for reference types it's true. And extensions let you add methods that only apply to certain instantiations, e.g. TaskExtensions.Unwrap, which can't be done with instance methods... and that may have been your main point (though at that point, it's not really a choice, since there's no way to do it with instance methods, unless you want to throw from every other usage).

stephentoub on 5 Apr 2017

For value types, instance methods can specialize implementation via guarded type checks that'll result in dead branches being trimmed by the JIT

Only for ahead of time decided types. If a library author creates their own struct type that can't be added to the type check chain in the instance method and will always take the general object path.

Obviously if there is a gain from access to internals (like the dictionary TryAdd method) that's one thing. Or their is a specific reason (like Vector which are JIT/CPU recognised types)

However, if its a method that is just using the public api and being an instance method gains no measurable benefit over an extension method; then it should be implemented as an extension method over the generic type; which then allows future specialisations for other types and future types to be written.

benaadams on 5 Apr 2017

Sure, in general, but this thread is about types in the BCL... types like Dictionary aren't going to provide methods that operate on BensCoolType, even as an extension method. :smile:

stephentoub on 5 Apr 2017

You say that but... you could do

public static GetValueOrDefault(this Dictionary<K,V> dict, K key)
{
    if (dict.TryGet(out var value)
    {
        return value;
    }
    return default(V)
}

Then the CoolType library could add

public static GetValueOrDefault(this Dictionary<K,BensCoolType> dict, K key)
{
    if (dict.TryGet(out var value)
    {
        return value;
    }
    return BensCoolType.DefaultValue;
}

😉

benaadams on 5 Apr 2017

Or maybe you hate string.IsNullOrEmpty and nulls in general for strings

public static GetValueOrDefault(this Dictionary<K,string> dict, K key)
{
    if (dict.TryGet(out var value)
    {
        return value;
    }
    return string.Empty;
}

Rather than writing var value = dict.GetValueOrDefault(key) ?? string.Empty;

benaadams on 5 Apr 2017

Then the CoolType library could add

I think what you're really pointing out is that the C# compiler only considers extension methods if it can't find a matching instance method, and thus defining an instance method prevents others from providing their own replacement extensions with a more specialized signature.

stephentoub on 5 Apr 2017

Yes exactly

benaadams on 5 Apr 2017

I see @benaadams's point, however, I am a little bit hesitant to optimize for method overloading in the compiler. IMO it creates quite a lot of complexity people need to be aware of ... (imagine you call GetValueOrDefault and the library you depend on overloaded it, but you don't know it and all you look at is public docs on MSDN and you are puzzled why it returns something else ...)

If a type wants specialized extension method overload, I would suggest to the author to create a brand new extension method ...

karelz on 5 Apr 2017

@terrajobst is it worth saving the rules in some API design doc in the repo?

karelz on 5 Apr 2017

@terrajobst is it worth saving the rules in some API design doc in the repo?

@karelz I think that is a good idea as for the future we can have that in a doc instead of looking for this specific issue/comment.

@safern please make sure we have appropriate issues for any work

Taking care of that.

safern on 5 Apr 2017

imagine you call GetValueOrDefault and the library you depend on overloaded it...

My examples might have been poor; though the counter argument equally applies to virtual class methods; interfaces, operator overloading etc. The extension methods allow for a kind of inheritance of the generic type when it is an of specific type.

Perhaps a real example would have been better; Span<T> is a good example. So its shipping with specializations for byte and char maybe later it will have int; however they have not been written yet. So as a user I can add specilizations for int right now at a faster cadence than corefx will ship them; then contribute back to they will get rolled in at a later date.

If these were written as a single instance method with typeof(T) specialization; then as a user I'd have to add a different method to plug the gap; then search and replace when the corefx library provided them, rather than just deleting my extension methods.

Anything that's a "delete a single method" I see as a better option than search and replace over your code base.

Also extension method changes are easier to shim back onto desktop than instance method changes; as desktop moves at an even slower cadence than corefx (which is relatively quick)

benaadams on 5 Apr 2017

@benaadams your new example falls under [2] in @terrajobst's list. I agree with that one.
I am just hesitant to go overboard with that guidance (see my previous concern). Do you have other examples which would not fall under [2]?

karelz on 5 Apr 2017

I've opened issues:

To track the work needed for this issue. So I will go ahead and close this issue.

safern on 5 Apr 2017

Do you have other examples which would not fall under [2]?

No. 😄

benaadams on 5 Apr 2017

@terrajobst I have updated your approval post https://github.com/dotnet/corefx/issues/17275#issuecomment-291636863 with links to tracking issues (easier to check all are covered) and I added explanation that by TryRemove, you actually meant Remove (it confused me for few minutes). Please let me know if that's incorrect.

karelz on 5 Apr 2017

Was this page helpful?

0 / 5 - 0 ratings