Runtime: Migrate new convenience APIs into extension methods

Created on 18 Mar 2017  路  39Comments  路  Source: dotnet/runtime

Highly used generic types (mostly Collections) pay high cost for every API (method) we add to it.

We should revisit the decisions of adding new APIs (esp. convenience APIs) directly on the type and consider moving them into extension methods instead (in future and potentially revisit those we added during 2.0 and did not publish yet).

Context: see @jkotas's comment and guidance here: https://github.com/dotnet/coreclr/pull/10203#issuecomment-287526223

area-System.Collections

Most helpful comment

We've finalized our set of principles to this list:

  1. CONSIDER making rare methods on generic types extension methods to avoid forcing the code gen to instantiate all these methods
  2. CONSIDER making methods on generic types extension methods when they should only apply to a specific instances, e.g. Span<byte> vs. Span<int>
  3. DO prefer extension methods on interfaces over extension methods on concrete types, e.g. prefer IDictionary<K,V>.GetFoo() over Dictionary<K, V>.GetFoo()
  4. AVOID exposing implementation details of concrete types to extension methods. However, if absolutely necessary it is a viable option.
  5. AVOID defining extension methods just because it's possible to do so. They create complexity, noise, and are often not as performant.

For the APIs @safern posted earlier this means:

  • BitArray looks good as proposed
  • CollectionExtensions looks good as proposed
  • Constructors on Dictionary<K,V> look good as proposed
  • Remove Dictionary<K,V>.GetValueOrDefault - dotnet/corefx#17917
  • TryAdd should remain
  • TryRemove (i.e. Remove) should follow TryAdd - https://github.com/dotnet/corefx/issues/15622 (PR in progress)
  • TryAdd/TryRemove (i.e. Remove): we also add extension methods for the interfaces on CollectionExtensions - dotnet/corefx#17918
  • TryGetValue should remain, impossible to implement reasonably
  • Queue/Stack look good as proposed as we don't have interfaces

All 39 comments

@terrajobst @stephentoub @weshaggard @danmosemsft @KrzysztofCwalina we should discuss it on Tuesday the latest ...

@jkotas might want to join us if we expect difference in opinions during the discussion ;-)

cc @ianhays @safern @AlexGhiondea @joperezr for Collections and Runtime areas which are potentially impacted the most in CoreFX.

Remove -- https://github.com/dotnet/corefx/issues/15622
TryAdd https://github.com/dotnet/corefx/issues/1942 -- this one I think we were OK with the perf and code size

It is not a high cost for every API. It becomes high cost once the per-API cost gets multiplied by non-trivial number of APIs. Death by thousand paper cuts sort of problem.

Also, this is not a black and white. Some new additions are fine - in particular the ones that make a significant dent in the usability of the type (e.g. new APIS on ArraySegment), but there is balance to strike.

Here's a proposal (for common generic classes):

  1. If it is primarily convenience API (usability), make it extension method, unless it would have really bad perf which we believe matters.

    • Examples: Dictionaty<,>.GetValueOrDefault(), Remove, etc.

  2. If it is core API for the type, add it into the type - any recent examples?
  3. If it is perf-sensitive API which cannot be implemented in extension method efficiently, add it on the type

    • Note: When unsure about perf-sensitivity, make it extension method - we can always move it into the type later when the value is clear.

    • Examples: TryAdd (see https://github.com/dotnet/corefx/issues/17275#issuecomment-287564658)

Examples: Dictionaty<,>.GetValueOrDefault(), Remove, TryAdd, etc.

TryAdd was convenience but also to avoid two lookups when one would suffice, in a commonly-used pattern. The latter part can't be done externally to the type.

Some apis get a benefit from being a member e.g. Dictionary<TKey,TValue>.TryAdd not doing a double check

Others just use the public interface so shouldn't loose much from being an extension
e.g.

  • Span<T>.Clear() -> .Clear(this Span<T>) (though struct copy overhead?)
  • Span<T>.Fill(T value) -> .Fill(this Span<T>, T value)
  • Array.Fill<T>(T[] array, T value) -> .Fill<T>(this T[] array, T value)
  • Array.Fill<T>(T[] array, T value, int startIndex, int count) -> .Fill<T>(this T[] array, T value, int startIndex, int count)

Slightly tangential, but related does ValueTuple need to implement 7 interfaces? (especially the ones that end up boxing)

Thanks @stephentoub @benaadams for pointing out that TryAdd belongs to category [3] and not [1]. I didn't go deeper on the examples - I probably should have. My proposal is now updated to avoid further confusion.

Another reason for using extensions on generic types is it allows for specialisation of the methods based on type (e.g. Span.IndexOf(this Span<T>, ...) that also has Span.IndexOf(this Span<byte>, ...))

We discussed this. Conclusion:

  • Let's create a list of all the convenience APIs like the Remove with an out, GetValueOrDefault and decided what to do with it.
  • We don't want to "hack" the APIs to optimize the code gen, but practically speaking we need to keep those constraints in mind.
  • In this instance, we already have an extension home (CollectionExtensions) and we can design most of the convenience target the interfaces in which case extension methods are the most logical design point.

@ianhays can you please list the APIs we added in 2.0 on any collection? @weshaggard can help if needed.
Thanks!

Is a principle established for API's that provide perf benefits, like TryAdd? I assume those can continue to go on the real types so long as they don't excessively bloat the code nor have a perf impact on existing API.

It will be case by case decisions, but high-level principles are:

  1. If it is overload of existing instance method, it should be instance method
  2. If the primary purpose of the API is convenience, it should be extension method
  3. If the primary purpose of the API is perf and it has/will have non-trivial usage, it should be instance method

    • Note: The fact that something can be implemented faster (e.g. one dictionary lookup vs. two) as instance methods than extension method is NOT justification enough to claim it's primary purpose is perf. Perf reasons should be backed by real usage data/scenarios. They will get scrutiny.

    • Overall we do not want to penalize 90% scenarios for "small" improvement in 1% scenarios.

Huh, I just noticed that TryAdd is used as perf method: https://github.com/dotnet/corefx/pull/17201 ... I think we will be easier on that one :(

Sure thing @karelz. Here are all the APIs that are exclusive to netcoreapp2.0 added in System.Collections:

```c#
namespace System.Collections
{
public sealed partial class BitArray : System.Collections.ICollection, System.Collections.IEnumerable, System.ICloneable
{
public System.Collections.BitArray RightShift(int count) { throw null; }
public System.Collections.BitArray LeftShift(int count) { throw null; }
}
}
namespace System.Collections.Generic
{
public static class CollectionExtensions
{
public static TValue GetValueOrDefault(this IDictionary dictionary, TKey key) { throw null; }
public static TValue GetValueOrDefault(this IDictionary dictionary, TKey key, TValue defaultValue) { throw null; }
public static TValue GetValueOrDefault(this IReadOnlyDictionary dictionary, TKey key) { throw null; }
public static TValue GetValueOrDefault(this IReadOnlyDictionary dictionary, TKey key, TValue defaultValue) { throw null; }
}

public partial class Dictionary<TKey, TValue> : System.Collections.Generic.ICollection<System.Collections.Generic.KeyValuePair<TKey, TValue>>, System.Collections.Generic.IDictionary<TKey, TValue>, System.Collections.Generic.IEnumerable<System.Collections.Generic.KeyValuePair<TKey, TValue>>, System.Collections.Generic.IReadOnlyCollection<System.Collections.Generic.KeyValuePair<TKey, TValue>>, System.Collections.Generic.IReadOnlyDictionary<TKey, TValue>, System.Collections.ICollection, System.Collections.IDictionary, System.Collections.IEnumerable, System.Runtime.Serialization.IDeserializationCallback, System.Runtime.Serialization.ISerializable
{
    public Dictionary(System.Collections.Generic.IEnumerable<System.Collections.Generic.KeyValuePair<TKey, TValue>> collection) { }
    public Dictionary(System.Collections.Generic.IEnumerable<System.Collections.Generic.KeyValuePair<TKey, TValue>> collection, System.Collections.Generic.IEqualityComparer<TKey> comparer) { }
    public bool TryAdd(TKey key, TValue value) { throw null; }
    public TValue GetValueOrDefault(TKey key) { throw null; }
    public TValue GetValueOrDefault(TKey key, TValue defaultValue) { throw null; }
}
public partial class HashSet<T> : System.Collections.Generic.ICollection<T>, System.Collections.Generic.IEnumerable<T>, System.Collections.Generic.IReadOnlyCollection<T>, System.Collections.Generic.ISet<T>, System.Collections.IEnumerable, System.Runtime.Serialization.IDeserializationCallback, System.Runtime.Serialization.ISerializable
{
    public HashSet(int capacity) { }
    public HashSet(int capacity, System.Collections.Generic.IEqualityComparer<T> comparer) { }
    public bool TryGetValue(T equalValue, out T actualValue) { throw null; }
}
public partial class Queue<T> : System.Collections.Generic.IEnumerable<T>, System.Collections.Generic.IReadOnlyCollection<T>, System.Collections.ICollection, System.Collections.IEnumerable
{
    public bool TryDequeue(out T result) { throw null; }
    public bool TryPeek(out T result) { throw null; }
}
public partial class SortedSet<T> : System.Collections.Generic.ICollection<T>, System.Collections.Generic.IEnumerable<T>, System.Collections.Generic.IReadOnlyCollection<T>, System.Collections.Generic.ISet<T>, System.Collections.ICollection, System.Collections.IEnumerable, System.Runtime.Serialization.IDeserializationCallback, System.Runtime.Serialization.ISerializable
{
    public bool TryGetValue(T equalValue, out T actualValue) { throw null; }
}
public partial class Stack<T> : System.Collections.Generic.IEnumerable<T>, System.Collections.Generic.IReadOnlyCollection<T>, System.Collections.ICollection, System.Collections.IEnumerable
{
    public bool TryPeek(out T result) { throw null; }
    public bool TryPop(out T result) { throw null; }
}

}
```

To get this list i diffed our current System.Collections ref file with the oldest one on Github, then removed API's that were present in both or that were present in the recent ref file but a part of the last netfx release (i.e. netstandard2.0 APIs).

@karelz do you think this is going to make it into 2.0.0 or should we probably move it to 2.1.0/future?

@safern moving it doesn't make sense. Either we do it, or we close it.
I was not marked as api-ready-for-review (and we didn't have time anyway yesterday). Pushing to next week.

@safern moving it doesn't make sense. Either we do it, or we close it.
I was not marked as api-ready-for-review (and we didn't have time anyway yesterday). Pushing to next week.

So with the list that @ianhays provided is it ready to be marked as api-ready-for-review so that it can be reviewed next week?

Yes it is. I just did it (got distracted from doing that after my previous reply, sorry)

No problem. Just wanted to make sure that it was good to go so that it is reviewed next Tuesday. Thanks!

Yep, thanks for chasing it down, appreciate it!

We've finalized our set of principles to this list:

  1. CONSIDER making rare methods on generic types extension methods to avoid forcing the code gen to instantiate all these methods
  2. CONSIDER making methods on generic types extension methods when they should only apply to a specific instances, e.g. Span<byte> vs. Span<int>
  3. DO prefer extension methods on interfaces over extension methods on concrete types, e.g. prefer IDictionary<K,V>.GetFoo() over Dictionary<K, V>.GetFoo()
  4. AVOID exposing implementation details of concrete types to extension methods. However, if absolutely necessary it is a viable option.
  5. AVOID defining extension methods just because it's possible to do so. They create complexity, noise, and are often not as performant.

For the APIs @safern posted earlier this means:

  • BitArray looks good as proposed
  • CollectionExtensions looks good as proposed
  • Constructors on Dictionary<K,V> look good as proposed
  • Remove Dictionary<K,V>.GetValueOrDefault - dotnet/corefx#17917
  • TryAdd should remain
  • TryRemove (i.e. Remove) should follow TryAdd - https://github.com/dotnet/corefx/issues/15622 (PR in progress)
  • TryAdd/TryRemove (i.e. Remove): we also add extension methods for the interfaces on CollectionExtensions - dotnet/corefx#17918
  • TryGetValue should remain, impossible to implement reasonably
  • Queue/Stack look good as proposed as we don't have interfaces

@safern please make sure we have appropriate issues for any work

@terrajobst could you consider an addition to the list?

DO prefer extension methods to instance methods on generic types if they gain no extra benefit from having access to non-public members.

This is because the extension methods allow additional specialisation by type whereas generic instance methods allow no further specialization (from https://github.com/dotnet/corefxlab/issues/1314#issuecomment-291645150)

As shown by

class Program
{
    static void Main(string[] args)
    {
        var value = new MyType<byte>();

        value.Output();
    }
}

public class MyType<T>
{
    public void Output()
    {
        Console.WriteLine("Generic instance method called");
    }
}

public static class MyTypeExtensions
{
    public static void Output(this MyType<byte> myType)
    {
        Console.WriteLine("MyType<byte> Extension method called");
    }
}

Outputs

Generic instance method called

This is because the extension methods allow additional specialisation by type

For value types, instance methods can specialize implementation via guarded type checks that'll result in dead branches being trimmed by the JIT. But for reference types it's true. And extensions let you add methods that only apply to certain instantiations, e.g. TaskExtensions.Unwrap, which can't be done with instance methods... and that may have been your main point (though at that point, it's not really a choice, since there's no way to do it with instance methods, unless you want to throw from every other usage).

For value types, instance methods can specialize implementation via guarded type checks that'll result in dead branches being trimmed by the JIT

Only for ahead of time decided types. If a library author creates their own struct type that can't be added to the type check chain in the instance method and will always take the general object path.

Obviously if there is a gain from access to internals (like the dictionary TryAdd method) that's one thing. Or their is a specific reason (like Vector which are JIT/CPU recognised types)

However, if its a method that is just using the public api and being an instance method gains no measurable benefit over an extension method; then it should be implemented as an extension method over the generic type; which then allows future specialisations for other types and future types to be written.

Sure, in general, but this thread is about types in the BCL... types like Dictionary aren't going to provide methods that operate on BensCoolType, even as an extension method. :smile:

You say that but... you could do

public static GetValueOrDefault(this Dictionary<K,V> dict, K key)
{
    if (dict.TryGet(out var value)
    {
        return value;
    }
    return default(V)
}

Then the CoolType library could add

public static GetValueOrDefault(this Dictionary<K,BensCoolType> dict, K key)
{
    if (dict.TryGet(out var value)
    {
        return value;
    }
    return BensCoolType.DefaultValue;
}

馃槈

Or maybe you hate string.IsNullOrEmpty and nulls in general for strings

public static GetValueOrDefault(this Dictionary<K,string> dict, K key)
{
    if (dict.TryGet(out var value)
    {
        return value;
    }
    return string.Empty;
}

Rather than writing var value = dict.GetValueOrDefault(key) ?? string.Empty;

Then the CoolType library could add

I think what you're really pointing out is that the C# compiler only considers extension methods if it can't find a matching instance method, and thus defining an instance method prevents others from providing their own replacement extensions with a more specialized signature.

Yes exactly

I see @benaadams's point, however, I am a little bit hesitant to optimize for method overloading in the compiler. IMO it creates quite a lot of complexity people need to be aware of ... (imagine you call GetValueOrDefault and the library you depend on overloaded it, but you don't know it and all you look at is public docs on MSDN and you are puzzled why it returns something else ...)

If a type wants specialized extension method overload, I would suggest to the author to create a brand new extension method ...

@terrajobst is it worth saving the rules in some API design doc in the repo?

@terrajobst is it worth saving the rules in some API design doc in the repo?

@karelz I think that is a good idea as for the future we can have that in a doc instead of looking for this specific issue/comment.

@safern please make sure we have appropriate issues for any work

Taking care of that.

imagine you call GetValueOrDefault and the library you depend on overloaded it...

My examples might have been poor; though the counter argument equally applies to virtual class methods; interfaces, operator overloading etc. The extension methods allow for a kind of inheritance of the generic type when it is an of specific type.

Perhaps a real example would have been better; Span<T> is a good example. So its shipping with specializations for byte and char maybe later it will have int; however they have not been written yet. So as a user I can add specilizations for int right now at a faster cadence than corefx will ship them; then contribute back to they will get rolled in at a later date.

If these were written as a single instance method with typeof(T) specialization; then as a user I'd have to add a different method to plug the gap; then search and replace when the corefx library provided them, rather than just deleting my extension methods.

Anything that's a "delete a single method" I see as a better option than search and replace over your code base.

Also extension method changes are easier to shim back onto desktop than instance method changes; as desktop moves at an even slower cadence than corefx (which is relatively quick)

@benaadams your new example falls under [2] in @terrajobst's list. I agree with that one.
I am just hesitant to go overboard with that guidance (see my previous concern). Do you have other examples which would not fall under [2]?

I've opened issues:

To track the work needed for this issue. So I will go ahead and close this issue.

Do you have other examples which would not fall under [2]?

No. 馃槃

@terrajobst I have updated your approval post https://github.com/dotnet/corefx/issues/17275#issuecomment-291636863 with links to tracking issues (easier to check all are covered) and I added explanation that by TryRemove, you actually meant Remove (it confused me for few minutes). Please let me know if that's incorrect.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

v0l picture v0l  路  3Comments

jamesqo picture jamesqo  路  3Comments

chunseoklee picture chunseoklee  路  3Comments

bencz picture bencz  路  3Comments

omariom picture omariom  路  3Comments