We have Enumerable.To* methods for many common collections. HashSet is sorely missing. This feature should be cheap to implement and not be problematic in any way that I can see.
Evidence that this feature is sorely missing: https://www.google.com/webhp?complete=1&hl=en&gws_rd=cr,ssl&ei=#complete=1&hl=en&q=.net+tohashset People are writing this method over and over again.
And I've done so myself often enough. It's pretty trivial:
public static HashSet<T> ToHashSet(this IEnumerable<T> source, IEqualityComparer<T> comparer)
{
if (source == null) throw new ArgumentNullException("source");
return new HashSet<T>(source, comparer)
}
public static HashSet<T> ToHashSet(this IEnumerable<T> source)
{
return source.ToHashSet(null);
}
:+1: This method is extremely useful and is always in my library of extensions as well.
Too early for a PR, but see the branch on my fork for an implementation.
Seems like a good idea, we should review the API, then we can mark it "up for grabs".
We just reviewed it and it looks good as proposed.
Our apologies that it took us a 1+ year to approve. We are trying to get better ;-)
Anyone wants to submit a PR? @GSPP?
Does the PR have to go against future or master?
master. We do not use future at this point. And hopefully never will have to again.
Do you want to grab it @JonHanna?
I've an approach on my machine somewhere, but @GSPP might want to be the one to implement their proposal, so I'll wait and see if they shout for it first. (Also, I need to learn how API versioning works, is there a doc on what to do about the ref files etc? I've never touched that).
@weshaggard can comment on API versioning.
Take a look at https://github.com/dotnet/corefx/blob/master/Documentation/coding-guidelines/adding-api-guidelines.md. You can also look at some example of recently changes to System.Runtime ref as well.
I think that was a respectable amount of time to give @GSPP first dibs.
Sure, just go ahead.
Gah. As a noob to versioning the API, I'm having noob problems versioning the API :(
@JonHanna yeah, sorry, adding APIs it is not pretty now :( ... I want to clean it up and make it simple for mainline cases (hopefully during December).
In the meantime: if you have questions, ask, if you know where we miss in docs, let me know or just submit PR updating them (if it is larger shake up of docs, we can create a docs branch right away and tune it there).
@karelz if you can see why dotnet/corefx#13726 is failing that'd be great. It's most likely the sort of silly thing one can get stuck on the first go on something that I'll kick myself about later, but I can't figure it right now.
@mellinoe @weshaggard @ericstj can you please help advise here?
(sadly I didn't do a change of this type myself, so I don't know yet - it's my homework for December)
Nice, didn't know about this proposal! :tada: Hopefully it'll get some people off of Distinct which is slower.
Hopefully it'll get some people off of
Distinctwhich is slower
I don't think it will get people off that -- it is non-trivial mental leap to think about using ToHashSet when you need unique/distinct members. Even if you have to just think about what happens with dupes in HashSet.
Also note that 99% of code does not care about subtle performance differences, because it is not on hot path. Only performance-sensitive code should care about things like these. And even there -- the best approach is to measure first, then find out what is slow rather than do premature optimizations.
Using Linq on performance-sensitive code path is IMO dangerous on its own, because you have to really understand what is going on underneath. In such cases it is just easier to not rely on such things ...
Distinct is certainly going to be faster to first element, and should compare well otherwise (the internal Set doesn't bother with some housekeeping that HashSet has to do, including cheating in its Remove putting it into a state that would break on another Add because it knows that other Add will never happen). That you're finding HashSet faster is a bit disconcerting.