Sometimes I find myself needing to retrieve a value from a HashSet. You may wonder why that's needed when you already have the value when you might want to retrieve it. Well, that's not always true depending on the Comparer you're using as a value may evaluate to being contained within the set but is not strictly equal to the value stored in the set.
namespace System.Collections.Generic {
public partial class HashSet<T> {
+ public bool TryGetValue(T equalValue, out T actualValue);
}
public partial class SortedSet<T> {
+ public bool TryGetValue(T equalValue, out T actualValue);
}
}
My use case is sometimes I want to create a case-insensitive string mapping to the actual string value. So for example instead of having to use a Dictionary<string, string> for the mapping as below
```c#
var dictionary = new Dictionary
dictionary["A"] = "A";
string value = "a";
string actualValue;
if (dictionary.TryGetValue(value, out actualValue))
{
Assert.AreEqual("A", actualValue);
}
I could use a `HashSet<string>` like this
```c#
var hashSet = new HashSet<string>(StringComparer.OrdinalIgnoreCase) { "A" };
string value = "a";
string actualValue;
if (hashSet.TryGetValue(value, out actualValue))
{
Assert.AreEqual("A", actualValue);
}
With this implemented users would save on memory usage, would be able to use set operations, and be able to populate the collection more easily.
Other possible use cases (from discussion below)
TryGetValue returns an indication if the HashSet contains the equalValue and the output parameter actualValue contains the HashSet's value that equals equalValue with the set Comparer if contained in the HashSet otherwise the default value of T.I've wanted this.
I have rolled my own hash sets solely for this use. (A while back I started on something that hit just this use and not other set purposes, with different versions for concurrent and single-threaded use, etc. as it can be very useful for memory-saving pools of objects, but then I got distracted by something shiny or something).
This would be useful for some string interning schemes, as @JonHanna mentions. I've copied and modified the HasSet code myself for this purpose in a previous product.
What's wrong with the Dictionary approach that was suggested in the top post? I think adding a TryGetValue method to HashSet would confuse 99% of users of the API who aren't using it for object pooling. Perhaps efforts should be dedicated to better pooling APIs instead.
@jamesqo it's wasteful in space - an extra pointer in every entry.
How would it be confusing? (Not disagreeing just trying to understand)
it's wasteful in space - an extra pointer in every entry.
Maybe it would be better to have specialized string interning APIs then that mitigated this problem then?
How would it be confusing? (Not disagreeing just trying to understand)
People not using the API for object pooling would ask "what's the point of this method if we already know the key?"
Any successful string interning API needs to have customizable policy (I guess that was why String.Intern failed). We could devise some such API but it's quite feasible to build something reasonable on HashSet or some other existing collection. I suspect if you're doing interning you're a relatively sophisticated developer and would be comfortable doing that.
/cc @jkotas
Any successful string interning API needs to have customizable policy
Agree. Check the Roslyn string interning facility as a case study: https://github.com/dotnet/roslyn/blob/acc053dbd9d3cb25f086f86f36f80896c8379466/src/Compilers/Core/Portable/MetadataReader/PEModule.cs#L2986
The interesting points:
(edit: @danmosemsft - also see https://github.com/dotnet/roslyn/blob/872202543f4be3033d39d0737426b9663a444c05/src/Compilers/Core/Portable/InternalUtilities/StringTable.cs )
Here's another one that I worked on
https://github.com/Microsoft/msbuild/blob/xplat/src/Shared/OpportunisticIntern.cs
It has some hard coded strings it does a fast check for, then uses several linked lists stored in priority order, each limited in the number of entries, and dedicated to a different size range. It also has a bunch of diagnostic logging to help tune it. Seems over complex now, but we got there by iterating with real work loads.
Upshot is I think it would be difficult to devise an interning API that would be sufficiently configurable for folks who want to go beyond basically a layer over a hashset.
Another potential use case for the API proposed is a hashtable where each entry knows its own key (some object that has a name property for example). If you provide a comparer that compares only against the name, you can retrieve the object without the overhead of storing a separate key.
Oddly enough in that previous life I had to do that also
https://github.com/Microsoft/msbuild/blob/xplat/src/XMakeBuildEngine/Collections/RetrievableEntryHashSet/HashSet.cs
An interning API can grow in some very non-set-related ways; concurrency, weak references, matching stored strings with character array ranges, hash-consing factories, layered stores, generalising the store of string.Intern to use as a layer. Which is all possibly useful, and all quite possibly of no value to the original request, depending on just why they wanted hash sets to offer this.
Just adding TryGetValue could perhaps raise some philosophical objections on the grounds of what it means to be in a set (the difference between a.Union(b) and b.Union(a) etc. can become more noticeable), but fits in well with some current practical uses, is easy to add (the last time I needed something like this I copied the source of HashSet from here and added it, and it was straightforward to do). It's probably therefore worth doing. There's a lot less design work needed than considering the wider needs of an ideal interning API.
Worth doing, I think.
I've formalized this request.
@ianhays, @safern any thoughts?
I think adding TryGetValue would be useful and as you guys already mentioned there are some practical uses where this can be useful.
About the index getter, why should this be valuable if adding TryGetValue would already give users the tool to get the value in a safe way without throwing an exception if the key doesn't exist and knowing if the key did exist while trying to get the value without a need of a try - catch block. @TylerBrinkley
@safern I think you're probably right. The index getter is probably not that useful. I guess I was trying to mirror Dictionary and added it as an afterthought. I've removed it from the proposal.
I updated the rationale to capture the use cases discussed above.
Thanks for updating that @danmosemsft.
API proposal looks good for me, so I will go ahead and add the label api-ready-for-review so that the members in charge of approving APIs can review it. Thanks @TylerBrinkley for your proposal, the review process should take one or a couple of weeks.
Looks good as proposed, but we should also add the same API on SortedSet<T> for consistency:
namespace System.Collections.Generic {
public partial class HashSet<T> {
+ public bool TryGetValue(T equalValue, out T actualValue);
}
public partial class SortedSet<T> {
+ public bool TryGetValue(T equalValue, out T actualValue);
}
}
By the way, we already exposed these APIs on the immutable collections for the same reason. We should be consistent with the parameter names used there (equalValue, and actualValue).
I have updated the top-post with the new API shape (@terrajobst's note above):
namespace System.Collections.Generic {
public class HashSet<T> : ICollection<T>, IDeserializationCallback, IEnumerable, IEnumerable<T>, IReadOnlyCollection<T>, ISerializable, ISet<T> {
+ public bool TryGetValue(T key, out T actualValue);
}
}
namespace System.Collections.Generic {
public partial class HashSet<T> {
+ public bool TryGetValue(T equalValue, out T actualValue);
}
public partial class SortedSet<T> {
+ public bool TryGetValue(T equalValue, out T actualValue);
}
}
I'll take my stab at implementing this if that's alright.
I was just about to write it is straightforward and available :).
Assigned yo you @TylerBrinkley.
That is great @TylerBrinkley thanks for contributing!!
Thanks @karelz @terrajobst
Most helpful comment
Another potential use case for the API proposed is a hashtable where each entry knows its own key (some object that has a name property for example). If you provide a comparer that compares only against the name, you can retrieve the object without the overhead of storing a separate key.
Oddly enough in that previous life I had to do that also
https://github.com/Microsoft/msbuild/blob/xplat/src/XMakeBuildEngine/Collections/RetrievableEntryHashSet/HashSet.cs