Getting the First element of a SortedDictionary causes allocation of a stack that is log sized in the number of elements in the dictionary. This can be done without any allocation.
Currently, First is only an extension method for SortedDictionary. This means it builds an Enumerator, and then uses that to get the first element. The enumerator creates a stack to aid its traversal of the underlying RedBlack tree. This seems the correct thing for the enumerator to do, but in the context of getting the single first element it is very costly. There does not seem to be a sensible way to alter the enumerator to special case this use, without introducing conditionals and hence costs elsewhere.
Would it be possible to make First a method of SortedDictionary, rather than an extension method, then it would be able to call SortedSet.Min directly? I am not familiar with the repurcussions of making an extension method into an actual method. Is there a way for this to break existing code, or not?
I can submit a Pull Request if that is the right way forward.
@Priya91 @ianhays any suggestions?
@mjp41, can you share examples of where adding this API would be very beneficial?
I only have one example of where it would be useful for me, priority queues. There is a Network Simulator I have been looking at that has memory performance issues. It uses a SortedDictionary as a priority queue. The system needs to preserve the order things are added with the same key, so it uses
SortedDictionary<timestamp, List<Packets>>
It is possible to refactor the code to use SortedSet, by instantiating it in the same way SortedDictionary does. But that leaves future clients of SortedDictionary open to the unexpected performance.
I only have one example of where it would be useful for me, priority queues.
Wouldn't an actual priority queue type (https://github.com/dotnet/corefx/issues/574) be better for that?
Example: I believe the Office DRM licensing server uses an LRU cache that uses a SortedDictionary to track when an AD item expires. It had a thread that checks the first item in the dictionary every few seconds to see if something should be removed from the cache. If it has expired it removes it, then removes it from the main lookup dictionary and then continues to check the First time until there is nothing else expired. At least that's how it worked when it was first written...back in the day.
@svick, a proper priority queue would be good for some aspects of the simulator I was looking at. But for the bit that uses the SortedDictionary it actually requires removal of an arbitrary element as well, something to do with reordering network flows.
@mjp41 Can you come up with an api speclet, with the api shape and explaining the usage scenarios.
@mjp41 do you have interest in doing such formal API proposal to keep this moving?
Docs on the API review process and see Jon's API request for a great example of a strong proposal. The more concrete the proposal is (e.g. has examples of usage, real-world scenarios, fleshed out API), the more discussion it will start and the better the chances of us being able to push the addition through the review process.
I'm going to close this since it's been stale for a few months now and there hasn't been much interest in it.
Personally I don't see much value in implementing directly on SortedDictionary over just using the existing extension method. I don't think the use-case justifies the additional API and potential compat issues (though I'm not sure how extensive those would be).
Re-opening issue since we have a similar proposal recently and we would like to have a proposal spec with performance benchmarks. (#22591)
Min/Max extraction in O(log n) is the fundamental perf expectation for a typed backed by a BST.
The original case of Priority Queue extraction in O(log n) with extraction of arbitrary element in O(log n) is needed, for instance, in a common solution for "The Skylline Problem" in LeetCode :
https://leetcode.com/problems/the-skyline-problem
Where you can follow one common solution here:
https://www.youtube.com/watch?v=GSBLe8cKu0s&t=927s
The guy in the video uses Java TreeMap that, not surprisingly given it is backed by a tree, has what is missing here (firstKey and lastKey).
That particular problem cannot use a priority key due to an arbitrary extraction issue. In a priority key, we can remove an element in O(log n) but we need a pointer to the element (say an index for a priority queue backed by array), which can be invalidated by an arbitrary extraction of another element (say before it in the array). Removing an item by value in a priority key becomes O(n) which blows away the intended perf.
The only structure that really fits the problem is a BST with O(log n) Min, Remove and Add.
Additionally, take a look at this StackOverflow thread asking for the same thing:
https://stackoverflow.com/questions/1613004/get-last-element-in-a-sorteddictionary
Finally, @ianhays meantioned"potential compat issues". Can you be more specific regarding this particular request? Is the issue still valid if something is added to the type?
@mjp41 So what you really wanted was a priority queue right? What did you do in the end, roll your own?
Most helpful comment
Min/Max extraction in O(log n) is the fundamental perf expectation for a typed backed by a BST.
The original case of Priority Queue extraction in O(log n) with extraction of arbitrary element in O(log n) is needed, for instance, in a common solution for "The Skylline Problem" in LeetCode :
https://leetcode.com/problems/the-skyline-problem
Where you can follow one common solution here:
https://www.youtube.com/watch?v=GSBLe8cKu0s&t=927s
The guy in the video uses Java TreeMap that, not surprisingly given it is backed by a tree, has what is missing here (firstKey and lastKey).
That particular problem cannot use a priority key due to an arbitrary extraction issue. In a priority key, we can remove an element in O(log n) but we need a pointer to the element (say an index for a priority queue backed by array), which can be invalidated by an arbitrary extraction of another element (say before it in the array). Removing an item by value in a priority key becomes O(n) which blows away the intended perf.
The only structure that really fits the problem is a BST with O(log n) Min, Remove and Add.
Additionally, take a look at this StackOverflow thread asking for the same thing:
https://stackoverflow.com/questions/1613004/get-last-element-in-a-sorteddictionary
Finally, @ianhays meantioned"potential compat issues". Can you be more specific regarding this particular request? Is the issue still valid if something is added to the type?