See LATEST Proposal in corefxlab repo.
Proposal from https://github.com/dotnet/corefx/issues/574#issuecomment-307971397
Elements in priority queue are unique. If they are not, we would have to introduce 'handles' of items to enable their update/remove. Or the update/remove semantics would have to apply to first/all, which is weird.
Modeled after Queue<T>
(MSDN link)
```c#
public class PriorityQueue
: IEnumerable,
IEnumerable<(TElement element, TPriority priority)>,
IReadOnlyCollection<(TElement element, TPriority priority)>
// ICollection not included on purpose
{
public PriorityQueue();
public PriorityQueue(IComparer
public IComparer<TPriority> Comparer { get; }
public int Count { get; }
public bool IsEmpty { get; }
public bool Contains(TElement element);
// Peek & Dequeue
public (TElement element, TPriority priority) Peek(); // Throws if empty
public (TElement element, TPriority priority) Dequeue(); // Throws if empty
public bool TryPeek(out TElement element, out TPriority priority); // Returns false if empty
public bool TryDequeue(out TElement element, out TPriority priority); // Returns false if empty
// Enqueue & Update
public void Enqueue(TElement element, TPriority priority); // Throws if it is duplicate
public void Update(TElement element, TPriority priority); // Throws if element does not exist
public void EnqueueOrUpdate(TElement element, TPriority priority);
public bool TryEnqueue(TElement element, TPriority priority); // Returns false if it is duplicate (does NOT update it)
public bool TryUpdate(TElement element, TPriority priority); // Returns false if element does not exist (does NOT add it)
public void Remove(TElement element); // Throws if element does not exist
public bool TryRemove(TElement element); // Returns false if element does not exist
public void Clear();
public IEnumerator<(TElement element, TPriority priority)> GetEnumerator();
IEnumerator IEnumerable.GetEnumerator();
//
// Selector part
//
public PriorityQueue(Func
public PriorityQueue(Func
public Func<TElement, TPriority> PrioritySelector { get; }
public void Enqueue(TElement element);
public void Update(TElement element);
}
````
Open questions:
PriorityQueue
vs. Heap
IHeap
and constructor overload? (Should we wait for later?)IPriorityQueue
? (Should we wait for later - IDictionary
example)(TElement element, TPriority priority)
vs. KeyValuePair<TPriority, TElement>
Peek
and Dequeue
rather have out
argument instead of tuple?Peek
and Dequeue
throwing useful at all?Issue https://github.com/dotnet/corefx/issues/163 requested the addition of a priority queue to the core .NET collection data structures.
This post, while a duplicate, is intended to act the formal submission to the corefx API Review Process. The issue contents are the _speclet_ for a new System.Collections.Generic.PriorityQueue
I will be contributing the PR, if approved.
The .NET Base Class Libraries (BCL) currently lacks support for ordered producer-consumer collections. A common requirement of many software applications is the ability generate a list of items over time and process them in an order different from the order they were received in.
There are three generic data structures within the System.Collections hierarchy of namespaces that supported a sorted collection of items; System.Collections.Generic.SortedList, System.Collections.Generic.SortedSet, and System.Collections.Generic.SortedDictionary.
Of these, SortedSet and SortedDictionary are not appropriate for producer-consumer patterns that generate duplicate values. The complexity of SortedList is Θ(n) worst case for both Add and Remove.
A much more memory and time efficient data structure for ordered collections with producer-consumer usage patterns is a priority queue. Other than when capacity resizing is necessary, worse case insertion (enqueue) and remove top (dequeue) performance is Θ(log n) - far better than the existing options that exist in the BCL.
Priority queues have a wide degree of applicability across different classes of applications. The Wikipedia page on Priority Queues offers a list of many different well understand use cases. While highly specialized implementations may still require custom priority queue implementations, a standard implementation would cover a broad range of usage scenarios. Priority queues are particularly useful in scheduling the output of multiple producers, which is an important pattern in highly parallelized software.
It's worth noting that both the C++ standard library and Java offer priority queue functionality as part of their basic APIs.
``` C#
namespace System.Collections.Generic
{
///
/// Represents a collection of objects that are removed in a sorted order.
///
///
[DebuggerDisplay("Count = {count}")]
[DebuggerTypeProxy(typeof(System_PriorityQueueDebugView<>))]
public class PriorityQueue
{
///
/// Initializes a new instance of the
/// that uses a default comparer.
///
public PriorityQueue();
/// <summary>
/// Initializes a new instance of the <see cref="PriorityQueue{T}"/> class
/// that has the specified initial capacity.
/// </summary>
/// <param name="capacity">The initial number of elements that the <see cref="PriorityQueue{T}"/> can contain.</param>
/// <exception cref="T:System.ArgumentOutOfRangeException"><paramref name="capacity"/> is less than zero.</exception>
public PriorityQueue(int capacity);
/// <summary>
/// Initializes a new instance of the <see cref="PriorityQueue{T}"/> class
/// that uses a specified comparer.
/// </summary>
/// <param name="comparer">The <see cref="T:System.Collections.Generic.IComparer{T}"/> to use when comparing elements.</param>
/// <exception cref="T:System.ArgumentNullException"><paramref name="comparer"/> is null.</exception>
public PriorityQueue(IComparer<T> comparer);
/// <summary>
/// Initializes a new instance of the <see cref="PriorityQueue{T}"/> class
/// that contains elements copied from the specified collection and uses a default comparer.
/// </summary>
/// <param name="collection">The collection whose elements are copied to the new <see cref="PriorityQueue{T}"/>.</param>
/// <exception cref="T:System.ArgumentNullException"><paramref name="collection"/> is null.</exception>
public PriorityQueue(IEnumerable<T> collection);
/// <summary>
/// Initializes a new instance of the <see cref="PriorityQueue{T}"/> class
/// that contains elements copied from the specified collection and uses a specified comparer.
/// </summary>
/// <param name="collection">The collection whose elements are copied to the new <see cref="PriorityQueue{T}"/>.</param>
/// <param name="comparer">The <see cref="T:System.Collections.Generic.IComparer{T}"/> to use when comparing elements.</param>
/// <exception cref="T:System.ArgumentNullException">
/// <paramref name="collection"/> is null. -or-
/// <paramref name="comparer"/> is null.
/// </exception>
public PriorityQueue(IEnumerable<T> collection, IComparer<T> comparer);
/// <summary>
/// Initializes a new instance of the <see cref="PriorityQueue{T}"/> class that is empty,
/// has the specified initial capacity, and uses a specified comparer.
/// </summary>
/// <param name="capacity">The initial number of elements that the <see cref="PriorityQueue{T}"/> can contain.</param>
/// <param name="comparer">The <see cref="T:System.Collections.Generic.IComparer{T}"/> to use when comparing elements.</param>
/// <exception cref="T:System.ArgumentOutOfRangeException"><paramref name="capacity"/> is less than zero.</exception>
/// <exception cref="T:System.ArgumentNullException"><paramref name="comparer"/> is null.</exception>
public PriorityQueue(int capacity, IComparer<T> comparer);
/// <summary>
/// Gets the <see cref="IComparer{T}"/> for the <see cref="PriorityQueue{T}"/>.
/// </summary>
/// <value>
/// The <see cref="T:System.Collections.Generic.IComparer{T}"/> that is used when
/// comparing elements in the <see cref="PriorityQueue{T}"/>.
/// </value>
public IComparer<T> Comparer
{
get;
}
/// <summary>
/// Gets the number of elements contained in the <see cref="PriorityQueue{T}"/>.
/// </summary>
/// <value>The number of elements contained in the <see cref="PriorityQueue{T}"/>.</value>
public int Count
{
get;
}
/// <summary>
/// Adds an object to the into the <see cref="PriorityQueue{T}"/> by its priority.
/// </summary>
/// <param name="item">
/// The object to add to the <see cref="PriorityQueue{T}"/>.
/// The value can be null for reference types.
/// </param>
public void Enqueue(T item);
/// <summary>
/// Removes and returns the object with the lowest priority in the <see cref="PriorityQueue{T}"/>.
/// </summary>
/// <returns>The object with the lowest priority that is removed from the <see cref="PriorityQueue{T}"/>.</returns>
/// <exception cref="InvalidOperationException">The <see cref="PriorityQueue{T}"/> is empty.</exception>
public T Dequeue();
/// <summary>
/// Returns the object with the lowest priority in the <see cref="PriorityQueue{T}"/>.
/// </summary>
/// <exception cref="InvalidOperationException">The <see cref="PriorityQueue{T}"/> is empty.</exception>
public T Peek();
/// <summary>
/// Removes all elements from the <see cref="PriorityQueue{T}"/>.
/// </summary>
public void Clear();
/// <summary>
/// Determines whether an element is in the <see cref="PriorityQueue{T}"/>.
/// </summary>
/// <param name="item">
/// The object to add to the end of the <see cref="PriorityQueue{T}"/>.
/// The value can be null for reference types.
/// </param>
/// <returns>
/// true if item is found in the <see cref="PriorityQueue{T}"/>; otherwise, false.
/// </returns>
public bool Contains(T item);
/// <summary>
/// Copies the elements of the <see cref="PriorityQueue{T}"/> to an <see cref="T:System.Array"/>,
/// starting at a particular <see cref="T:System.Array"/> index.
/// </summary>
/// <param name="array">
/// The one-dimensional <see cref="T:System.Array">Array</see> that is the
/// destination of the elements copied from the <see cref="PriorityQueue{T}"/>.
/// The <see cref="T:System.Array">Array</see> must have zero-based indexing.
/// </param>
/// <param name="arrayIndex">The zero-based index in <paramref name="array"/> at which copying begins.</param>
/// <exception cref="T:System.ArgumentNullException"><paramref name="array"/> is null.</exception>
/// <exception cref="T:System.ArgumentOutOfRangeException">
/// <paramref name="arrayIndex"/> is less than zero. -or-
/// <paramref name="arrayIndex"/> is equal to or greater than the length of the <paramref name="array"/>
/// </exception>
/// <exception cref="ArgumentException">
/// The number of elements in the source <see cref="T:System.Collections.ICollection"/> is
/// greater than the available space from <paramref name="index"/> to the end of the destination
/// <paramref name="array"/>.
/// </exception>
public void CopyTo(T[] array, int arrayIndex);
/// <summary>
/// Copies the elements of the <see cref="T:System.Collections.ICollection"/> to an
/// <see cref="T:System.Array"/>, starting at a particular <see cref="T:System.Array"/> index.
/// </summary>
/// <param name="array">
/// The one-dimensional <see cref="T:System.Array">Array</see> that is the
/// destination of the elements copied from the <see cref="PriorityQueue{T}"/>.
/// The <see cref="T:System.Array">Array</see> must have zero-based indexing.
/// </param>
/// <param name="index">The zero-based index in <paramref name="array"/> at which copying begins.</param>
/// <exception cref="T:System.ArgumentNullException"><paramref name="array"/> is null.</exception>
/// <exception cref="T:System.ArgumentOutOfRangeException"><paramref name="index"/> is less than zero.</exception>
/// <exception cref="ArgumentException">
/// <paramref name="array"/> is multidimensional. -or-
/// <paramref name="array"/> does not have zero-based indexing. -or-
/// <paramref name="index"/> is equal to or greater than the length of the <paramref name="array"/> -or-
/// The number of elements in the source <see cref="T:System.Collections.ICollection"/> is
/// greater than the available space from <paramref name="index"/> to the end of the destination
/// <paramref name="array"/>. -or-
/// The type of the source <see cref="T:System.Collections.ICollection"/> cannot be cast automatically
/// to the type of the destination <paramref name="array"/>.
/// </exception>
void ICollection.CopyTo(Array array, int index);
/// <summary>
/// Copies the elements stored in the <see cref="PriorityQueue{T}"/> to a new array.
/// </summary>
/// <returns>
/// A new array containing a snapshot of elements copied from the <see cref="PriorityQueue{T}"/>.
/// </returns>
public T[] ToArray();
/// <summary>
/// Returns an enumerator that iterates through the <see cref="PriorityQueue{T}"/>
/// </summary>
/// <returns>An enumerator for the contents of the <see cref="PriorityQueue{T}"/>.</returns>
public Enumerator GetEnumerator();
/// <summary>
/// Returns an enumerator that iterates through the <see cref="PriorityQueue{T}"/>
/// </summary>
/// <returns>An enumerator for the contents of the <see cref="PriorityQueue{T}"/>.</returns>
IEnumerator<T> IEnumerable<T>.GetEnumerator();
/// <summary>
/// Returns an enumerator that iterates through the <see cref="PriorityQueue{T}"/>.
/// </summary>
/// <returns>An <see cref="T:System.Collections.IEnumerator"/> that can be used to iterate through the collection.</returns>
IEnumerator IEnumerable.GetEnumerator();
/// <summary>
/// Sets the capacity to the actual number of elements in the <see cref="PriorityQueue{T}"/>,
/// if that number is less than than a threshold value.
/// </summary>
public void TrimExcess();
/// <summary>
/// Gets a value that indicates whether access to the <see cref="ICollection"/> is
/// synchronized with the SyncRoot.
/// </summary>
/// <value>true if access to the <see cref="T:System.Collections.ICollection"/> is synchronized
/// with the SyncRoot; otherwise, false. For <see cref="PriorityQueue{T}"/>, this property always
/// returns false.</value>
bool ICollection.IsSynchronized
{
get;
}
/// <summary>
/// Gets an object that can be used to synchronize access to the
/// <see cref="T:System.Collections.ICollection"/>.
/// </summary>
/// <value>
/// An object that can be used to synchronize access to the
/// <see cref="T:System.Collections.ICollection"/>.
/// </value>
object ICollection.SyncRoot
{
get;
}
public struct Enumerator : IEnumerator<T>
{
public T Current { get; }
object IEnumerator.Current { get; }
public bool MoveNext();
public void Reset();
public void Dispose();
}
}
}
```
| Operation | Complexity | Notes |
| --- | --- | --- |
| Construct | Θ(1) | |
| Construct Using IEnumerable | Θ(n) | |
| Enqueue | Θ(log n) | |
| Dequeue | Θ(log n) | |
| Peek | Θ(1) | |
| Count | Θ(1) | |
| Clear | Θ(N) | |
| Contains | Θ(N) | |
| CopyTo | Θ(N) | Uses Array.Copy, actual complexity may be lower |
| ToArray | Θ(N) | Uses Array.Copy, actual complexity may be lower |
| GetEnumerator | Θ(1) | |
| Enumerator.MoveNext | Θ(1) | |
Should the collection be 'stable'? In other words, should two items with equal IComparison
Fixed complexity of 'Construct Using IEnumerable' to Θ(n). Thanks @svick.
| Operation | Complexity |
| --- | --- |
| Construct Using IEnumerable | Θ(log n) |
I think this should be Θ(n). You need to at least iterate the input.
+1
Rx has a highly production-tested priority queue class:
Should a nested public enumerator structure be used to avoid an additional heap allocation during calls to GetEnumerator and when using foreach? My assumption is no, since enumerating over a queue is an uncommon operation.
I'd lean towards using the struct enumerator to be consistent with Queue<T>
which uses a struct enumerator. Also, if a struct enumerator isn't used now, we won't be able to change PriorityQueue<T>
to use one in the future.
Also perhaps a method for batch inserts? Can always then sort and continue from previous insertion point rather than starting at the beginning if that would help?:
public void Enqueue(List<T> items) {
items.Sort(_comparer);
... insertions ...
}
I've tossed a copy of the initial implementation here. Test coverage is far from complete, but if anyone is curious please take a look and let me know what you think. I've tried to follow the existing coding conventions from the System.Collections classes as much as possible.
Cool. Some initial feedback:
Queue<T>.Enumerator
implements IEnumerator.Reset
explicitly. Should PriorityQueue<T>.Enumerator
do the same?Queue<T>.Enumerator
uses _index == -2
to indicate the enumerator has been disposed. PriorityQueue<T>.Enumerator
has the same comment but has an extra _disposed
field. Consider getting rid of the extra _disposed
field and use _index == -2
to indicate it's been disposed to make the struct smaller and to be consistent with Queue<T>
_emptyArray
field can be removed and usage replaced with Array.Empty<T>()
instead.Also...
Dictionary<TKey, TValue>
, HashSet<T>
, SortedList<TKey, TValue>
, SortedDictionary<TKey, TValue>
, SortedSet<T>
, etc.) allow null to be passed in for the comparer, in which case Comparer<T>.Default
is used.Also...
ToArray
can be optimized by checking for _size == 0
before allocating the new array, in which case just return Array.Empty<T>()
.@justinvp Great feedback, thanks!
@ebickle
Array.Empty is a F# feature, so can't use it here sadly!
Not anymore: https://github.com/dotnet/coreclr/blob/master/src/mscorlib/src/System/Array.cs#L1060-L1069
I've migrated the code over to ebickle/corefx in the issue-574 branch.
Implemented the Array.Empty
Will be fixed once issue dotnet/corefx#966 is complete.
One key area I'm looking for feedback is how ToArray, CopyTo, and the enumerator should be handled. Currently they're optimized for performance, which means the target array is a heap and not sorted by the comparer.
There are three options:
1) Leave it as-is, and document that the returned array is not sorted. (it's a priority queue, not a sorted queue)
2) Modify the methods to sort the returned items/arrays. They will no longer be O(n) operations.
3) Remove the methods and enumerable support altogether. This is the "purist" option but removes the ability to quickly grab the remaining queued items when the priority queue is no longer needed.
Another thing I'd like feedback on is whether the queue should be stable for two items with the same priority (compare result of 0). Generally priority queues make no guarantees that two items with the same priority will be dequeued in the order they were enqueued, but I noticed that internal priority queue implementation used in System.Reactive.Core took some additional overhead to ensure this property. My preference would be to not do this, but I'm not completely sure which option is better in terms developers' expectations.
I happened upon this PR because I too was interested in adding a priority queue to .NET. Glad to see someone went to the effort of making this proposal :). After reviewing the code, I noticed the following:
IComparer
ordering is not consistent with Equals
, the behavior of this Contains
implementation (which uses the IComparer
) may be surprising to some users, as it is essentially _contains an item with equal priority_.Dequeue
. Shrinking the heap array by half when quarter full is typical.Enqueue
method accept null
arguments?Clear
should be Θ(n), since that is the complexity of System.Array.Clear
, which it uses. https://msdn.microsoft.com/en-us/library/system.array.clear%28v=vs.110%29.aspxI didn't see code to shrink the array in
Dequeue
. Shrinking the heap array by half when quarter full is typical.
Queue<T>
and Stack<T>
also don't shrink their arrays (based on Reference Source, they're not in CoreFX yet).
@mbeidler I'd considered adding some form of automatic array shrinking in dequeue, but as @svick pointed out it doesn't exist in the reference implementations of similar data structures. I'd be curious to hear from anyone on the .NET Core/BCL team whether there's any particular reason they chose that style of implementation.
Update: I checked List
Enqueue should support nulls, and is documented as allowing them. Did you run into a bug? I can't remember how robust the unit tests area in the area yet.
Interesting, I noticed in the reference source linked by @svick that Queue<T>
has an unused private constant named _ShrinkThreshold
. Perhaps that behavior existed in a previous version.
Concerning the use of IComparer
instead of Equals
in the implementation of Contains
, I wrote up the following unit test, which would fail currently: https://gist.github.com/mbeidler/9e9f566ba7356302c57e
@mbeidler Good point. According to MSDN, IComparer
However, it looks as though the same issue exists in the other collection classes. If I modify the code to operate on SortedList
Arguably it's a bug in the existing collection classes as well. I'll dug through the rest of the collections classes and see if there are any other data structures that accept an IComparer but test for equality separately. You're right though, for a priority queue you would expect custom ordering behavior that's completely independent from equality.
Committed a fix and the test case into my fork branch. The new implementation of Contains is based directly on the behavior of List
When I get some time later today, I'll likely submit bug reports for the other built-in collections. The probably can't be fixed due to regression behavior, but at very least the documentation needs to be updated.
I think it makes sense for ContainsKey
to use the IComparer<TKey>
implementation, since that is what specifies the key. However, I think it would be more logical for ContainsValue
to use Equals
instead of IComparable<TValue>
in its linear search; though in this case the scope is significantly reduced since a type's natural ordering is much less likely to be inconsistent with equals.
It appears that in the MSDN documentation for SortedList<TKey, TValue>
, the remarks section for ContainsValue
does indicate that the TValue's sort ordering is used in lieu of equality.
@terrajobst How do you feel about the API proposal so far? Do you feel this a good fit for CoreFX?
:+1:
Thanks for filing this. I believe we've enough data to make a formal review of this proposal, hence I labelled it as 'ready for api review'
As Dequeue
and Peek
are throwing methods the caller needs to check count before each call. Would it make sense to instead (or in addition) provide TryDequeue
and TryPeek
following the pattern of the concurrent collections? There are issues open to add non-throwing methods to existing generic collections so adding a new collection which doesn't have these methods seems counterproductive.
@andrewgmorris related https://github.com/dotnet/corefx/issues/4316 "Add TryDequeue to Queue"
We had a basic review of this and we agree that we want a ProrityQueue in the framework. However, we need to get someone to help drive the design and implementation for it. Whoever grabs the issue can work @terrajobst on finalizing the API.
So what work is left on the API?
This is missing from the API proposal above: PriorityQueue<T>
should implement IReadOnlyCollection<T>
to match Queue<T>
(Queue<T>
now implements IReadOnlyCollection<T>
as of .NET 4.6).
I don't know that array-based priority queues are best. Memory allocation in .NET is really fast. We don't have the same search-small-blocks issue that the old malloc dealt with. You are welcome to use my priority queue code from here: https://github.com/BrannonKing/Kts.Astar/tree/master/Kts.AStar
@ebickle One tiny nit on the _speclet_. It says:
/// Adds an object to the end of the <see cref="PriorityQueue{T}"/>.
...
public void Enqueue(T item);
Shouldn't it say instead, /// Inserts object into the <see cref="PriorityQueue{T}"> by its priority.
@SunnyWar Fixed the documentation of the Enqueue method, thank you!
Some time ago, I created a data structure with complexities similar to a priority queue based on a Skip List data structure which I've decided at this point to share: https://gist.github.com/bbarry/5e0f3cc1ac7f7521fe6ea25947f48ace
https://en.wikipedia.org/wiki/Skip_list
The skip list matches the complexities of a priority queue above in average cases, except that Contains is an average case O(log(n))
. In addition, access to either first or last elements are constant time operations and iteration in both forward and reverse order matches the forward order complexities of a PQ.
There obviously are downsides to such a structure in the form of higher costs on memory, and it does devolve to O(n)
worst case inserts and removals, so it has its trade offs...
Is this already implemented somewhere? When is the expected release?
Also whats about updating the priority of an existing item?
@Priya91 @ianhays is this ready to get marked ready for review?
This is missing from the API proposal above: PriorityQueue
should implement IReadOnlyCollection to match Queue (Queue now implements IReadOnlyCollection as of .NET 4.6).
I agree with @justinvp here.
@Priya91 @ianhays is this ready to get marked ready for review?
I'd say so. This has been sitting for a while; lets make moves on it.
@justinvp @ianhays I've updated the spec to implement IReadOnlyCollection
I have a full implementation of the class and it's associated PriorityQueueDebugView that uses an array-based implementation. Unit tests aren't at 100% coverage yet, but if there's some interest I can do a bit of work and dust off my fork.
@NKnusperer made a good point about updating the priority of an existing item. I'll leave it out for now but it's something to consider during spec review.
There are 2 implementations in full framework, that are possible alternatives.
https://referencesource.microsoft.com/#q=priorityqueue
For reference here is a question about the Java PriorityQueue on stackoverflow http://stackoverflow.com/questions/683041/java-how-do-i-use-a-priorityqueue. It is interesting that the priority is handled by a comparer instead of just a simple int priority wrapper object. For example it doesn't make it easy to change the priority of an item in the queue already.
API review:
We agree that it is useful to have this type in CoreFX, because we expect CoreFX to use it.
For final review of the API shape, we would like to see sample code: PriorityQueue<Thread>
and PriorityQueue<MyClass>
.
T
.Notes:
Remove
and Add
on the queue.This would be a really useful type to have in CoreFX. Is anyone interested in grabbing this one?
I don't like the idea of fixing the priority queue to a binary heap. Have a look at my AlgoKit wiki page for details. Quick thoughts:
Last year I implemented a few heap types. In particular, there is IHeap
interface that could be used as the priority queue.
I'm all for introducing an IHeap
interface and some most performant implementations (at least those array based). The API and implementations are in the repository I linked above.
So no _priority queues_. Heaps.
What do you think?
@karelz @safern @danmosemsft
@pgolebiowski Don't forget we are designing easy-to-use-while-pretty-performant API. If you want PriorityQueue
(which is established computer-science term), you should be able to find one in docs / via internet search.
If the underlying implementation is heap (which I personally wonder why), or something else, it doesn't matter too much. Assuming you can demonstrate the implementation is measurably better than simpler alternatives (complexity of code is also a metric that somewhat matters).
So if you still believe heap-based implementation (without IHeap
API) is better than some simple list-based or list-of-array-chunks-based implementation, please explain why (ideally in a few sentences/paragraphs), so that we can get agreement on the implementation approach (and avoid wasting time on your side on complex implementation which may be turned down at PR review time).
Drop ICollection
, IEnumerable
? Just have the generic versions (though generic IEnumerable<T>
will bring in IEnumerable
)
@pgolebiowski how its implemented doesn't change the external api. PriorityQueue
defines the behaviour/contract; whereas a Heap
is a specific implementation.
If the underlying implementation is heap (which I personally wonder why), or something else, it doesn't matter too much. Assuming you can demonstrate the implementation is measurably better than simpler alternatives (complexity of code is also a metric that somewhat matters).
So if you still believe heap-based implementation is better than some simple list-based or list-of-array-chunks-based implementation, please explain why (ideally in a few sentences/paragraphs), so that we can get agreement on the implementation approach [...]
OK. A priority queue is an abstract data structure that can be then implemented in some way. Of course you can implement it with a data structure different than a heap. But no way is more efficient. As a result:
To back my words let's start with the theoretical support. Introduction to Algorithms, Cormen:
[…] priority queues come in two forms: max-priority queues and min-priority queues. We will focus here on how to implement max-priority queues, which are in turn based on max- heaps.
Stated clearly that priority queues are heaps. This is a shortcut of course, but you get the idea. Now, more importantly, let's have a look at how other languages and their standard libraries add support for the operations we are discussing:
PriorityQueue<T>
— priority queue implemented with a heap.BinaryHeap
— heap explicitly in the API. It says in the docs that it's a priority queue implemented with a binary heap. But the API is very clear on the structure — a heap.CFBinaryHeap
— again, explicitly says what the data structure is, avoiding the usage of the abstract term "priority queue". The docs describing the class: Binary heaps can be useful as priority queues. I like the approach.priority_queue
— once again, implemented canonically with a binary heap built on the top of an array.heapq
— heap is explicitly exposed in the API. The priority queue is only mentioned in the docs: This module provides an implementation of the heap queue algorithm, also known as the priority queue algorithm. Heaps are binary trees. Also notice the huge shortcut to say that a heap equals a priority queue and that a heap equals a binary tree.heap package
— there is even a heap interface. No explicit priority queue, but again only in docs: A heap is a common way to implement a priority queue. To build a priority queue, implement the Heap interface.I strongly believe we should go the Rust / Swift / Python / Go way and expose a heap explicitly while clearly stating in the docs that it can be used as a priority queue. I strongly believe this approach is very clean and simple.
I hope you all like the approach of just clearly exposing a heap data structure — and making a priority queue reference in the docs only.
We need to agree upon the matter above, but let me just start another topic — how to implement this. I can see two solutions here:
ArrayHeap
class. Seeing the name you can immediately tell what kind of heap you're dealing with (again, there are dozens of heap types). You see it is array-based. You immediately know the beast you're dealing with.IHeap
interface and providing one or more implementations. The customer could write code that depends on interfaces — this would allow providing really clear and readable implementations of complex algorithms. Imagine writing a DijkstraAlgorithm
class. It could simply depend on the IHeap
interface (one parametrized constructor) or just use the ArrayHeap
(default constructor). Clean, simple, explicit, no ambiguity involved due to using a "priority queue" term. And a wonderful interface that makes very much sense theoretically.In the approach above, the ArrayHeap
represents an implicit heap-ordered complete d-ary tree, stored as an array. This can be used to create for example a BinaryHeap
or a QuaternaryHeap
.
For further discussion I strongly encourage you to have a look at this paper. You will know that:
4-ary heaps (quaternary) are simply faster than 2-ary heaps (binary). There are many tests done in the paper — you would be interested in comparing the performance of implicit_4
and implicit_2
(sometimes implicit_simple_4
and implicit_simple_2
).
The optimal choice of implementation is strongly input-dependent. Out of implicit d-ary heaps, pairing heaps, Fibonacci heaps, binomial heaps, explicit d-ary heaps, rank-pairing heaps, quake heaps, violation heaps, rank-relaxed weak heaps, and strict Fibonacci heaps, the following types seem to cover almost all needs for various scenarios:
IHeap
approach, it would be worth adding this, as in many cases it is really fast and faster than the array-based solution.For both of them the programming effort is surprisingly low. Have a look at my implementations.
@karelz @benaadams
This would be a really useful type to have in CoreFX. Is anyone interested in grabbing this one?
@safern I'd be very happy to grab this one from now on.
OK, it was a problem between my keyboard and my chair - PriorityQueue
is of course based on Heap
-- I was thinking about just Queue
where it doesn't make sense and forgot that heap is 'sorted' -- very embarrassing thinking-process-outage for someone like me, who loves logic, algorithms, Turing machines, etc., my apologies. (BTW: As soon as I read couple of sentences in your Java docs link, the discrepancy immediately clicked)
From that perspective it makes sense to build the API on top of Heap
. We should not make that class public yet though - it will require its own API review and its own discussion if it is something we need in CoreFX. We do not want API surface creep due to implementation, but it may be the right thing to do -- hence the discussion needed.
From that perspective I don't think we need to create IHeap
just yet. It may be good decision later.
If there is research that specific heap (e.g. 4-ary as you mention above) is best for general random input, then we should choose that. Let's wait for @safern @ianhays @stephentoub to confirm/voice opinion.
The parametrization of underlying heap with multiple implemented options is something which IMO does not belong into CoreFX (I may be wrong here, again - let's see what others think).
My reason is that IMO we would just soon ship gazillions of specialized collections which would be very hard for people (average developer without strong background in nuances of algorithms) to choose from. Such library, however, would make a great NuGet package, for experts in the field - owned by you/community. In future, we may consider adding it into PowerCollections (we are actively discussing for last 4 months where on GitHub to put this library and if we should own it, or if we should encourage community to own it - there are different opinions on the matter, I expect we will finalize its fate post 2.0)
Assigning to you as you want to work on it ...
@pgolebiowski collaborator invite sent - when you accept ping me, I will be able to assign it to you then (GitHub limitations).
@benaadams I would keep ICollection
(mild preference). For consistency with other d.s. in CoreFX. IMO it is not worth to have one odd beast here ... if we were adding a handful new ones (e.g. PowerCollections even to another repo), we should not include the non-generic ones ... thoughts?
OK, it was a problem between my keyboard and my chair.
Haha 😄 No worries.
it makes sense to build the API on top of Heap. We should not make that class public yet though [...] We do not want API surface creep due to implementation, but it may be the right thing to do -- hence the discussion needed. [...] I don't think we need to create IHeap just yet. It may be good decision later.
If the decision of the group is to go with PriorityQueue
, I'll just help with the design and implement this. However, please take into consideration the fact that if we add a PriorityQueue
now, it will be messy in the API later to add Heap
-- as both behave basically the same. It would be sort of redundancy IMO. This would be a design smell for me. I wouldn't add the priority queue. It doesn't help.
Also, one more thought. Actually, the pairing heap data structure could come in handy fairly often. Array-based heaps are horrible at merging them. This operation is basically linear. When you're having lots of merging heaps, you're killing the performance. However, if you use a pairing heap instead of an array heap -- the merging operation is constant (amortized). This is another argument why I would like to provide a nice interface and two implementations. One for general input, second for some specific scenarios, especially when merging of heaps is involved.
Voting for IHeap
+ ArrayHeap
+ PairingHeap
! 😄 (like in Rust / Swift / Python / Go)
If the pairing heap is too much -- OK. But let's at least go with IHeap
+ ArrayHeap
. Don't you guys feel that going with a PriorityQueue
class is locking the possibilities in the future and making the API less clear?
But as I said -- if you all vote for a PriorityQueue
class over the proposed solution -- OK.
collaborator invite sent - when you accept ping me, I will be able to assign it to you then (GitHub limitations).
@karelz ping :)
please take into consideration the fact that if we add a PriorityQueue now, it will be messy in the API later to add Heap -- as both behave basically the same. It would be sort of redundancy IMO. This would be a design smell for me. I wouldn't add the priority queue. It doesn't help.
Can you explain more details about why it will be messy later? What are your concerns?
PriorityQueue
is concept people use. Having a type named that way is useful, right?
I think the logical operations (at least their names) on Heap
might be different. If they are the same, we can have 2 different implementations of the same code in the worst case (not ideal, but not the end of the world). Or we can insert Heap
class as parent of PriorityQueue
, right? (assuming it is allowed from API review perspective - right now I don't see reason for not, but I don't have that many years of experience with API reviews, so will wait on others to confirm)
Let's see how the voting & further design discussion go ... I am slowly warming up to the idea of IHeap
+ ArrayHeap
, but not fully convinced yet ...
if we were adding a handful new ones ... we should not include the non-generic ones
Red rag to a bull. Anyone got some other collections to add so we can drop ICollection
?
Circular/Ring buffer; Generic and Concurrent?
@karelz A solution for the naming issue could be something like IPriorityQueue
like DataFlow does for produce/consumer patterns. Many ways to implement a priority queue and if you don't care about that use the interface. Care about the implementation or are creating an instance use implementing class.
Can you explain more details about why it will be messy later? What are your concerns?
PriorityQueue
is concept people use. Having a type named that way is useful, right? […] I am slowly warming up to the idea ofIHeap
+ArrayHeap
, but not fully convinced yet ...
@karelz From my experience, I find it really important to have an abstraction (IPriorityQueue
or IHeap
). Thanks to such an approach a developer can write code that is decoupled. Because it is written against an interface (rather than a specific implementation), there is more flexibility and IoC spirit. It is very easy to write unit tests for such a code (having Dependency Injection one can inject their own mocked IPriorityQueue
or IHeap
and see what methods are called at what time and with what arguments). Abstractions are good.
It is true that the term "priority queue" is used commonly. The problem is that there is just one way to implement a priority queue effectively — with a heap. Lots of types of heaps. So we could have:
class ArrayHeap : IPriorityQueue {}
class PairingHeap : IPriorityQueue {}
class FibonacciHeap : IPriorityQueue {}
class BinomialHeap : IPriorityQueue {}
or
class ArrayHeap : IHeap {}
class PairingHeap : IHeap {}
class FibonacciHeap : IHeap {}
class BinomialHeap : IHeap {}
For me, the second approach looks better. How do you feel about it?
Regarding the PriorityQueue
class — I think it would be too vague and am totally against such a class. A vague interface is good, but not an implementation. If it's a binary heap, just call it BinaryHeap
. If it's something else, name it accordingly.
I am all for naming classes clearly because of the problems with classes like SortedDictionary
. There is a lot of confusion among developers for what it stands for and how it differs from SortedList
. If it was just called BinarySearchTree
, life would be simpler and we could go home and see our children earlier.
Let's name a heap a heap.
@benaadams each of them has to make it into CoreFX (i.e. it has to be valuable for CoreFX code itself, or it has to be as basic and widely used as the ones we already have) -- we discussed those things lately quite a lot. Still not 100% consensus, but no one is eager to just add more collections into CoreFX.
The most likely outcome (biased claim, because I like the solution) is that we will create another repo and prime it with PowerCollections and then let community extend it with some basic guidance/oversight from our team.
BTW: @terrajobst thinks it is not worth it ("we have better and more impactful things in the ecosystem to do") and we should encourage community to fully drive it (incl. start with existing PowerCollections) and not make it one of our repos - some though discussions & decisions in front of us.
// I guess there's opportunity for community to jump on this solution before we make our mind ;-). That would make the discussion (and my preference) mute ;-)
@pgolebiowski you are slowly convincing me that having Heap
is better than PriorityQueue
-- we would just need strong guidance & docs "This is how you do PriorityQueue
- use Heap
" ... that could work.
However, I am very hesitant to include more than 1 heap implementation into CoreFX. 98%+ of 'normal' C# developers out there do not care. They don't even want to think which one is best, they just need something that gets the job done. Not every piece of every SW is designed with high-performance in mind, rightfully so. Think about all the one off tools, UI apps, etc. Unless you are designing high-performance scaling-out system where this d.s. is on the critical path, you should NOT care.
The same way, I don't care how SortedDictionary
or ArrayList
or other d.s. are implemented -- they do their job decently. I (as many others) understand that if I need high-perf out of these d.s for my scenarios, I need to measure performance and/or check implementation and have to decide if it is good enough for my scenarios, or if I need to roll out my own special implementation to get the best perf, tuned for my needs.
We should optimize usability for 98% use cases, not for the 2%. If we roll out too many options (implementations) and force everyone to decide, we just caused unnecessary confusion for 98% use cases. I don't think it's worth it ...
IMO .NET ecosystem has great value in offering single-ish choice of many APIs (not just collections) with very decent performance characteristics, useful to most use cases. And offering ecosystem which allows high-performance extensions for those who need it and are willing to dig deeper and learn more / make educated choices and tradeoffs.
That said, having an interface IHeap
(like IDictionary
and IReadOnlyDictionary
), may make sense - I have to think about it a bit more / ask the API review experts in the space ...
We already (to some degree) have what @pgolebiowski is talking about with ISet<T>
and HashSet<T>
. I say just mirror it. So the API above is changed to an interface (IPriorityQueue<T>
) and then we have an implementation (HeapPriorityQueue<T>
) which internally uses a heap that may or may not be publicly exposed as it's own class.
Should it (PriorityQueue<T>
) also implement IList<T>
?
@karelz my issue with ICollection
is SyncRoot
and IsSynchronized
; either they are implemented which means there is an extra allocation for the lock object; or they throw, when its a bit pointless having them.
@benaadams This would be misleading. Since 99.99% of implementations of priority queues are heaps based on arrays (and as I can see we are going with one here as well), would it mean exposing access to the internal structure of the array?
Let's say we have a heap with elements 4, 8, 10, 13, 30, 45. Considering the order, they would be accessed by indexes 0, 1, 2, 3, 4, 5. However, the internal structure of the heap is [4, 8, 30, 10, 13, 45] (in binary, in quaternary it would be different).
i
doesn't really make sense from the user's point of view, as it is almost arbitrary.i
elements, getting the i
-th element, and then pushing them once again.IList<T>
is normally the cheeky workaround for: I want be flexible with collections my api accepts, and I want to Enumerate them but don't want to allocate via IEnumerable<T>
Just realized there is no generic interface for
ICollection<T>
{
int Count
CopyTo(T[], int index)
}
So never mind about that 😢 (Though its sort of IReadOnlyCollection)
But Reset on Enumerator should be explicitly implemented because its bad and should just throw.
So my suggested changes
public bool TryDequeue(out T item); // Add
public bool TryPeek(out T item); // Add
public struct Enumerator : IEnumerator<T>
{
public T Current { get; }
object IEnumerator.Current { get; }
public bool MoveNext();
void IEnumerator.Reset(); // Explicit
public void Dispose();
}
Since you're discussing the methods... We haven't yet agreed upon the IHeap
vs IPriorityQueue
thingy -- it impacts the names of the methods and the logic a bit. However, either way, I find the current API proposal missing the following:
These operations are pretty crucial, especially the possibility of updating an element. Without this, many algorithms simply cannot be implemented. We need to introduce a handle type. In the API here, the handle is IHeapNode
. Which is another argument for going the IHeap
way because otherwise we would have to introduce PriorityQueueHandle
type that would always just be a heap node... 😜 Plus it's just vague what it means... Whereas, a heap node -- everyone knows the stuff and can imagine the thingy they're dealing with.
Actually, shortly speaking, for the API proposal, please have a look at this directory. We would probably need only a subset of it. But nevertheless -- it just contains what we need IMO, so it might be worth having a look at it as a starting point.
What are your thoughts, guys?
IHeapNode
isn't much different than the clr type KeyValuePair?
However that's then separating the priority and type so now its a PriorityQueue<TKey, TValue>
with a IComparer<TKey> comparer
?
KeyValuePair is not only a struct but its properties are read-only. It would basically equal creating a new object every time the structure is updated.
Using a key only doesn't work with equal keys -- more information is needed to know which element to update / remove.
From IHeapNode
and ArrayHeapNode
:
/// <summary>
/// Represents a heap node. It is a wrapper around a key and a value.
/// </summary>
public interface IHeapNode<out TKey, out TValue>
{
/// <summary>
/// Gets the key for the value.
/// </summary>
TKey Key { get; }
/// <summary>
/// Gets the value contained in the node.
/// </summary>
TValue Value { get; }
}
/// <summary>
/// Represents a node of an <see cref="ArrayHeap{TKey,TValue}"/>.
/// </summary>
public class ArrayHeapNode<TKey, TValue> : IHeapNode<TKey, TValue>
{
/// <inheritdoc cref="IHeapNode{TKey,TValue}.Key"/>
public TKey Key { get; internal set; }
/// <inheritdoc cref="IHeapNode{TKey,TValue}.Value"/>
public TValue Value { get; }
/// <summary>
/// The index of the node within an <see cref="ArrayHeap{TKey,TValue}"/>.
/// </summary>
public int Index { get; internal set; }
/// <summary>
/// Initializes a new instance of the <see cref="ArrayHeapNode{TKey,TValue}"/> class,
/// containing the specified value.
/// </summary>
public ArrayHeapNode(TKey key, TValue value, int index)
{
this.Key = key;
this.Value = value;
this.Index = index;
}
}
And the update method within an ArrayHeap
:
/// <inheritdoc cref="IHeap{TKey,TValue}.Update"/>
public override void Update(ArrayHeapNode<TKey, TValue> node, TKey key)
{
if (node == null)
throw new ArgumentNullException(nameof(node));
var relation = this.Comparer.Compare(key, node.Key);
node.Key = key;
if (relation < 0)
this.MoveUp(node);
else
this.MoveDown(node);
}
But now every element in the Heap is an additional object; what if I want a PriorityQueue behavior but I don't want these extra allocations?
Then you would not be able to update / remove elements. It would be a bare implementation not allowing you to implement any algorithm dependent on DecreaseKey
operation (which is hugely common). For example Dijkstra's algorithm, Prim's algorithm. Or if you're writing some sort of a scheduler and some process (or whatever) has changed its priority, you cannot address that.
Also, in every other heap implementation -- elements are just nodes, explicitly. It's perfectly natural in other cases, here it's a bit artificial, but necessary for updating / removing.
In Java the priority queue does not have the option to update elements. Result:
When I implemented heaps for the AlgoKit project and came across all these design problems I thought that this is the reason why the authors of .NET decided not to add such a basic data structure as a heap (or a priority queue). Because neither design is good. Each has its shortcomings.
Shortly speaking -- if we want to add a data structure supporting efficient ordering of elements by their priority, we better do it the right way and add a functionality like updating / removing elements from it.
If you feel bad about wrapping elements in an array with another object -- this is not the only problem. Another is merging. With array-based heaps, this is totally inefficient. However... if we use the pairing heap data structure (The pairing heap: A new form of self-adjusting heap), then:
Actually, we could solve this the following way:
IHeap
interface that supports all the methodsArrayHeap
and PairingHeap
with all those handles, updating, removing, mergingPriorityQueue
which is just a wrapper around an ArrayHeap
, simplifying the APIPriorityQueue
would be in System.Collections.Generic
and all the heaps in System.Collections.Specialized
.
Works?
It's fairly unlikely we'll get three new data structures through API review. In most cases, a smaller amount of API is better. We can always add more later if it ends up being insufficient, but we cannot remove API.
That's one of the reasons I'm not a fan of the HeapNode class. Imo that kind of thing should be internal-only and the API should expose an already existing type if at all possible - in this case KVP probably.
@ianhays If this is kept internal-only, users would not be able to update priorities of elements in the data structure. This would be pretty much useless and we would end up with all the Java problems -- people re-implementing the data structure that is already in the native library... Sounds bad to me. Far worse than having a simple class representing a node.
BTW: a linked list has a node class so that users can use the proper functionality. This is pretty much mirroring.
users would not be able to update priorities of elements in the data structure.
That's not necessarily true. The priority could be exposed in a way that doesn't require an additional data structure such that instead of
public override void Update(ArrayHeapNode<TKey, TValue> node, TKey key)
{
}
you would have e.g.
public override void Update(TKey item, TValue priority)
{
}
I'm not convinced yet on exposing the priority as a type parameter either. Imo the majority of people are going to be using the default priority type and the TValue will be an unnecessary specificity.
void Update(TKey item, TValue priority)
First -- what is TKey
and TValue
in your code? How is that supposed to work? The convention in computer science is that:
Second:
Imo the majority of people are going to be using the default priority type and the TValue will be an unnecessary specificity.
Please define "the default priority type". I can feel that you just want to have PriorityQueue<T>
, is that right? If that is the case, consider the fact that a user will likely have to create a new class, a wrapper around his priority and value, plus implement something like IComparable
or provide a custom comparator. Pretty poor.
First -- what is TKey and TValue in your code?
The item and the priority of the item. You can switch them to be the priority and the item associated with that priority, but then you have a collection in which the TKeys aren't necessarily unique (i.e. allowing duplicate priorities). I'm not against that, but to me TKey usually implies uniqueness.
The point is that exposing a Node class is not a requirement to exposing an Update method.
Please define "the default priority type". I can feel that you just want to have PriorityQueue
, is that right?
Yes. Though reading through this thread again, I'm not completely convinced that's the case.
If that is the case, consider the fact that a user will likely have to create a new class, a wrapper around his priority and value, plus implement something like IComparable or provide a custom comparator. Pretty poor.
You're not wrong. I imagine a fair amount of users will have to create a wrapper type with custom comparator logic. I imagine there are also a fair amount of users that already have a comparable type they want to put in the priorityqueue or a type with a defined comparator. The question is which camp is the larger of the two.
I think the type should be called PriorityQueue<T>
, not Heap<T>
, ArrayHeap<T>
or even QuaternaryHeap<T>
to stay consistent with the rest of .Net:
List<T>
, not ArrayList<T>
.Dictionary<K, V>
, not HashTable<K, V>
.ImmutableList<T>
, not ImmutableAVLTreeList<T>
.Array.Sort()
, not Array.IntroSort()
.Users of such types usually don't care how they are implemented, it certainly shouldn't be the most prominent thing about the type.
If that is the case, consider the fact that a user will likely have to create a new class, a wrapper around his priority and value, plus implement something like IComparable or provide a custom comparator.
You're not wrong. I imagine a fair amount of users will have to create a wrapper type with custom comparator logic.
In the summary api the compare is provided by the provided comparer IComparer<T> comparer
, no wrapper is required. Often the priority will be part of the type e.g. a time scheduler will have the execution time as a property of the type.
Using a KeyValuePair
with a custom IComparer
adds no extra allocations; though it adds one extra indirection for an update as values need to be compared rather than the item.
Updates would be problematic for struct items unless the item was obtained via a ref return and then updated and passed back into an update method by ref and refs were compared. But that's a bit horrible.
@svick
I think the type should be called
PriorityQueue<T>
, notHeap<T>
,ArrayHeap<T>
or evenQuaternaryHeap<T>
to stay consistent with the rest of .Net.
I'm losing my faith in humanity.
PriorityQueue
is being consistent with the rest of .NET and calling it Heap
is not? Something wrong with heaps? Same should apply to stacks and queues then.ArrayHeap
-> base class for QuaternaryHeap
. Huge difference.List
is a list, but it's not a list. 😜Wanna stay consistent with the rest of .NET and run into problems like this one?
And what do I see there... Jon Skeet saying that SortedDictionary
should be called SortedTree
as that reflects the implementation more closely.
@pgolebiowski
Something wrong with heaps?
Yes, it describes the implementation, not usage.
Same should apply to stacks and queues then.
Neither really describes the implementation, for example the name does not tell you they are array-based.
ArrayHeap
-> base class forQuaternaryHeap
. Huge difference.
Yes, I understand your design. What I'm saying is none of this should be exposed to the user.
There is a whole lot of stuff coming as a consequence of following a certain design path. Re-read the thread please.
I did read the thread. I don't think that design belongs to the BCL. I think it should contain something that is simple to use and understand, not something that results in people wondering what "quaternary heap" is or whether they should use it.
If the default implementation is not good enough for them, that's where other libraries (like your own) come in.
Hashtable, ArrayList. They seem to exist.
Yes, those are .Net Framework 1.0 classes that nobody uses anymore. As far as I can tell, their names were copied from Java and .Net Framework 2.0 designers decided not to follow that convention. In my opinion, that was the right decision.
BTW, a "list" is a pretty bad choice of naming IMO, as the first thought of a user is that
List
is a list, but it's not a list.
It is. It's not a linked list, but that's not the same thing. And I like not having to write ArrayList
or ResizeArray
(F#'s name for List<T>
) everywhere.
It is about giving meaningful names that show users immediately what they are dealing with.
Most people will have no idea what they are dealing with if they see QuaternaryHeap
. On the other hand, if they see PriorityQueue
, it should be clear what they are dealing with, even if they don't have any CS background. They will not know what the implementation is, but that's what documentation is for.
@ianhays
First -- what is TKey and TValue in your code?
The item and the priority of the item. You can switch them to be the priority and the item associated with that priority, but then you have a collection in which the TKeys aren't necessarily unique (i.e. allowing duplicate priorities). I'm not against that, but to me TKey usually implies uniqueness.
Keys -- thingies used to do the prioritization. Keys don't have to be unique. We cannot make such an assumption.
The point is that exposing a Node class is not a requirement to exposing an Update method.
I believe it is: Decrease-key/Increase-key support in native libraries. Also on this topic, so there are helper classes involved for dealing with data structures:
I imagine a fair amount of users will have to create a wrapper type with custom comparator logic. I imagine there are also a fair amount of users that already have a comparable type they want to put in the priorityqueue or a type with a defined comparator. The question is which camp is the larger of the two.
I don't know about the others, but I find creating a wrapper with custom comparator logic every single time I want to use a priority queue to be pretty horrible...
Another thought -- let's say I created a wrapper:
class Wrapper
{
public int Priority { get; set; }
public string SomeStuff ...
}
I feed PriorityQueue<T>
with this type. So it prioritizes its elements based on it. Let's say it has defined some key selector. Such a design can crash the internal work of a heap. You can change priorities any time as you are the owner of the element. There is no notification mechanism for the heap to update. You would be responsible for handling all of that and calling it. Pretty dangerous and not straight-forward to me.
In case of the handle I proposed -- the type was immutable from the user's point of view. And there were no issues with uniqueness.
IMO I like the idea of having an IHeap
intereface and then the class API implementing that interface. I'm with @ianhays' on this that we are not going to expose 3 new apis to provide different kind of heaps to the customers as there would need to be a API Review process and we would like to focus on PriorityQueue
no matter what the name is.
Second I think there is no need of having a Node internal/public class to store the values in the queue. No need of extra allocations, we could follow something sort of what IDictionary does and when yielding the values for the Enumerable creating KeyValuePairPriorityQueue<T>
(this name is just to give it one). With this approach we could have a constructor following what the whole BCL does with an IComparer<T>
and doing the comparison with that comprarer to give the priority. No need of exposing APIs that pass a priority as a parameter.
I think having some of the APIs receive a priority would make it "less usable" or more complicated for ordinal customers that would like to be the default priority, then we are making it more complicated to understand the usare of it, when giving them the option of having a custom IComparer<T>
will be the most reasonable and will be following the guidelines we have across BCL.
The names are what they do, not how they do it.
They are named after the abstract concepts and what they achieve for the user, rather than their implementation and how they achieve it. (Which also means their implementation can be improved to use a different implementation if that proves better)
Same should apply to stacks and queues then.
Stack
is a unbounded resizing array and Queue
is implemented as an unbounded resizing circular buffer. They are named after their abstract concepts. Stack (abstract data type): Last-In-First-Out (LIFO), Queue (abstract data type): First-In-First-Out (FIFO)
Hashtable, ArrayList. They seem to exist.
DictionaryEntry
Yeah but lets pretend they don't; they are artefacts from a less civilised time. They have no type safety and box if you use primitives or structs; so allocate on every Add.
You can use @terrajobst's Platform Compatibility Analyzer and it will tell you: "Please don't"
List is a list, but it's not a list
It really is a List (abstract data type) also known as Sequence.
Equally Priority Queue is an abstract type that tells you what it achieves in use; not how it does it in implementation (as there can be many different implementations)
Lots of great discussion!
The original specification was designed with a few core principles in mind:
@karelz @pgolebiowski
Renaming to "Heap" or another term aligned to a data structure implementation wouldn't match most BCL conventions for collections. Historically, .NET collection classes have been designed to be general purpose rather than focused on the specific data structure/pattern. My original thinking was that early in the .NET ecosystem the API designers intentionally moved from "ArrayList" to "List
If we go with Heap, the same might occur - many intermediate-skilled developers will (sadly) see "Heap" and confuse it for the application's memory heap (i.e. heap and stack) instead of "heap the generalized data structure". The prevalence of System.Collections.Generic will cause it to show up in just about every .NET developer's intelligent proposal, and they'll wonder why they can allocate a new memory heap :)
PriorityQueue, by comparison, is far more discoverable and less prone to confusion. You can type "Queue" and get proposals for PriorityQueue.
A few proposals and questions asked about integer priorities or a generic parameter for priority (TKey, TPriority, etc). Adding an explicit priority would require consumers to write their own logic to map their priorities and increase the complexity of the API. Using the built in IComparer
If entries were required to be unique, Enqueue() would require a uniqueness lookup to throw an ArgumentException. In addition, there are likely valid scenarios to allow an item to be queued more than once. This non-uniqueness design makes providing an Update() operation challenging as there would be no way to tell which object is being updated. As a few comments indicated, this would start getting into APIs returning "node" references which in turn would (likely) require allocations that would need to be garbage collected. Even if that was worked around, it would increase the per-element memory consumption of the priority queue.
At one point I'd had a custom IPriorityQueue interface in the API before I posted the spec. Ultimately I decided against it - the usage pattern I was aiming for was Enqueue, Dequeue, and Iterate. Already covered by the existing interface set. Thinking of this as a Queue that's sorted internally; so long as items hold their own (initial) position in the queue based on their IComparer
I owe you all some customer code examples - my original plan was to go through the existing BCL implementations of PriorityQueue to use as a basis for the examples.
In the summary api the compare is provided by the provided comparer IComparer
comparer, no wrapper is required. Often the priority will be part of the type e.g. a time scheduler will have the execution time as a property of the type.
Under the OP API proposed, one of these needs to be met to use the class:
The dual-type API aims to lessens the burden of the second two options, yes? How does construction work of a new PriorityQueue/Heap when you have a type that already contains the priority? What does the API look like?
I believe it is: Decrease-key/Increase-key support in native libraries.
public override void Update(ArrayHeapNode<TKey, TValue> node, TKey key) {}
public override void Update(TKey key, TValue value);
In either case, the item associated with the given priority is updated. Why is the ArrayHeapNode required for updating? What does it accomplish that cannot be accomplished by taking the TKey/TValue directly?
@ianhays
In either case, the item associated with the given priority is updated. Why is the ArrayHeapNode required for updating? What does it accomplish that cannot be accomplished by taking the TKey/TValue directly?
At least with binary heap (the one I am most familiar with), if you want to update the priority of some value and you know its position in the heap, you can do it quickly (in O(log n) time).
But if you only know the value and the priority, you would first need to find the value, which is slow (O(n)).
On the other hand, if you don't need to update efficiently, that one allocation per item in the heap is pure overhead.
I would like to see a solution where you pay that overhead only when you need it, but that might not be possible to implement and the resulting API might not look good.
The dual-type API aims to lessens the burden of the second two options, yes? How does construction work of a new PriorityQueue/Heap when you have a type that already contains the priority? What does the API look like?
That's a good point, there is a gap in the original API for scenarios where the consumer already has a priority value (i.e. integer) maintained separately from the value. The flip side is that a
SortedList
To me, the same principle applies to queues. A queue, historically, is a one dimensional data structure.
My modern C++ std library knowledge is admittedly a bit rusty, but even std::priority_queue appears to have push, pop, and take a comparer as a templated (generic) parameter. The C++ standard library crew is about as performance sensitive as you can get :)
I did a very quick scan of priority queue and heap implementations in a few programming languages just now - C++, Java, Rust, and Go all work with a single type (similar to the original API posted here). A cursory glance of the most popular heap/priority queue implementations in NPM show the same.
@pgolebiowski don't get me wrong there should be specific implementations of things named explicitly after their specific implementation.
However that's for when you know what specific data structure you want that matches the performance goals you are after and has the trade offs you are willing to accept.
Generally the framework collections cover the 90% of uses where you want a general behaviour. Then if you want a very specific behaviour or implementation you'd probably go for a 3rd party library; and they'd hopefully be named after the implementation so you know it matches your need.
Just don't want to tie the general behaviour types to a specific implementation; as then its weird if the implementation changes because the type name has to stay the same and they will not match.
There are many concerns that need discussion, but let's start with the one that currently has the most impact on the parts we disagree about: updating and removing elements.
How do you want to support these operations then? We have to include them, they are basic. In Java, the designers have omitted them, and as a result:
This is just pathetic. Is there anyone who really wants to pursue such a way? I'd be ashamed to release such a disabled data structure.
@pgolebiowski rest assured that everyone here has the best intentions for the platform. No one wants to ship broken APIs. We want to learn from mistakes of others, so please keep bringing such useful relevant information (like the one abut Java story).
However, I want to point out couple of things:
I just asked how do you guys want to design the updating / removing thing if you don't like the proposed approach... This is listening to others and looking for consensus, I believe.
I don't doubt your good intentions! Sometimes it matters how you ask - it affects how people perceive the text on other side. Text is free of emotions, so things can be understood differently when written down. English as second language muddies things even more and we all need to be aware of that. I'll be happy to chat about details offline if you're interested ... let's steer the discussion over here back to technical discussion ...
My two cents on the Heap vs. PriorityQueue debate: both approaches are valid and clearly have benefits and drawbacks.
That said, "PriorityQueue" seems far more consistent with the existing .NET approach. Today the core collections are List
It would feel weird for one collection to buck the trend here.
I think PriorityQueue with additional constructor: PriorityQueue
In that case PrioriryQueue
class ArrayHeap : IHeap {}
class PairingHeap : IHeap {}
class FibonacciHeap : IHeap {}
class BinomialHeap : IHeap {}
OK. There are lots of different voices. I went through the discussion again and refined my approach. I also address common concerns. The text below makes use of quotes from the posts above.
The ultimate goal of this entire discussion is providing a specific set of functionalities to make the user happy. It is a pretty common case that a user has a bunch of elements, where some of them are with a higher priority than others. Eventually, they want to keep this bunch of elements in some specific order to be able to efficiently perform the following operations:
The authors of other standard libraries have also tried to support this functionality. In this section, I will refer to how it was solved in Python, Java, C++, Go, Swift, and Rust.
None of them support modifying elements already inserted into the collection. It is extremely surprising, because it is very probable that, during the lifetime of a collection, the priorities of its elements will change. Especially, if such a collection is intended to be used during the entire lifetime of a service. There is also a number of algorithms that utilize this very functionality of updating elements. Such a design is thus simply wrong, because as a result, our customers are lost: StackOverflow. This is one of many, many questions like that over the Internet. The designers have failed.
Another thing worth noting is that every single library provides this partial functionality through implementing a binary heap. Well, it has been shown that it not the most performant implementation in the general case (random input). The best fit for the general case is a quaternary heap (an implicit heap-ordered complete 4-ary tree, stored as an array). It is significantly less known and this is probably the reason why the designers went with a binary heap instead. But still — another poor choice, although of a lesser severity.
What do we learn from that?
I strongly feel that we should provide:
IHeap<T>
interfaceHeap<T>
classThe IHeap
interface would obviously include methods for achieving all the operations described in the beginning of this post. The Heap
class, implemented with a quaternary heap, would be the go-to solution in 98% of the cases. The ordering of elements would be based on IComparer<T>
passed into the constructor or the default ordering if a type is already comparable.
IHeap
as the input. Some examples of other heaps that behave better than the quaternary heap in some specific conditions are: pairing heap, binomial heap, and our beloved binary heap.Heap
implementation. This is optimizing towards 98% of use cases.ISet
and HashSet
IList
and List
IDictionary
and Dictionary
Heap
is indeed unambiguous and clear. It is also is in harmony with Stack
, Queue
, List
, and Array
. The same approach with naming was taken in the most modern standard libraries (Go, Swift, Rust) — they expose a heap explicitly.@pgolebiowski I'm not seeing how Heap<T>
/IHeap<T>
is named like Stack<T>
, Queue<T>
, and/or List<T>
? None of those names explain how they are internally implemented (an array of T as it happens).
@SamuelEnglard
Heap
doesn't say how it is internally implemented either. I fail to understand why for so many people a heap immediately follows a specific implementation. There are many variants of heaps that share the same API, to begin with:
Saying that we are dealing with a heap is still being very abstract. Actually, even saying that we are dealing with a quaternary heap is abstract -- it could be implemented either as an implicit data structure based on an array of T
(like our Stack<T>
, Queue<T>
, and List<T>
) or explicit (using nodes and pointers).
Shortly speaking, Heap<T>
is very similar to Stack<T>
, Queue<T>
, and List<T>
, because it is an elementary data structure, a basic abstract building block, that can be implemented in many ways. Additionally, as it happens, they are all implemented using an array of T
underneath. I find this similarity really strong.
Does that make sense?
For the record, I'm indifferent about the naming. People that are used to using the C++ standard library will, perhaps, prefer _priority_queue_. People that have been educated about data structures may prefer _Heap_. If I had to vote, I'd choose _heap_, though it's close to a coin toss for me.
@pgolebiowski I worded my question wrong, that's my bad. Yes, Heap<T>
doesn't say how it is implemented internally.
Yes Heap is a valid data structure, but Heap != Priority Queue. They both exposes different API surfaces and are used for different ideas. Heap<T>
/IHeap<T>
should be data types that are used internally by (just theoretical names) PriorityQueue<T>
/IPriorityQueue<T>
.
@SamuelEnglard
In terms of how the world of Computer Science is organized, yes. Here are the levels of abstraction:
And yes, having IHeap
and Heap
, the implementation of PriorityQueue
would basically be:
public class PriorityQueue<T>
{
private readonly IHeap<T> heap;
public PriorityQueue(IHeap<T> heap)
{
this.heap = heap;
}
public void Add<T>(T item) => this.heap.Add(item);
public void Remove<T>(T item) => this.heap.Remove(item);
// etc...
}
Let's run a decision tree here.
A priority queue could also be implemented with a data structure different than some form of a heap (theoretically). That makes the design of a PriorityQueue
like the one above pretty ugly, because it is fixed towards the family of heaps only. It is also a very thin wrapper around an IHeap
. This yields a question -- _why not simply use the family of heaps instead_?
We are left with one solution -- fixing the priority queue to a specific implementation of a quaternary heap, without a room for IHeap
interface. I feel it is passing through too many levels of abstraction and kills all the wonderful benefits of having an interface.
We are back with the design choice from the middle of the discussion -- have PriorityQueue
and IPriorityQueue
. But then we would be basically having:
class BinaryHeap : IPriorityQueue {}
class PairingHeap : IPriorityQueue {}
class FibonacciHeap : IPriorityQueue {}
class BinomialHeap : IPriorityQueue {}
Feels not only ugly, but also conceptually wrong -- these are not types of priority queues and don't share the same API (as @SamuelEnglard already noted). I think we should stick to heaps, a family huge enough to have an abstraction for them. We would get:
class BinaryHeap : IHeap {}
class PairingHeap : IHeap {}
class FibonacciHeap : IHeap {}
class BinomialHeap : IHeap {}
And, provided by us, class Heap : IHeap {}
.
BTW, someone might find the following helpful:
| Google query | Results |
| :--------------------------------------: | :-----: |
| "data structure" "priority queue" | 172,000 |
| "data structure" "heap" | 430,000 |
| "data structure" "queue" -"priority queue" | 496,000 |
| "data structure" "queue" | 530,000 |
| "data structure" "stack" | 577,000 |
@pgolebiowski I feel myself going back and forth here so I'll conceded
@karelz @safern How do you feel about the above? Can we fix ourselves on the IHeap
+ Heap
approach so that I can present a specific API proposal?
I question the need for an interface here (be it IHeap
or IPriorityQueue
). Yes, there are many different algorithms that can theoretically be used to implement this data structure. However, it seems unlikely that the framework would ship with more than one, and the idea that library writers really need a canonical interface so that various parties can coordinate in writing cross-compatible heap implementations doesn't seem that likely.
Furthermore, once an interface is released it can never be changed due to compat. That means that the interface tends to fall behind the concrete classes in terms of functionality (an issue with IList and IDictionary today). In contrast, if a Heap class were released and there was serious clamoring for an IHeap interface, I think the interface could be added down the line without issues.
@madelson I agree that the framework likely does not need to ship more than a single implementation of a heap. However, by backing that implementation by an interface, those of us that do care about other implementations can easily swap in another implementation (of our own creation or in an external library) and still be compatible with code that accepts the interface.
I don't really see any negative points to coding to an interface. Those who don't care about it can ignore it and go directly for the concrete implementation. Those who do care can use any implementation of their choosing. It's a choice. And it's a choice that I want.
@pgolebiowski now that you asked, here is my personal opinion (I am somewhat sure that other API reviewers/architects will share it as I've checked for opinion with a few):
The name should be PriorityQueue
, and we should not introduce IHeap
interface. There should be exactly one implementation (likely via some heap).
IHeap
interface is very advanced expert scenario - I would suggest to move it into PowerCollections library (eventually to be created) or any other library.
If the library and IHeap
interface become really popular, we can change our mind later and add IHeap
based on demand (via your constructor overload), but I don't think it is useful/aligned with the rest of BCL enough to justify the complication of adding new interface now. Start simple, grow to complicated only if really needed.
... just my 2 (personal) cents
Given the difference in opinions, I propose following approach to move the proposal forward (which I suggested earlier this week to @ianhays, and therefore indirectly to @safern):
... trying to be fully transparent about my opinion (which you asked for), please don't take it as discouragement or push back - let's see how the API proposal goes.
@karelz
The name should be
PriorityQueue
Any argument? It would be at least nice if you addressed what I wrote above instead of just saying no.
I think you named it pretty well previously: _If you have input, provide it, back it with data and evidence. Also listen to arguments of others, acknowledge them. Provide evidence against other's opinion if you disagree._
Plus, it is not just about the name. It is not that shallow. Please read what I wrote. It is about working between levels of abstraction and being able to enhance code / build on top of it.
Oh, there was an argument in this discussion why:
If we go with Heap, many intermediate-skilled developers will (sadly) see "Heap" and confuse it for the application's memory heap (i.e. heap and stack) instead of "heap the generalized data structure". [...] they'll wonder why they can allocate a new memory heap.
😛
we should not introduce
IHeap
interface.IHeap
interface is very advanced expert scenario - I would suggest to move it into PowerCollections library.
@bendono wrote a very nice comment on this one. @safern also wanted to back the implementation with an interface. And a few others.
Another note -- I am not sure how you imagine moving an interface into a third-party library. How could developers possibly write their code against that interface and use our functionality? They would be forced to either stick to our non-interface functionality or ignore it entirely, there is no other option, it's mutually exclusive. In other words -- our solution would not be extensible at all, resulting with people using either our disabled solution or some third-party library, instead of relying on the same architectural core in both cases. This is what we have in Java's standard library and third-party solutions.
But again, you nicely commented on this one: No one wants to ship broken APIs. We want to learn from mistakes of others, so please keep bringing such useful relevant information (like the one abut Java story).
Given the difference in opinions, I propose following approach to move the proposal forward. [...] Make 2 alternative proposals [...] let's bring it to API review, discuss it there and make a decision there. If I see at least one person in that group voting for Heaps proposal, I'll be happy to reconsider my view point.
There was a lot of discussion on various parts of the API above. It tackled:
PriorityQueue
vs Heap
Why cannot those people you have in mind just contribute to this discussion? Why should we instead start it again?
If we go with Heap, many intermediate-skilled developers will (sadly) see "Heap" and confuse it for the application's memory heap (i.e. heap and stack) instead of "heap the generalized data structure".
Yeah, this is me. I'm entirely self-taught in programming, so I have no idea what was being said when "Heap
" entered the discussion. Not to mention that even in terms of a collection, "a heap of things", to me would more intuitively imply it being unordered in every fashion.
I can't believe this is really happening...
Any argument? It would be at least nice if you addressed what I wrote above instead of just saying no.
If you read my answer, you can notice that I mentioned key arguments for my position:
Same arguments which has been repeated on the thread several times IMO. I just summarized them
. Please read what I wrote.
I was actively monitoring this thread all the time. I read all the arguments and points. This is my summarized opinion, despite reading (and understanding) all your and others' points.
You are apparently passionate about the topic. That is great. However, I have a feeling that we got into a position when we're just repeating similar arguments from each side, which is not going to lead much further - that's why I recommended 2 proposals and getting more feedback on them from larger group of experience BCL API reviewers (BTW: I don't count myself as experienced API reviewer yet).
How could developers possibly write their code against that interface and use our functionality?
The developers who care about the IHeap
advanced scenario would reference the interface and the implementation from the 3rd party library. As I said, if it proves to be popular, we could consider moving it into CoreFX later.
The good news is that adding IHeap
later is totally doable - it basically adds just one constructor overload on PriorityQueue
.
Yes, it is not ideal from your point of view, but it does not prevent the innovation you consider important in future. I think it is reasonable middle-ground.
Why cannot those people you have in mind just contribute to this discussion?
API review is a meeting with active discussion, brainstorming, weighting all angles. In many cases it is more productive/efficient than back-and-forth on GitHub issues. See dotnet/corefx#14354 and its predecessor dotnet/corefx#8034 - too long discussion, several different opinions with gazillion tweaks which are hard to track, no conclusion, while great discussion, also non-trivial waste time for quite a few people, until we sat down and talked about it and got to consensus.
Having API reviewers monitor every single API issue, or even just the chatty ones does not scale well.
Why should we instead start it again?
We won't start again. Why would you think that?
We will finish the API review on the first level (the area owner level) by sending 2 most popular proposals with pros/cons into next API review level.
It's hierarchical approvals/reviews approach. It is similar to business reviews - VPs/CEOs with decision powers don't oversee every discussion about every project in their company, they ask their teams/reports to bring proposals for the most impactful or contagious decisions to have a further discussion about them. Teams/reports have to summarize the problem and present the pros/cons of alternative solutions.
If you believe we are not ready to present the final 2 proposals with pros and cons, because there are things which haven't been said on this thread yet, let's keep discussing, until we have just a few top candidates to review at the next API review level.
I got the feeling that everything that needed to be said has been said.
Makes sense?
The name should be
PriorityQueue
Any argument?
If you read my answer, you can notice that I mentioned key arguments for my position.
Geez... I was referring to what I quoted (obviously) -- you deciding to go with a priority queue instead of a heap. And yes, I have read your answer -- it contains exactly 0% of arguments for this.
I was actively monitoring this thread all the time. I read all the arguments and points. This is my summarized opinion, despite reading (and understanding) all your and others' points.
You like to be hyperbolic, I have noticed that before. You merely acknowledge the existence of points.
How could developers possibly write their code against that interface and use our functionality?
The developers who care about the IHeap advanced scenario would reference the interface and the implementation from the 3rd party library. As I said, if it proves to be popular, we could consider moving it into CoreFX later.
Are you aware that you are just repeating my words here and not at all addressing the issue I have presented as a consequence of the above? I wrote:
They would be forced to either stick to our non-interface functionality or ignore it entirely, there is no other option, it's mutually exclusive. In other words -- our solution would not be extensible at all, resulting with people using either our disabled solution or some third-party library, instead of relying on the same architectural core in both cases.
I will make this very clear for you. There would be two disjoint groups of people:
The problem here is, as I said, that those are disjoint groups of people. And they are producing code that does not work together, because there is no common architectural core. _CODEBASE IS NOT COMPATIBLE_. You cannot undo it later.
The good news is that adding IHeap later is totally doable - it basically adds just one constructor overload on PriorityQueue.
I have already written why it sucks: see this post.
Why cannot those people you have in mind just contribute to this discussion?
Having API reviewers monitor every single API issue, or even just the chatty ones does not scale well.
Yes, I was asking why API reviewers cannot monitor every single API issue. Precisely, you indeed responded accordingly.
Makes sense?
No. I am tired of this discussion, really. Some of you guys are clearly participating in it simply because it is your job and you are told to. Some of you need guidance all the time, which is very tiresome. You even asked me to prove why a priority queue should be implemented with a heap internally, clearly lacking computer science background. Some of you don't even understand what a heap really is, making the discussion even more chaotic.
Go with your disabled PriorityQueue that does not allow updating and removing elements. Go with your design that does not allow healthy OO approach. Go with your solution that does not allow reusing the standard library when writing an extension. Go the Java way.
And this... this is just mind-blowing:
If we go with Heap, many intermediate-skilled developers will (sadly) see "Heap" and confuse it for the application's memory heap (i.e. heap and stack) instead of "heap the generalized data structure".
Present the API with your approach. I'm curious.
I can't believe this is really happening...
Well, excuuuuuuuuuuuuuuuse me for not having had the opportunity to get a proper Computer Science education to learn that Heap
is some kind of data structure different from the memory heap.
The point still stands, though. A heap of something implies nothing about it being ordered in some kind. If I were to need a collection that allows me to store objects of things to process, where some instances that come in later may need to be processed sooner, I would not be searching for something called a Heap
. PriorityQueue
on the other hand, perfectly communicates that it does exactly that.
As a backing implementation? Sure, implementation details shouldn't be my concern.
Some IHeap
abstraction? Great for API authors and people who do have a CS Major to know what it's used for, no reason not to have.
Giving something a cryptic name that doesn't state its intent all that well and subsequently limits discoverability? 👎
Well, excuuuuuuuuuuuuuuuse me for not having had the opportunity to get a proper Computer Science education to learn that Heap is some kind of data structure different from the memory heap.
This is ridiculous. At the same time you want to participate in a discussion regarding adding a functionality like that. It sounds like trolling.
A heap of something implies nothing about it being ordered in some kind.
You are wrong. It is ordered, as a heap. Like in the pictures you linked.
As a backing implementation? Sure, implementation details shouldn't be my concern.
I have already addressed that. The family of heaps is huge and there are two levels of abstraction above an implementation. Priority queue is the third abstraction layer.
If I were to need a collection that allows me to store objects of things to process, where some instances that come in later may need to be processed sooner, I would not be searching for something called a Heap. PriorityQueue on the other hand, perfectly communicates that it does exactly that.
And without any background, you would ask Google to provide you articles on priority queues? Well, we can argue what is more or less likely in our opinions. But, as has been said very nicely:
If you have input, provide it, back it with data and evidence. Also listen to arguments of others, acknowledge them. Provide evidence against other's opinion if you disagree.
And according to the data you are wrong:
Query | Hits
:----: |:----:|
| "data structure" "priority queue" | 172,000 |
| "data structure" "heap" | 430,000 |
It is almost 3 times more likely that you would come across a heap while reading about data structures. Plus, it is a name that Swift, Go, Rust, and Python developers are familiar with, because their standard libraries provide such a data structure.
Query | Hits
:----: |:----:|
| "golang" "priority queue" | 3.390 |
| "rust" "priority queue" | 8.630 |
| "swift" "priority queue" | 18.600 |
| "python" "priority queue" | 72.800 |
| "golang" "heap" | 79.000 |
| "rust" "heap" | 492.000 |
| "swift" "heap" | 551.000 |
| "python" "heap" | 555.000 |
Actually it is also similar for C++, because a heap data structure was introduced there sometime in the previous century.
Giving something a cryptic name that doesn't state its intent all that well and subsequently limits discoverability? 👎
No opinions. Data. See above. Especially no opinions from someone who has no background. You wouldn't Google a priority queue either without having some reading about data structures before. And a heap is covered in many Data Structures 101.
It's the foundation in Computer Science. It is elementary. When you have several semesters of algorithms and data structures, a heap is something you see in the beginning.
But still:
EDIT: See Google Trends.
As another self-taught developer, I have no problem with _heap_. As a developer that is always striving to improve, I've taken the time to learn and understand all about data structures. In short, I don't agree with the implication that a naming convention should target those that have not taken the time to understand the lexicon of the field of which they are a part.
And I also strongly disagree with statements like "The name should be PriorityQueue". If you don't want peoples input, then don't make it open-source and don't ask for it.
Let me provide some explanation of how we think about API naming:
We tend to favor consistency within the .NET platform above almost anything else. That's important in order to make APIs look familiar and predictable. Sometimes this means we accept that a name isn't a 100% correct if that's a term we've used before.
Our goal is to design a platform that is approachable to a wide variety of developers, some of which who haven't had a formal education in computer science. We believe that part of the reason .NET has generally been perceived as very productive and easy to use is partially due to that design point.
We generally employ the "search engine test" when it comes checking how well-known and established a name or terminology is. So I highly appreciate the research @pgolebiowski has done. I haven't done the research myself but my gut feel is that "heap" isn't a term many non-domain experts would be looking for.
Hence, I tend to agree with @karelz that PriorityQueue
looks like the better choice. It combines an existing concept (queue) and adds the twist that expresses the desired capability: ordered retrieval based on a priority. But we're not unmovably attached to that name. We've often changed names of data structures and technologies based on customer feedback.
However, I'd like to point out that this:
If you don't want peoples input, then don't make it open-source and don't ask for it.
is a false dichotomy. It's not that we don't want feedback from our ecosystem and contributors (we obviously do). But at the same time we also have to appreciate that our customer base is quite diverse and that GitHub contributors (or developers on our team) aren't always the best proxy for all of our customers. Usability is hard and it will likely take some iterations to add new concepts into .NET, especially in highly popular areas like collections.
@pgolebiowski:
I highly value your insight, data, and suggestions. But I absolutely do not appreciate your style of argumentation. You got personal towards both members of my team as well as community members on this thread. Just because you disagree with us doesn't give you permission to accuse us of not having expertise or not caring because it's "just our job". Consider that many of us have literally moved around the globe, left families and friends behind only because we wanted to do this job. Not only are comments like yours very unfair, they also don't help to advance the design.
So while I like to think I have a thick skin, I don't have a lot of tolerance for that sort of behavior. Our domain is already complex enough; we don't need to add confrontational & adversarial communication.
Please be respectful. Passionately criticizing ideas is fair game, but attacking people isn't. Thank you.
Dear everyone who feels downhearted,
I apologize for decreasing your levels of happiness by my harsh-ish attitude.
@karelz
I've made a fool of myself there by making technical mistake. I apologized. It was accepted. Yet thrown at me later. Not cool IMO.
I am sorry that what I wrote made you unhappy. Though it wasn't that bad as you described it -- I merely provided this as one of many factors that have contributed to my feeling of being tired. It's of a lesser severity, I think. But still, I am sorry.
And yes, everyone makes mistakes. It is OK. Me too, for example by getting carried away sometimes.
What got me most was "you are just being told to do this, you don't believe it" - yeah, that's exactly why I do it ALSO on weekends.
I am sorry, I can see that you work hard and I appreciate that very much. It was visible to me how especially dedicated you were by the 5/10 milestone.
@terrajobst
Just because you disagree with us doesn't give you permission to accuse us of not having expertise or not caring because it's "just our job".
Comments like yours are very unfair and also don't help to advance the design.
If there is anyone ill who would like to "lick me", please feel free.
We have encountered a problem and have dealt with that. Everyone has learned something out of it. I can see that everyone here is concerned about the quality of the framework, which is absolutely wonderful and makes me motivated to contribute. I look forward to continue working on CoreFX with you. That being said, I will address your new technical input probably tomorrow.
@pgolebiowski
Hopefully we can meet in person at some point. I honestly believe that part of the challenge of doing everything online is that personalities can sometimes mix in bad ways with no intention from either side.
I look forward to continue working on CoreFX with you. That being said, I will address your new technical input probably tomorrow.
Same here. This is an interesting space and there are plenty of amazing things we can do together :-)
@pgolebiowski first, thanks for your reply. It shows you care and you mean well (which is something I secretly hope every person/developer in the world does, all conflict is just misunderstanding / miscommunication). And that makes me really happy - it keeps me going and excited.
I would suggest to start over in our relationship. Let's turn the discussion back to technical, let's all learn from this thread, how to handle similar situations in future, let's all assume again that the other party has only best interest for the platform in mind.
BTW: This is one of a few more difficult encounters/discussions on CoreFX repo in last 9 months, and as you see, we (incl./esp. me) are still learning how to handle them well - so this particular instance is going to benefit even us and it will make us be better in future and help us understand better different view points from passionate community members. Maybe it will shape our updates to contribution docs ...
My harsh comments were a result of the inefficiency in this discussion. Inefficiency = chaotic way of discussing, where points are not addressed / resolved and we move further in spite of that => tiresome.
Understood your frustration! Interestingly similar frustration was also on the other side from the same reason 😉 ... it's almost funny how the world works :).
Unfortunately, the difficult discussion is part of the job when you drive a design decision. It is LOTS of work. Many people underestimate it. Key skill required is patience with everyone and ability to step above your own opinion and think how to drive consensus even if it doesn't go your way. That's why I suggested to have 2 proposals and "escalate" the technical discussion to API reviewer group (mainly because I don't know for sure that I am right, although I secretly hope I am right as every other developer in the world would 😉).
It is VERY hard to have opinion on a topic AND drive the discussion to CONSENSUS on the same thread. From that perspective you and me have the most common on this thread - we both have opinions, yet we both try to drive the discussion to closure & decision. So let's work together closely.
My general approach is: Whenever I think someone is attacking me, is being evil, lazy, frustrates me, or something. I ask first myself and also the particular person: Why? Why did you say that? What did you mean?
Usually it is the sign of lack of understanding/communication of motives. Or sign of reading too much between the lines and seeing insults/accusations/bad intents where they are not.
Now that I am not afraid to keep discussing technical questions, here's what I wanted to ask earlier:
Go with your disabled PriorityQueue that does not allow updating and removing elements.
This is something I don't understand. If we leave out IHeap
(in my / the original proposal here), why would that not be possible?
IMO there is no difference between the 2 proposals from the class capabilities point of view, the only difference is -- do we add the PriorityQueue(IHeap)
constructor overload or not (leaving the class name dispute aside as independent problem to resolve).
Disclaimer (to avoid miscommunication): I don't have time to read articles and do research, I expect short answer, presenting elevator-pitch argument from whoever wants to drive the technical discussion. Note: It is not me trolling. I would ask the same question to anyone on our team making this claim. If you don't have energy to explain it / drive the discussion (which would be totally understandable given the difficulties and time investment from your side), just say so, no hard feelings. Please don't feel pressured by me or anyone (and that applies to everyone on the thread).
Not trying to add one more unnecessary comments here, this thread has been WAY TOO LONG. The no.1 rule of Internet age is that avoid text communication if you care about people relationship. (Well, I coined it). I believe some other open source community would switch to a Google Hangout for this kind of discussion if the need is apparent. When you look at other people's face, you would never say anything "insulting", and people get familiar with each other very fast. Maybe we can try as well?
@karelz Due to the length of the discussion above, it is highly unlikely for anyone new to contribute if the flow is not modified. As such, I would like to propose the following approach right now:
Does that have a chance of making sense?
PriorityQueue that does not allow updating and removing elements.
It was regarding the original proposal, which lacked these capabilities :) Sorry for not making it clear.
If you don't have energy to explain it / drive the discussion (which would be totally understandable given the difficulties and time investment from your side), just say so, no hard feelings. Please don't feel pressured by me or anyone (and that applies to everyone on the thread).
I will not give up. No pain no gain xD
@xied75
I believe some other open source community would switch to a Google Hangout for this kind of discussion if the need is apparent. Maybe we can try as well?
Looks good ;)
No matter if we go with Heap
/ IHeap
or PriorityQueue
/ IPriorityQueue
(or something else), for the functionality we are about to provide...
_would you like to have an interface, along with the implementation?_
@bendono
By backing that implementation by an interface, those of us that do care about other implementations can easily swap in another implementation (of our own creation or in an external library) and still be compatible with code that accepts the interface.
Those who don't care about it can ignore it and go directly for the concrete implementation. Those who do care can use any implementation of their choosing.
@madelson
There are many different algorithms that can theoretically be used to implement this data structure. However, it seems unlikely that the framework would ship with more than one, and the idea that library writers really need a canonical interface so that various parties can coordinate in writing cross-compatible heap implementations doesn't seem that likely.
Furthermore, once an interface is released it can never be changed due to compat. That means that the interface tends to fall behind the concrete classes in terms of functionality (an issue with IList and IDictionary today).
@karelz
Interface is very advanced expert scenario.
If the library and
IHeap
interface become really popular, we can change our mind later and addIHeap
based on demand (via constructor overload), but I don't think it is useful/aligned with the rest of BCL enough to justify the complication of adding new interface now. Start simple, grow to complicated only if really needed.
Use 👍 and 👎 to vote on this one (for and against an interface, respectively). Alternatively, write a comment. Ideally, API reviewers will participate.
I would like to add that while changing interfaces is hard, with extension methods (and properties coming) interfaces are easier to extend and/or work with (see LINQ)
I would like to add that while changing interfaces is hard, with extension methods (and properties coming) interfaces are easier to extend and/or work with (see LINQ)
They can only work with the publicly defined methods on the interface; so it means getting it right first time.
I'd suggest holding off on the interface for a bit till the class is in use and has settled - then introduce an interface. (with the debate over the shape of the interface being a separate issue)
To be blunt, the only thing that I care about is the interface. A solid implementation would be nice, but I (or anyone else) could always create my own.
I remember a few years back how we had this exact same conversation with HashSet<T>
. Microsoft wanted HashSet<T>
while the community wanted ISet<T>
. If I remember correctly, we got HashSet<T>
first and ISet<T>
second. Without an interface, usage of HashSet<T>
was quite limited as it is difficult (if not often impossible) to change a public API.
I should note that there is also SortedSet<T>
now, not to mention numerous non-BCL implementations of ISet<T>
. I have used ISet<T>
in public APIs and am thankful for it. My private implementation can use any concrete implementation that I feel is right. I can also easily swap out an implementation for another without breaking anything. This would not be possible without the interface.
For those who say that we can always define our own interfaces, consider this. Assume for a moment that ISet<T>
in the BCL never happened. Now I can create my own interface IMySet<T>
as well as solid implementations. However, one day the BCL HashSet<T>
is released. It may or may not implement ISet<T>
, but it does not implement IMySet<T>
. As a result, I cannot swap in HashSet<T>
as an implementation of my IMySet<T>
.
I fear we are going to repeat this travesty again.
If you aren't ready to commit to an interface, then it is too early to introduce a concrete class.
I find the disparity in opinions significant. Just by looking at the numbers, slightly more people are for an interface, but that doesn't tell us much. I'll try to ask some other people who have participated in the discussion before but have not expressed their opinions in regards to an interface yet:
@ebickle @svick @horaciojcfilho @paulcbetts @justinvp @akoeplinger @mbeidler @SirCmpwn @andrewgmorris @weshaggard @BrannonKing @NKnusperer @danmosemsft @ianhays @safern @VisualMelon @Joe4evr @jcouv @xied75
For the notified: _guys, could you please provide your input? It would be very helpful, even if you just vote with :+1:, :-1:. You can start reading at this issue comment (4 posts above here) -- we are discussing whether providing an interface is a good idea (only this aspect for now, until it is resolved)._
Maybe some of these people are API reviewers, but I believe we need their support in this fundamental decision before we go forward. @karelz, @terrajobst, would it be possible for you to ask them to help us resolve this aspect? Their input would be very valuable, as they are the ones who will review it eventually -- it would be very helpful to know it at this point, before committing with a certain approach (or going with more than 1 proposal, which would be tiresome and a bit pointless, as we can know their decision earlier).
Personally, I am for an interface, but if the decision is different, I am happy to follow a different path.
I don't want to drag API reviewers into the discussion -- it is long and messy, it would not be efficient for API reviewers to re-read everything or even just decide what is the last important reply (I am losing myself in it).
I think we are at the point when we can create 2 formal API proposals (see the 'good' example there) and highlight pros/cons of each. We can then review them in our API review meeting and make recommendation, taking votes into consideration. Depending on the discussion there (if there are multiple opinions), we might come back and kick off Twitter poll / additional GH voting, etc.
BTW: API review meetings happen almost every Tue.
To help kick it off, here's how a proposal should look like:
```c#
namespace System.Collections.Generic
{
public class PriorityQueue
: IEnumerable
{
public PriorityQueue();
public PriorityQueue(int capacity);
public PriorityQueue(IComparer
public PriorityQueue(IEnumerable
public PriorityQueue(IEnumerable
public PriorityQueue(int capacity, IComparer
public IComparer<T> Comparer { get; }
public int Count { get; }
public void Enqueue(T item);
public T Dequeue();
public T Peek();
public void Clear();
public bool Contains(T item);
// Sets the capacity to the actual number of elements
public void TrimExcess();
public void CopyTo(T[] array, int arrayIndex);
void ICollection.CopyTo(Array array, int index);
public T[] ToArray();
public Enumerator GetEnumerator();
IEnumerator<T> IEnumerable<T>.GetEnumerator();
IEnumerator IEnumerable.GetEnumerator();
bool ICollection.IsSynchronized { get; }
object ICollection.SyncRoot { get; }
public struct Enumerator : IEnumerator<T>
{
public T Current { get; }
object IEnumerator.Current { get; }
public bool MoveNext();
public void Reset();
public void Dispose();
}
}
}
```
TODO:
UpdatePriority
scenario@karelz OK, will do it, stay tuned! :smile:
Alright. As far as I can tell, an interface or a heap will not pass the API review. As such I would like to propose a slightly different solution, forgetting about the quaternary heap for example. The data structure below is different in a few ways than what we can find in Python, Java, C++, Go, Swift, and Rust (at least those).
The main focus is on correctness, completeness in terms of functionality, and intuitiveness, while maintaining optimal complexities and great real-world performance.
@karelz @terrajobst
It is a pretty common case that a user has a bunch of elements, where some of them are with a higher priority than others. Eventually, they want to keep this bunch of elements in some specific order to be able to efficiently perform the following operations:
First we will focus an building the priority queue (only adding elements). The way it is done depends on the type of user data.
TKey
and TValue
are separate objects.TKey
is comparable.var queue = new PriorityQueue<int, string>();
queue.Enqueue(5, "five");
queue.Enqueue(1, "one");
queue.Enqueue(3, "three");
TKey
and TValue
are separate objects.TKey
is not comparable.var comparer = Comparer<MyKey>.Create(/* custom logic */);
var queue = new PriorityQueue<MyKey, string>(comparer);
queue.Enqueue(new MyKey(5), "five");
queue.Enqueue(new MyKey(1), "one");
queue.Enqueue(new MyKey(3), "three");
TKey
is contained within TValue
.TKey
is not comparable.public class MyClass
{
public MyKey Key { get; set; }
}
Apart from a key comparator, we also need a key selector:
var selector = new Func<MyClass, MyKey>(item => item.Key);
var queue = new PriorityQueue<MyKey, MyClass>(selector, comparer);
queue.Enqueue(new MyClass( /* args */ ));
queue.Enqueue(new MyClass( /* args */ ));
queue.Enqueue(new MyClass( /* args */ ));
Enqueue
method here. This time it only takes one argument (TValue
).Enqueue(TKey, TValue)
must throw InvalidOperationException
.Enqueue(TValue)
must throw InvalidOperationException
.TKey
is contained within TValue
.TKey
is comparable.var queue = new PriorityQueue<MyKey, MyClass>(selector);
queue.Enqueue(new MyClass( /* args */ ));
queue.Enqueue(new MyClass( /* args */ ));
queue.Enqueue(new MyClass( /* args */ ));
TKey
is assumed to be Comparer<TKey>.Default
, as in Scenario 1.TKey
and TValue
are separate objects, but of the same type.TKey
is comparable.var queue = new PriorityQueue<int, int>();
queue.Enqueue(5, 50);
queue.Enqueue(1, 10);
queue.Enqueue(3, 30);
public class MyClass : IComparable<MyClass>
{
public int CompareTo(MyClass other) => /* custom logic */
}
var queue = new PriorityQueue<MyClass, MyClass>();
queue.Enqueue(new MyClass( /* args */ ));
queue.Enqueue(new MyClass( /* args */ ));
queue.Enqueue(new MyClass( /* args */ ));
In the beginning, there is an ambiguity.
MyClass
is a separate object and the user would like to simply have the key and value separate, as it was in Scenario 5.Here is how PriorityQueue
deals with the ambiguity:
Enqueue(TValue)
is allowed. Therefore, an alternative solution to Scenario 6 is to simply define a selector and pass it to the constructor.Enqueue
method:Enqueue(TKey, TValue)
is called the first time, key and value are considered to be separate objects (Scenario 5). From then on, the Enqueue(TValue)
method must throw InvalidOperationException
.Enqueue(TValue)
is called the first time, key and value are considered to be the same object, key selector is inferred (Scenario 6). From then on, the Enqueue(TKey, TValue)
method must throw InvalidOperationException
.We have already covered the creation of a priority queue. We also know how to add elements to the collection. Now we will focus on the remaining functionality.
There are two operations we can do on the element with the highest priority — retrieve it or remove it.
var queue = new PriorityQueue<int, string>();
queue.Enqueue(5, "five");
queue.Enqueue(1, "one");
queue.Enqueue(3, "three");
// retrieve the element with the highest priority
var element = queue.Peek();
// remove that element
queue.Dequeue();
What is exactly being returned by the Peek
and Dequeue
methods will be discussed later.
For an arbitrary element, we might like to modify it. As an example, if a priority queue is used during the entire lifetime of a service, there is a high chance the developer would like to have a possibility of updating priorities.
Surprisingly, such a functionality is not provided by equivalent data structures: in Python, Java, C++, Go, Swift, and Rust. Possibly some others too, but I have only checked those. The result? Disappointed developers:
We have basically two options here:
Every time a user adds an element to the priority queue, they are given a handle:
var handle = queue.Enqueue(42, "forty two");
Handle is a class with the following public API:
public class PriorityQueueHandle<TKey, TValue>
{
public TKey Key { get; }
public TValue Value { get; }
}
It is a reference to a unique element in the priority queue. If you are concerned about the efficiency, please see FAQ (and you will not be concerned anymore).
Such an approach allows us to easily modify a unique element in the priority queue, in a very intuitive and simple way:
/*
* User wants to retrieve a server that at any given moment
* has the lowest average response time.
* He doesn't want to maintain a separate key object (it is
* inside his type) and the key is already comparable.
*/
var queue = new PriorityQueue<double, ServerStats>(selector);
/* adding some elements */
var handle = queue.Enqueue(server);
/*
* Server stats are kept along with handles (e.g. in a dictionary).
* Whenever there is a need of updating the priority of a certain
* server, the user simply updates the appropriate ServerStats object
* and then simply uses the handle associated with it:
*/
queue.Update(handle);
In the example above, because there was a specific selector defined, a user could update the object himself. The priority queue simply needed to be notified to rearrange itself (we don't want to make it observable).
Similarly to how the type of user data impacts the way a priority queue is build and filled with elements, the way it is updated also varies. I will make the scenarios shorter this time, as you already probably know what I am going to write.
TKey
and TValue
are separate objects.var queue = new PriorityQueue<int, string>();
var handle = queue.Enqueue(1, "three");
queue.Enqueue(1, "three");
queue.Enqueue(2, "three");
queue.Update(handle, 3);
As you can see, such an approach provides a simple way of referring to a unique element in the priority queue, no questions asked. It is especially helpful in scenarios where keys may be duplicated. Also, this is very efficient — we know the element to update in O(1), and the operation can be performed in O(log n).
Alternatively, the user could use additional methods that search through the entire structure in O(n) and then update the first element that matches the arguments. This is how removing an element is done in Java. It is neither fully correct nor efficient, but sometimes simpler:
var queue = new PriorityQueue<int, string>();
queue.Enqueue(1, "one");
queue.Enqueue(2, "three");
queue.Enqueue(3, "three");
queue.Update("three", 30);
The method above finds the first element with its value equal to "three"
. Its key will be updated to 30
. We don't know though which one will be updated if there is more than one that satisfies the condition.
It could be slightly safer with a method Update(TKey oldKey, TValue, TKey newKey)
. This adds another condition — the old key also has to match. Both solutions are simpler, but not 100% safe and less performant (O(1 + log n) vs O(n + log n)).
TKey
is contained within TValue
.var queue = new PriorityQueue<int, MyClass>(selector);
/* adding some elements */
queue.Update(handle);
This scenario is what was given as an example in the Handles section.
The above is achieved in O(log n). Alternatively, the user could use a method Update(TValue)
that finds the first element equal to the one specified and does an internal rearranging. Of course, this is O(n).
TKey
and TValue
are separate objects, but of the same type.Using a handle, there is no ambiguity, as always. In case of other methods that allow for updating — there is, of course. This is a trade-off between simplicity, performance, and correctness (which depends on whether data can be duplicated or not).
With a handle, there is again no issue. Updating via other methods may be simpler — but again, less performant and not always correct (equal entries).
Can be done simply and correctly via a handle. Alternatively, via methods Remove(TValue)
and Remove(TKey, TValue)
. Same issues as previously described.
var queue1 = new PriorityQueue<int, string>();
var queue2 = new PriorityQueue<int, string>();
/* add some elements to both */
queue1.Merge(queue2);
InvalidOperationException
.InvalidOperationException
.public class PriorityQueue<TKey, TValue>
: IEnumerable,
IEnumerable<PriorityQueueHandle<TKey, TValue>>,
IReadOnlyCollection<PriorityQueueHandle<TKey, TValue>>
// ICollection not included on purpose
{
public PriorityQueue();
public PriorityQueue(IComparer<TKey> comparer);
public PriorityQueue(Func<TValue, TKey> keySelector);
public PriorityQueue(Func<TValue, TKey> keySelector, IComparer<TKey> comparer);
IComparer<TKey> Comparer { get; }
Func<TValue, TKey> KeySelector { get; }
public int Count { get; }
public void Clear();
public bool Contains(PriorityQueueHandle<TKey, TValue> handle); // O(log n)
public bool Contains(TValue value); // O(n)
public bool Contains(TKey key, TValue value); // O(n)
public PriorityQueueHandle<TKey, TValue> Enqueue(TKey key, TValue value); // O(log n)
public PriorityQueueHandle<TKey, TValue> Enqueue(TValue value); // O(log n)
public PriorityQueueHandle<TKey, TValue> Peek(); // O(1)
public PriorityQueueHandle<TKey, TValue> Dequeue(); // O(log n)
public void Update(PriorityQueueHandle<TKey, TValue> handle); // O(log n)
public void Update(PriorityQueueHandle<TKey, TValue> handle, TKey newKey); // O(log n)
public void Update(TValue value, TKey newKey); // O(n)
public void Update(TKey oldKey, TValue value, TKey newKey); // O(n)
public void Remove(PriorityQueueHandle<TKey, TValue> handle); // O(log n)
public void Remove(TValue value); // O(n)
public void Remove(TKey key, TValue value); // O(n)
public void Merge(PriorityQueue<TKey, TValue> other); // O(1)
public IEnumerator<PriorityQueueHandle<TKey, TValue>> GetEnumerator();
IEnumerator IEnumerable.GetEnumerator();
}
public class PriorityQueueHandle<TKey, TValue>
{
public TKey Key { get; }
public TValue Value { get; }
}
I am strongly for the approach with handles, because it is efficient, intuitive, and fully correct. The question is how to deal with simpler but potentially not that safe methods. One thing we could do is to replace those simpler methods with something like this:
UpdateFirst(Func<PriorityQueueHandle<TKey, TValue>, bool> predicate)
UpdateAll(Func<PriorityQueueHandle<TKey, TValue>, bool> predicate)
The usage would be pretty sweet and powerful. And truly intuitive. And readable (very expressible). I am for that one.
var queue = new PriorityQueue<int, string>();
/* add some elements */
queue.UpdateAll(x => x.Key < 15, 0);
// or for example
queue.UpdateFirst(x => x.Value.State == State.Idle, 100);
Setting the new key could potentially be a function as well, which takes the old key and transforms it somehow.
Same for Contains
and Remove
.
If the key selector is not defined (and thus key and value are kept separately), the method could be named UpdateKey
. It's probably more expressible. When the key selector is defined though, Update
is better, because the key is already updated and what is needed to be done is rearrangement of some elements within the priority queue.
There is no issue with efficiency in regards to using handles. A common fear is that it will require additional allocations, because we are using a heap based on an array internally. Fear not. Read on.
It would be a completely different approach to delivering a priority queue. For the first time, a standard library won't provide this functionality implemented as a binary heap, which is represented as an array underneath. It will be implemented with a pairing heap, which by design doesn't use an array — it simply represents a tree instead.
BTW, it is for the 1st API review iteration. The section with _open questions_ needs to be resolved (I need your opinion [and of API reviewers if possible]). If the 1st iteration passes at least partially, for the rest of the discussion I'd like to create a new issue (close and reference this one).
As far as I can tell, an interface or a heap will not pass the API review.
@pgolebiowski : Why not just close the issue then? Without an interface, this class is close to useless. The only place that I could use it is in private implementations. At which point I can just create (or re-use) my own priority queue when needed. I cannot expose a public API in my code with this type in the signature as it will break as soon as I need to swap it out for another implementation.
@bendono
Well, Microsoft has the final word here, not a few people commenting on the thread. And we know that:
I am somewhat sure that other API reviewers/architects will share it as I've checked for opinion with a few: the name should be PriorityQueue, and we should not introduce IHeap interface. There should be exactly one implementation (likely via some heap).
This is shared by @karelz, Software Engineer Manager, and @terrajobst, Program Manager and the owner of this repository + some API reviewers.
Although I obviously like the approach with interfaces, as clearly stated in the previous posts, I can see it is pretty tough given we don't have much power in this discussion. We have made our points, but we are just some commenters. The code does not belong to us anyway. What else can we do?
Why not just close the issue then? Without an interface, this class is close to useless.
I've done enough -- instead of hating my work, do something. Do the actual work. Why do you try blaming me for anything? Blame yourself for not succeeding in persuading others to your point of view.
And please, spare me rhetoric like that. It is really childish -- do better.
BTW, the proposal focuses on things that are common no matter if we introduce an interface or not, or whether the class is named PriorityQueue
or Heap
. So focus on what really matters here and show us some bias for action if you want something.
@pgolebiowski Of course it is Microsoft's decision. But it is better to present an API that you want to use that meets your needs. If it is rejected, then so be it. I just don't see the need to compromise the proposal.
I apologize if you interpreted my comments as blaming you. That was certainly not my intention.
@pgolebiowski why not use KeyValuePair<TKey,TValue>
for the handle?
@SamuelEnglard
Why not use
KeyValuePair<TKey,TValue>
for the handle?
PriorityQueueHandle
is in fact _pairing heap node_. It exposes two properties -- TKey Key
and TValue Value
. However, it has much more logic inside, which is just internal. Please refer to my implementation of this. It contains for example also pointers to other nodes in the tree (everything would be internal in CoreFX).KeyValuePair
is a struct, so it gets copied every time + it cannot be inherited.PriorityQueueHandle
is a rather complicated class that just happens to expose same public API as KeyValuePair
.@bendono
But it is better to present an API that you want to use that meets your needs. If it is rejected, then so be it. I just don't see the need to compromise the proposal.
TKey
and TValue
), this class is truly close to being useless then -- as it is now in Java.That was certainly not my intention.
Sorry, my mistake then.
My questions on the design (disclaimer: these questions & clarifications, no hard push backs (yet)):
PriorityQueueHandle
? What if we just expect unique values in the queue?Merge
? Do other collections have it? We should not add APIs, just because they are easy to implement, there should be some common-ish use case for the APIs.IEnumerable<KeyValuePair<TKey, TValue>>
would be sufficient? + rely on Linqcomparer
overload? Can we just always fall back to default? (disclaimer: I am missing knowledge/expertise in this case, so just asking)keySelector
overloads? I think we should decide if we want to have priority as part of the value, or a separate thing. Separate priority seems a bit more natural to me, but I don't have strong opinion. Do we know the pros vs. cons?Separate/parallel decision points:
PriorityQueue
vs. Heap
IHeap
and constructor overload?Introduce IHeap and constructor overload?
Seems like things have settled down enough for me to throw in my 2 cents.... I like the interface, personally. It abstracts the implementation details away from the API (and the core functionality described by that API) in a way that in my opinion simplifies the structure and enables the most usability.
What I don't have as strong of an opinion on is whether we do the interface at the same time as the PQueue/Heap/ILikeThisItemMoreThanThisItemList or if we add it later. The argument that the API may be "in flux" and as such we should release it as a class first until we get feedback is certainly a valid one that I don't disagree with. The question then becomes when it is considered "stable" enough to add an interface. Way above in the thread IList
and IDictionary
were mentioned as lagging behind their canonical implementations' APIs which we added a loooong time ago, so what period of time is considered an acceptable resting period?
If we can define that period with reasonable certainty and be sure it isn't unacceptably blocking, then I see no problem shipping this heapy data structure thing without an interface. Then after that period of time we can examine usage and consider adding an interface.
And yes, it is truly surprising for me that adding an interface is perceived as a disadvantage by Microsoft.
That's because in a lot of ways it is a disadvantage. If we ship an interface, it's pretty much done. There's not a lot of wiggle for room for iterating on that API, so we better be sure it's right the first time and will continue to be right for the many years to come. Missing some nice-to-have functionality is a way better place to be in than being stuck with a potentially inadequate interface where it would be oh-so-nice to have just this one little change that would make it all better.
Thank you for the input, @karelz and @ianhays!
Do we really need
PriorityQueueHandle
? What if we just expect unique values in the queue?Motivation: It seems rather involved concept. If we have it, I would like to understand why we need it? How is it helping? Or is it just specific implementation detail leaking into the API surface? Is it going to buy us that much perf to pay for the complication in the API?
No, we don't need that. The priority queue API proposed above is quite powerful and very flexible. It is based on an assumption that elements and priorities could be duplicated. Because of that assumption, having a handle is necessary to be able to remove or update the correct node. However, if we put a restriction that elements have to be unique, we could achieve the same result as above with a simpler API and without exposing an internal class (PriorityQueueHandle
), which is indeed not ideal.
Let's assume that we allow only unique elements. We could still support all the previous scenarios and keep optimal performance. Simpler API:
public class PriorityQueue<TElement, TPriority>
: IEnumerable,
IEnumerable<(TElement element, TPriority priority)>,
IReadOnlyCollection<(TElement element, TPriority priority)>
// ICollection not included on purpose
{
public PriorityQueue();
public PriorityQueue(IComparer<TPriority> comparer);
public PriorityQueue(Func<TElement, TPriority> prioritySelector);
public PriorityQueue(Func<TElement, TPriority> prioritySelector, IComparer<TPriority> comparer);
IComparer<TPriority> Comparer { get; }
Func<TElement, TPriority> PrioritySelector { get; }
public int Count { get; }
public void Clear();
public bool Contains(TElement element); // O(1)
public (TElement element, TPriority priority) Peek(); // O(1)
public (TElement element, TPriority priority) Dequeue(); // O(log n)
public void Enqueue(TElement element, TPriority priority); // O(log n)
public void Enqueue(TElement element); // O(log n)
public void Update(TElement element); // O(log n)
public void Update(TElement element, TPriority priority); // O(log n)
public void Remove(TElement element); // O(log n)
public IEnumerator<(TElement element, TPriority priority)> GetEnumerator();
IEnumerator IEnumerable.GetEnumerator();
}
Shortly, there would be an underlying Dictionary<TElement, InternalNode>
. Testing whether the priority queue contains an element can be done even faster than in the previous approach. Updating and removing elements is significantly simplified, as we can always point to a direct element in the queue.
Probably allowing duplicates is not worth the hassle and the above is enough. I think I like it. What do you think?
Do we need
Merge
? Do other collections have it? We should not add APIs, just because they are easy to implement, there should be some common-ish use case for the APIs.
Agree. We don't need it. We can always add API but not remove it. I am OK with removing this method and (potentially) adding it later (if there is a need).
Do we need comparer overload? Can we just always fall back to default? (disclaimer: I am missing knowledge/expertise in this case, so just asking)
This I think is pretty important, for two reasons:
IComparable
.Plus, it is consistent with the existing API. Have a look at SortedDictionary.
Do we need keySelector overloads? I think we should decide if we want to have priority as part of the value, or a separate thing. Separate priority seems a bit more natural to me, but I don't have strong opinion. Do we know the pros vs. cons?
I also like separate priority. Apart from that, it is easier to implement this, better performance & memory usage, more intuitive API, less work for the user (no need to implement IComparable
).
Regarding the selector now...
This is what makes this priority queue flexible. It allows users to have:
IComparable
It allows pretty much every configuration I can think of. I find it useful, as users could just "plug n play". It is also rather intuitive.
Enqueue
and Update
method.Class name
PriorityQueue
vs.Heap
Introduce
IHeap
and constructor overload.
PriorityQueue
should be a part of CoreFX, rather than Heap
.IHeap
interface — because a priority queue can be implemented with something different than a heap, probably we would like not to expose it this way. However, we might need IPriorityQueue
.If we ship an interface, it's pretty much done. There's not a lot of wiggle for room for iterating on that API, so we better be sure it's right the first time and will continue to be right for the many years to come. Missing some nice-to-have functionality is a way better place to be in than being stuck with a potentially inadequate interface where it would be oh-so-nice to have just this one little change that would make it all better.
Totally agree!
If we add another assumption: that elements have to be comparable, then the API is even simpler. But again, it is less flexible.
IComparer
.Enqueue
and Update
method.IComparable
before they can use this priority queue.<TElement, InternalNode>
and have InternalNode
contain the TPriority
.public class PriorityQueue<T> : IEnumerable, IEnumerable<T>, IReadOnlyCollection<T>
where T : IComparable<T>
// ICollection not included on purpose
{
public PriorityQueue();
// some other constructors like building it from a collection, or initial capacity if we have an array beneath
public int Count { get; }
public void Clear();
public bool Contains(T element);
public T Peek();
public T Dequeue();
public void Enqueue(T element);
public void Update(T element);
public void Remove(T element);
public IEnumerator<T> GetEnumerator();
IEnumerator IEnumerable.GetEnumerator();
}
Everything is a trade-off. We remove some features, lose some flexibility, but maybe we don't need that to satisfy 95% of our users.
I also like the interface approach because it is more flexible and gives more usages, but I also agree with @karelz and @ianhays that we should wait until the new class API is used and we get feedback until we actually ship the interface and then we are stocked with that version and can't change it without breaking customers that already used it.
Also about comparers I think it follows the other BCL apis and I don't like the fact of users needing to provide a new wrapper-class. I really like the constructor overload receiving a comparer approach and using that comparer inside all of the comparisons that need to be done internally, if not comparer provided or comparer is null, then use default comparer.
@pgolebiowski thanks for such detailed and descriptive API proposal and being so pro active and energetic on getting this API approved and added to CoreFX 👍 when this discussion is done and we consider it is ready for review I would merge all the inputs and final API surface into one comment and update the main issue comment at the top as that would make reviewers' life easier.
OK, I am convinced about comparer, it makes sense and it is consistent.
I am still torn on selector - IMO we should try to go without it - let's split it into 2 variants.
```c#
public class PriorityQueue
: IEnumerable,
IEnumerable<(TElement element, TPriority priority)>,
IReadOnlyCollection<(TElement element, TPriority priority)>
// ICollection not included on purpose
{
public PriorityQueue();
public PriorityQueue(IComparer
public IComparer<TPriority> Comparer { get; }
public int Count { get; }
public void Clear();
public bool Contains(TElement element); // O(1)
public (TElement element, TPriority priority) Peek(); // O(1)
public (TElement element, TPriority priority) Dequeue(); // O(log n)
public void Enqueue(TElement element, TPriority priority); // O(log n)
public void Update(TElement element, TPriority priority); // O(log n)
public void Remove(TElement element); // O(log n)
public IEnumerator<(TElement element, TPriority priority)> GetEnumerator();
IEnumerator IEnumerable.GetEnumerator();
//
// Selector part
//
public PriorityQueue(Func
public PriorityQueue(Func
public Func<TElement, TPriority> PrioritySelector { get; }
public void Enqueue(TElement element); // O(log n)
public void Update(TElement element); // O(log n)
}
````
Open questions:
PriorityQueue
vs. Heap
IHeap
and constructor overload? (Should we wait for later?)IPriorityQueue
? (Should we wait for later - IDictionary
example)(TElement element, TPriority priority)
vs. KeyValuePair<TPriority, TElement>
Peek
and Dequeue
rather have out
argument instead of tuple?I'll try to run it by API review group tomorrow for early feedback.
Fixed Peek
and Dequeue
tuple field name to priority
(thanks @pgolebiowski for pointing it out).
Top-post updated with latest proposal above.
I second @safern's comment - Thanks a lot to @pgolebiowski for driving it forward!
Open questions:
I think we could have one more here:
Good point, captured as 'Assumptions' in top post, rather than open question. The opposite would uglyfy the API quite a bit.
Should we return bool
fromRemove
? And also the priority as out priority
arg? (I believe we added similar overloads recently into other data structures)
Similar to Contains
- I bet someone will want the priority out of it. We might want to add an overload with out priority
.
I remain of the opinion (though I'm yet to voice it) that Heap would imply an implementation, and would duly support calling this PriorityQueue
. (I'd even support then a proposal for a slimmer Heap
class which does allow duplicate elements, and doesn't allow updating, etc. (more in line with the original proposal) but I don't expect that to happen).
KeyValuePair<TPriority, TElement>
most certainly should not be used, as duplicate priorities are expected, and I think that KeyValuePair<TElement, TPriority>
is also confusing, so I would support not using KeyValuePair
at all, and either using plain tuples, or out parameters for the priorities (personally I like out params, but I'm not fussed).
If we disallow duplicates, we need to decide behaviour of attempting to re-add them with different/changed priorities.
I do not support the selector proposal, for the simple reason that it implies redundancy, and duly will recreate confusion. If the Priority of an Element in stored with it, then it will be stored in two places, and if they become out-of-sync (i.e. someone forgets to call the useless-looking Update(TElement)
method) then much suffering will ensure. If the selector is a computer, then we are open to people intentionally adding an element, then changing the values they are computed from, and now if they do try to re-add it, there is a wealth of things that could go wrong depending on the decision of what happens when this occurs. At a slightly higher level, trying though could result in adding a changed copy of the Element, as it is no longer equal to what it once was (this is a general issue with mutable keys, but I think that separating the Priority and Element will help to avoid potential issue).
The selector itself is liable to be change behaviour, which is another way that users can break everything without thinking. Much better, I think, to have users explicitly provide the priorities. The big issue I see with this, is that it conveys that duplicate entries are allowed, as we are declaring a pair, not an Element. However, sensible inline documentation and a bool
return value on Enqueue
should trivially remedy this. Better than a Boolean would perhaps be to return the old/new priority of the element (e.g. if Enqueue
uses the newly provided priority, or the minimum of the two, or the old priority), but I think that Enqueue
ought just to fail if you try to re-add something, and should therefore just return a bool
indicating success. This keeps Enqueue
and Update
completely separate and well defined.
I would support not using KeyValuePair at all, and either using plain tuples
I'm with you on tuples.
If we disallow duplicates, we need to decide behaviour of attempting to re-add them with different/changed priorities.
Dictionary
. There, when you do dictionary["something"] = 5
then it gets updated if "something"
was a key there previously. If it wasn't there, it just gets added.Enqueue
method is analogical for me to the Add
method in the dictionary. Which means it should throw an exception.Enqueue
method should just throw an exception if someone wants to add a duplicated element. Similarly, Update
method should throw an exception if someone wants to update the priority of an element that is not present.TryUpdate
method that indeed returns bool
.I do not support the selector proposal, for the simple reason that it implies redundancy
Isn't it that the key is not physically copied (if it exists), but the selector just remains a function that gets called when priorities need to be evaluated? Where is the redundancy?
I think that separating the Priority and Element will help to avoid potential issue
The only problem is the case when the customer doesn't have the physical priority. He cannot do much then.
Much better, I think, to have users explicitly provide the priorities. The big issue I see with this, is that it conveys that duplicate entries are allowed, as we are declaring a pair, not an Element.
I see some issues with that solution, but not necessarily why it conveys that duplicate entries are allowed. I think the "no-duplicates" logic should be applied on TElement
only -- the priority is just a value here.
var queue = new PriorityQueue<string, double>();
queue.Enqueue("first", 0.1);
queue.Enqueue("first", 0.5); // should be unsuccessful IMO
@VisualMelon does that make sense?
@pgolebiowski
I agree wholly with adding a TryUpdate
, and Update
throwing makes sense to me. I was thinking more along the lines of ISet
with regard to Enqueue
(rather than IDictionary
). Throwing would make sense also, I must have missed that point, but I do think the return bool
conveys the 'setness' of the type. Perhaps a TryEnqueue
would be in order also (with Add
throwing)? A TryRemove
would also be appreciated (fails if empty). Regarding your last point, yes, that is the behaviour I had in mind also. I suppose an analogy with IDictionary
is better than ISet
on reflection, and that ought to be clear enough. (To summarise: I'd support everything throwing per your suggestion, but having Try*
is a must if that is the case; I also agree with your statements regarding failure conditions).
Regarding the indexer, I think you're right that it doesn't really 'fit' the queue concept, I wouldn't support that. If anything, a well-named method to Queue or Update would be in order.
Isn't it that the key is not physically copied (if it exists), but the selector just remains a function that gets called when priorities need to be evaluated? Where is the redundancy?
You're right, I misread the proposal update (the bit about storing it in the Element). Taking it then that the Selector is called on demand (which would have to be made clear in any documentation, as it could have performance implications), the points still stand that result of the function can change without the data-structure responding, leaving the two out-of-sync unless Update
is called. Even worse, if a user changes the effective priority of multiple Elements, and only updates one of them, then the data structure will end up depending on 'uncommited' changes when they are selected from 'unupdated' Elements (I haven't looked into the proposed DataStructure implementation in detail, but I think this is necessarily an issue for any Update O(<n)
). Forcing the user to explicitly update any priorities remedies this, necessarily taking the data structure from one consistent state to another.
Note that all my stated gripes so far with the Selector proposal are about the robustness of the API: I think the selector makes it easy to use incorrectly. However, usability aside, conceptually, the Elements in the Queue shouldn't have to know their priority. It is potentially irrelevant to them, and if the user ends up wrapping their Elements in a struct Queable<T>{}
or something then that seems like a failure in providing a frictionless API (as much as I hurts me to use the term).
You could of course argue that without the selector, the burden is on the user to call the selector, but I think if the Elements know their priority, they will (hopefully) be exposed tidily (i.e. a property), and if they don't, then there is no selector to worry about (generating Priorities on the fly, etc.). There is a lot more meaningless plumbing required to support a selector if you don't want it, then to pass in the priorities if you already have a well-defined 'selector'. The selector entices the user to expose this information (perhaps unhelpfully), while separate priorities provide an extremely transparent interface which I can't imagine will influence design decisions.
The only argument I can really think of in favour of the selector is that it means you can use pass a PriorityQueue
around, and other parts of the program can use it without knowing how the priories are computed. I'll have to think about this a little longer, but given the 'niche' suitability, this strikes me as little reward for the reasonably heavy overhead in more general cases.
_Edit: Having though about it some more, it would rather be very nice to have self-contained PriorityQueue
s which you can just throw Elements at, but I maintain that the cost of working around this would be great, while the cost of introducing would be considerably less._
I've been doing a bit of work looking through open source .NET codebases to see real world examples of how priority queues are used. The tricky problem is filtering out any student projects and training source code.
Usage 1: Roslyn Notification Service
https://github.com/dotnet/roslyn/
The Roslyn compiler includes a private priority queue implementation called "PriorityQueue" that appears to have a very specific optimization in it - an object queue that to reuses objects in the queue to avoid them being garbage collected. The Enqueue_NoLock method performs an evaluation on current.Value.MinimumRunPointInMS < entry.Value.MinimumRunPointInMS to determine where in the queue to place the new node. Either of the two main priority queue designs proposed here (comparer function/delegate vs explicit priority) would fit this usage scenario.
Usage 2: Lucene.net
https://github.com/apache/lucenenet
This is arguably the largest usage example for a PriorityQueue I could find in .NET. Apache Lucene.net is a full .net port of the popular Lucene search engine library. We use the Java version at my company, and according to Apache's website a few big names use the .NET version. There's a large number of forks of the .NET project in Github.
Lucene includes its own PriorityQueue implementation that is subclassed by a number of "specialized" priority queues: HitQueue, TopOrdAndFloatQueue, PhraseQueue, and SuggestWordQueue. As well, the project directly instantiates PriorityQueue in a number of places.
Lucene's PriorityQueue implementation, linked to above, is very similar to the original priority queue API posted in this issue. It's defined as "PriorityQueue
Priority in Lucene's PriorityQueue is determined by the comparer passed into the constructor, and if none was passed, it assumes that the compared objects implement IComparable
There are a large number of usage examples through their code base, SloppyPhraseScorer being one of them.
SloppyPhraseScorer's constructor instantiates a new PhraseQueue (pq), which is one of their own custom subclasses of PriorityQueue. A collection of PhrasePositions is generated, which appears to be a wrapper for a set of postings, a position, and a set of terms. The FillQueue method enumerates over the phrase positions and enqueues them. PharseFreq() calls an AdvancePP function and, at a high level, appears to dequeue, update an item's priority, and then enqueue again. The priority is determined relatively (using a comparer) rather than explicitly (priority is not "passed in" as a second parameter during enqueue).
You can see that based on their implementation of PhraseQueue a comparison value passed in via the constructor (e.g. integer) might not cut it. Their comparison function ("LessThan") evaluates three different fields: PhrasePositions.doc, PhrasePositions.position, and PhrasePositions.offset.
Usage 3: Game Development
I haven't finished searching through usage examples in this space, but I saw quite a few examples of custom .NET PriorityQueue's being used in game development. In a very general sense these tended to be clustered around pathfinding as the main use case (Dijkstra's). You can find a lot of people asking how to implement pathfinding algorithms in .NET due to Unity 3D.
Still need to dig through this area; saw a few examples with explicit priority being enqueued and some examples using Comparer/IComparable.
On a separate note, there's been some discussion around unique elements, removing explicit elements, and determine whether a specific element exists.
Queues, as a data structure, generally support Enqueuing and Dequeuing. If we head down the path of providing other set/list-like operations, I wonder whether we're actually designing a different data structure altogether - something akin to a sorted list of tuples. If a caller has a need other than Enqueue, Dequeue, Peek, perhaps they need something other than a priority queue? Queue, by definition, implies insertion into a queue and ordered removal from the queue; not much else.
@ebickle
I appreciate your effort in going through other repositories and checking how the priority queue functionality was delivered there. However, IMO this discussion would benefit from providing specific design proposals. This thread is already hard to follow and telling a lengthy story without any conclusions makes it even harder.
Queues [...] generally support Enqueuing and Dequeuing. [...] If a caller has a need other than Enqueue, Dequeue, Peek, perhaps they need something other than a priority queue? Queue, by definition, implies insertion into a queue and ordered removal from the queue; not much else.
Great discussion. @ebickle's research is extremely useful IMO!
@ebickle do you have conclusion on [2] lucene.net - does our latest proposal fit the usages or not? (I hope I didn't miss it in your detailed description)
It looks like we need Try*
variants from above + IsEmpty
+ TryPeek
/TryDequeue
+ EnqueueOrUpdate
? Thoughts?
```c#
public bool IsEmpty();
public bool TryPeek(ref TElement element, ref TPriority priority); // false if empty
public bool TryDequeue(ref TElement element, ref TPriority priority); // false if empty
public bool TryEnqueue(TElement element, TPriority priority); // false if it is duplicate (doe NOT update it)
public void EnqueueOrUpdate(TElement element, TPriority priority); // TODO: Should return bool status for enqueued vs. updated?
public bool TryUpdate(TElement element, TPriority priority); // false if element does not exist (does NOT add it)
public bool TryRemove(TElement element); // false if element does not exist
```
@karelz
It looks like we need Try* variants from above
Yes, exactly.
Should return bool status for enqueued vs. updated?
If we want to return statuses everywhere, then everywhere. For this particular method, I think it should be true if either operation succeeded (Enqueue
or Update
).
For the rest -- I simply agree :smile:
Just one question -- why ref
instead of out
? I'm not sure:
If we want to return statuses everywhere, then everywhere. For this particular method, I think it should be true if either operation succeeded (
Enqueue
orUpdate
).
It would always return true, that's not a good idea. We should use void
in such case IMO. Otherwise people will be confused and will check return value and add useless never-to-be-executed code. (Unless I missed something)
ref vs. out agreed, I debated it myself. I don't have strong opinion / enough experience to make decision myself. We can ask API reviewers / wait for more comments.
It would always return true
You are right, my bad. Sorry.
I don't have strong opinion / enough experience to make decision myself.
Maybe I am missing something, but I find it pretty simple. If we use ref
, we are basically saying that Peek
and Dequeue
want to use somehow the TElement
and TPriority
passed to it (I mean read those fields). Which isn't really the case -- our methods are supposed to only assign values to those variables (and they are in fact required to do so by the compiler).
Top-post updated with my Try*
APIs
Added 2 open questions:
Peek
and Dequeue
throwing useful at all?TryPeek
and TryDequeue
- should use ref
or out
args?ref vs. out - You are right. I was optimizing for avoiding of initialization in case we return false. That was stupid & blindsided from me - premature optimization. I'll change it to out and remove the question.
6: I might be missing something, but if not throwing an exception, what should Peek
or Dequeue
do if the queue is empty? I suppose this starts to beg the question as to whether the data structure should accept null
(I'd prefer not, but no firm opinion). Even if we allow null
, Value Types have no null
(default
certainly doesn't count), so Peek
and Dequeue
have no way to convey a meaningless result, and duly I think must throw an exception (thus removing any concerns about out
parameters!). I see no reason not follow the example of the existing Queue.Dequeue
I'll just add that Dijkstra's (aka. Heuristic search without a Heuristic) shouldn't require any Priority changes. I've never wished to update the priority of anything, and I bash out heuristic searches all the time. (the point of the algorithm is that once you've explored a state you know that you've explored the best route to it, otherwise it risks being non-optimal, hence you could never improve the priority, and you certainly don't want to decrease it (i.e. consider a worse route to it). _Ignoring cases where you intentionally choose the heuristic such that it isn't optimal, in which case you can of course get a non-optimal result, but you'll still never update a priority_)
if not throwing an exception, what should Peek or Dequeue do if the queue is empty?
True.
thus removing any concerns about out parameters!
How? Well, the out
parameters are for the TryPeek
and TryDequeue
methods. The ones that throw an exception are Peek
and Dequeue
.
I'll just add that Dijkstra's shouldn't require any Priority changes.
I might be wrong, but to my knowledge, the Dijkstra's algorithm makes use of the DecreaseKey
operation. See for example this. Whether this is efficient or not is a different aspect. In fact, Fibonacci heap was designed in a way that achieves the DecreaseKey
operation asymptotically in O(1) (to improve Dijkstra).
But still, what is important in our discussion -- being able to update an element in a priority queue is very helpful and there are people who search for such a functionality (see previously linked questions on StackOverflow). I have used it myself a few times too.
Sorry, yes, I see the issue with out
params now, misreading things again. And it seems that a variant of Dijkstra's (a name which seems to be applied rather more broadly than I believed...) can be implemented perhaps more efficiently (if your queue already contains the Elements, presumably there is benefit for repeatable searches) with updateable priorities. That's what I get for using long words and names. Note that I'm not proposing we drop the Update
, it would just be nice also to have a slimmer Heap
or other (i.e. the original proposal) without the restrictions this capability imposes (as glorious a capability as I appreciate it is).
7: These should be out
. Any input could never have any meaning, so shouldn't exist (i.e. we shouldn't go with ref
). With out
params, we aren't going to improve on returning default
values. This is what Dictionary.TryGetValue
does, and I see no reason to do otherwise. _That said, you could treat the ref
as value or default, but if you don't have a meaningful default then it frustrates matters._
API review discussion:
Here are key raw notes:
ImmutableCollections
in the past (fast preview release cycle was key & helpful in shaping the APIs).MultiValueDictionary
which already is in CoreFxLab. TODO: Check if we have more candidates, we do not want to blog post each of them separately.TElement
from Peek
& Dequeue
, don't return TPriority
(users can use Try*
methods for that).Update
- user can Remove
and then Enqueue
item back with different priority.Remove
should remove just 1st found item (as List
does).IQueue
interface to abstract Queue
and PriorityQueue
(Peek
and Dequeue
returning just TElement
helps here)We had long discussion about selector - we are still not decided (may be required by IQueue
above?). Majority didn't like selector, but things may change in future API review meeting.
Other open questions were not discussed.
The latest proposal looks pretty good to me. My opinions/questions:
Peek
and Dequeue
used out
parameters for priority
, then they could also have overloads that don't return priority at all, which I think would simplify common usage. Though priority can also be ignored using a discard, which makes this less important.PriorityQueue<T>
? Or a set of static and extension methods working with PriorityQueue<T, T>
?priorityQueue.Select(t => t.element)
acceptable?Dictionary
on the element type, should there be an option to pass an IEqualityComparer<TElement>
?Dictionary
is unnecessary overhead if I never need to update priorities. Should there be an option to disable it? Though this can be added later, if it turns out that it's useful.@karelz
Stability (for same priorities) - items with same priorities should be returned in the order they were inserted into the queue (general queue behavior)
I think this would mean that internally, priority would have to be something like a pair of (priority, version)
, where version
is incremented for every addition. I think that would significantly bloat memory usage of the priority queue, especially considering that the version
would likely have to be 64 bits. I'm not sure that would be worth it.
We do not want to prevent duplicate values (similar to other collections):
Get rid ofUpdate
- user canRemove
and thenEnqueue
item back with different priority.
Depending on the implementation, Update
is likely going to be much more efficient than Remove
followed by Enqueue
. For example, with binary heap (I think quaternary heap has the same time complexities), Update
(with unique values and a dictionary) is O(log n), while Remove
(with duplicate values and no dictionary) is O(n).
@pgolebiowski
Completely agree we need to stay focused on proposals here; I made the original API proposal and original post. Back in January @karelz asked for some specific usage examples; that opened up a much broader question regarding the specific need and usage of a PriorityQueue API. I had my own PriorityQueue implementation and used it in a few projects and felt something similar would be useful in the BCL.
What I lacked in the original post was a broad survey of existing queue usage; real world examples will help keep the design grounded and ensure it can be broadly used.
@karelz
Lucene.net contains at least one priority queue comparison function (similar to Comparison
Proposal:
1) PriorityQueue
2) PriorityQueue
Both types support duplicate elements. Need to determine whether or not we guarantee FIFO for elements of the same priority; likely not if we support priority updates.
Build both variants in the CoreFxLabs as @karelz suggested and solicit feedback.
Questions:
public void Enqueue(TElement element, TPriority priority); // Throws if it is duplicate`
public bool TryEnqueue(TElement element, TPriority priority); // Returns false if it is duplicate (does NOT update it)
Why no duplicates?
public (TElement element, TPriority priority) Peek(); // Throws if empty
public (TElement element, TPriority priority) Dequeue(); // Throws if empty
public void Remove(TElement element); // Throws if element does not exist
If these methods are desirable, have them as extension methods?
public void Enqueue(TElement element);
public void Update(TElement element);
Should throw; but why no Try variant, also extension methods?
Struct based enumerator?
@benaadams throw on dupes was originally suggested (by me) to avoid the ugly handle type. Latest update from API review reverts that.
We should not add methods as extension methods when we can add them on the type. Extension methods are backup if we can't add them on the type (e.g. it is interface, or we want to deliver it to .NET Framework faster).
Dupes: If you have multiple entries that are the same you only need to update one? Then they become different?
Throwing methods: basically are wrappers and they will be?
public void Remove(TElement element)
{
if (!TryRemove(element))
{
throw new Exception();
}
}
Though its control flow via exceptions?
@benaadams check my reply with API review notes: https://github.com/dotnet/corefx/issues/574#issuecomment-308206064
Update
- user can Remove
and then Enqueue
item back with different priority.Remove
should remove just 1st found item (as List
does).Though its control flow via exceptions?
Not sure what you mean. It seems like a pretty common pattern in BCL.
Though its control flow via exceptions?
Not sure what you mean.
If you just have TryX methods
No exception involvement; but the name encourages you to make a decision to throw away the return.
If you have non-Try exception throwing methods
So you are in the anti-pattern of using exception handling for flow-control. They just seem a bit superfluous; can always just add a throw if false to recreate.
It seems like a pretty common pattern in BCL.
The TryX methods weren't a common pattern in the original BCL; until the Concurrent types (though think Dictionary.TryGetValue
in 2.0 may have been first example?)
e.g. the TryX methods have only just been added to Core for Queue and Stack and aren't part of Framework yet.
IComparer
. If the customer wants to have stability, they can easily add it themselves. It would be easy to build StablePriorityQueue
on top of our implementation — using the ordinary approach with storing the "age" of an element and using it during comparisons.There are API conflicts then. Let's now consider a simple IQueue<T>
so that it works with the existing Queue<T>
:
public interface IQueue<T> :
IEnumerable,
IEnumerable<T>,
IReadOnlyCollection<T>
{
void Enqueue(T element);
T Peek();
T Dequeue();
bool TryPeek(out T element);
bool TryDequeue(out T element);
}
We have two options for the priority queue:
IQueue<TElement>
.IQueue<(TElement, TPriority)>
.I will write implications of both paths using simply numbers (1) and (2).
In the priority queue, we need to enqueue both an element and its priority (current solution). In the ordinary heap, we add just one element.
Enqueue(TElement)
, which is pretty strange for a priority queue if we are inserting an element without a priority. We are then forced to do... what? Assume default(TPriority)
? Nah...Enqueue((TElement, TPriority) element)
. We essentially add two arguments by accepting a tuple created out of them. Not ideal API probably.You wanted those methods to return TElement
only.
(TElement, TPriority)
.We could mitigate some of the issues by providing PriorityQueue<T>
, where there is no physical separate priority.
IComparable
. Which adds lots of fairly boilerplate code (and a new file in their source code, most likely).If we deliver two priority queues, the overall solution would be more powerful and flexible. There are a few notes to take:
PriorityQueue<T>
could potentially implement IQueue<T>
.PriorityQueue<TElement, TPriority>
would expose a strange API if it tries to implement the IQueue
interface.Well… it would work. Not ideal though.
Given the assumption that we want to allow duplicates and we don't want to use the concept of a handle:
This basically leads to what is in Java, where users have to provide their own implementation of the entire data structure if they want to be able to update priorities in their priority queues without naively iterating through the entire collection every time.
With the assumption that we don't want to add a new handle type, we can do it differently to be able to have the full support for updating and removing elements (plus do it efficiently). The solution is to add alternative methods:
void Enqueue(TElement element, TPriority priority, out object handle);
void Update(object handle, TPriority priority);
void Remove(object handle);
These would be methods for those who want to have a proper control on updating and removing elements (and don't do it in O(n), as if it was a List
:stuck_out_tongue_winking_eye:).
But better though, let's consider...
Alternatively, we can remove this functionality from the priority queue entirely and add it in the heap instead. This has numerous benefits:
Heap
.PriorityQueue
-- for people who want their code to just work.PriorityQueue
stable now -- it's not a trade-off anymore.Heap
. They would also be aware of the limitations of the PriorityQueue
and thus could use the Heap
instead (for example if they want to have more control over updating/removing elements or don't want the data structure to be stable at the expense of speed and memory usage).IQueue
interface -- because the power and functionality would be thrown into the heap field. The API of PriorityQueue
could be focused on making it compatible with Queue
through an abstraction.IPriorityQueue
interface anymore (we rather focus on keeping the functionality of PriorityQueue
and Queue
similar). Instead, we can add it in the heap area -- and essentially have IHeap
there. This is great, because it allows people to build third-party libraries on top of what is in the standard library. And it feels right -- because again, we consider heaps to be more advanced than priority queues, so extensions would be provided by that area. Such extensions would also not suffer from the choices we would do in PriorityQueue
, because it would be separate.IHeap
constructor for the PriorityQueue
anymore.Heap
at our command!Basically, the priority queue would be focused primarily on ease of use, at the expense of: performance, power, and flexibility. The heap would be for those who are aware of the performance and functional implications of our decisions in the priority queue. We mitigate many issues with trade-offs.
If we are for making an experiment, I think it is possible now. Let's just ask the community. Not everyone reads this thread -- we might generate other valuable comments and usage scenarios. What do you think? Personally, I would love such a solution. I think it would make everyone happy.
Important note: if we want such an approach, we need to design priority queue and heap together. Because their purposes would be different and one solution would provide what the other one would not.
With the approach presented above, in order for the priority queue to implement IQueue<T>
(so that it makes sense), and assuming that we drop the support for the selector there, it would have to have one generic type. Although that would mean users need to provide a wrapper if they have (user data, priority)
separtely, such a solution is still intuitive. And most importantly, it allows all input formats (that's why it has to be done this way if we drop the selector). Without the selector, Enqueue(TElement, TPriority)
would not allow types that are already comparable. A single generic type is also crucial for enumeration -- so that this method can be included in IQueue<T>
.
@svick
Is it defined in what order are the elements returned when enumerating? My guess is that it's undefined, to make it efficient.
To have the order while enumerating we essentially need to sort the collection. That's why yes, it should be undefined, as the customer can simply run OrderBy
themselves and achieve about the same performance (but some people don't need it).
Idea: in the priority queue this could be ordered, in the heap unordered. It feels better. Somehow a priority queue feels like iterating it in order. A heap definitely not. Another benefit of the approach above.
@pgolebiowski
That all seems very sane. Would you mind clarifying, in the Delivering IQueue then
section, are you suggesting T : IComparable<T>
for the "one generic type" as an alternative to the selector given the allowance of duplicate elements?
I would support having the two separate types.
I don't understand the rationale behind using object
as the handle type: is this just to avoid creating a new type? Defining a new type would provide minimal implementation overhead, while making the API harder to misuse (what's stopping me trying to pass a string
to Remove(object)
?), and easier to use (what's stopping me trying to pass the Element itself to Remove(object)
, and who could blame me for trying?).
I propose the additional of a suitably named dummy type, to replace object
in the handle methods, in the interests of a more expressive interface.
If debugability trumps memory overhead, the handle type could even include information about which queue it belongs to (would have to become Generic, which would further the type safety of the interface), providing useful exceptions along the lines of ("Handle supplied was created by a different Queue"), or whether it has already been consumed ("Element referenced by Handle has already been removed").
If the handle idea goes ahead, I would propose that if this information is deemed useful, then a subset of these exception would also be thrown by the corresponding TryRemove
and TryUpdate
methods, excepting that where the element is no longer present, either because it has been dequeued or removed by handle. This would entail a less boring, generic, suitably named handle type.
@VisualMelon
Would you mind clarifying, in the
Delivering IQueue then
section, are you suggestingT : IComparable<T>
for the "one generic type" as an alternative to the selector given the allowance of duplicate elements?
Sorry for not making this clear.
PriorityQueue<T>
, with no constraint on the T
.IComparer<T>
.T
is already comparable, then simply Comparer<T>.Default
would be assumed (and you can call the default constructor of the priority queue with no arguments).The selector had a different purpose — to be able to consume all types of user data. There are several configurations:
The rare case would not be possible in PriorityQueue<T>
, but that doesn't matter much. What is important is that we can now handle (1), (2), and (3). However, if we had two generic types, we would have to have a method like Enqueue(TElement, TPrioriity)
. That would limit us to (1) only. (2) would lead to redundancy. (3) would be incredibly ugly. There is a bit more on this in the IQueue > Enqueue
section above (second Enqueue
method and default(TPriority
).
I hope it is clearer now.
BTW, assuming such a solution, designing the API of PriorityQueue<T>
and IQueue<T>
would be trivial. Just take some of the methods in Queue<T>
, throw them into IQueue<T>
and make PriorityQueue<T>
implement them. Tadaa! 😄
I don't understand the rationale behind using
object
as the handle type: is this just to avoid creating a new type?
PriorityQueue<T>
reside in System.Collections.Generic
, while the heap functionality would be in System.Collections.Specialized
. There, we would have a higher chance of introducing such a type, eventually having the wonderful compile-time error detection.If debugability trumps memory overhead, the handle type could even include information about which queue it belongs to (would have to become Generic, which would further the type safety of the interface), providing useful exceptions along the lines of ("Handle supplied was created by a different Queue"), or whether it has already been consumed ("Element referenced by Handle has already been removed").
If the handle idea goes ahead, I would propose that if this information is deemed useful...
Definitely more is possible then :wink:. Especially to extend all of that by our community in third-party libraries.
I think the idea of creating two class makes a lot of sense and solves a lot of issues. Only though I have is that it might make sense for PriorityQueue<T>
to use the Heap<T>
internally and just "hide" it's advanced functionality.
So basically...
IQueue<T>
public interface IQueue<T> :
IEnumerable,
IEnumerable<T>,
IReadOnlyCollection<T>
{
int Count { get; }
void Clear();
bool Contains(T element);
bool IsEmpty();
void Enqueue(T element);
T Peek();
T Dequeue();
bool TryPeek(out T element);
bool TryDequeue(out T element);
}
System.Collections.Generic
.Remove
and TryRemove
). It is not in Queue<T>
though. But it's not necessary.PriorityQueue<T>
public class PriorityQueue<T> : IQueue<T>
{
public PriorityQueue();
public PriorityQueue(IComparer<T> comparer);
public PriorityQueue(IEnumerable<T> collection);
public PriorityQueue(IEnumerable<T> collection, IComparer<T> comparer);
public IComparer<T> Comparer { get; }
public int Count { get; }
public bool IsEmpty();
public void Clear();
public bool Contains(T element);
public void Enqueue(T element);
public T Peek();
public T Dequeue();
public bool TryPeek(out T element);
public bool TryDequeue(out T element);
public void Remove(T element);
public bool TryRemove(T element);
public IEnumerator<T> GetEnumerator();
IEnumerator IEnumerable.GetEnumerator();
}
System.Collections.Generic
.IComparer<T>
is not delivered, Comparer<T>.Default
is summoned.Remove
and TryRemove
remove only the first occurence (if they find it).Not writing everything here now, but:
System.Collections.Specialized
.Agree with the IQueue
The spec for IQueue
Consider adding "int Count { get; }" to IQueue
On the fence regarding TryPeek, TryDequeue in IQueue
IsEmpty seems like an outlier; not many other collection types in the BCL have it. To add it to the interface we'd have to assume it's going to be added to Queue
Drop TryRemove and change Remove to "bool Remove". Keeping in alignment with the other collection classes will be important here - developers will have a lot of muscle-memory that says "remove() in a collection doesn't throw". It's an area many developers won't test well and will cause a lot of surprises if the normal behavior is changed.
From your earlier quote @pgolebiowski
- User data is separate from the priority (two physical instances).
- User data contains the priority.
- User data is the priority.
- Rare case: priority is obtainable via some other logic (resides in an object different from the user data).
Also recommend considering 5. User data contains the priority in multiple fields (as we saw in Lucene.net)
On the fence regarding TryPeek, TryDequeue in IQueue considering they aren't in Queue
They are System/Collections/Generic/Queue.cs#L253-L295
On the other hand Queue doesn't have
c#
public void Remove(T element);
public bool TryRemove(T element);
Consider adding "int Count { get; }" to IQueue so that it's clear Count is desired regardless of whether or not we inherit from IReadOnlyCollection.
OK. Will modify it.
IsEmpty seems like an outlier; not many other collection types in the BCL have it.
This one was added by @karelz, I just copied it. I like it though, might be considered in the API review :)
Drop TryRemove and change Remove to "bool Remove".
I think Remove
and TryRemove
is consistent with other methods like that (Peek
and TryPeek
or Dequeue
and TryDequeue
).
- User data contains the priority in multiple fields
It's a valid point, but this in fact can also be handled with a selector (it's any function after all) -- but that's for heaps.
IsEmpty seems like an outlier; not many other collection types in the BCL have it.
FWIW, bool IsEmpty { get; }
is something I wish we'd added to IProducerConsumerCollection<T>
, and I've regretted it not being there on multiple occasions since. Without it, wrappers often need to do the equivalent of Count == 0
, which for some collections is significantly less efficient to implement, in particular on most of the concurrent collections.
@pgolebiowski How do you feel about the idea of creating a github repository to host the current API contracts and an .md file or two containing the design rationale. Once that stabalizes, it can be used as a place to build out the initial implementation before doing a PR up to CoreFxLabs when ready?
@svick
If
Peek
andDequeue
usedout
parameters for priority, then they could also have overloads that don't return priority at all, which I think would simplify common usage
Agreed. Let's add them.
Should there be some way to enumerate just the items in the priority queue, ignoring priorities?
Good question. What would you propose? IEnumerable<TElement> Elements { get; }
?
Stability - I think this would mean that internally, priority would have to be something like a pair of
(priority, version)
I think we can avoid that by considering Update
as logical Remove
+Enqueue
. We would add items always to the end of same-priority items (basically consider comparer result 0 as -1). IMO that should work.
@benaadams
The TryX methods weren't a common pattern in the original BCL; until the Concurrent types (though think
Dictionary.TryGetValue
in 2.0 may have been first example?)
e.g. the TryX methods have only just been added to Core for Queue and Stack and aren't part of Framework yet.
I admit I am still new-ish to BCL. From API review meetings and from the fact we added bunch of Try*
methods recently I was under the impression it is a common pattern for much longer 😉.
Either way, it is common pattern now and we should not be afraid to use it. The fact the pattern is not yet in .NET Framework should not stop us innovate in .NET Core -- that is its main purpose, to innovate faster.
@pgolebiowski
Alternatively, we can remove this functionality from the priority queue entirely and add it in the heap instead. This has numerous benefits
Hmm, something tells me you might have an agenda here 😆
Now seriously, it is actually good direction we were aiming for all the time - PriorityQueue
was never meant to be a reason for NOT doing Heap
. If we are all ok with the fact that Heap
might not make it into CoreFX and might stay "just" in CoreFXExtensions repo as advanced data structure alongside PowerCollections, I am fine with that,
Important note: if we want such an approach, we need to design priority queue and heap together. Because their purposes would be different and one solution would provide what the other one would not.
I don't see why we need to do it together. IMO we can focus on PriorityQueue
and add "proper" Heap
in parallel/later. I don't mind if someone does them together, but I don't see any strong reason why existence of easy-to-use PriorityQueue
should affect design of "proper/high-perf" advanced Heap
family.
IQueue
Thanks for the write up. Given your points, I don't think we should go out of our way to add IQueue
. IMO it is nice-to-have. If we had a selector, then it would be natural. However, I don't like the selector approach as it brings weirdness and complication in describing when the selector is called by the PriorityQueue
(only upon Enqueue
and Update
.
Alternative (object) handle
That is actually not a bad idea (although a bit ugly) to have such overloads IMO. We would have to be able to detect that the handle is from wrong PriorityQueue
, which is O(log n).
I have a feeling that API reviewers will reject it, but IMO it is worth a shot to experiment with it ...
Stability
I don't think stability comes with any perf/memory overhead (assuming we dropped Update
already or that we treat Update
as logical Remove
+Enqueue
, so we basically reset element's age). Just treat comparer result 0 as -1 and everything is good ... Or am I missing something?
IQueue<T>
It might be a good idea to have 2 proposals (and we may potentially take both):
PriorityQueue<T,U>
without selector and without IQueue
(which would be cumbersome)PriorityQueue<T>
with selector and IQueue
Quite a few folks hinted that above.
Re: Latest proposal from @pgolebiowski - https://github.com/dotnet/corefx/issues/574#issuecomment-308427321
IQueue<T>
- we could addRemove
andTryRemove
, butQueue<T>
doesn't have them.
Would they make sense to be added to Queue<T>
? (@ebickle agrees) If yes, we should bundle the addition as well.
Let's add it in the interface and call out they are questionable / need Queue<T>
addition as well.
Same for IsEmpty
-- whatever needs Queue<T>
and Stack<T>
additions, let's mark it so in the interface (it will be easier to review & digest).
@pgolebiowski can you please add comment to IQueue<T>
with list of classes we think it will be implemented by?
PriorityQueue<T>
Let's add struct enumerator (I think it is common BCL pattern lately). It was called out couple of times on the thread, then dropped/forgotten.
Heaps
Namespace choice: Let's not waste time on namespace decision yet. If Heap
ends up in CoreFxExtensions, we do not know yet what kind of namespaces we will allow there. Maybe Community.*
or something like that. It depends on CoreFxExtensions purpose/operation mode discussions result.
Note: One idea for CoreFxExtensions repo is to give proven community members Write permissions, and let it be driven primarily by community with .NET team providing just advice (incl. API review expertise) & .NET team/MS being the arbiter when needed. If that's where we land, we would likely not want it in System.*
or Microsoft.*
namespace. (Disclaimer: early thinking, please don't jump to conclusions yet, there are other alternative governance ideas in flight)
Drop
TryRemove
and changeRemove
tobool Remove
. Keeping in alignment with the other collection classes will be important here - developers will have a lot of muscle-memory that says "remove() in a collection doesn't throw". It's an area many developers won't test well and will cause a lot of surprises if the normal behavior is changed.
👍 We should definitely at least consider aligning it with other collections. Can someone scan other collections what is their Remove
pattern?
@ebickle How do you feel about the idea of creating a github repository to host the current API contracts and an .md file or two containing the design rationale.
Let's host it directly in CoreFxLabs. 👍
@karelz
Should there be some way to enumerate just the items in the priority queue, ignoring priorities?
Good question. What would you propose?
IEnumerable<TElement> Elements { get; }
?
Yeah, either that or a property returning some kind of ElementsCollection
is probably the best option. Especially since it's similar to Dictionary<K, V>.Values
.
We would add items always to the end of same-priority items (basically consider comparer result 0 as -1). IMO that should work.
That's not how heaps work, there is no "end of same-priority items". Items with the same priority can be spread all over the heap (unless they're close to being the current minimum).
Or, put another way, to maintain stability, you have to consider comparer result of 0 to be -1 not just when inserting, but also when items are moved around in the heap afterwards. And I think you have to store something like version
along with the priority
to do that properly.
IQueue
- we could add Remove and TryRemove, but Queue doesn't have them.
Would they make sense to be added to Queue
? (@ebickle agrees) If yes, we should bundle the addition as well.
I don't think Remove
should be added to IQueue<T>
; and I'd even suggest Contains
is dodgy; it limits the usefulness of the interface and what types of Queue it can be used for unless you start also throwing NotSupportedExceptions. i.e. what is the scope?
Is it just for vanilla queues, message queues, distributed queues, ServiceBus queues, Azure Storage Queues, ServiceFabric ReliableQueues etc... (glossing over some of them preferring async methods)
That's not how heaps work, there is no "end of same-priority items". Items with the same priority can be spread all over the heap (unless they're close to being the current minimum).
Fair point. I was thinking about it as binary search tree. Gotta brush my d.s. basics I guess :)
Well, either we implement it with different d.s. (array, binary search tree (or whatever the official name is) - @stephentoub has ideas/suggestions), or we settle for random order. I don't think maintaining version
or age
field is worth the guarantee of stability.
@ebickle How do you feel about the idea of creating a github repository to host the current API contracts and an .md file or two containing the design rationale.
@karelz Let's host it directly in CoreFxLabs.
Please have a look at this document.
It might be a good idea to have 2 proposals (and we may potentially take both):
- PriorityQueue
without selector and without IQueue (which would be cumbersome) - PriorityQueue
with selector and IQueue
I got rid of the selector altogether. It is necessary only if we want to have one unified PriorityQueue<T, U>
. If we have two classes -- well, a comparer is enough.
Please let me know if you find anything worth adding / changing / removing.
@pgolebiowski
Document looks good. Starting to feel like a solid solution I'd expect to see in the .NET core libraries :)
Should add a nested Enumerator class. Not sure if the situation has changed, but years ago having the child structure avoided a garbage collector allocation caused by boxing the result of GetEnumerator(). See https://github.com/dotnet/coreclr/issues/1579 for example.
public class PriorityQueue<T> // ...
{
public struct Enumerator : IEnumerator<T>
{
public T Current { get; }
object IEnumerator.Current { get; }
public bool MoveNext();
public void Reset();
public void Dispose();
}
}
PriorityQueue
I'm still voting to have "public bool Remove(T element)" without the TryRemove since it's a long-standing pattern and changing it makes developer mistakes highly likely. We can let the API Review crew chime in, but it's an open question in my mind.
Is there any value in specifying the initial capacity in the constructor or having a TrimExcess function, or is that a micro-optimization right now - especially considering the IEnumerable
I agree with bool Remove(T element)
and no TryRemove
. Cf. https://github.com/dotnet/corefx/blob/master/src/System.Collections/src/System/Collections/Generic/CollectionExtensions.cs#L41
@pgolebiowski thanks for putting it into a document. Can you please submit your doc as PR against CoreFxLab master branch? Tag @ianhays @safern they will be able to merge the PRs.
We can then use CoreFxLab issues for further discussions about particular design points - it should be easier than this mega-issue (I just kicked off email to create area-PriorityQueue there).
Please put a link to the CoreFxLab pull request here?
If the API is approved in its current form (two classes), I would create another PR to CoreFXLab with an implementation for both using a quaternary heap (see implementation). The PriorityQueue<T>
would just use one array beneath, while PriorityQueue<TElement, TPriority>
would use two -- and rearrange their elements together. This could spare us additional allocations and be as efficient as we can. I'll add that once there is green light.
Is there any plan of making thread-safe version like ConcurrentQueue?
I'd :heart: to see a concurrent version of this, implementing IProducerConsumerCollection<T>
, so it could be used with BlockingCollection<T>
etc.
@aobatact @khellang It sounds like a completely different thread:
I agree it is very valuable though :wink:!
public bool IsEmpty();
Why is this a method? Every other collection on the framework that has IsEmpty
has it as a property.
I updated the proposal in top-post with IsEmpty { get; }
.
I have also added link to the latest proposal doc in corefxlab repo.
Hi all,
I think there are a lot of voices arguing it would be good to support update if possible. I think nobody ended up doubting it is a useful feature for graph searches.
But there were some objections raised here and there. So I can check my understanding it seems like the main objections are:
-it is not clear how update should work when there are duplicate elements.
-there is some argument about whether duplicate elements are even desirable to support in a priority queue
-there is some concern that it might be inefficient to have an additional lookup data structure for finding elements in the backing data structure just to update their priority. Especially for scenarios where update is never performed! And/or affect worst-case performance guarantees...
Anything I missed?
OK I guess one more was - it's been claimed that update/remove semantically meaning 'updateFirst' or 'removeFirst' might be weird. :)
In which version of .Net Core that we can begin to use PriorityQueue?
@memoryfraction .NET Core itself does not have a PriorityQueue yet (there are proposals - but we have not had time to spend on it recently). There is no reason why it has to be in the Microsoft .NET Core distribution however. Anyone in the community can put one on NuGet. Perhaps someone on this issue can suggest one.
@memoryfraction .NET Core itself does not have a PriorityQueue yet (there are proposals - but we have not had time to spend on it recently). There is no reason why it has to be in the Microsoft .NET Core distribution however. Anyone in the community can put one on NuGet. Perhaps someone on this issue can suggest one.
Thanks for your reply and suggestion.
When I use C# to practice leetcode, and need to use SortedSet to handle set with duplicate elements. I have to wrap the element with a unique ID to solve the issue.
So, I prefer to have the Priority Queue in .NET Core in the future because it will be more convenient.
What's the current state of this issue? I've just found myself recently requiring a PriorityQueue
This proposal does not enable collection initialization by default. PriorityQueue<T>
does not have an Add
method. I think collection initialization is a big enough language feature to warrant the addition of a duplicate method.
If someone has a fast binary heap they'd like to share, I'd like to compare it with what I wrote.
I fully implemented the API as proposed. However, rather than a heap, it uses a simple sorted array. I keep the lowest value item at the end so it's sorted in the reverse of the usual order. When the number of items is less than 32 I use a simple linear search to find where to insert new values. After that, I used a binary search. Using random data I found this approach to be faster than a binary heap in the simple case of filling the queue and then draining it.
If I have the time, I'll put it in a public git repo so people can critique it and update this comment with the location.
@SunnyWar I'm currently using: https://github.com/BlueRaja/High-Speed-Priority-Queue-for-C-Sharp
which has good performance. I think it would make sense to test your implementation against the GenericPriorityQueue there
Please disregard my earlier post. I found an error in the test. Once fixed the binary heap does perform the best.
@karelz Where is this proposal currently at on ICollection<T>
? I didn't get why that was not going to be supported, even though the collection is obviously not intended to be readonly?
@karelz re the open question of dequeue order: I argue in favor of lowest priority value is removed first. It is convenient for shortest-path algorithms and is natural for timer queue implementations too. And those are the main two scenarios I think it'll be used for. I'm not really aware of what the opposing argument would be, unless its a linguistic one related to 'high' priorities, and 'high' numeric values.
@karelz re the enumeration order... how about having a Sort()
method on the type which will put the collection's internal data heap into sorted order in-place (in O (n log n)). And calling Enumerate()
after Sort()
enumerates the collection in O(n). And if Sort() is NOT called, it returns elements in non-sorted order in O(n)?
Triage: moving to future as there doesn't seem to be consensus on whether we should implement this.
That’s very disappointing to hear. I still have to use thirdparty solutions for this.
What are the primary reasons you don't like using a third party solution?
heap data structure is a MUST for doing leetcode
more leetcode, more c# code interview which means more c# developers.
more developers means better ecosystem.
better ecosystem means that if we can still program in c# tomorrow.
in sum: this is not only a feature, but also the future. that is why is issue is labeled 'future'.
What are the primary reasons you don't like using a third party solution?
Because when there isn't a standard, everyone invents their own, each with its own set of quirks.
There have been plenty of times when I've wished for a System.Collections.Generic.PriorityQueue<T>
because it's relevant to something I'm working on. This proposal's been around for 5 years now. Why has it still not happened?
Why has it still not happened?
Priority starvation, no pun intended. Designing collections is work because if we put them in the BCL we have to think about how they exchange, what interfaces they implement, what the semantics are etc. None of this is rocket science, but it takes a while. Furthermore, you need to a set of concrete user cases and customers to judge the design against, which gets harder as the collections get more and more specialized. So far, there has always been other work that was considered more important than that.
Let's look at a recent example: immutable collections. They were designed by someone not working on the BCL team for a use case in VS that was around immutability. We worked with him to get the APIs "BCL-ified". And when Roslyn came online, we replaced their copies with ours in as many places as we could and we tweaked the design (and implementation) a lot based on their feedback. Without a "hero" scenario this is hard.
Because when there isn't a standard, everyone invents their own, each with its own set of quirks.
@masonwheeler is this what you have seen for PriorityQueue<T>
? That there are several 3rd party options that are not interchangeable and no clearly accepted "best library for most"? (I have not done the research so I don't know the answer)
@eiriktsarpalis How is there not consensus on whether to implement?
@terrajobst Do you really need a hero scenario when this is a well-known pattern with well-known applications? Very little innovation should be required here. There's even a pretty fully-developed interface spec already. In my opinion the real reason this utility doesn't exist is not that its too hard, nor there's no demand nor use case for it. The real reason is that for any real software project its just easier to just build one yourself than to try to wage a political campaign for this to be put in the framework libraries.
What are the primary reasons you don't like using a third party solution?
@terrajobst
I’ve personally made the experience that 3rdparty solutions don’t always perform as expected/ don’t use current language features. With a standardized version i (as a user) could be quite certain that performance is the best you can get.
@danmosemsft
@masonwheeler is this what you have seen for
PriorityQueue<T>
? That there are several 3rd party options that are not interchangeable and no clearly accepted "best library for most"? (I have not done the research so I don't know the answer)
Yes. Just google "C# priority queue"; the first page is full of:
@terrajobst Do you really need a hero scenario when this is a well-known pattern with well-known applications? Very little innovation should be required here.
In my experience yes. The devil's in the details and once we shipped an API we can't really make breaking changes. And there are many implementation choices that are user visible. We can preview the API as an OOB for a while but we've also learned that while we can certainly gather feedback the lack of a hero scenario means you don't have any tie breakers, which often results in a situation where when the hero scenario arrives, the data structure isn't meeting its requirements.
Apparently we need a hero. 😛
@masonwheeler I assumed your link would be to this😄 Now it's in my head.
Although as @terrajobst says, our primary issue here has been resources/attention (and you can blame us for that if you like) we would also like to strengthen the non-Microsoft ecosystem so that it's more likely you can find strong libraries for whatever scenario.
[edited for clarity]
@danmosemsft Nah, if I was going to pick that song I'd do the Shrek 2 version.
Hero app candidate #1: how about using one in TimerQueue.Portable?
Hero app candidate #1: how about using one in TimerQueue.Portable?
Already been considered, prototyped, and discarded. It makes the very common case of a timer being quickly created and destroyed (e.g. for a timeout) less efficient.
@stephentoub I assume you mean that its less efficient for some scenarios where there's a small number of timers. But how does it scale?
I assume you mean that its less efficient for some scenarios where there's a small number of timers. But how does it scale?
No, I meant the common case is that you have lots of timers at any moment, but very few are firing. This is what results when timers are used for timeouts. And the important thing there is the speed at which you can add and remove from the data structure... you want that to be 0(1) and with very low overhead. If that becomes O(log N), it's a problem.
Having priority queue will definitely make C# more interview friendly.
Having priority queue will definitely make C# more interview friendly.
Yes, that's true.
I'm looking for it for the same reason.
@stephentoub For the timeouts that never actually happen, that makes perfect sense to me. But I wonder what happens to the system when suddenly a lot of timeouts do start happening, because suddenly there is packet loss or an unresponsive server or whatever? Also do recurring timers ala System.Timer use the same implementation? There, the timeout expiring would be the 'happy path'.
But I wonder what happens to the system when suddenly a lot of timeouts do start happening
Try it. :)
We saw a good deal of real workloads suffering with the previous implementation. In every case we've looked at, the new one (which doesn't use a priority queue but instead just has a simple split between soon and not-soon timers) has fixed the issue, and we've not seen new ones with it.
Also do recurring timers ala System.Timer use the same implementation?
Yes. But they're generally spread out, often fairly evenly, in real-world applications.
5 years later, there is still no PriorityQueue.
Rx has a highly production-tested priority queue class:
5 years later, there is still no PriorityQueue.
Must not be a high enough priority...
They changed around the repo layout. New location is https://github.com/dotnet/reactive/blob/master/Rx.NET/Source/src/System.Reactive/Internal/PriorityQueue.cs
@stephentoub but maybe actually @eiriktsarpalis do we actually have any traction on this right now? If we have a finalized API design I would be willing to work on it
I haven't seen a declaration this is settled yet and I'm not sure if there can be a final api design without a designated killer app. But...
assuming the top killer app candidate is programming contests / teaching / interviews, I think Eric's design at top looks serviceable enough... and I still have my counterproposal sitting around (newly revised, still not withdrawn!)
The main differences I'm noticing between them are whether it should try to talk like a dictionary, and whether selectors should be a thing.
I'm starting to forget exactly why I wanted it to be a dictionary. Having an indexed assigner for updating priorities, and being able to enumerate keys with their priorities, seemed like goodness and very dictionary-like to me. Being able to visualize it as a dictionary in the debugger could be sweet too.
I think selectors actually kind of drastically change the expected class interface, FWIW. Selector should be something baked in at construction time, and if you have it, it removes the need for anyone to ever pass anyone else a priority value -- they should just call the selector if they want to know the priority. So you wouldn't want to have priority parameters in any method signatures at all except the selector's pretty much. So in that case it becomes a sort of more specialized 'PrioritySet' class. [For which impure selectors are a possible bug problem!]
@TimLovellSmith I understand the desire for this behavior, but because this has been a 5 year ask, wouldn't it be reasonable to just implement a array based binary heap with a selector and sharing the same API surface area as Queue.cs
? I think that is by far what most users need. In my opinion this data structure hasn't been implemented because we can't agree on an API design, but if we are making a PriorityQueue
I do believe we should just emulate the api design of Queue
.
@Jlalond Hence the proposal here, huh? I have no objection to heap based array implementation. I now recall the biggest objection I had versus selectors, is that its not easy enough to do priority updates right. A plain old queue doesn't have a priority update operation, but updating element priority is an important operations for many priority queue algorithms.
People should be forever thankful if we go with an API which doesn't require them to debug corrupted priority queue structures.
@TimLovellSmith Sorry Tim, I should've clarified the inheritance from IDictionary
.
Do we have a use case where the priority would change?
I like your implementation, but I think we can reduce the replication of IDictionary Behavior. I think we shouldn't inherit from IDictionary<>
but only ICollection, because I don't think the dictionary access patterns are intuitive,
However I do really think returning T and it's associated priority would make sense, but I don't know a use case where I would need to know the priority of an element within the data structure when enqueueing or dequeueing it.
@Jlalond
If we're have a priority queue, then it should support all operations that it simply and efficiently can, _and_ that are
'expected' to be there by people familiar with this kind of data structure / abstraction of one.
Updating priority belongs in the API because it falls into both those categories. Updating priority is important enough consideration in many algorithms that the complexity of doing the operation with heap data structures affects the complexity of the overall algorithm, and it is regularly measured, see:
https://en.wikipedia.org/wiki/Priority_queue#Specialized_heaps
TBH mainly its decrease_key which is interesting for algorithms and efficiency, and measured, so I'm personally fine for the API only has DecreasePriority(), and doesn't actually have UpdatePriority.
But for convenience it seems nicer not to bother users of the API with worrying about whether every change is an increase or decrease. So that joe developer has 'update' right there out of the box without having to wonder why it only supports decrease and not increase, (and which to use when), I think its best to have a general update priority API, but document that when using it for decrease it has cost X, when using it for increase it has cost Y because it is implemented the same as Remove+Add.
RE dictionary I've agreed. Removed it from my proposal in the name of simpler is better.
https://github.com/TimLovellSmith/corefxlab/blob/priorityQueueSpecSolved/docs/specs/priority-queue.md
I don't know a use case where I would need to know the priority of an element within the data structure when enqueueing or dequeueing it.
I'm not sure what this comment is about. You don't need to know the priority of an element in order to dequeue it. But you do need to know its priority when you enqueue it. You do maybe want to learn the final priority of an element when you dequeue it, as that value can have meaning e.g. distance.
@TimLovellSmith
I'm not sure what this comment is about. You don't need to know the priority of an element in order to dequeue it. But you do need to know its priority when you enqueue it. You do maybe want to learn the final priority of an element when you dequeue it, as that value can have meaning e.g. distance.
Sorry, I misunderstood what TPriorty meant in your key value pair.
Okay, last two questions, why they KeyValuePair with TPriority, and the best time I can see for a reprioritization is N log N, I agree it is valuable, but also what would that API look like?
the best time I can see for a reprioritization is N log N,
That's best time for reprioritization of N items as opposed to one item, right?
I'll clarify the doc: "UpdatePriority | O(log n)" should read "UpdatePriority (single item) | O(log n)".
@TimLovellSmith Makes sense, but wouldn't any update of priority actually ne O2 * (log n), as we'd need to remove the element and then reinsert it? Basing my assumption off this being a binary heap
@TimLovellSmith Makes sense, but wouldn't any update of priority actually ne O2 * (log n), as we'd need to remove the element and then reinsert it? Basing my assumption off this being a binary heap
Constant factors like that 2 are generally ignored in complexity analysis because they become mostly irrelevant as the size of N grows.
Understood, mostly wanted to know if Tim had any exciting ideas to only do
one operation :)
On Tue, Aug 18, 2020, 23:51 masonwheeler notifications@github.com wrote:
@TimLovellSmith https://github.com/TimLovellSmith Makes sense, but
wouldn't any update of priority actually ne O2 * (log n), as we'd need to
remove the element and then reinsert it? Basing my assumption off this
being a binary heapConstant factors like that 2 are generally ignored in complexity analysis
because they become mostly irrelevant as the size of N grows.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/dotnet/runtime/issues/14032#issuecomment-675887763,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AF76XTI3YK4LRUVTOIQMVHLSBNZARANCNFSM4LTSQI6Q
.
@Jlalond
No novel ideas, I was just ignoring an constant factor, but decreases are also different from increases
If the priority update is a decrease, you don't have to remove it and re-add it, it just bubbles up, so its 1 * O(log n) = O(log n).
If the priority update is an increase, you probably have to remove it and re-add it, so its 2 * O(log n) = still O (log n).
Someone else might have invented a better datastructure/algorithm for increase priority than remove + readd, but I haven't tried to find it, O(log n) seems good enough.
KeyValuePair
crept into the interface while it was a dictionary, but survived the removal of dictionary, mainly so that its possible to iterate the collection of _elements with their priorities_. And also so that you can dequeue elements with their priorities. But maybe for simplicity again, dequeue with priorities should only be in the 'advanced' version of Dequeue API. (TryDequeue :D)
I'll make that change.
@TimLovellSmith Cool, I look forward to your revised proposal. Can we submit it for review so that I can start work on this? Additionally, I know there was some impetus for a pairing queue to improve merge time, but I still think an array based binary heap would be the most overall performance. Thoughts?
@Jlalond
I've made the minor changes to simplify that Peek + Dequeue api.
Some ICollection
Same PR link. Is there another way I should submit it for review?
I don't mind which implementation it uses as long as its about as fast as array based heap or better.
@TimLovellSmith I actually know nothing about API review, but I imagine @danmosemsft can point us in the right direction.
My first question is what would
And I actually prefer binary heaps, but wanted to make sure we're all okay going in that direction
@eiriktsarpalis @layomia are the area owners and would have to shepherd this through API review. I haven't been following the discussion, have we reached a point of consensus?
As we've discussed in the past the core libraries are not the best place for all collections, especially those that are opinionated and/or niche. Eg., we ship annually - we cannot move as quickly as a library can. We have a goal to build a community around a collections pack to fill that gap. But 84 thumbs up on this one suggests there's a widespread need for this in the core libraries.
@Jlalond Sorry, I didn't quite get what the first question is supposed to be there. Were you asking why TPriority is generic? Or has to be a number? I doubt many folks will be using non-numeric priority types. But they might prefer to use a byte, int, long, float, double, or enum.
@TimLovellSmith Yep, just wanted to know if it was generic
@danmosemsft Thanks Dan. The number of unaddressed objections/comments I see to my proposal[1] is roughly zero, and it addresses any important questions that were left open by @ebickle's proposal at top (which it is becoming increasingly similar to).
So I claim it is passing sanity checks so far. There must be some review required still, we can talk about whether its useful to inherit IReadOnlyCollection (sounds not very useful but I should defer to the experts) and so on -- I guess that is what the api review process is for! @eiriktsarpalis @layomia may I ask you to give this a look?
[PS - based on thread sentiment, the current proposed killer app is coding competitions, interview questions, etc, where you just want nice simple API to use that works either classes or structs plus your favorite numeric type of the day, with the right O(n) and not too slow an implementation - and I'm happy to be guinea pig to apply it to a few problems. Sorry its not anything more exciting.]
Just speaking up because I do use priority queues all the time for 'real' projects, primarily for graph search (e.g. problem optimisation, route finding), and ordered extraction (by chunks (e.g. quad-tree nearest neighbours), scheduling (i.e. like the Timer discussion above)). I have a couple of projects I could plug an implementation into directly for sanity checking/testing, if it helps. The current proposal looks good to me (would work for all my purposes, if not an ideal fit). I have a couple of small observations:
TElement
appears a couple of times where it should be TKey
bool EnqueueOrUpdateIfHigherPriority
for ease of use (and in some cases efficiency), which often ends up being the only enqueueing/updating method that gets used (e.g. in graph search). Obviously it's non-essential and would add additional complexity to the API, but it is a very nice thing to have.Enqueue
(I don't mean Add
): is one meant to be bool TryEnqueue
?EqualityComparer
is a bit unexpected: I'd have expected KeyComparer
to go with PriorityComparer
.CopyTo
and ToArray
.@VisualMelon
Thank you very much for the review!
I like the naming suggestion of KeyComparer. The decrease priority API sounds very useful. How about we add one or both of those APIs. Since we're using in the 'least priority comes first' model, would you only use decrease? Or would you wish for increase too?
public void EnqueueOrIncreasePriority(TKey key, TPriority priority); // doesn't throw, only updates priority of keys already in the collection if new priority is higher
public void EnqueueOrDeccreasePriority(TKey key, TPriority priority); // doesn't throw, only updates priority of keys already in the collection new priroity is lower
The reason I thought enumerating returns the least element first was to be consistent with the mental model that Peek() returns the element at the front of the collection. And also no comparisons are necessary to implement it if its a binary heap, so it seemed like a freebie. But maybe it is less free than I think - unwarranted complexity? I'm happy to take it out for that reason.
Two Enqueues was accidental. We do have EnqueueOrUpdate, where priority always gets updated. I guess TryUpdate is the logical inverse of that, where priority should never be updated for items already in the collection. I can see it fills that logical gap but I'm not sure whether it is an API people will want in practice.
I can only imagine using the decreasing variant: it's a natural operation in route finding to want to enqueue a node or replace an existing node with one that has a lower priority; I couldn't immediately suggest a use for the opposite. A boolean
return parameter can be handy should you need to perform any operation when the item is queued/not queued.
I wasn't assuming a binary heap, but if it's free then I have no issue with the first element always being the minimum, it just seemed an odd constraint.
@VisualMelon @TimLovellSmith Would one API be fine?
EnqueueOrUpdatePriority
? I imagine just the ability to reposition a node within the queue is valuable for traversal algorithms even if it increases or decreases priority
@Jlalond yeah, I mention it because it can be helpful for efficiency depending on the implementation, and directly encodes the use case of only wanting to queue something if it improves on what you already have in the queue. I won't presume to say whether the additional complexity is appropriate for such a general purpose tool, but it certainly isn't necessary.
Updated to remove EnqueueOrIncreasePriority, keeping EnqueueOrUpdate and EnqueueOrDecrease. Letting them return bool when an item is newly added to the collection, like HashSet.Add().
@Jlalond I apologize for the above error (deletion of your latest comment asking when this issue will be reviewed).
I just wanted to mention that @eiriktsarpalis will be back next week and we'll discuss prioritizing this feature and reviewing the APIs then.
This is something we are considering but are not committed yet for .NET 6.
I see that this thread got revived by starting off from scratch and forking, even though we had in-depth discussions on this topic in 2017 for several months (the majority of this huge thread) and collectively produced this proposal — scroll above to see. This is also the proposal that @karelz linked in the first line of the first post for visbility:
See LATEST Proposal in corefxlab repo.
As of 2017-06-16, we were waiting for an API review that has never happened and the status was:
If the API is approved in its current form (two classes), I would create another PR to CoreFXLab with an implementation for both using a quaternary heap (see implementation). The
PriorityQueue<T>
would just use one array beneath, whilePriorityQueue<TElement, TPriority>
would use two -- and rearrange their elements together. This could spare us additional allocations and be as efficient as we can. I'll add that once there is green light.
I recommend to continue iterating on the proposal we made in 2017, starting from a formal review that has not happened yet. This will allow us to avoid forking and iterate on the back of a proposal put together after hundreds of posts exchanged by community members, also out of respect for the effort put into this by everyone involved. I am happy to discuss it if there is feedback.
@pgolebiowski Thanks for coming back to the discussion. I would like to apologize for effectively forking the proposals even further, which is not the most effective way to collaborate. It was an impulsive move on my part. (I had the impression that the discussion had completely stalled due to there being too many open questions in the proposals too far, and just needed a more opinionated proposal.)
How about if we try to move forward on this, by at least temporarily forgetting the code content of my PR, and resume discussing here the identified 'open issues'? I would like to give my opinion on all of them.
Class name PriorityQueue vs. Heap <-- I think it is discussed already and consensus is that PriorityQueue is better.
Introduce IHeap and constructor overload? <-- I don't foresee a lot of value. I think for 95% of the world there is not a compelling reason (in e.g. performance delta) to have multiple heap implementations, and the other 5% will probably have to write their own.
Introduce IPriorityQueue? <-- I also don't think its needed. I am assuming other languages and frameworks get along just fine without these interfaces in their standard library. Let me know if that is incorrect.
Use selector (of priority stored inside the value) or not (5 APIs difference) <-- I don't see a strong argument in favor of supporting selectors in the priority queue. I feel that IComparer<>
already serves as a really good minimal extensibility mechanism for comparing item priorities, and selectors doesn't offer any significant improvement. Also people are likely to be confused about how to use selectors with mutable priorities (or how NOT to use them).
Use tuples (TElement element, TPriority priority) vs. KeyValuePairKeyValuePair
is the preferred option for a priority queue where items have updatable priorities, since the items are being treated as keys in a set/dictionary.
Should Peek and Dequeue rather have out argument instead of tuple? <-- I'm not sure of all the pros and cons, especially performance. Should we just pick the one which performs better in a simple benchmark test?
Is Peek and Dequeue throwing useful at all? <-- Yes... It is what should happen for algorithms which assume incorrectly that there are still items in the queue. If you don't want algorithms to ever assume that, the best thing to do is not provide Peek and Dequeue apis at all, but just provide TryPeek and TryDequeue. [I would guess that there are algorithms which can provably safely call Peek or Dequeue in certain cases, and having Peek/Dequeue is a slight performance+usability improvement for those.]
Aside from that I also have some suggestions to consider adding to the proposal under consideration:
PriorityQueue
PriorityQueue
It would be convenient for lazy programmers if PriorityQueue
Smallest priority first is best. Because priority queues are used for a lot are optimization problems, with 'minimal cost'.
Enumeration should not provide any ordering guarantees.
Open issue - handles:
PriorityQueue
A thought to help move this faster, it may make sense to move this type to the "Community Collections" library being discussed in https://github.com/dotnet/runtime/discussions/42205
I agree that smallest priority first is best. A priority queue is also useful in the development of turn-based gaming, and being able to have the priority be a monotonically-increasing counter of when each element gets its next turn is very helpful.
We are looking at bringing this is in for API review ASAP (though we are not yet committed at delivering an implementation within the .NET 6 timeframe).
But before we send this for review, we will need to iron out some of the high-level open questions. I think @TimLovellSmith's write up is a good starting point towards reaching that goal.
A few remarks on those points:
Consensus on questions 1-3 is long established, I think we could treat those as resolved.
_Use selector (of priority stored inside the value) or not_ -- Agreed with your remarks. Another issue with that design is that selectors are optional, which means you need to be careful not to call the wrong Enqueue
overload (or risk getting an InvalidOperationException
).
I favor using a tuple over KeyValuePair
. Using a .Key
property to access a TPriority
value feels strange to me. A tuple lets you use .Priority
which has better self-documentation properties.
_Should Peek and Dequeue rather have out argument instead of tuple?_ -- I think we should just follow established convention in similar methods elsewhere. So probably use an out
argument.
_Is Peek and Dequeue throwing useful at all?_ -- Agreed 100% with your comments.
_Smallest priority first is best_ -- Agreed.
_Enumeration should not provide any ordering guarantees_ -- Might this not violate user expectations? What are the trade-offs? NB we can probably defer details like this for API review discussion.
I would like to rephrase a few other open questions as well:
Do we ship PriorityQueue<T>
, PriorityQueue<TElement, TPriority>
or both? -- Personally I think we should only implement the latter, since it seems to provide a cleaner and more general-purpose solution. In principle priority should not be a property intrinsic to the queued element, so we should not force users to encapsulate it in wrapper types.
Do we require element uniqueness, up to equality? -- Correct me if I'm wrong, but I feel uniqueness is an artificial constraint, forced on us by the requirement to support update scenaria without resorting to a handle approach. It also complicates the API surface, since we now also need to worry about elements having the right equality semantics (what's the appropriate equality when elements are large DTOs?). I can see three possible paths here:
Enqueue
method variants for users who need them. If handle allocations are a big enough concern, I'm wondering whether those could be amortized à la ValueTask?Do we support merging queues? -- Consensus seems to be no, since we're only shipping one implementation (presumably using array heaps) where merging is not efficient.
What interfaces should it implement? -- I saw a few proposals recommending IQueue<T>
, but this feels like a leaky abstraction. I would personally prefer to keep it simple and just implement ICollection<T>
.
cc @layomia @safern
@eiriktsarpalis On the contrary - there's nothing artificial about the constraint to support updates at all!
Algorithms with priority queues often update priorities of elements. The question is, do you provide an object-oriented API, which 'just works' for updating priorities of ordinary objects... or do you force people to
a) understand the handle model
b) keep additional data in memory, such as extra properties or an external dictionary, to keep track of the handles of objects, on their side, just so they can do updates (must go with the dictionary for classes they can't change? or upgrade the objects into tuples etc)
c) 'memory manage', or 'garbage collect' external structures i.e. clean up handle for items which are no longer in the queue, when using the dictionary approach
d) not confuse opaque handles in contexts with multiple queues since they are only meaningful in context of a single priority queue
Plus there is this philosophical question around the whole reason anyone would want a queue that _keeps track of objects by priority_ to behave this way: why should the same object (equals returns true) have two _different_ priorities? If they are really supposed to have different priorities, why aren't they modelled as different objects? (or upconverted to tuples, with distinguishers?)
Also for handles, there has to be some internal table of handles in the priority queue just so that handles will actually work. I think this is equivalent work to keeping a dictionary to look up objects in the queue.
PS: every .net object already supports concepts of equality/uniqueness, so its not a big ask to 'require' it.
Regarding KeyValuePair
, I'd agree it's not ideal (though in the proposal, Key
is for the element, Value
for the priority, which is consistent with the various Sorted
data-types in the BCL), and it's appropriateness would depend on a decision regarding uniqueness. Personally, I would _much_ prefer a dedicated well-named struct
over a tuple anywhere on the public API, especially so for input.
Regarding uniqueness, that is a fundamental concern, and I don't think anything else can be decided until that is. I would favour element uniqueness as defined by the (optional) comparator (as per both existing proposals and suggestion i) if the goal is a single easy-to-use and general purpose API. Unique vs non-unique is a big rift, and I use both types for different purposes. The former is 'harder' to implement, and covers the most (and more typical) use cases (just my experience) while being harder to mis-use. Those use cases that _require_ non-uniqueness should (IMO) be serviced by a different type (e.g. a plain old binary heap), and I would appreciate having both available. This is essentially what the original proposal linked by @pgolebiowski provides (as I understand it) modulo a (simple) wrapper. _Edit: Nope, that wouldn't support tied priorities_
On the contrary - there's nothing artificial about the constraint to support updates at all!
I'm sorry, I didn't mean to indicate that support for updates is artificial; rather the requirement for uniqueness is introduced artificially to support updates.
PS: every .net object already supports concepts of equality/uniqueness, so its not a big ask to 'require' it
Sure, but occasionally the equality semantics that come with the type might not be the desirable one (e.g. reference equality, full-blown structural equality, etc.). I'm just pointing out that equality is hard and forcing it into the design brings in a whole new class of potential user bugs.
@eiriktsarpalis Thanks for clarifying that. But is it really artificial? I don't think it is. Its just another natural solution.
The api has to be _well-defined_. You can't provide an update API without requiring the user to be specific about exactly what they want to update. Handles, and object equality are just two different approaches to building the well-defined API.
Handle approach: every time you add an object to the collection, you must give it a 'name', i.e. a 'handle', so that you can refer to that exact object without ambiguity later in the conversation.
Object uniqueness approach: every time you add an object to the collection, it must be a different object, or you must specify how to handle the case the object already exists.
Unfortunately, only the object approach really allows you to support some of the most useful high-level API methods property , like 'EnqueueIfNotExists' or 'EnqueueOrDecreasePriroity(item)' - they make no sense to provide in a handle-centric design, because you can't not know if the item already exists in the queue (since its your job to keep track of that, with handles).
One of the most telling criticisms of the handle approach, or dropping the uniqueness constraint to me, is that it makes all sorts of scenarios with updatable priority much more complicated to implement:
e.g.
PS
Sure, but occasionally the equality semantics that come with the type might not be the desirable one
There shall be escape hatches, like IEqualityComparer, or upconverting to a richer type.
Thanks for the feedback 🥳 Will update the proposal over the weekend, taking into account all new input, and share a new revision for another round. ETA 2020-09-20.
The .NET Core community proposes to add to the system library _priority queue_ functionality, a data structure in which each element additionally has a priority associated with it. Specifically, we propose to add PriorityQueue<TElement, TPriority>
to the System.Collections.Generic
namespace.
In our design, we were guided by the following tenets (unless you know better ones):
Conceptually, a priority queue is a collection of elements, where each element has an associated priority. The most important functionality of a priority queue is that it provides efficient access to the element with the highest priority in the collection, and an option to remove that element. Expected behavior may also include: 1) ability to modify the priority of an element that is already in the collection; 2) ability to merge multiple priority queues.
A priority queue is an abstract data structure, i.e. it is a concept with certain behavioral characteristics, as described in the previous section. Most efficient implementations of a priority queue are based on heaps. However, contrary to the general misconception, a heap is also an abstract data structure, and can be realized in various ways, each offering different benefits and disadvantages.
Most software engineers are familiar only with the array-based binary heap implementation — it is the simplest one, but unfortunately not the most efficient one. For general random input, two examples of more efficient heap types are: quaternary heap and pairing heap. For more information on heaps, please refer to Wikipedia and this paper.
Our discussions have demonstrated that the most challenging area in the design, and at the same time with the greatest impact on the API, is the update mechanism. Specifically, the challenge is to determine whether and how the product we want to offer customers should support updating priorities of elements already present in the collection.
Such capability is necessary to implement e.g. Dijkstra’s shortest path algorithm or a job scheduler that needs to handle changing priorities. The update mechanism is missing in Java, which has shown to be disappointing for engineers, e.g. in these three StackOverflow questions viewed over 32k times: example, example, example. To avoid introducing API with such limited value, we believe that a fundamental requirement for the priority queue functionality we deliver would be to support the ability of updating priorities for elements already present in the collection.
To deliver the update mechanism, we must ensure that the customer can be specific about what exactly they want to update. We’ve identified two ways of delivering this: a) via handles; and b) via enforcing uniqueness of elements in the collection. Each of these comes with different benefits and costs.
Option (a): Handles. In this approach, every time an element is added to the queue, the data structure provides its unique handle. If the customer wants to use the update mechanism, they need to keep track of such handles so that they can later unambiguously specify which element they wish to update. The primary cost of this solution is that customers need to manage these pointers. However, it does not mean that there need to be any internal allocations to support handles within the priority queue — any non-array-based heap is based on nodes, where each node is automatically its own handle. As an example, see the API of the PairingHeap.Update method.
Option (b): Uniqueness. This approach puts two additional constraints on the customer: i) elements within the priority queue must conform to certain equality semantics, which brings a new class of potential user bugs; ii) two equal elements cannot be stored in the same queue. By paying this cost, we get the benefit of supporting the update mechanism without resorting to the handle approach. However, any implementation that leverages uniqueness/equality to determine the element to update will require an extra internal mapping, so that it’s done in O(1) and not in O(n).
We recommend to add to the system library a PriorityQueue<TElement, TPriority>
class that supports the update mechanism through handles. The underlying implementation would be a pairing heap.
public class PriorityQueueNode<TElement, TPriority>
{
public TElement Element { get; }
public TPriority Priority { get; }
}
public class PriorityQueue<TElement, TPriority> :
IEnumerable<PriorityQueueNode<TElement, TPriority>>
{
public PriorityQueue();
public PriorityQueue(IComparer<TPriority> comparer);
public IComparer<TPriority> Comparer { get; }
public int Count { get; }
public bool IsEmpty { get; }
public void Clear();
public bool Contains(TElement element); // O(n)
public bool TryGetNode(TElement element, out PriorityQueueNode<TElement, TPriority> node); // O(n)
public PriorityQueueNode<TElement, TPriority> Enqueue(TElement element, TPriority priority); //O(log n)
public PriorityQueueNode<TElement, TPriority> Peek(); // O(1)
public bool TryPeek(out PriorityQueueNode<TElement, TPriority> node); // O(1)
public void Dequeue(out TElement element); // O(log n)
public bool TryDequeue(out TElement element, out TPriority priority); // O(log n)
public void Update(PriorityQueueNode<TElement, TPriority> node, TPriority priority); // O(log n)
public void Remove(PriorityQueueNode<TElement, TPriority> node); // O(log n)
public void Merge(PriorityQueue<TElement, TPriority> other) // O(1)
public IEnumerator<PriorityQueueNode<TElement, TPriority>> GetEnumerator();
IEnumerator IEnumerable.GetEnumerator();
public struct Enumerator : IEnumerator<PriorityQueueNode<TElement, TPriority>>
{
public PriorityQueueNode<TElement, TPriority> Current { get; }
object IEnumerator.Current { get; }
public bool MoveNext() => throw new NotImplementedException();
public void Reset() => throw new NotImplementedException();
public void Dispose() => throw new NotImplementedException();
}
}
var queue = new PriorityQueue<Job, double>();
queue.Enqueue(firstJob, 10);
queue.Enqueue(secondJob, 5);
queue.Enqueue(thirdJob, 40);
var withHighestPriority = queue.Peek(); // { element: secondJob, priority: 5 }
var queue = new PriorityQueue<Job, double>();
var mapping = new Dictionary<Job, PriorityQueueNode<Job, double>>();
mapping[job] = queue.Enqueue(job, priority);
queue.Update(mapping[job], newPriority);
In undefined order, so that the enumeration can happen in O(n), similarly as with a HashSet
. No efficient implementation would provide capabilities to enumerate a heap in linear time while ensuring that elements are enumerated in order — this would require O(n log n). Because ordering on a collection can be trivially achieved with .OrderBy(x => x.Priority)
and not every customer cares about enumeration with this order, we believe it is better to provide undefined enumeration order.
| Language | Type | Notes |
|:-:|:-:|:-:|
| Java | PriorityQueueAbstractQueue
and implements the interface Queue
. |
| Rust | BinaryHeap | |
| Swift | CFBinaryHeap | |
| C++ | priority_queue | |
| Python | heapq | |
| Go | heap | There is a heap interface. |
Note that when discussing data structures, the term _heap_ is used 4 times more often than _priority queue_.
"array" AND "data structure"
— 17.400.000 results"stack" AND "data structure"
— 12.100.000 results"queue" AND "data structure"
— 3.850.000 results"heap" AND "data structure"
— 1.830.000 results"priority queue" AND "data structure"
— 430.000 results"trie" AND "data structure"
— 335.000 resultsPlease review, am happy to receive feedback and keep iterating :) I feel that we are converging! 😄 Also am happy to receive questions — will add them to the FAQ!
@pgolebiowski I don't tend to use handle-based APIs (I'd expect to be wrapping this as per example 2 were I to refit any existing code) but it mostly looks good to me. I might have a go with the implementation you mention to see how it feels, but here are some comments off the bat:
TryDequeue
returns a node, because it isn't really a handle at that point (I'd prefer two separate out parameters)Merge
should be PriorityQueue'2
not PriorityQueueNode'2
, and can you clarify the behaviour? I'm not familiar with the pairing heap, but presumably the two heaps overlap thereafterContains
for the 2-parameter method: that is not a name I would guess for a TryGet
style methodIEqualityComparer<TElement>
for the purpose of Contains
?Remove
returns is a bool
; I'd expect it was called TryRemove
or that it threw (which I'm assuming Update
does if the node isn't in the heap).@VisualMelon thanks a lot for the feedback! Will quickly resolve these, definitely agree:
- It seems odd that
TryDequeue
returns a node, because it isn't really a handle at that point (I'd prefer two separate out parameters)- The parameter for
Merge
should bePriorityQueue'2
notPriorityQueueNode'2
, and can you clarify the behaviour? I'm not familiar with the pairing heap, but presumably the two heaps overlap thereafter- It's weird that
Remove
returns is abool
; I'd expect it was calledTryRemove
or that it threw (which I'm assumingUpdate
does if the node isn't in the heap).- I'm not a fan of the name
Contains
for the 2-parameter method: that is not a name I would guess for aTryGet
style method
Clarification for these two:
- Will it be stable in that if two elements are queued with the same priority they dequeue in a predicable order? (nice to have for reproducibility; can be implemented easily enough by the consumer otherwise)
- There doesn't seem to be a way to efficiently determine whether a node is still in the heap (not sure when I'd use this, thought)
For the first point, if the goal is reproducibility, then the implementation will be deterministic, yes. If the goal is _if two elements which are placed into the queue with the same priority, it follows that they will come out in the same order in which they were placed into the queue_ — I'm not sure if the implementation can be adjusted to achieve this property, a quick answer would be "probably no".
For the second point, yes, heaps are not good for checking if an element exists in the collection, a customer would have to keep track of this separately to achieve this in O(1) or reuse the mapping they use for the handles, if they use the update mechanism. Otherwise, O(n).
- Should the class support a custom
IEqualityComparer<TElement>
for the purpose ofContains
?
Hmm... I'm starting to think that perhaps putting Contains
into responsibility of this priority queue may be too much, and using the method from Linq
may suffice (Contains
has to be applied on enumeration anyway).
@pgolebiowski thanks for the clarifications
For the first point, if the goal is reproducibility, then the implementation will be deterministic, yes
Not so much deterministic, as guaranteed forever (e.g. even the implementation changes, the behaviour would not), so I think the answer I was after is 'no'. That's fine: the consumer can add a sequence ID to the priority if they need this, though at that point a SortedSet
would do the job.
For the second point, yes, heaps are not good for checking if an element exists in the collection, a customer would have to keep track of this separately to achieve this in O(1) or reuse the mapping they use for the handles, if they use the update mechanism. Otherwise, O(n).
Does it not require a subset of the work necessary for Remove
? I may not have been clear: I meant given a PriorityQueueNode
, checking if it is in the heap (not a TElement
).
Hmm... I'm starting to think that perhaps putting Contains into responsibility of this priority queue may be too much
I wouldn't complain if Contains
wasn't there: it's also a trap for people who don't realise they should be using handles.
@pgolebiowski You seem pretty strongly in favor of handles, am I right in thinking its for efficiency reasons?
From an efficiency point of view, I suspect handles are truly the best, for both scenarios with uniqueness and without, so I'm fine with that as being the primary solution offered by the framework.
At all:
From a usability point of view, I think duplicate elements is rarely what I want, which still leaves me wondering whether it is valuable for framework to provide support for both models. But... a unique-element-friendly "PrioritySet" class would at least be easy to add later as a wrapper for the proposed PriorityQueue
, e.g. in response to continued demand for a more friendly API. (if the demand exists. As I think it might!)
For the current API proposed itself, a couple thoughts/questions:
TryPeek(out TElement element, out TPriority priority)
?Remove(PriorityQueueNode)
throw if the node isn't found? or return false?TryRemove()
version that doesn't throw if Remove throws?Contains()
api is useful most of the time? 'Contains' seems to be in the eye of the beholder, especially for scenarios with 'duplicate' elements with distinct priorities, or other distinct features! In which case the end user probably must do their own search anyway. But at least it can be useful for the scenario with no duplicates.@pgolebiowski Thanks for taking the time to draft the new proposal! A few comments on my side:
Contains()
or TryGetNode()
should exist at all in the API in its currently proposed form. They imply equality for TElement
is significant, which is presumably one of the things a handle-based approach was trying to avoid.public void Dequeue(out TElement element);
as public TElement Dequeue();
TryDequeue()
method need to return a priority?ICollection<T>
or IReadOnlyCollection<T>
?Most software engineers are familiar only with the array-based binary heap implementation — it is the simplest one, but unfortunately not the most efficient one. For general random input, two examples of more efficient heap types are: quaternary heap and pairing heap.
What are the trade-offs for choosing the latter approaches? Is it possible to use an array-based implementation for those?
It can't implement ICollection<T>
or when it doesn't have the right signatures for 'Add' and 'Remove' etc.IReadOnlyCollection<T>
Its much closer in spirit/shape to ICollection<KeyValuePair<T,Priority>>
Its much closer in spirit/shape to
ICollection<KeyValuePair<T,Priority>>
Wouldn't it be KeyValuePair<TPriority, TElement>
, since the ordering is done by TPriority, which is effectively a keying mechanism?
OK, it seems like we are overall in favor of dropping the methods Contains
and TryGet
. Will remove them in the next revision and explain the removal reason in the FAQ.
As for the interfaces implemented — isn't IEnumerable<PriorityQueueNode<TElement, TPriority>>
sufficient? What kind of functionality is missing?
On the KeyValuePair
- there were a few voices that a tuple or a struct with .Element
and .Priority
are more desirable. I think I'm in favor of these.
Wouldn't it be
KeyValuePair<TPriority, TElement>
, since the ordering is done by TPriority, which is effectively a keying mechanism?
There are good arguments for both sides. On the one hand, yes, exactly what you just said. On the other hand, keys in a KVP collection are generally expected to be unique, and it's perfectly valid to have multiple elements with the same priority.
On the other hand, keys in a KVP collection are generally expected to be unique
I'd disagree with this statement - a collection of key-value pairs is just that; any uniqueness requirements are layered on top of that.
isn't
IEnumerable<PriorityQueueNode<TElement, TPriority>>
sufficient? What kind of functionality is missing?
I'd personally expect IReadOnlyCollection<PQN<TElement, TPriority>>
to be implemented, since the provided API already mostly satisfies that interface. Plus, that would be consistent with other collection types.
Regarding interface:
```
public bool TryGetNode(TElement element, out PriorityQueueNode
What's the point of it if I can just do enumerate collection and do comparison of elements? And why only Try version?
public PriorityQueueNode<TElement, TPriority> Peek(); // O(1)
public void Dequeue(out TElement element); // O(log n)
I find it a bit strange to have discrepancy between Dequeue and Peek. They do pretty much same thing expect one is removing element from queue and other is not, it's looks weird for me if one returns priority and element and other just element.
public void Remove(PriorityQueueNode<TElement, TPriority> node); // O(log n)
`Queue` doesn't have `Remove` method, why `PriorityQueue` should ?
I would also like to see constructor
public PriorityQueue(IEnumerable
```
I would also like PriorityQueue
to implement IReadonlyCollection<>
interface.
Jumping to other things than public API surface.
Such capability is necessary to implement e.g. Dijkstra’s shortest path algorithm or a job scheduler that needs to handle changing priorities. The update mechanism is missing in Java, which has shown to be disappointing for engineers, e.g. in these three StackOverflow questions viewed over 32k times: example, example, example. To avoid introducing API with such limited value, we believe that a fundamental requirement for the priority queue functionality we deliver would be to support the ability of updating priorities for elements already present in the collection.
I would like to disagree. Last time I wrote Dijkstra in C++ std::priority_queue was sufficient enough and it doesn't handles priority update. AFAIK common consensus for that case is to add fake element into queue with altered priority and value and monitor have we process value or not. Same can be done with job scheduler.
To be honest I'm not sure how Dijkstra would look like with current queue proposal. How I'm suppose to keep track of nodes I need update priority? With TryGetNode()? Or have another collection of nodes? Would love to see code for current proposal.
If you look into Wikipedia, there is no assumption of updating priorities for priority queue. Same for all other languages which doesn't have that functionality and got away with that. I know "strive to be better", but is there actually demand for that?
For general random input, two examples of more efficient heap types are: quaternary heap and pairing heap. For more information on heaps, please refer to Wikipedia and this paper.
I've looked into paper and this is quote from it:
The results show that the optimal choice of implementation is strongly input-dependent. Furthermore, it shows that care must be taken to optimize for cache performance, primarily at the L1-L2 barrier. This suggests that complicated, cache-oblivious structures are unlikely to perform well compared to simpler, cache-aware structures.
From what it looks current queue proposal would be locked behind tree implementation rather than array, and something tells me what there is chance what tree nodes scattered around memory would be not as performant as array of elements.
I think having benchmarks to compare simple binary heap based on array and pairing heap would be ideal to make proper decision, before that, I don't think it's smart to lock design behind specific implementation (I'm looking on you Merge
method).
Jumping to another topic, I would personally prefer to have KeyValuePair
Downside is mixed feelings looking on TPriority Key
but I think good documentation can solve that.
`
As a workaround for Dijkstra, adding fake elements into queue works, and total number of graph edges you process doesn't change. But the number of the temporary nodes staying resident in the heap generated by processing edges changes, and that can impact the memory usage and efficiency of enqueue and dequeue.
I was wrong about not being able to do IReadOnlyCollection, that should be fine. No Add() and Remove() in that interface, hah! (What was I thinking...)
@Ivanidzo4ka Your comments convince me even more that it would make sense to have two separate types: one a simple binary heap (i.e. with no update), and a second as described by one of the proposals:
In the past, I've mostly got away with a binary heap I wrote the best part of a decade ago, and a SortedSet
/Dictionary
combination, and there are scenarios where I've used both types in separate roles (KSSP springs to mind).
It's a struct rather than class so no NullReference exceptions
I'd argue it's the other way around: if someone is passing default values around, I'd love them to see an NRE; however, I think we need to be clear where we expect these things to be used: nodes/handles should probably be classes, but if we are just reading/returning a pair, then I agree it should be a struct.
I'm tempted to suggest that it should be possible to update the element as well as the priority in the handles based proposal. You can achieve the same effect by just remove a handle and adding a new one, but it is a useful operation, and may have performance benefits depending on the implementation (e.g. some heaps can reduce the priority of something relatively cheaply). Such a change would make it tidier to implement many things (e.g. this somewhat nightmare inducing example based on the existing AlgoKit ParingHeap), especially those that operate in an unknown region of the state space.
The .NET Core community proposes to add to the system library _priority queue_ functionality, a data structure in which each element additionally has a priority associated with it. Specifically, we propose to add PriorityQueue<TElement, TPriority>
to the System.Collections.Generic
namespace.
In our design, we were guided by the following tenets (unless you know better ones):
Conceptually, a priority queue is a collection of elements, where each element has an associated priority. The most important functionality of a priority queue is that it provides efficient access to the element with the highest priority in the collection, and an option to remove that element. Expected behavior may also include: 1) ability to modify the priority of an element that is already in the collection; 2) ability to merge multiple priority queues.
A priority queue is an abstract data structure, i.e. it is a concept with certain behavioral characteristics, as described in the previous section. Most efficient implementations of a priority queue are based on heaps. However, contrary to the general misconception, a heap is also an abstract data structure, and can be realized in various ways, each offering different benefits and disadvantages.
Most software engineers are familiar only with the array-based binary heap implementation — it is the simplest one, but unfortunately not the most efficient one. For general random input, two examples of more efficient heap types are: quaternary heap and pairing heap. For more information on heaps, please refer to Wikipedia and this paper.
Our discussions have demonstrated that the most challenging area in the design, and at the same time with the greatest impact on the API, is the update mechanism. Specifically, the challenge is to determine whether and how the product we want to offer customers should support updating priorities of elements already present in the collection.
Such capability is necessary to implement e.g. Dijkstra’s shortest path algorithm or a job scheduler that needs to handle changing priorities. The update mechanism is missing in Java, which has shown to be disappointing for engineers, e.g. in these three StackOverflow questions viewed over 32k times: example, example, example. To avoid introducing API with such limited value, we believe that a fundamental requirement for the priority queue functionality we deliver would be to support the ability of updating priorities for elements already present in the collection.
To deliver the update mechanism, we must ensure that the customer can be specific about what exactly they want to update. We’ve identified two ways of delivering this: a) via handles; and b) via enforcing uniqueness of elements in the collection. Each of these comes with different benefits and costs.
Option (a): Handles. In this approach, every time an element is added to the queue, the data structure provides its unique handle. If the customer wants to use the update mechanism, they need to keep track of such handles so that they can later unambiguously specify which element they wish to update. The primary cost of this solution is that customers need to manage these pointers. However, it does not mean that there need to be any internal allocations to support handles within the priority queue — any non-array-based heap is based on nodes, where each node is automatically its own handle. As an example, see the API of the PairingHeap.Update method.
Option (b): Uniqueness. This approach puts two additional constraints on the customer: i) elements within the priority queue must conform to certain equality semantics, which brings a new class of potential user bugs; ii) two equal elements cannot be stored in the same queue. By paying this cost, we get the benefit of supporting the update mechanism without resorting to the handle approach. However, any implementation that leverages uniqueness/equality to determine the element to update will require an extra internal mapping, so that it’s done in O(1) and not in O(n).
We recommend to add to the system library a PriorityQueue<TElement, TPriority>
class that supports the update mechanism through handles. The underlying implementation would be a pairing heap.
public class PriorityQueueNode<TElement, TPriority>
{
public TElement Element { get; }
public TPriority Priority { get; }
}
public class PriorityQueue<TElement, TPriority> :
IEnumerable<PriorityQueueNode<TElement, TPriority>>,
IReadOnlyCollection<PriorityQueueNode<TElement, TPriority>>
{
public PriorityQueue();
public PriorityQueue(IComparer<TPriority> comparer);
public IComparer<TPriority> Comparer { get; }
public int Count { get; }
public bool IsEmpty { get; }
public void Clear();
public PriorityQueueNode<TElement, TPriority> Enqueue(TElement element, TPriority priority); //O(log n)
public PriorityQueueNode<TElement, TPriority> Peek(); // O(1)
public bool TryPeek(out PriorityQueueNode<TElement, TPriority> node); // O(1)
public bool TryPeek(out TElement element, out TPriority priority); // O(1)
public bool TryPeek(out TElement element); // O(1)
public PriorityQueueNode<TElement, TPriority> Dequeue(); // O(log n)
public void Dequeue(out TElement element, out TPriority priority); // O(log n)
public bool TryDequeue(out PriorityQueueNode<TElement, TPriority> node); // O(log n)
public bool TryDequeue(out TElement element, out TPriority priority); // O(log n)
public bool TryDequeue(out TElement element); // O(log n)
public void Update(PriorityQueueNode<TElement, TPriority> node, TPriority priority); // O(log n)
public void Remove(PriorityQueueNode<TElement, TPriority> node); // O(log n)
public void Merge(PriorityQueue<TElement, TPriority> other) // O(1)
public IEnumerator<PriorityQueueNode<TElement, TPriority>> GetEnumerator();
IEnumerator IEnumerable.GetEnumerator();
public struct Enumerator : IEnumerator<PriorityQueueNode<TElement, TPriority>>
{
public PriorityQueueNode<TElement, TPriority> Current { get; }
object IEnumerator.Current { get; }
public bool MoveNext() => throw new NotImplementedException();
public void Reset() => throw new NotImplementedException();
public void Dispose() => throw new NotImplementedException();
}
}
var queue = new PriorityQueue<Job, double>();
queue.Enqueue(firstJob, 10);
queue.Enqueue(secondJob, 5);
queue.Enqueue(thirdJob, 40);
var withHighestPriority = queue.Peek(); // { element: secondJob, priority: 5 }
var queue = new PriorityQueue<Job, double>();
var mapping = new Dictionary<Job, PriorityQueueNode<Job, double>>();
mapping[job] = queue.Enqueue(job, priority);
queue.Update(mapping[job], newPriority);
In undefined order, so that the enumeration can happen in O(n), similarly as with a HashSet
. No efficient implementation would provide capabilities to enumerate a heap in linear time while ensuring that elements are enumerated in order — this would require O(n log n). Because ordering on a collection can be trivially achieved with .OrderBy(x => x.Priority)
and not every customer cares about enumeration with this order, we believe it is better to provide undefined enumeration order.
Contains
or TryGet
method?Providing such methods has negligible value, because finding an element in a heap requires enumeration of the entire collection, meaning that any Contains
or TryGet
method would be a wrapper around enumeration. Further, to check if an element exists in collection, the priority queue would have to be aware of how to conduct equality checks of TElement
objects, which is not something we believe should fall into the responsibility of a priority queue.
Dequeue
and TryDequeue
overloads that return PriorityQueueNode
?This is for customers who want to use the Update
or Remove
method and keep track of handles. While dequeuing an element from the priority queue, they would receive a handle that they could use to adjust the state of their handle tracking system.
Update
or Remove
method receives a node from a different queue?The priority queue will throw an exception. Each node is aware of the priority queue it belongs to, similarly as a LinkedListNode<T>
is aware of the LinkedList<T>
it belongs to.
| Language | Type | Notes |
|:-:|:-:|:-:|
| Java | PriorityQueueAbstractQueue
and implements the interface Queue
. |
| Rust | BinaryHeap | |
| Swift | CFBinaryHeap | |
| C++ | priority_queue | |
| Python | heapq | |
| Go | heap | There is a heap interface. |
Note that when discussing data structures, the term _heap_ is used 4 times more often than _priority queue_.
"array" AND "data structure"
— 17.400.000 results"stack" AND "data structure"
— 12.100.000 results"queue" AND "data structure"
— 3.850.000 results"heap" AND "data structure"
— 1.830.000 results"priority queue" AND "data structure"
— 430.000 results"trie" AND "data structure"
— 335.000 resultsThank you all for the feedback! I’ve updated the proposal to v2.1. Changelog:
Contains
and TryGet
.Contains
or TryGet
method?_IReadOnlyCollection<PriorityQueueNode<TElement, TPriority>>
.bool TryPeek(out TElement element, out TPriority priority)
.bool TryPeek(out TElement element)
.void Dequeue(out TElement element, out TPriority priority)
.void Dequeue(out TElement element)
to PriorityQueueNode<TElement, TPriority> Dequeue()
.bool TryDequeue(out TElement element)
.bool TryDequeue(out PriorityQueueNode<TElement, TPriority> node)
.Dequeue
and TryDequeue
overloads that return PriorityQueueNode
?_Update
or Remove
method receives a node from a different queue?_Thanks for the change notes ;)
Small requests for clarification:
@pgolebiowski Regarding the proposed Merge
method:
public void Merge(PriorityQueue<TElement, TPriority> other); // O(1)
Clearly such an operation would not have copy semantics, so I'm wondering whether there are would be any gotchas around making changes to this
and other
_after_ a merge has been performed (e.g. either instance failing to satisfy the heap property).
@eiriktsarpalis @VisualMelon — Thanks! Will address raised points, ETA 2020-10-04.
If others have any more feedback/questions/concerns/thoughts — please share 😊
The .NET Core community proposes to add to the system library _priority queue_ functionality, a data structure in which each element additionally has a priority associated with it. Specifically, we propose to add PriorityQueue<TElement, TPriority>
to the System.Collections.Generic
namespace.
In our design, we were guided by the following tenets (unless you know better ones):
Conceptually, a priority queue is a collection of elements, where each element has an associated priority. The most important functionality of a priority queue is that it provides efficient access to the element with the highest priority in the collection, and an option to remove that element. Expected behavior may also include: 1) ability to modify the priority of an element that is already in the collection; 2) ability to merge multiple priority queues.
A priority queue is an abstract data structure, i.e. it is a concept with certain behavioral characteristics, as described in the previous section. Most efficient implementations of a priority queue are based on heaps. However, contrary to the general misconception, a heap is also an abstract data structure, and can be realized in various ways, each offering different benefits and disadvantages.
Most software engineers are familiar only with the array-based binary heap implementation — it is the simplest one, but unfortunately not the most efficient one. For general random input, two examples of more efficient heap types are: quaternary heap and pairing heap. For more information on heaps, please refer to Wikipedia and this paper.
Our discussions have demonstrated that the most challenging area in the design, and at the same time with the greatest impact on the API, is the update mechanism. Specifically, the challenge is to determine whether and how the product we want to offer customers should support updating priorities of elements already present in the collection.
Such capability is necessary to implement e.g. Dijkstra’s shortest path algorithm or a job scheduler that needs to handle changing priorities. The update mechanism is missing in Java, which has shown to be disappointing for engineers, e.g. in these three StackOverflow questions viewed over 32k times: example, example, example. To avoid introducing API with such limited value, we believe that a fundamental requirement for the priority queue functionality we deliver would be to support the ability of updating priorities for elements already present in the collection.
To deliver the update mechanism, we must ensure that the customer can be specific about what exactly they want to update. We’ve identified two ways of delivering this: a) via handles; and b) via enforcing uniqueness of elements in the collection. Each of these comes with different benefits and costs.
Option (a): Handles. In this approach, every time an element is added to the queue, the data structure provides its unique handle. If the customer wants to use the update mechanism, they need to keep track of such handles so that they can later unambiguously specify which element they wish to update. The primary cost of this solution is that customers need to manage these pointers. However, it does not mean that there need to be any internal allocations to support handles within the priority queue — any non-array-based heap is based on nodes, where each node is automatically its own handle. As an example, see the API of the PairingHeap.Update method.
Option (b): Uniqueness. This approach puts two additional constraints on the customer: i) elements within the priority queue must conform to certain equality semantics, which brings a new class of potential user bugs; ii) two equal elements cannot be stored in the same queue. By paying this cost, we get the benefit of supporting the update mechanism without resorting to the handle approach. However, any implementation that leverages uniqueness/equality to determine the element to update will require an extra internal mapping, so that it’s done in O(1) and not in O(n).
We recommend to add to the system library a PriorityQueue<TElement, TPriority>
class that supports the update mechanism through handles. The underlying implementation would be a pairing heap.
public class PriorityQueueNode<TElement, TPriority>
{
public TElement Element { get; }
public TPriority Priority { get; }
}
public class PriorityQueue<TElement, TPriority> :
IEnumerable<PriorityQueueNode<TElement, TPriority>>,
IReadOnlyCollection<PriorityQueueNode<TElement, TPriority>>
{
public PriorityQueue();
public PriorityQueue(IComparer<TPriority> comparer);
public IComparer<TPriority> Comparer { get; }
public int Count { get; }
public bool IsEmpty { get; }
public void Clear();
public PriorityQueueNode<TElement, TPriority> Enqueue(TElement element, TPriority priority); //O(log n)
public PriorityQueueNode<TElement, TPriority> Peek(); // O(1)
public bool TryPeek(out PriorityQueueNode<TElement, TPriority> node); // O(1)
public bool TryPeek(out TElement element, out TPriority priority); // O(1)
public bool TryPeek(out TElement element); // O(1)
public PriorityQueueNode<TElement, TPriority> Dequeue(); // O(log n)
public void Dequeue(out TElement element, out TPriority priority); // O(log n)
public bool TryDequeue(out PriorityQueueNode<TElement, TPriority> node); // O(log n)
public bool TryDequeue(out TElement element, out TPriority priority); // O(log n)
public bool TryDequeue(out TElement element); // O(log n)
public void Update(PriorityQueueNode<TElement, TPriority> node, TPriority priority); // O(log n)
public void Remove(PriorityQueueNode<TElement, TPriority> node); // O(log n)
public IEnumerator<PriorityQueueNode<TElement, TPriority>> GetEnumerator();
IEnumerator IEnumerable.GetEnumerator();
public struct Enumerator : IEnumerator<PriorityQueueNode<TElement, TPriority>>
{
public PriorityQueueNode<TElement, TPriority> Current { get; }
object IEnumerator.Current { get; }
public bool MoveNext() => throw new NotImplementedException();
public void Reset() => throw new NotImplementedException();
public void Dispose() => throw new NotImplementedException();
}
}
var queue = new PriorityQueue<Job, double>();
queue.Enqueue(firstJob, 10);
queue.Enqueue(secondJob, 5);
queue.Enqueue(thirdJob, 40);
var withHighestPriority = queue.Peek(); // { element: secondJob, priority: 5 }
var queue = new PriorityQueue<Job, double>();
var mapping = new Dictionary<Job, PriorityQueueNode<Job, double>>();
mapping[job] = queue.Enqueue(job, priority);
queue.Update(mapping[job], newPriority);
In undefined order, so that the enumeration can happen in O(n), similarly as with a HashSet
. No efficient implementation would provide capabilities to enumerate a heap in linear time while ensuring that elements are enumerated in order — this would require O(n log n). Because ordering on a collection can be trivially achieved with .OrderBy(x => x.Priority)
and not every customer cares about enumeration with this order, we believe it is better to provide undefined enumeration order.
Contains
or TryGet
method?Providing such methods has negligible value, because finding an element in a heap requires enumeration of the entire collection, meaning that any Contains
or TryGet
method would be a wrapper around enumeration. Further, to check if an element exists in collection, the priority queue would have to be aware of how to conduct equality checks of TElement
objects, which is not something we believe should fall into the responsibility of a priority queue.
Dequeue
and TryDequeue
overloads that return PriorityQueueNode
?This is for customers who want to use the Update
or Remove
method and keep track of handles. While dequeuing an element from the priority queue, they would receive a handle that they could use to adjust the state of their handle tracking system.
Update
or Remove
method receives a node from a different queue?The priority queue will throw an exception. Each node is aware of the priority queue it belongs to, similarly as a LinkedListNode<T>
is aware of the LinkedList<T>
it belongs to. Further, if a node has been removed from a queue, trying to invoke Update
or Remove
on it will also result with an exception.
Merge
method?Merging two priority queues can be achieved in constant time, which makes it a tempting feature to offer customers. However, we have no data to demonstrate that there is demand for such a functionality, and we cannot justify including it in the public API. Further, the design of such a functionality is non-trivial and, given that this feature may not be needed, it could unnecessarily complicate the API surface and implementation.
Nevertheless, not including the Merge
method now is a 2-way door — if in the future customers do express interest in having the merge functionality supported, it will be possible to extend the PriorityQueue
type. As such, we recommend not to include the Merge
method yet, and to move forward with the launch.
The collection will not provide a stability guarantee out of the box, i.e. if two elements are queued with the same priority, the customer will not be able to assume that they will be dequeued in a certain order. However, if a customer would like to achieve stability using our PriorityQueue
, they could define a TPriority
and the corresponding IComparer<TPriority>
that would ensure it. Further, the data collection will be deterministic, i.e. for a given sequence of operations, it will always behave the same way, allowing for reproducibility.
| Language | Type | Notes |
|:-:|:-:|:-:|
| Java | PriorityQueueAbstractQueue
and implements the interface Queue
. |
| Rust | BinaryHeap | |
| Swift | CFBinaryHeap | |
| C++ | priority_queue | |
| Python | heapq | |
| Go | heap | There is a heap interface. |
Note that when discussing data structures, the term _heap_ is used 4 times more often than _priority queue_.
"array" AND "data structure"
— 17.400.000 results"stack" AND "data structure"
— 12.100.000 results"queue" AND "data structure"
— 3.850.000 results"heap" AND "data structure"
— 1.830.000 results"priority queue" AND "data structure"
— 430.000 results"trie" AND "data structure"
— 335.000 resultsChangelog:
void Merge(PriorityQueue<TElement, TPriority> other) // O(1)
.Merge
method?The new FAQ looks great. I played around with coding Dijkstra against the proposed API, with handle dictionary, and it seemed basically fine.
The one small learning I had from doing this was that the current set of method names/overloads doesn't work so well for implicit typing of out
variables. What I wanted to do with the C# code was TryDequeue(out var node)
- but unfortunately I needed to give the explicit type of out
variable as PriorityQueueNode<>
otherwise the compiler didn't know if I wanted a priority queue node or an element.
var shortestDistances = new Dictionary<int, int>();
var queue = new PriorityQueue<int, int>();
var handles = new Dictionary<int, PriorityQueueNode<int, int>>();
handles[startNode] = queue.Enqueue(startNode, 0);
while (queue.TryDequeue(out PriorityQueueNode<int, int> nextQueueNode))
{
int nodeToExploreFrom = nextQueueNode.Element, minDistance = nextQueueNode.Priority;
shortestDistances.Add(nodeToExploreFrom, minDistance);
handles[nodeToExploreFrom] = null; // so it is clearly already visited
foreach (int nextNode in nodes)
{
int candidatePathDistance = minDistance + edgeDistances[nodeToExploreFrom, nextNode];
if (handles.TryGetValue(nextNode, out var nextNodeHandle))
{
if (nextNodeHandle != null && candidatePathDistance < nextNodeHandle.Priority)
{
queue.Update(nextNodeHandle, candidatePathDistance);
}
// or else... we already got there
}
else
{
handles[nextNode] = queue.Enqueue(nextNode, candidatePathDistance);
}
}
}
The main unresolved design question in my view is whether or not the implementation should support priority updates. I can see three possible paths here:
These approaches are mutually exclusive and have their own sets of trade-offs, respectively:
It's important that we identify which of the above approaches could provide the best value for most of our users. So I've been going through .NET codebases, both Microsoft internal and public, for instances of Heap/PriorityQueue implementations in order to better understand what the most common usage patterns are:
Most applications implement variations of either heap sort or job scheduling. A few instances were used for algorithms such as topological sorting or Huffman coding. A smaller number was used for calculating distances in graphs. Of the 80 PriorityQueue implementations examined, only 9 were found to have implemented some form of priority updates.
At the same time, none of the Heap/PriorityQueue implementations in either Python, Java, C++, Go, Swift, or Rust core libraries support priority updates in their APIs.
In light of this data, it is clear to me that .ΝΕΤ needs a baseline PriorityQueue implementation that exposes the essential API (heapify/push/peek/pop), offers certain performance guarantees (e.g. no additional allocations per enqueue) and does not enforce uniqueness constraints. This implies that the implementation _would not support O(log n) priority updates_.
We should also consider following up with a separate heap implementation that does support _O(log n)_ updates/removals, and uses an equality-based approach. Since this would be a specialized type, the uniqueness requirement shouldn't be much of an issue.
I've been working on prototyping both implementations, and will follow up with an API proposal shortly.
Thank you very much for the analysis, @eiriktsarpalis! I appreciate in particular the time to analyze the internal Microsoft codebase to find relevant use cases and support our discussion with data.
The handle based approach would imply at least one additional allocation per enqueued element.
This assumption is false, you don't need additional allocations in node-based heaps. The pairing heap is more performant for a general random input than the array-based binary heap, meaning that you would have the incentive to use this node-based heap even for a priority queue that does not support updates. You can see benchmarks in the paper I referenced earlier.
Of the 80 PriorityQueue implementations examined, only 9 were found to have implemented some form of priority updates.
Even given this small sample, that's 11-12% of all usages. Also, this may be underrepresented in certain domains like e.g. video games, where I'd expect this percentage to be higher.
In light of this data, it is clear to me that [...]
I don't think such a conclusion is clear, given that one of the key assumptions is false and it's arguable whether 11-12% of the customers is an important enough use case or not. What I'm missing in your assessment is the evaluation of the cost impact of the "data structure supporting updates" for customers who don't care about this mechanism — which to me, this cost is negligible, as they could use the data structure without being affected by the handle mechanism.
Basically:
| | 11-12% use cases | 88-89% use cases |
|:-:|:-:|:-:|
| cares about updates | yes | no |
| is negatively affected by the handles | N/A (they are desired) | no |
| is positively affected by the handles | yes | no |
To me, this is a no-brainer in favor of supporting 100% use cases, not only 88-89%.
This assumption is false, you don't need additional allocations in node-based heaps
If both the priority and the item are value types (or if both are reference types you don't own and/or can't change the base type of), can you link to an implementation that demonstrates no additional allocations being required (or just describe how it's achieved)? That'd be helpful to see. Thanks.
It would be helpful if you could elaborate more, or just say what you're trying to say. I'd need to disambiguate, there would be ping-pong, and that would turn into a lengthy discussion. Alternatively, we could arrange a call.
I'm saying we want to avoid any Enqueue operation requiring an allocation, whether on the part of the caller or the part of the implementation (ammortized internal allocation is fine, e.g. to expand an array used in the implementation). I'm trying to understand how that's possible with a node-based heap (e.g. if those node objects are exposed to the caller, that prohibits pooling by the implementation due to concerns around inappropriate reuse / aliasing). I want to be able to write:
C#
pq.Enqueue(42, 84);
and have that not allocate. How do the implementations you refer to achieve that?
or just say what you're trying to say
I thought I was.
we want to avoid any Enqueue operation requiring an allocation [...] I want to be able to write:
pq.Enqueue(42, 84);
and have that not allocate.
Where does this desire come from? It's a nice to have side effect of a solution, not a requirement that 99.9% customers need to have satisfied. I don't see why you would choose this low-impact dimension to make design choices between solutions.
We're not making design choices based on optimizations for the 0.1% of customers if these negatively impact 12% of the customers in another dimension. "caring about no allocations" + "dealing with two value types" is an edge case.
I find the dimension of supported behavior/functionality far more important, especially when designing a general-purpose versatile data structure for a wide audience and a variety of use cases.
Where does this desire come from?
From wanting the core collection types to be usable in scenarios that care about performance. You say the node-based solution would support 100% of use cases: that is not the case if every enqueue allocates, just as List<T>
, Dictionary<TKey, TValue>
, HashSet<T>
, and so on would become unusable in many situations if they allocated on every Add.
Why do you believe it's only "0.1%" that care about the allocation overheads of these methods? From where is that data coming?
"caring about no allocations" + "dealing with two value types" is an edge case
It is not. It's also not just "two value types". As I understand it, the proposed solution would require either a) an allocation on every enqueue regardless of the Ts involved, or b) would require the element type to derive from some known base type which in turn prohibits a large number of possible uses to avoid extra allocation.
@eiriktsarpalis
So you don't forget any options, I think there is a feasible option 4 to add to options 1, 2, and 3, in your list which is a compromise:
We might still decide that because there's extra performance available to the 88% or 12% from an implementation which doesn't need an updatable data structure, or is optimized for one in the first place, its better to provide options 2 and 3, than option 4. But I thought we should not forget another option exists.
[Or I suppose you could just see this as a better option 1 and update the description of 1 to say that bookkeeping is not forced, but lazy, and correct equatable behavior is only required when updates are used...]
@stephentoub That's exactly what I had in mind about saying simply what you want to say, thank you :)
Why do you believe it's only 0.1% that care about the allocation overheads of these methods? From where is that data coming?
From intuition, i.e. the same source based on which you believe it's more important to prioritize "no additional allocations" over "ability to conduct updates". At least for the update mechanism, we have the data that 11-12% customers do need to have this behavior supported. I don't think remotely close customers would care about "no additional allocations".
In either case, you are for some reason choosing to fixate on the memory dimension, forgetting about other dimensions, e.g. raw speed, which is another trade-off for your preferred approach. An array-based implementation providing "no additional allocations" would be slower than a node-based implementation. Again, I think it's arbitrary here to prioritize memory over speed.
Let's take a step back and focus on what the customers want. We have a design choice that may or may not make the data structure unusable for 12% of the customers. I think we would need to be very careful with providing reasons for why we would choose not to support these.
An array-based implementation providing "no additional allocations" would be slower than a node-based implementation.
Please share the two C# implementations you're using to perform that comparison and the benchmarks used to come to that conclusion. Theoretical papers are certainly valuable but they are only one small piece of the puzzle. The more important thing is when the rubber meets the road, factoring in the details of the given platform and given implementations, and you're able to validate on the specific platform with the specific implementation and typical/expected data sets / usage patterns. It could very well be that your assertion is correct. It also may not be. I'd like to see the implementations / data to understand better.
Please share the two C# implementations you're using to perform that comparison and the benchmarks used to come to that conclusion
This is a valid point, the paper I quote only compares and benchmarks implementations in C++. It conducts multiple benchmarks with different data sets and usage patterns. I am quite confident this would be transferable to C#, but if you believe this is something we need to double down, I think the best course of action would be for you to ask a colleague to conduct such a study.
@pgolebiowski I would be interested in better understanding the nature of your objection. The proposal advocates for two separate types, would that not cover your requirements?
- an implementation that supports the 12% use case, while also nearly-optimizing for the other 88% by allowing updates to elements that are equatable, and only lazily building the lookup table required to do those updates the first time an update method is called (and updating it on subsequence updates+removals). Therefore incurring less cost for apps which don't use the functionality.
I would probably classify that as a performance optimization for option 1, however I do see a couple of issues with that particular approach:
@eiriktsarpalis Its only O(n) once, and O(1) afterwards, which is O(1) amortized. And you can postpone validating uniqueness until the first update. But maybe this is overly clever. Two classes is easier to explain.
I've spent the last few days prototyping two PriorityQueue implementations: a basic implementation without update support and an implementation that supports updates using element equality. I've named the former PriorityQueue
and the latter, for lack of a better name, PrioritySet
. My goal is to gauge API ergonomics and compare performance.
The implementations can be found in this repo. Both classes are implemented using array-based quad heaps. The updatable implementation also uses a dictionary that maps elements to internal heap indices.
namespace System.Collections.Generic
{
public class PriorityQueue<TElement, TPriority> : IReadOnlyCollection<(TElement Element, TPriority Priority)>
{
// Constructors
public PriorityQueue();
public PriorityQueue(int initialCapacity);
public PriorityQueue(IComparer<TPriority>? comparer);
public PriorityQueue(int initialCapacity, IComparer<TPriority>? comparer);
public PriorityQueue(IEnumerable<(TElement Element, TPriority Priority)> values);
public PriorityQueue(IEnumerable<(TElement Element, TPriority Priority)> values, IComparer<TPriority>? comparer);
// Properties
public int Count { get; }
public IComparer<TPriority> Comparer { get; }
// O(log(n)) push operation
public void Enqueue(TElement element, TPriority priority);
// O(1) peek operations
public TElement Peek();
public bool TryPeek(out TElement element, out TPriority priority);
// O(log(n)) pop operations
public TElement Dequeue();
public bool TryDequeue(out TElement element, out TPriority priority);
// Combined push/pop, generally more efficient than sequential Enqueue();Dequeue() calls.
public TElement EnqueueDequeue(TElement element, TPriority priority);
public void Clear();
public Enumerator GetEnumerator();
public struct Enumerator : IEnumerator<(TElement Element, TPriority Priority)>, IEnumerator;
}
}
Here's a basic sample using the type
var queue = new PriorityQueue<string, int>();
queue.Enqueue("John", 1940);
queue.Enqueue("Paul", 1942);
queue.Enqueue("George", 1943);
queue.Enqueue("Ringo", 1940);
Assert.Equal("John", queue.Dequeue());
Assert.Equal("Ringo", queue.Dequeue());
Assert.Equal("Paul", queue.Dequeue());
Assert.Equal("George", queue.Dequeue());
namespace System.Collections.Generic
{
public class PrioritySet<TElement, TPriority> : IReadOnlyCollection<(TElement Element, TPriority Priority)> where TElement : notnull
{
// Constructors
public PrioritySet();
public PrioritySet(int initialCapacity);
public PrioritySet(IComparer<TPriority> comparer);
public PrioritySet(int initialCapacity, IComparer<TPriority>? priorityComparer, IEqualityComparer<TElement>? elementComparer);
public PrioritySet(IEnumerable<(TElement Element, TPriority Priority)> values);
public PrioritySet(IEnumerable<(TElement Element, TPriority Priority)> values, IComparer<TPriority>? comparer, IEqualityComparer<TElement>? elementComparer);
// Members shared with baseline PriorityQueue implementation
public int Count { get; }
public IComparer<TPriority> Comparer { get; }
public void Enqueue(TElement element, TPriority priority);
public TElement Peek();
public bool TryPeek(out TElement element, out TPriority priority);
public TElement Dequeue();
public bool TryDequeue(out TElement element, out TPriority priority);
public TElement EnqueueDequeue(TElement element, TPriority priority);
// Update methods and friends
public bool Contains(TElement element); // O(1)
public bool TryRemove(TElement element); // O(log(n))
public bool TryUpdate(TElement element, TPriority priority); // O(log(n))
public void EnqueueOrUpdate(TElement element, TPriority priority); // O(log(n))
public void Clear();
public Enumerator GetEnumerator();
public struct Enumerator : IEnumerator<(TElement Element, TPriority Priority)>, IEnumerator;
}
}
I wrote a simple heapsort benchmark that compares the two implementations in their most basic application. I also included a sort benchmark that uses Linq for comparison:
BenchmarkDotNet=v0.12.1, OS=ubuntu 20.04
AMD EPYC 7452, 1 CPU, 4 logical and 2 physical cores
.NET Core SDK=5.0.100-rc.2.20479.15
[Host] : .NET Core 5.0.0 (CoreCLR 5.0.20.47505, CoreFX 5.0.20.47505), X64 RyuJIT
DefaultJob : .NET Core 5.0.0 (CoreCLR 5.0.20.47505, CoreFX 5.0.20.47505), X64 RyuJIT
| Method | Size | Mean | Error | StdDev | Ratio | RatioSD | Gen 0 | Gen 1 | Gen 2 | Allocated |
|-------------- |------ |-------------:|-----------:|-----------:|------:|--------:|--------:|--------:|--------:|----------:|
| LinqSort | 30 | 1.439 us | 0.0072 us | 0.0064 us | 1.00 | 0.00 | 0.0095 | - | - | 672 B |
| PriorityQueue | 30 | 1.450 us | 0.0085 us | 0.0079 us | 1.01 | 0.01 | - | - | - | - |
| PrioritySet | 30 | 2.778 us | 0.0217 us | 0.0192 us | 1.93 | 0.02 | - | - | - | - |
| | | | | | | | | | | |
| LinqSort | 300 | 24.727 us | 0.1032 us | 0.0915 us | 1.00 | 0.00 | 0.0305 | - | - | 3912 B |
| PriorityQueue | 300 | 29.510 us | 0.0995 us | 0.0882 us | 1.19 | 0.01 | - | - | - | - |
| PrioritySet | 300 | 47.715 us | 0.4455 us | 0.4168 us | 1.93 | 0.02 | - | - | - | - |
| | | | | | | | | | | |
| LinqSort | 3000 | 412.015 us | 1.5495 us | 1.3736 us | 1.00 | 0.00 | 0.4883 | - | - | 36312 B |
| PriorityQueue | 3000 | 491.722 us | 4.1463 us | 3.8785 us | 1.19 | 0.01 | - | - | - | - |
| PrioritySet | 3000 | 677.959 us | 3.1996 us | 2.4981 us | 1.64 | 0.01 | - | - | - | - |
| | | | | | | | | | | |
| LinqSort | 30000 | 5,223.560 us | 11.9077 us | 9.9434 us | 1.00 | 0.00 | 93.7500 | 93.7500 | 93.7500 | 360910 B |
| PriorityQueue | 30000 | 5,688.625 us | 53.0746 us | 49.6460 us | 1.09 | 0.01 | - | - | - | 2 B |
| PrioritySet | 30000 | 8,124.306 us | 39.9498 us | 37.3691 us | 1.55 | 0.01 | - | - | - | 4 B |
As can be expected, the overhead of tracking element locations adds a significant performance hit, roughly 40-50% slower compared to the baseline implementation.
I appreciate all the effort, I see it took a lot of time and energy.
tl;dr:
(priority: ComputePriority(element), element)
into a PQ, and my priority-getter function would simply be tuple => tuple.priority
.I work in high-performance search (motion-planning) & computational geometry code (e.g. sweepline algorithms) that is relevant to both robotics and games, I use a lot of custom hand-rolled priority queues. The other common use-case I have is a Top-K query where updatable priority is not useful.
Some feedback regarding the two-implementations (yes vs no update support) debate.
Naming:
Performance:
APIs:
Func<TElement, TPriority>
rather than insert with priority. If computing priority is expensive, I can simply insert (priority, element)
and provide a func tuple => tuple.priority
That being said, for a significant number of my algorithms, my priority queue items contain their priorities, and storing that twice (once in the PQ, once in the items) sounds inefficient. This is especially the case if I am ordering by numerous keys (similar use-case to OrderBy.ThenBy.ThenBy). This API also cleans a lot of the inconsistencies where Insert takes a priority but Peek doesn't return it.
Finally, it is worth noting that I often insert indices of an array into a Priority Queue, rather than array elements themselves. This is supported by all APIs discussed so far, though. As an example, if I am processing the beginning/ends of intervals on a x-axis line, I might have priority queue events (x, isStartElseEnd, intervalId)
and order by x then by isStartElseEnd. This is often because I have other data structures that map from an index to some computed data.
@pgolebiowski I took the liberty of including your proposed pairing heap implementation to the benchmarks, just so we can compare instances of all three approaches directly. Here are the results:
| Method | Size | Mean | Error | StdDev | Median | Gen 0 | Gen 1 | Gen 2 | Allocated |
|-------------- |-------- |-------------------:|-----------------:|-----------------:|-------------------:|----------:|------:|------:|-----------:|
| PriorityQueue | 10 | 774.7 ns | 3.30 ns | 3.08 ns | 773.2 ns | - | - | - | - |
| PrioritySet | 10 | 1,643.0 ns | 3.89 ns | 3.45 ns | 1,642.8 ns | - | - | - | - |
| PairingHeap | 10 | 1,660.2 ns | 14.11 ns | 12.51 ns | 1,657.2 ns | 0.0134 | - | - | 960 B |
| PriorityQueue | 50 | 6,413.0 ns | 14.95 ns | 13.99 ns | 6,409.5 ns | - | - | - | - |
| PrioritySet | 50 | 12,193.1 ns | 35.41 ns | 29.57 ns | 12,188.3 ns | - | - | - | - |
| PairingHeap | 50 | 13,955.8 ns | 193.36 ns | 180.87 ns | 13,989.2 ns | 0.0610 | - | - | 4800 B |
| PriorityQueue | 150 | 27,402.5 ns | 76.52 ns | 71.58 ns | 27,410.2 ns | - | - | - | - |
| PrioritySet | 150 | 48,485.8 ns | 160.22 ns | 149.87 ns | 48,476.3 ns | - | - | - | - |
| PairingHeap | 150 | 56,951.2 ns | 190.52 ns | 168.89 ns | 56,953.6 ns | 0.1831 | - | - | 14400 B |
| PriorityQueue | 500 | 124,933.7 ns | 429.20 ns | 380.48 ns | 124,824.4 ns | - | - | - | - |
| PrioritySet | 500 | 206,310.0 ns | 433.97 ns | 338.81 ns | 206,319.0 ns | - | - | - | - |
| PairingHeap | 500 | 229,423.9 ns | 3,213.33 ns | 2,848.53 ns | 230,398.7 ns | 0.4883 | - | - | 48000 B |
| PriorityQueue | 1000 | 284,481.8 ns | 475.91 ns | 445.16 ns | 284,445.6 ns | - | - | - | - |
| PrioritySet | 1000 | 454,989.4 ns | 3,712.11 ns | 3,472.31 ns | 455,354.0 ns | - | - | - | - |
| PairingHeap | 1000 | 459,049.3 ns | 1,706.28 ns | 1,424.82 ns | 459,364.9 ns | 0.9766 | - | - | 96000 B |
| PriorityQueue | 10000 | 3,788,802.4 ns | 11,715.81 ns | 10,958.98 ns | 3,787,811.9 ns | - | - | - | 1 B |
| PrioritySet | 10000 | 5,963,100.4 ns | 26,669.04 ns | 22,269.86 ns | 5,950,915.5 ns | - | - | - | 2 B |
| PairingHeap | 10000 | 6,789,719.0 ns | 134,453.01 ns | 265,397.13 ns | 6,918,392.9 ns | 7.8125 | - | - | 960002 B |
| PriorityQueue | 1000000 | 595,059,170.7 ns | 4,001,349.38 ns | 3,547,092.00 ns | 595,716,610.5 ns | - | - | - | 4376 B |
| PrioritySet | 1000000 | 1,592,037,780.9 ns | 13,925,896.05 ns | 12,344,944.12 ns | 1,591,051,886.5 ns | - | - | - | 288 B |
| PairingHeap | 1000000 | 1,858,670,560.7 ns | 36,405,433.20 ns | 59,815,170.76 ns | 1,838,721,629.0 ns | 1000.0000 | - | - | 96000376 B |
~In light of the above I still believe there are valid trade-offs between the baseline array heap and the pairing heap approach~.
EDIT: updated the results after a bugfix in my benchmarks, thanks @VisualMelon
@eiriktsarpalis Your benchmark for the PairingHeap
is, I think, wrong: the parameters to Add
are the wrong way around. When you swap them around, it's a different story: https://gist.github.com/VisualMelon/00885fe50f7ab0f4ae5cd1307312109f
(I did exactly the same thing when I first implemented it)
Note that this doesn't mean the pairing heap is faster or slower, rather it seems to depend heavily on the distribution/order of the data supplied.
@eiriktsarpalis re: the usefulness of PrioritySet...
We shouldn't expect updatable to be anything other than slower for heapsort, since it has no priority updates in the scenario. (Also for heapsort, its likely you even want to keep duplicates, a set is just not appropriate.)
The litmus test for seeing whether PrioritySet is useful should be benchmarking algorithms which use priority updates, versus a non-updating implementation of same algorithm, enqueueing the duplicate values and ignoring duplicates when dequeueing.
Thanks @VisualMelon, I updated my results and comments after your suggested fix.
rather it seems to depend heavily on the distribution/order of the data supplied.
I believe it might have benefited from the fact that the enqueued priorities were monotonic.
The litmus test for seeing whether PrioritySet is useful should be benchmarking algorithms which use priority updates, versus a non-updating implementation of same algorithm, enqueueing the duplicate values and ignoring duplicates when dequeueing.
@TimLovellSmith my goal here was to measure performance for the most common PriorityQueue application: rather than measure the performance of updates, I wanted to see the impact for the case where updates are not needed at all. It might however make sense to produce a separate benchmark that compares pairing heap with "PrioritySet" updates.
@miyu thanks for your detailed feedback, it is very much appreciated!
@TimLovellSmith I wrote a simple benchmark that uses updates:
| Method | Size | Mean | Error | StdDev | Median | Gen 0 | Gen 1 | Gen 2 | Allocated |
|------------ |-------- |---------------:|---------------:|---------------:|---------------:|-------:|------:|------:|-----------:|
| PrioritySet | 10 | 1.052 us | 0.0106 us | 0.0099 us | 1.055 us | - | - | - | - |
| PairingHeap | 10 | 1.055 us | 0.0042 us | 0.0035 us | 1.055 us | 0.0057 | - | - | 480 B |
| PrioritySet | 50 | 7.394 us | 0.0527 us | 0.0493 us | 7.380 us | - | - | - | - |
| PairingHeap | 50 | 8.587 us | 0.1678 us | 0.1570 us | 8.634 us | 0.0305 | - | - | 2400 B |
| PrioritySet | 150 | 27.522 us | 0.0459 us | 0.0359 us | 27.523 us | - | - | - | - |
| PairingHeap | 150 | 32.045 us | 0.1076 us | 0.1007 us | 32.019 us | 0.0610 | - | - | 7200 B |
| PrioritySet | 500 | 109.097 us | 0.6548 us | 0.6125 us | 109.162 us | - | - | - | - |
| PairingHeap | 500 | 131.647 us | 0.5401 us | 0.4510 us | 131.588 us | 0.2441 | - | - | 24000 B |
| PrioritySet | 1000 | 238.184 us | 1.0282 us | 0.9618 us | 238.457 us | - | - | - | - |
| PairingHeap | 1000 | 293.236 us | 0.9396 us | 0.8789 us | 293.257 us | 0.4883 | - | - | 48000 B |
| PrioritySet | 10000 | 3,035.982 us | 12.2952 us | 10.8994 us | 3,036.985 us | - | - | - | 1 B |
| PairingHeap | 10000 | 3,388.685 us | 16.0675 us | 38.1861 us | 3,374.565 us | - | - | - | 480002 B |
| PrioritySet | 1000000 | 841,406.888 us | 16,788.4775 us | 15,703.9522 us | 840,888.389 us | - | - | - | 288 B |
| PairingHeap | 1000000 | 989,966.501 us | 19,722.6687 us | 30,705.8191 us | 996,075.410 us | - | - | - | 48000448 B |
On a separate note, has their been discussion/feedback on the lack of stability being an issue (or non-issue) for people's usecases?
has their been discussion/feedback on the lack of stability being an issue (or non-issue) for people's usecase
None of the implementations guarantee stability, however it should be pretty straightforward for users to obtain stability by augmenting the ordinal with insertion order:
var pq = new PriorityQueue<string, (int priority, int insertionCount)>();
int insertionCount = 0;
foreach (string element in elements)
{
int priority = 42;
pq.Enqueue(element, (priority, insertionCount++));
}
To summarize some of my previous posts, I've been trying to identify what a popular .NET priority queue would look like, so I went through the following data:
Which yielded the following takeaways:
Going forward, I propose we take the following actions for .NET 6:
Introduce a System.Collections.Generic.PriorityQueue
class that is simple, caters to the majority of our user requirements and is as efficient as possible. It'll use an array-backed quaternary heap and will not support priority updates. A prototype of the implementation can be found here. I will be creating a separate issue detailing the API proposal shortly.
We recognize the need for heaps that support efficient priority updates, so will keep working towards introducing a specialized class that addresses this requirement. We are evaluating a couple of prototypes [1,2] each with their own sets of trade-offs. My recommendation would be to introduce this type at a later stage, since more work is needed to finalize the design.
At this moment, I would like to thank the contributors to this thread, in particular @pgolebiowski and @TimLovellSmith. Your feedback has played a huge role in guiding our design process. I hope to keep receiving your input as we iron out the design for the updatable priority queue.
Going forward, I propose we take the following actions for .NET 6: [...]
Sounds good :)
Introduce a
System.Collections.Generic.PriorityQueue
class that is simple, caters to the majority of our user requirements and is as efficient as possible. It'll use an array-backed quaternary heap and will not support priority updates.
If we have the decision by the codebase owners that this direction is approved and desired, can I continue leading the API design for that bit and provide the final implementation?
At this moment, I would like to thank the contributors to this thread, in particular @pgolebiowski and @TimLovellSmith. Your feedback has played a huge role in guiding our design process. I hope to keep receiving your input as we iron out the design for the updatable priority queue.
It was quite a journey :D
The API for System.Collections.Generic.PriorityQueue<TElement, TPriority>
has just been approved. I've created a separate issue to continue our conversation on a potential heap implementation that supports priority updates.
I'm going to close this issue, thank you all for your contributions!
Maybe someone can do a write up on this journey! A whole 6 years for one API. :) Any chance to win a Guinness?
Most helpful comment
heap data structure is a MUST for doing leetcode
more leetcode, more c# code interview which means more c# developers.
more developers means better ecosystem.
better ecosystem means that if we can still program in c# tomorrow.
in sum: this is not only a feature, but also the future. that is why is issue is labeled 'future'.