Adding a priority queue has been a feature long requested by the community, however so far we've fallen short of finalizing a design, primarily due to a lack of consensus on certain high-level features: namely, the support for priority updates and any API or performance considerations this feature entails.
To better identify what a popular .NET priority queue should look like, I have been going through .NET codebases for common usage patterns as well as running benchmarks against various prototypes. The key takeaway is that _90% of the use cases examined do not require priority updates_. It also turns out that implementations without update support can be 2-3x faster compared to ones that do support it. Please refer to the original issue for more details on my investigations.
This issue presents a finalized priority queue API proposal, informed by the above findings. The goal is to introduce a PriorityQueue class that has a simple design, caters to the majority of our users' requirements and is as efficient as possible.
A prototype implementing the proposal can be found here.
PriorityQueue instead of Heap.Queue<T>, PriorityQueue cannot efficiently enumerate elements by ascending priority. We therefore expose the enumerable as a separate UnorderedItems property, to more effectively communicate this behaviour.namespace System.Collections.Generic
{
public class PriorityQueue<TElement, TPriority> // NB does not implement IEnumerable or ICollection
{
/// <summary>
/// Creates an empty PriorityQueue instance.
/// </summary>
public PriorityQueue();
/// <summary>
/// Creates a PriorityQueue instance with specified initial capacity in its backing array.
/// </summary>
public PriorityQueue(int initialCapacity);
/// <summary>
/// Creates a PriorityQueue instance with specified priority comparer.
/// </summary>
public PriorityQueue(IComparer<TPriority>? comparer);
public PriorityQueue(int initialCapacity, IComparer<TPriority>? comparer);
/// <summary>
/// Creates a PriorityQueue populated with the specified values and priorities.
/// </summary>
public PriorityQueue(IEnumerable<(TElement Element, TPriority Priority)> values);
public PriorityQueue(IEnumerable<(TElement Element, TPriority Priority)> values, IComparer<TPriority>? comparer);
/// <summary>
/// Gets the current element count in the queue.
/// </summary>
public int Count { get; }
/// <summary>
/// Gets the priority comparer of the queue.
/// </summary>
public IComparer<TPriority> Comparer { get; }
/// <summary>
/// Enqueues the element with specified priority.
/// </summary>
public void Enqueue(TElement element, TPriority priority);
/// <summary>
/// Gets the element with minimal priority, if it exists.
/// </summary>
/// <exception cref="InvalidOperationException">The queue is empty.</exception>
public TElement Peek();
/// <summary>
/// Dequeues the element with minimal priority, if it exists.
/// </summary>
/// <exception cref="InvalidOperationException">The queue is empty.</exception>
public TElement Dequeue();
/// <summary>
/// Try-variants of Dequeue and Peek methods.
/// </summary>
public bool TryDequeue([MaybeNullWhen(false)] out TElement element, [MaybeNullWhen(false)] out TPriority priority);
public bool TryPeek([MaybeNullWhen(false)] out TElement element, [MaybeNullWhen(false)] out TPriority priority);
/// <summary>
/// Combined enqueue/dequeue operation, generally more efficient than sequential Enqueue/Dequeue calls.
/// </summary>
public TElement EnqueueDequeue(TElement element, TPriority priority);
/// <summary>
/// Enqueues a sequence of element/priority pairs to the queue.
/// </summary>
public void EnqueueRange(IEnumerable<(TElement Element, TPriority Priority)> values);
/// <summary>
/// Enqueues a sequence of elements with provided priority.
/// </summary>
public void EnqueueRange(IEnumerable<TElement> values, TPriority priority);
/// <summary>
/// Removes all objects from the PriorityQueue.
/// </summary>
public void Clear();
/// <summary>
/// Ensures that the PriorityQueue can hold the specified capacity and resizes its underlying buffer if necessary.
/// </summary>
public void EnsureCapacity(int capacity);
/// <summary>
/// Sets capacity to the actual number of elements in the queue, if that is less than 90 percent of current capacity.
/// </summary>
public void TrimExcess();
/// <summary>
/// Gets a collection that enumerates the elements of the queue.
/// </summary>
public UnorderedItemsCollection UnorderedItems { get; }
public class UnorderedItemsCollection : IReadOnlyCollection<(TElement Element, TPriority Priority)>, ICollection
{
public struct Enumerator : IEnumerator<(TElement TElement, TPriority Priority)>, IEnumerator { }
public Enumerator GetEnumerator();
}
}
}
``` C#
var pq = new PriorityQueue
pq.Enqueue("John", 1940);
pq.Enqueue("Paul", 1942);
pq.Enqueue("George", 1943);
pq.Enqueue("Ringo", 1940);
Assert.Equal("John", pq.Dequeue());
Assert.Equal("Ringo", pq.Dequeue());
Assert.Equal("Paul", pq.Dequeue());
Assert.Equal("George", pq.Dequeue());
```
We recognize the need for heaps that support efficient priority updates, so we will consider introducing a separate specialized class that addresses this requirement at a later stage. Please refer to the original issue for a more in-depth analysis on the alternatives.
KeyValuePair<TPriority, TElement> instead of (TPriority Priority, TElement Element)?bool Contains(TElement element); method?void Remove(TElement element); method?Tagging subscribers to this area: @eiriktsarpalis, @jeffhandley
See info in area-owners.md if you want to be subscribed.
Priority ordinals are passed as separate values (rather than being encapsulated by the element)
How come? Is it common to use a heap without having a custom node type? I'm just surprised by this one as I've always had the node perform comparisons.
public class PriorityQueue
: IReadOnlyCollection<(TElement Element, TPriority Priority)>
Are there any constraints on the type parameters? e.g. where TPriority : notnull?
public bool TryDequeue(out TElement element, out TPriority priority);
public bool TryPeek(out TElement element, out TPriority priority);
These should have [MaybeNullWhen(false)] on both arguments.
EnqueueDequeue
Can you share an example where this is used? And it's more efficient because, if it's based on a heap, it only has to sift once instead of twice?
: IReadOnlyCollection<(TElement Element, TPriority Priority)>
Not that LINQ has some optimizations based on the non-generic ICollection.Count. I think this would the (or one of the) first prominent collection types that didn't implement it. Queue<T>, for example, implements both IReadOnlyCollection<T> and the non-generic ICollection. Did you consider implementing it? Or as part of this are we going to revisit IReadOnlyCollection<T> specialization in LINQ? Or... maybe it doesn't matter?
Should enumeration of elements be sorted by priority?
Can that be done without O(N) additional space?
Priority ordinals are passed as separate values (rather than being encapsulated by the element)
How come? Is it common to use a heap without having a custom node type? I'm just surprised by this one as I've always had the node perform comparisons.
I think it is pretty common actually. When investigating our codebases I found many PQ implementations passing priority as a separate parameter. However even for queues ordering on the elements themselves, it was very common for consumers to populate the type parameter with a tuple containing an element and a priority, then use a custom comparer that projected to the priority. To me this seems like we might be forcing additional boilerplate just to encapsulate the priority.
I think the real issue here is performance, since you end up copying more stuff. In practice however I found the perf impact to be negligible, but it could still be a problem if your ordinal is a large struct.
public class PriorityQueue
: IReadOnlyCollection<(TElement Element, TPriority Priority)> Are there any constraints on the type parameters? e.g. where TPriority : notnull?
Probably not. Comparisons are done using the provided IComparer<TPriority> instance, and the default comparers seem perfectly capable of handling null parameters.
EnqueueDequeue
Can you share an example where this is used? And it's more efficient because, if it's based on a heap, it only has to sift once instead of twice?
Correct. It's also _O(1)_ if the enqueued operation is less or equal to the min element and is always guaranteed to succeed. The standard term for that operation is push-pop and is available in the python heap implementation.
: IReadOnlyCollection<(TElement Element, TPriority Priority)>
Not that LINQ has some optimizations based on the non-generic ICollection.Count. I think this would the (or one of the) first prominent collection types that didn't implement it. Queue
, for example, implements both IReadOnlyCollection and the non-generic ICollection. Did you consider implementing it? Or as part of this are we going to revisit IReadOnlyCollection specialization in LINQ? Or... maybe it doesn't matter?
I was considering ICollection<T> but it was not clear to me how to best implement the Contains() and Remove() methods. ICollection is probably fine though, will update the API proposal.
Should enumeration of elements be sorted by priority?
Can that be done without O(N) additional space?
Not to my knowledge. I thought I'd raise this point though, since we had customer feedback stating that unsorted enumeration felt counterintuitive (given how Queue<T> behaves).
The standard term for that operation is push-pop and is available in the python heap implementation.
Can you share an example of an algorithm that uses it?
Can that be done without O(N) additional space?
Not to my knowledge. I thought I'd raise this point though, since we had customer feedback stating that unsorted enumeration felt counterintuitive (given how Queue
behaves).
I think a struct-based Enumerator allocating O(N) space would also be counterintuitive ;-)
The standard term for that operation is push-pop and is available in the python heap implementation.
Can you share an example of an algorithm that uses it?
In python it is commonly used in algorithms that need to keep the heap within a given capacity, for example when calculating _k_ closest points.
Transcribed to C# it might look as follows:
```C#
public static IEnumerable
{
var pq = new PriorityQueue
for (int i = k; i < inputs.Length; i++)
{
TValue value = inputs[i];
pq.EnqueueDequeue(value, calculateSize(value));
}
return pq.Select(x => x.Element);
}
```
I see. Such use doesn't actually care about the result of the dequeue, just that the min is replaced.
Correct. Happy to discuss a better method name, I simply adapted pushpop to our terminology.
Happy to discuss a better method name
Seems ok
Do you envision a future ConcurrentPriorityQueue ?
Do you envision a future ConcurrentPriorityQueue ?
Not at present. We'd need significant evidence that there was both a) a strong need for it and b) a significantly better implementation than just lock'ing around each operation (and not just for machines with a bazillion cores).
Sets capacity to the actual number of elements in the queue, if that is less than 90 percent of current capacity.
Should we not allow the user to pick that policy? They can easily perform this check themselves with a percentage of their liking. With this design, it is not possible to use a percentage above 90%.
Would sorted enumeration be more efficient than the user sorting after enumeration? If yet it might be worth it to provide a EnumerateOrdered method.
A ToArray method seems useful.
Should we not allow the user to pick that policy? They can easily perform this check themselves with a percentage of their liking. With this design, it is not possible to use a percentage above 90%.
That's literally replicating the logic of the Queue.TrimExcess method. I'm not sure why the 90% threshold was chosen there.
Would sorted enumeration be more efficient than the user sorting after enumeration? If yet it might be worth it to provide a EnumerateOrdered method.
Maybe, but users might assume it's a cheap operation. I'd rather they explicitly pay the price by sorting the elements.
A ToArray method seems useful.
~Sounds reasonable, will update the proposal.~
On second thought, it is not common for key/value pair collections to have a ToArray() method.
If your TElement is a class/struct that contains its priority as a field, is there a suggested pattern to avoid storing the priority twice? (is it dumb/slow to have TPriority = TElement and then use a custom comparer to project the priority field?)
If your TElement is a class/struct that contains its priority as a field, is there a suggested pattern to avoid storing the priority twice? (is it dumb/slow to have TPriority = TElement and then use a custom comparer to project the priority field?)
Good question. You wouldn't be able to avoid storing the priority twice. However, in my benchmarks I found that this duplication has negligible performance impact (I presume though that the story might be different if the priority type is a large struct).
I recently ran an investigation of codebases that employ PriorityQueues, and found the following to be true:
The design proposed here therefore is a trade-off between duplication (for the cases where elements do contain their priorities) and API simplicity (not requiring users to write wrapper types and custom comparers).
Can I ask why not having both types?
I mean, one as PriorityQueue<TElement, TPriority> and other as PriorityQueue<T>, where T is both the element and the priority.
Both implementations would be quite similar, so it wouldn't be much effort to have both.
Can I ask why not having both types?
I don't think the benefits outweigh the effort of designing and maintaining two almost identical types.
Is it considered acceptable and "a good idea" to put value tuples in the public surface area? ((TElement Element, TPriority Priority)). An alternative could be a custom struct. It would be a pretty simple DTO type.
Is it considered acceptable and "a good idea" to put value tuples in the public surface area?
There is precedent, e.g. the Enumerable.Zip method. Alternatively we could just use KeyValuePair<TPriority, TElement>, but I felt that "Key" might not be a name users necessarily associate with priority (even though it is sometimes referred to as that in certain heap implementations).
Not a stable queue (elements enqueued with equal priority not guaranteed to be dequeued in the same order).
I don't think you can call it a queue then. Even in your sample usage you asserted that the item inserted at a given priority (or birth year, whatever) first came out first.
I don't think you can call it a queue then.
Does the name 'queue' necessarily imply FIFO? There is precedent in the priority queues defined in the Java and C++ standard libraries which are also not FIFO.
Does the name 'queue' necessarily imply FIFO?
Wikipedia says so :smile::
In computer science, a queue is a collection of entities that are maintained in a sequence and can be modified by the addition of entities at one end of the sequence and the removal of entities from the other end of the sequence. ... The operations of a queue make it a first-in-first-out (FIFO) data structure.
The Oxford Languages dictionary also agrees:
- (Computing) a list of data items, commands, etc., stored so as to be retrievable in a definite order, usually the order of insertion.
While that one tries to avoid locking it to FIFO with "usually" it does say that the order is "definite".
Admittedly, Wikipedia for Priority queue is less prescriptive:
In some implementations, if two elements have the same priority, they are served according to the order in which they were enqueued, while in other implementations, ordering of elements with the same priority is undefined.
In a system where the queue gets reliably drained to empty there's not a problem, but in a "we got a surge and then maintain steady state for a long time" situation having one of the oldest items sitting around and aging while newer items at the same priority get processed feels bad to me. Since I was just dealing with phone-based customer service, I'll say "I certainly hope no customer service phone system uses a priority queue that doesn't maintain ordering at relative priorities".
Is it considered acceptable and "a good idea" to put value tuples in the public surface area?
There is precedent, e.g. the
Enumerable.Zipmethod. Alternatively we could just useKeyValuePair<TPriority, TElement>, but I felt that "Key" might not be a name users necessarily associate with priority (even though it is sometimes referred to as that in certain heap implementations).
Seems like it should be OK now, since this isn't going to be used with Framework etc. it's certainly more intuitive
is this lib going to multitarget ns2.0/2.1, or purely NET5 ?
My point is that the term 'queue' doesn't seem to carry strong connotations of FIFO-like behaviour. For example, the javadoc for queues contains a rather loose interpretation:
Queues typically, but do not necessarily, order elements in a FIFO (first-in-first-out) manner. Among the exceptions are priority queues, which order elements according to a supplied comparator, or the elements' natural ordering, and LIFO queues (or stacks) which order the elements LIFO (last-in-first-out).
Bottom line, I don't think we would be able to guarantee this without some performance hit. At the same time, it is fairly trivial for users to obtain stability if they need it.
Is it considered acceptable and "a good idea" to put value tuples in the public surface area?
Hand-waving a lot, the guideline is "don't do it if there's any other sensible approach".
If there's any possibility at all that the API would want to expose an enqueued item with its priority and use that item to update the priority (which would require a method) then it would need to use a custom type now. But since it's really just "you gave me two things, I'm giving you back those two things" (which is what it was for Enumerable.Zip) then creating a custom type is really just noise, and using KeyValuePair is using worse names just to avoid ValueTuple.
My point is that the term 'queue' doesn't seem to carry strong connotations of FIFO-like behaviour.
Something in my background has built "FIFO" and "queue" as literal synonyms, similar with "FILO" and "stack" (data structure). Various Wikipedia authors seem to have similar backgrounds, since the FIFO and Queue pages seem to arbitrarily use both terms.
it is fairly trivial for users to obtain stability if they need it.
That stability approach has an overflow problem; but I suppose if it uses ulong then it's effectively immune; though if someone builds one with load/save(//serialize/deserialize) mechanics they'd want to avoid saving the order value and just rebuilding that on load.
I was figuring that a real implementation of stability might decrement when popping to help with overflow? It'd be good to have a decent example in the eventual docs.
Naming is always bikesheddy :) if it wasn't for the Java/C++ precedent, a more natural/parallel fit with the existing C# data structures might have been PriorityBag :D
EnqueueRange(IEnumerable<TElement> values, TPriority priority)IEnumerable might be problematic because consumers are likely expecting it to return the elements in the order in which Dequeue would (or at least priority order)Elements, so we should removed thatpublic UnorderedItemsCollection UnorderedItems { get; } (and should not allocate on each get)ContainsSystem.Collections.GenericC#
namespace System.Collections.Generic
{
public class PriorityQueue<TElement, TPriority>
{
public PriorityQueue();
public PriorityQueue(int initialCapacity);
public PriorityQueue(IComparer<TPriority>? comparer);
public PriorityQueue(int initialCapacity, IComparer<TPriority>? comparer);
public PriorityQueue(IEnumerable<(TElement Element, TPriority Priority)> values);
public PriorityQueue(IEnumerable<(TElement Element, TPriority Priority)> values, IComparer<TPriority>? comparer);
public int Count { get; }
public IComparer<TPriority> Comparer { get; }
public void Enqueue(TElement element, TPriority priority);
public TElement Peek();
public TElement Dequeue();
public bool TryDequeue([MaybeNullWhen(false)] out TElement element, [MaybeNullWhen(false)] out TPriority priority);
public bool TryPeek([MaybeNullWhen(false)] out TElement element, [MaybeNullWhen(false)] out TPriority priority);
public TElement EnqueueDequeue(TElement element, TPriority priority);
public void EnqueueRange(IEnumerable<TElement> values, TPriority priority);
public void EnqueueRange(IEnumerable<(TElement Element, TPriority Priority)> values);
public void Clear();
public void EnsureCapacity(int capacity);
public void TrimExcess();
public UnorderedItemsCollection UnorderedItems { get; }
public class UnorderedItemsCollection : IReadOnlyCollection<(TElement, TPriority)>, ICollection
{
public struct Enumerator : IEnumerator<(TElement, TPriority)>, IEnumerator { }
public Enumerator GetEnumerator();
}
}
}
Is UnorderedItemsCollection a snapshot or an iterator?
(TElement, TPriority)[] ToArray() instead? There's now no "free" iteration over PQ elements.Dictionary.Keys/Values are the same)Is
UnorderedItemsCollectiona snapshot or an iterator?
Clarified
Updated the OP and updated the prototype to incorporate API review feedback.
@pgolebiowski would you still be interested in contributing the implementation?
EnsureCapacity should return an int representing the actual new capacity. This is how EnsureCapacity works in StringBuilder , HashSet<T>, and Dictionary<TKey, TValue>.
EnsureCapacity should return an int representing the actual new capacity. This is how EnsureCapacity works in StringBuilder , HashSet
, and Dictionary .
Good point, the signatures should be consistent. @terrajobst should I just update the proposal?
@pgolebiowski would you still be interested in contributing the implementation?
Yes, would love to! Will submit a pull request over the next two weeks 馃槉 Could you add me to the assignees of this issue?
Will any concurrent modifications invalidate the enumerator? For example in Dictionary<K,V> just Add/Insert, Trim, and Grow invalidate the enumerator, because those may cause enumeration to misbehave. Remove does not, you just may or may not see the remove depending on where you enumerated to.
Will any concurrent modifications invalidate the enumerator? For example in Dictionary
just Add/Insert, Trim, and Grow invalidate the enumerator, because those may cause enumeration to misbehave. Remove does not, you just may or may not see the remove depending on where you enumerated to.
In heaps update operations involve shuffling elements around to preserve the heap property, for example in Dequeue you actually pick the last element from the tree and sift it downward from the top. So I don't think we could avoid invalidating the enumeration without potentially skipping elements.
Hi @pgolebiowski, I've updated the OP with a "Definition of Done" checklist. Note that this is not work we are asking you to complete in a single PR, rather, it enumerates what is needed for the feature to be considered ready for shipping. It is an effort we will be iterating on, and I will be happy to take on some of the more menial tasks on that list.
@eiriktsarpalis, do other languages/libraries offer the equivalent of:
public TElement Dequeue(Tpriority priority);
public TElement Peek(Tpriority priority);
@ericsampson I'm not sure I understand what these methods should do, find elements whose priority matches (or is greater than) the supplied priority? If so, they are remove operations rather than dequeue operations, and would require a linear time scan of the heap contents.
Sorry for the terminology confusion, yeah you have the idea.
A motivation would be to allow a form of priority boosting; say you're writing a job scheduler, and want to ensure that lower-priority jobs get run "sometime", even if the queue is continually full of higher priority jobs.
I believe that Java's implementation has something like Remove(Telement element), which could be used along with the iterator by the end user to achieve this functionality (although with two scans).
Or also offering Remove(Tpriority priority)
Am I thinking right that the originally described use case can't be accomplished with the current proposed API surface? If so that could be a good motivation to add Remove to the API surface, like Java has. Or maybe I just need more coffee 鈽曪笍 :)
We discussed adding a Remove method like Java, however we eventually decided against it due to poor performance and ambiguity in the presence of duplicate elements.
I can understand that.
How then could a person achieve this use case (effectively 'dequeue an item with a given priority') given the current API surface? Thanks!
How then could a person achieve this use case (effectively 'dequeue an item with a given priority') given the current API surface? Thanks!
Such an operation would not be supported. I have created a separate issue for adding a heap implementation that would support priority update and removal operations (however you could only remove by specifying the element _not_ the priority).
Thanks, I had a feeling the answer would be "wait for the priority-update supporting version". Cheers
Note that it would still not be possible to remove elements _by priority_, at least not without it requiring a linear-time scan.
Yup. I'd be ok with a scan. Or more likely, to obtain the behavior I originally described, I would bump/increment the priority of lower priority items every N dequeues or something, and let the natural dequeuing pop them off eventually. Thanks Erik.
Most helpful comment
Yes, would love to! Will submit a pull request over the next two weeks 馃槉 Could you add me to the assignees of this issue?