Runtime: Add PriorityQueue<T> to Collections

Created on 30 Jan 2015  ·  318Comments  ·  Source: dotnet/runtime

See LATEST Proposal in corefxlab repo.

Second Proposal options

Proposal from https://github.com/dotnet/corefx/issues/574#issuecomment-307971397

Assumptions

Elements in priority queue are unique. If they are not, we would have to introduce 'handles' of items to enable their update/remove. Or the update/remove semantics would have to apply to first/all, which is weird.

Modeled after Queue<T> (MSDN link)

API

```c#
public class PriorityQueue
: IEnumerable,
IEnumerable<(TElement element, TPriority priority)>,
IReadOnlyCollection<(TElement element, TPriority priority)>
// ICollection not included on purpose
{
public PriorityQueue();
public PriorityQueue(IComparer comparer);

public IComparer<TPriority> Comparer { get; }
public int Count { get; }
public bool IsEmpty { get; }

public bool Contains(TElement element);

// Peek & Dequeue
public (TElement element, TPriority priority) Peek(); // Throws if empty
public (TElement element, TPriority priority) Dequeue(); // Throws if empty
public bool TryPeek(out TElement element, out TPriority priority); // Returns false if empty
public bool TryDequeue(out TElement element, out TPriority priority); // Returns false if empty

// Enqueue & Update
public void Enqueue(TElement element, TPriority priority); // Throws if it is duplicate
public void Update(TElement element, TPriority priority); // Throws if element does not exist
public void EnqueueOrUpdate(TElement element, TPriority priority);
public bool TryEnqueue(TElement element, TPriority priority); // Returns false if it is duplicate (does NOT update it)
public bool TryUpdate(TElement element, TPriority priority); // Returns false if element does not exist (does NOT add it)

public void Remove(TElement element); // Throws if element does not exist
public bool TryRemove(TElement element); // Returns false if element does not exist

public void Clear();

public IEnumerator<(TElement element, TPriority priority)> GetEnumerator();
IEnumerator IEnumerable.GetEnumerator();

//
// Selector part
//
public PriorityQueue(Func prioritySelector);
public PriorityQueue(Func prioritySelector, IComparer comparer);

public Func<TElement, TPriority> PrioritySelector { get; }

public void Enqueue(TElement element);
public void Update(TElement element);

}
````

Open questions:

  1. Class name PriorityQueue vs. Heap
  2. Introduce IHeap and constructor overload? (Should we wait for later?)
  3. Introduce IPriorityQueue? (Should we wait for later - IDictionary example)
  4. Use selector (of priority stored inside the value) or not (5 APIs difference)
  5. Use tuples (TElement element, TPriority priority) vs. KeyValuePair<TPriority, TElement>

    • Should Peek and Dequeue rather have out argument instead of tuple?

  6. Is Peek and Dequeue throwing useful at all?

Original Proposal

Issue https://github.com/dotnet/corefx/issues/163 requested the addition of a priority queue to the core .NET collection data structures.

This post, while a duplicate, is intended to act the formal submission to the corefx API Review Process. The issue contents are the _speclet_ for a new System.Collections.Generic.PriorityQueue type.

I will be contributing the PR, if approved.

Rationale and Usage

The .NET Base Class Libraries (BCL) currently lacks support for ordered producer-consumer collections. A common requirement of many software applications is the ability generate a list of items over time and process them in an order different from the order they were received in.

There are three generic data structures within the System.Collections hierarchy of namespaces that supported a sorted collection of items; System.Collections.Generic.SortedList, System.Collections.Generic.SortedSet, and System.Collections.Generic.SortedDictionary.

Of these, SortedSet and SortedDictionary are not appropriate for producer-consumer patterns that generate duplicate values. The complexity of SortedList is Θ(n) worst case for both Add and Remove.

A much more memory and time efficient data structure for ordered collections with producer-consumer usage patterns is a priority queue. Other than when capacity resizing is necessary, worse case insertion (enqueue) and remove top (dequeue) performance is Θ(log n) - far better than the existing options that exist in the BCL.

Priority queues have a wide degree of applicability across different classes of applications. The Wikipedia page on Priority Queues offers a list of many different well understand use cases. While highly specialized implementations may still require custom priority queue implementations, a standard implementation would cover a broad range of usage scenarios. Priority queues are particularly useful in scheduling the output of multiple producers, which is an important pattern in highly parallelized software.

It's worth noting that both the C++ standard library and Java offer priority queue functionality as part of their basic APIs.

Proposed API

``` C#
namespace System.Collections.Generic
{
///


/// Represents a collection of objects that are removed in a sorted order.
///

/// Specifies the type of elements in the queue.
[DebuggerDisplay("Count = {count}")]
[DebuggerTypeProxy(typeof(System_PriorityQueueDebugView<>))]
public class PriorityQueue : IEnumerable, ICollection, IEnumerable, IReadOnlyCollection
{
///
/// Initializes a new instance of the class
/// that uses a default comparer.
///

public PriorityQueue();

    /// <summary>
    /// Initializes a new instance of the <see cref="PriorityQueue{T}"/> class 
    /// that has the specified initial capacity.
    /// </summary>
    /// <param name="capacity">The initial number of elements that the <see cref="PriorityQueue{T}"/> can contain.</param>
    /// <exception cref="T:System.ArgumentOutOfRangeException"><paramref name="capacity"/> is less than zero.</exception>
    public PriorityQueue(int capacity);

    /// <summary>
    /// Initializes a new instance of the <see cref="PriorityQueue{T}"/> class 
    /// that uses a specified comparer.
    /// </summary>
    /// <param name="comparer">The <see cref="T:System.Collections.Generic.IComparer{T}"/> to use when comparing elements.</param>
    /// <exception cref="T:System.ArgumentNullException"><paramref name="comparer"/> is null.</exception>
    public PriorityQueue(IComparer<T> comparer);

    /// <summary>
    /// Initializes a new instance of the <see cref="PriorityQueue{T}"/> class 
    /// that contains elements copied from the specified collection and uses a default comparer.
    /// </summary>
    /// <param name="collection">The collection whose elements are copied to the new <see cref="PriorityQueue{T}"/>.</param>
    /// <exception cref="T:System.ArgumentNullException"><paramref name="collection"/> is null.</exception>
    public PriorityQueue(IEnumerable<T> collection);

    /// <summary>
    /// Initializes a new instance of the <see cref="PriorityQueue{T}"/> class 
    /// that contains elements copied from the specified collection and uses a specified comparer.
    /// </summary>
    /// <param name="collection">The collection whose elements are copied to the new <see cref="PriorityQueue{T}"/>.</param>
    /// <param name="comparer">The <see cref="T:System.Collections.Generic.IComparer{T}"/> to use when comparing elements.</param>
    /// <exception cref="T:System.ArgumentNullException">
    /// <paramref name="collection"/> is null. -or-
    /// <paramref name="comparer"/> is null.
    /// </exception>
    public PriorityQueue(IEnumerable<T> collection, IComparer<T> comparer);

    /// <summary>
    /// Initializes a new instance of the <see cref="PriorityQueue{T}"/> class that is empty,
    /// has the specified initial capacity, and uses a specified comparer.
    /// </summary>
    /// <param name="capacity">The initial number of elements that the <see cref="PriorityQueue{T}"/> can contain.</param>
    /// <param name="comparer">The <see cref="T:System.Collections.Generic.IComparer{T}"/> to use when comparing elements.</param>
    /// <exception cref="T:System.ArgumentOutOfRangeException"><paramref name="capacity"/> is less than zero.</exception>
    /// <exception cref="T:System.ArgumentNullException"><paramref name="comparer"/> is null.</exception>
    public PriorityQueue(int capacity, IComparer<T> comparer);

    /// <summary>
    /// Gets the <see cref="IComparer{T}"/> for the <see cref="PriorityQueue{T}"/>. 
    /// </summary>
    /// <value>
    /// The <see cref="T:System.Collections.Generic.IComparer{T}"/> that is used when
    /// comparing elements in the <see cref="PriorityQueue{T}"/>. 
    /// </value>
    public IComparer<T> Comparer 
    { 
        get;
    }

    /// <summary>
    /// Gets the number of elements contained in the <see cref="PriorityQueue{T}"/>.
    /// </summary>
    /// <value>The number of elements contained in the <see cref="PriorityQueue{T}"/>.</value>
    public int Count 
    { 
        get;
    }

    /// <summary>
    /// Adds an object to the into the <see cref="PriorityQueue{T}"/> by its priority.
    /// </summary>
    /// <param name="item">
    /// The object to add to the <see cref="PriorityQueue{T}"/>. 
    /// The value can be null for reference types.
    /// </param>
    public void Enqueue(T item);

    /// <summary>
    /// Removes and returns the object with the lowest priority in the <see cref="PriorityQueue{T}"/>.
    /// </summary>
    /// <returns>The object with the lowest priority that is removed from the <see cref="PriorityQueue{T}"/>.</returns>
    /// <exception cref="InvalidOperationException">The <see cref="PriorityQueue{T}"/> is empty.</exception>
    public T Dequeue();

    /// <summary>
    /// Returns the object with the lowest priority in the <see cref="PriorityQueue{T}"/>.
    /// </summary>
    /// <exception cref="InvalidOperationException">The <see cref="PriorityQueue{T}"/> is empty.</exception>
    public T Peek();

    /// <summary>
    /// Removes all elements from the <see cref="PriorityQueue{T}"/>.
    /// </summary>
    public void Clear();

    /// <summary>
    /// Determines whether an element is in the <see cref="PriorityQueue{T}"/>.
    /// </summary>
    /// <param name="item">
    /// The object to add to the end of the <see cref="PriorityQueue{T}"/>. 
    /// The value can be null for reference types.
    /// </param>
    /// <returns>
    /// true if item is found in the <see cref="PriorityQueue{T}"/>;  otherwise, false.
    /// </returns>
    public bool Contains(T item);

    /// <summary>
    /// Copies the elements of the <see cref="PriorityQueue{T}"/> to an  <see cref="T:System.Array"/>, 
    /// starting at a particular <see cref="T:System.Array"/> index.
    /// </summary>
    /// <param name="array">
    /// The one-dimensional <see cref="T:System.Array">Array</see> that is the
    /// destination of the elements copied from the <see cref="PriorityQueue{T}"/>. 
    /// The <see cref="T:System.Array">Array</see> must have zero-based indexing.
    /// </param>
    /// <param name="arrayIndex">The zero-based index in <paramref name="array"/> at which copying begins.</param>
    /// <exception cref="T:System.ArgumentNullException"><paramref name="array"/> is null.</exception>
    /// <exception cref="T:System.ArgumentOutOfRangeException">
    /// <paramref name="arrayIndex"/> is less than zero. -or- 
    /// <paramref name="arrayIndex"/> is equal to or greater than the length of the <paramref name="array"/>
    /// </exception>
    /// <exception cref="ArgumentException">
    /// The number of elements in the source <see cref="T:System.Collections.ICollection"/> is
    /// greater than the available space from <paramref name="index"/> to the end of the destination
    /// <paramref name="array"/>.
    /// </exception>
    public void CopyTo(T[] array, int arrayIndex);

    /// <summary>
    /// Copies the elements of the <see cref="T:System.Collections.ICollection"/> to an 
    /// <see cref="T:System.Array"/>, starting at a particular <see cref="T:System.Array"/> index.
    /// </summary>
    /// <param name="array">
    /// The one-dimensional <see cref="T:System.Array">Array</see> that is the
    /// destination of the elements copied from the <see cref="PriorityQueue{T}"/>. 
    /// The <see cref="T:System.Array">Array</see> must have zero-based indexing.
    /// </param>
    /// <param name="index">The zero-based index in <paramref name="array"/> at which copying begins.</param>
    /// <exception cref="T:System.ArgumentNullException"><paramref name="array"/> is null.</exception>
    /// <exception cref="T:System.ArgumentOutOfRangeException"><paramref name="index"/> is less than zero.</exception>
    /// <exception cref="ArgumentException">
    /// <paramref name="array"/> is multidimensional. -or-
    /// <paramref name="array"/> does not have zero-based indexing. -or-
    /// <paramref name="index"/> is equal to or greater than the length of the <paramref name="array"/> -or- 
    /// The number of elements in the source <see cref="T:System.Collections.ICollection"/> is
    /// greater than the available space from <paramref name="index"/> to the end of the destination
    /// <paramref name="array"/>. -or- 
    /// The type of the source <see cref="T:System.Collections.ICollection"/> cannot be cast automatically 
    /// to the type of the destination <paramref name="array"/>.
    /// </exception>
    void ICollection.CopyTo(Array array, int index);

    /// <summary>
    /// Copies the elements stored in the <see cref="PriorityQueue{T}"/> to a new array.
    /// </summary>
    /// <returns>
    /// A new array containing a snapshot of elements copied from the <see cref="PriorityQueue{T}"/>.
    /// </returns>
    public T[] ToArray();

    /// <summary>
    /// Returns an enumerator that iterates through the <see cref="PriorityQueue{T}"/>
    /// </summary>
    /// <returns>An enumerator for the contents of the <see cref="PriorityQueue{T}"/>.</returns>
    public Enumerator GetEnumerator();

    /// <summary>
    /// Returns an enumerator that iterates through the <see cref="PriorityQueue{T}"/>
    /// </summary>
    /// <returns>An enumerator for the contents of the <see cref="PriorityQueue{T}"/>.</returns>
    IEnumerator<T> IEnumerable<T>.GetEnumerator();

    /// <summary>
    /// Returns an enumerator that iterates through the <see cref="PriorityQueue{T}"/>.
    /// </summary>
    /// <returns>An <see cref="T:System.Collections.IEnumerator"/> that can be used to iterate through the collection.</returns>
    IEnumerator IEnumerable.GetEnumerator();

    /// <summary>
    /// Sets the capacity to the actual number of elements in the <see cref="PriorityQueue{T}"/>, 
    /// if that number is less than than a threshold value.
    /// </summary>
    public void TrimExcess();

    /// <summary>
    /// Gets a value that indicates whether access to the <see cref="ICollection"/> is 
    /// synchronized with the SyncRoot.
    /// </summary>
    /// <value>true if access to the <see cref="T:System.Collections.ICollection"/> is synchronized
    /// with the SyncRoot; otherwise, false. For <see cref="PriorityQueue{T}"/>, this property always
    /// returns false.</value>
    bool ICollection.IsSynchronized
    {
        get;
    }

    /// <summary>
    /// Gets an object that can be used to synchronize access to the 
    /// <see cref="T:System.Collections.ICollection"/>.
    /// </summary>
    /// <value>
    /// An object that can be used to synchronize access to the 
    /// <see cref="T:System.Collections.ICollection"/>.
    /// </value>
    object ICollection.SyncRoot
    {
        get;
    }

    public struct Enumerator : IEnumerator<T>
    {
        public T Current { get; }
        object IEnumerator.Current { get; }
        public bool MoveNext();
        public void Reset();
        public void Dispose();
    }
}

}
```

Details

  • Implementation data structure will be a binary heap. Items with a greater comparison value will be returned first. (descending order)
  • Time complexities:

| Operation | Complexity | Notes |
| --- | --- | --- |
| Construct | Θ(1) | |
| Construct Using IEnumerable | Θ(n) | |
| Enqueue | Θ(log n) | |
| Dequeue | Θ(log n) | |
| Peek | Θ(1) | |
| Count | Θ(1) | |
| Clear | Θ(N) | |
| Contains | Θ(N) | |
| CopyTo | Θ(N) | Uses Array.Copy, actual complexity may be lower |
| ToArray | Θ(N) | Uses Array.Copy, actual complexity may be lower |
| GetEnumerator | Θ(1) | |
| Enumerator.MoveNext | Θ(1) | |

  • Additional constructor overloads that take the System.Comparison delegate were intentionally omitted in favor of a simplified API surface area. Callers can use Comparer.Create to convert a function or Lambda expression to an IComparer interface if necessary. This does require the caller to incur a one-time heap allocation.
  • Although System.Collections.Generic is not yet part of corefx, I propose that this class be added to corefxlab in the meantime. It can be moved to the primary corefx repository once System.Collections.Generic are added and there is consensus that its status should be elevated from experimental to an official API.
  • An IsEmpty property was not included, since there is no additional performance penalty calling Count. The majority of the collection data structures do not include IsEmpty.
  • The IsSynchronized and SyncRoot properties of ICollection were implemented explicitly as they are effectively obsolete. This also follows the pattern used for the other System.Collection.Generic data structures.
  • Dequeue and Peek throw an InvalidOperationException when the queue is empty to match the established behavior of System.Collections.Queue.
  • IProducerConsumerCollection was not implemented as its documentation states that it is only intended for thread-safe collections.

Open Questions

  • Is avoiding an additional heap allocation during calls to GetEnumerator when using foreach a strong enough rationale for including the nested public enumerator structure?
  • Should CopyTo, ToArray, and GetEnumerator return results in prioritized (sorted) order, or the internal order used by the data structure? My assumption is that the internal order should be returned, as it doesn't incur any additional performance penalties. However, this is a potential usability issue if a developer thinks of the class as a "sorted queue" rather a priority queue.
  • Does adding a type named PriorityQueue to System.Collections.Generic cause a potentially breaking change? The namespace is heavily used, and could cause a source compatibility problem for projects that include their own priority queue type.
  • Should items be dequeued in ascending or descending order, based on the output of IComparer? (my assumption is ascending order, to match the normal sorting convention of IComparer).
  • Should the collection be 'stable'? In other words, should two items with equal IComparison results be dequeued in the exact same order they are enqueued in? (my assumption is this isn't needed)

    Updates

  • Fixed complexity of 'Construct Using IEnumerable' to Θ(n). Thanks @svick.

  • Added another option question regarding whether the priority queue should be ordered in ascending or descending order compared to the IComparer.
  • Removed NotSupportedException from explicit SyncRoot property to match behavior of other System.Collection.Generic types instead of using the newer pattern.
  • Made the public GetEnumerator method return a nested Enumerator struct instead of IEnumerable, similar to the existing System.Collections.Generic types. This is an optimization to avoid a heap (GC) allocation when using a foreach loop.
  • Removed ComVisible attribute.
  • Changed complexity of Clear to Θ(n). Thanks @mbeidler.
api-needs-work area-System.Collections wishlist

Most helpful comment

heap data structure is a MUST for doing leetcode
more leetcode, more c# code interview which means more c# developers.
more developers means better ecosystem.
better ecosystem means that if we can still program in c# tomorrow.

in sum: this is not only a feature, but also the future. that is why is issue is labeled 'future'.

All 318 comments

| Operation | Complexity |
| --- | --- |
| Construct Using IEnumerable | Θ(log n) |

I think this should be Θ(n). You need to at least iterate the input.

+1

Should a nested public enumerator structure be used to avoid an additional heap allocation during calls to GetEnumerator and when using foreach? My assumption is no, since enumerating over a queue is an uncommon operation.

I'd lean towards using the struct enumerator to be consistent with Queue<T> which uses a struct enumerator. Also, if a struct enumerator isn't used now, we won't be able to change PriorityQueue<T> to use one in the future.

Also perhaps a method for batch inserts? Can always then sort and continue from previous insertion point rather than starting at the beginning if that would help?:

    public void Enqueue(List<T> items) {
         items.Sort(_comparer);
         ... insertions ...
    }

I've tossed a copy of the initial implementation here. Test coverage is far from complete, but if anyone is curious please take a look and let me know what you think. I've tried to follow the existing coding conventions from the System.Collections classes as much as possible.

Cool. Some initial feedback:

  • Queue<T>.Enumerator implements IEnumerator.Reset explicitly. Should PriorityQueue<T>.Enumerator do the same?
  • Queue<T>.Enumerator uses _index == -2 to indicate the enumerator has been disposed. PriorityQueue<T>.Enumerator has the same comment but has an extra _disposed field. Consider getting rid of the extra _disposed field and use _index == -2 to indicate it's been disposed to make the struct smaller and to be consistent with Queue<T>
  • I think the static _emptyArray field can be removed and usage replaced with Array.Empty<T>() instead.

Also...

  • Other collections that take a comparer (e.g. Dictionary<TKey, TValue>, HashSet<T>, SortedList<TKey, TValue>, SortedDictionary<TKey, TValue>, SortedSet<T>, etc.) allow null to be passed in for the comparer, in which case Comparer<T>.Default is used.

Also...

  • ToArray can be optimized by checking for _size == 0 before allocating the new array, in which case just return Array.Empty<T>().

@justinvp Great feedback, thanks!

  • I implemented Enumerator.Reset explicitly since it's core, non-deprecated functionality of an enumerator. Whether or not it's exposed seems inconsistent across the collection types, and only a few use the explicit variant.
  • Removed the _disposed field in favor of _index, thanks! Tossed that in at the last minute that night and missed the obvious. Decided to keep the ObjectDisposedException for correctness with the newer collection types, even though the old System.Collections.Generic types don't use it.
  • Array.Empty is a F# feature, so can't use it here sadly!
  • Modified the comparer parameters to accept null, good find!
  • The ToArray optimization is tricky. _Technically_ speaking arrays are mutable in C#, even when they have a length of zero. In reality you're right, the allocation isn't needed and it can be optimized. I'm leaning towards a more cautious implementation, in case there are side-effects I'm not thinking of. Semantically the caller will still expect that allocation, and it's a minor one.

@ebickle

Array.Empty is a F# feature, so can't use it here sadly!

Not anymore: https://github.com/dotnet/coreclr/blob/master/src/mscorlib/src/System/Array.cs#L1060-L1069

I've migrated the code over to ebickle/corefx in the issue-574 branch.

Implemented the Array.Empty() change and plugged everything into the regular build pipeline. One slight temporary kludge I had to introduce was having the System.Collections project depend on the System.Collections nuget package, as Comparer isn't open source yet.

Will be fixed once issue dotnet/corefx#966 is complete.

One key area I'm looking for feedback is how ToArray, CopyTo, and the enumerator should be handled. Currently they're optimized for performance, which means the target array is a heap and not sorted by the comparer.

There are three options:
1) Leave it as-is, and document that the returned array is not sorted. (it's a priority queue, not a sorted queue)
2) Modify the methods to sort the returned items/arrays. They will no longer be O(n) operations.
3) Remove the methods and enumerable support altogether. This is the "purist" option but removes the ability to quickly grab the remaining queued items when the priority queue is no longer needed.

Another thing I'd like feedback on is whether the queue should be stable for two items with the same priority (compare result of 0). Generally priority queues make no guarantees that two items with the same priority will be dequeued in the order they were enqueued, but I noticed that internal priority queue implementation used in System.Reactive.Core took some additional overhead to ensure this property. My preference would be to not do this, but I'm not completely sure which option is better in terms developers' expectations.

I happened upon this PR because I too was interested in adding a priority queue to .NET. Glad to see someone went to the effort of making this proposal :). After reviewing the code, I noticed the following:

  • When the IComparer ordering is not consistent with Equals, the behavior of this Contains implementation (which uses the IComparer) may be surprising to some users, as it is essentially _contains an item with equal priority_.
  • I didn't see code to shrink the array in Dequeue. Shrinking the heap array by half when quarter full is typical.
  • Should the Enqueue method accept null arguments?
  • I think the complexity of Clear should be Θ(n), since that is the complexity of System.Array.Clear, which it uses. https://msdn.microsoft.com/en-us/library/system.array.clear%28v=vs.110%29.aspx

I didn't see code to shrink the array in Dequeue. Shrinking the heap array by half when quarter full is typical.

Queue<T> and Stack<T> also don't shrink their arrays (based on Reference Source, they're not in CoreFX yet).

@mbeidler I'd considered adding some form of automatic array shrinking in dequeue, but as @svick pointed out it doesn't exist in the reference implementations of similar data structures. I'd be curious to hear from anyone on the .NET Core/BCL team whether there's any particular reason they chose that style of implementation.

Update: I checked List, Queue, Queue, and ArrayList - none of them shrink the size of the internal array on a remove/dequeue.

Enqueue should support nulls, and is documented as allowing them. Did you run into a bug? I can't remember how robust the unit tests area in the area yet.

Interesting, I noticed in the reference source linked by @svick that Queue<T> has an unused private constant named _ShrinkThreshold. Perhaps that behavior existed in a previous version.

Concerning the use of IComparer instead of Equals in the implementation of Contains, I wrote up the following unit test, which would fail currently: https://gist.github.com/mbeidler/9e9f566ba7356302c57e

@mbeidler Good point. According to MSDN, IComparer / IComparable only guarantees that a value of zero has the same sort order.

However, it looks as though the same issue exists in the other collection classes. If I modify the code to operate on SortedList, the test case still fails on ContainsKey. The implementation of SortedList.ContainsKey calls Array.BinarySearch, which relies on IComparer to check for equality. The same holds true for SortedSet.

Arguably it's a bug in the existing collection classes as well. I'll dug through the rest of the collections classes and see if there are any other data structures that accept an IComparer but test for equality separately. You're right though, for a priority queue you would expect custom ordering behavior that's completely independent from equality.

Committed a fix and the test case into my fork branch. The new implementation of Contains is based directly on the behavior of List.Contains. Since List doesn't accept an IEqualityComparer, the behavior is functionally equivalent.

When I get some time later today, I'll likely submit bug reports for the other built-in collections. The probably can't be fixed due to regression behavior, but at very least the documentation needs to be updated.

I think it makes sense for ContainsKey to use the IComparer<TKey> implementation, since that is what specifies the key. However, I think it would be more logical for ContainsValue to use Equals instead of IComparable<TValue> in its linear search; though in this case the scope is significantly reduced since a type's natural ordering is much less likely to be inconsistent with equals.

It appears that in the MSDN documentation for SortedList<TKey, TValue>, the remarks section for ContainsValue does indicate that the TValue's sort ordering is used in lieu of equality.

@terrajobst How do you feel about the API proposal so far? Do you feel this a good fit for CoreFX?

:+1:

Thanks for filing this. I believe we've enough data to make a formal review of this proposal, hence I labelled it as 'ready for api review'

As Dequeue and Peek are throwing methods the caller needs to check count before each call. Would it make sense to instead (or in addition) provide TryDequeue and TryPeek following the pattern of the concurrent collections? There are issues open to add non-throwing methods to existing generic collections so adding a new collection which doesn't have these methods seems counterproductive.

@andrewgmorris related https://github.com/dotnet/corefx/issues/4316 "Add TryDequeue to Queue"

We had a basic review of this and we agree that we want a ProrityQueue in the framework. However, we need to get someone to help drive the design and implementation for it. Whoever grabs the issue can work @terrajobst on finalizing the API.

So what work is left on the API?

This is missing from the API proposal above: PriorityQueue<T> should implement IReadOnlyCollection<T> to match Queue<T> (Queue<T> now implements IReadOnlyCollection<T> as of .NET 4.6).

I don't know that array-based priority queues are best. Memory allocation in .NET is really fast. We don't have the same search-small-blocks issue that the old malloc dealt with. You are welcome to use my priority queue code from here: https://github.com/BrannonKing/Kts.Astar/tree/master/Kts.AStar

@ebickle One tiny nit on the _speclet_. It says:

/// Adds an object to the end of the <see cref="PriorityQueue{T}"/>. ... public void Enqueue(T item);

Shouldn't it say instead, /// Inserts object into the <see cref="PriorityQueue{T}"> by its priority.

@SunnyWar Fixed the documentation of the Enqueue method, thank you!

Some time ago, I created a data structure with complexities similar to a priority queue based on a Skip List data structure which I've decided at this point to share: https://gist.github.com/bbarry/5e0f3cc1ac7f7521fe6ea25947f48ace

https://en.wikipedia.org/wiki/Skip_list

The skip list matches the complexities of a priority queue above in average cases, except that Contains is an average case O(log(n)). In addition, access to either first or last elements are constant time operations and iteration in both forward and reverse order matches the forward order complexities of a PQ.

There obviously are downsides to such a structure in the form of higher costs on memory, and it does devolve to O(n) worst case inserts and removals, so it has its trade offs...

Is this already implemented somewhere? When is the expected release?
Also whats about updating the priority of an existing item?

@Priya91 @ianhays is this ready to get marked ready for review?

This is missing from the API proposal above: PriorityQueue should implement IReadOnlyCollection to match Queue (Queue now implements IReadOnlyCollection as of .NET 4.6).

I agree with @justinvp here.

@Priya91 @ianhays is this ready to get marked ready for review?

I'd say so. This has been sitting for a while; lets make moves on it.

@justinvp @ianhays I've updated the spec to implement IReadOnlyCollection. Thank you!

I have a full implementation of the class and it's associated PriorityQueueDebugView that uses an array-based implementation. Unit tests aren't at 100% coverage yet, but if there's some interest I can do a bit of work and dust off my fork.

@NKnusperer made a good point about updating the priority of an existing item. I'll leave it out for now but it's something to consider during spec review.

There are 2 implementations in full framework, that are possible alternatives.
https://referencesource.microsoft.com/#q=priorityqueue

For reference here is a question about the Java PriorityQueue on stackoverflow http://stackoverflow.com/questions/683041/java-how-do-i-use-a-priorityqueue. It is interesting that the priority is handled by a comparer instead of just a simple int priority wrapper object. For example it doesn't make it easy to change the priority of an item in the queue already.

API review:
We agree that it is useful to have this type in CoreFX, because we expect CoreFX to use it.

For final review of the API shape, we would like to see sample code: PriorityQueue<Thread> and PriorityQueue<MyClass>.

  1. How do we maintain the priority? Right now it is only implied by T.
  2. Do we want it to be possible to pass the priority when you add an entry? (which seems pretty handy)

Notes:

  • We expect that priority does not mutate on its own - we would need either API for it, or we would expect Remove and Add on the queue.
  • Given that we do not have a clear customer code here (just general wish that we want the type), it is hard to decide what to optimize for - perf, vs. usability vs. something else?

This would be a really useful type to have in CoreFX. Is anyone interested in grabbing this one?

I don't like the idea of fixing the priority queue to a binary heap. Have a look at my AlgoKit wiki page for details. Quick thoughts:

  • If we are fixing the implementation to a certain type of a heap, make it a 4-ary heap.
  • We shouldn't fix the implementation of the heap, as in some scenarios some types of heaps are more performant than others. Thus, given we fix the implementation to a certain heap type and the customer needs a different one for more performance, he would need to implement the entire priority queue from the ground up. That's wrong. We should be more flexible here and let him re-use some of the CoreFX code (at least some interfaces).

Last year I implemented a few heap types. In particular, there is IHeap interface that could be used as the priority queue.

  • Why not just call it a heap...? You know any sane way to implement a priority queue other than using a heap?

I'm all for introducing an IHeap interface and some most performant implementations (at least those array based). The API and implementations are in the repository I linked above.

So no _priority queues_. Heaps.

What do you think?

@karelz @safern @danmosemsft

@pgolebiowski Don't forget we are designing easy-to-use-while-pretty-performant API. If you want PriorityQueue (which is established computer-science term), you should be able to find one in docs / via internet search.
If the underlying implementation is heap (which I personally wonder why), or something else, it doesn't matter too much. Assuming you can demonstrate the implementation is measurably better than simpler alternatives (complexity of code is also a metric that somewhat matters).

So if you still believe heap-based implementation (without IHeap API) is better than some simple list-based or list-of-array-chunks-based implementation, please explain why (ideally in a few sentences/paragraphs), so that we can get agreement on the implementation approach (and avoid wasting time on your side on complex implementation which may be turned down at PR review time).

Drop ICollection, IEnumerable? Just have the generic versions (though generic IEnumerable<T> will bring in IEnumerable)

@pgolebiowski how its implemented doesn't change the external api. PriorityQueue defines the behaviour/contract; whereas a Heap is a specific implementation.

Priority queues vs heaps

If the underlying implementation is heap (which I personally wonder why), or something else, it doesn't matter too much. Assuming you can demonstrate the implementation is measurably better than simpler alternatives (complexity of code is also a metric that somewhat matters).

So if you still believe heap-based implementation is better than some simple list-based or list-of-array-chunks-based implementation, please explain why (ideally in a few sentences/paragraphs), so that we can get agreement on the implementation approach [...]

OK. A priority queue is an abstract data structure that can be then implemented in some way. Of course you can implement it with a data structure different than a heap. But no way is more efficient. As a result:

  • priority queue and heap are often used as synonyms (see Python doc below as the clearest example of this)
  • whenever you have a "priority queue" support in some library, it uses a heap (see all the examples below)

To back my words let's start with the theoretical support. Introduction to Algorithms, Cormen:

[…] priority queues come in two forms: max-priority queues and min-priority queues. We will focus here on how to implement max-priority queues, which are in turn based on max- heaps.

Stated clearly that priority queues are heaps. This is a shortcut of course, but you get the idea. Now, more importantly, let's have a look at how other languages and their standard libraries add support for the operations we are discussing:

  • Java: PriorityQueue<T> — priority queue implemented with a heap.
  • Rust: BinaryHeap — heap explicitly in the API. It says in the docs that it's a priority queue implemented with a binary heap. But the API is very clear on the structure — a heap.
  • Swift: CFBinaryHeap — again, explicitly says what the data structure is, avoiding the usage of the abstract term "priority queue". The docs describing the class: Binary heaps can be useful as priority queues. I like the approach.
  • C++: priority_queue — once again, implemented canonically with a binary heap built on the top of an array.
  • Python: heapq — heap is explicitly exposed in the API. The priority queue is only mentioned in the docs: This module provides an implementation of the heap queue algorithm, also known as the priority queue algorithm. Heaps are binary trees. Also notice the huge shortcut to say that a heap equals a priority queue and that a heap equals a binary tree.
  • Go: heap package — there is even a heap interface. No explicit priority queue, but again only in docs: A heap is a common way to implement a priority queue. To build a priority queue, implement the Heap interface.

I strongly believe we should go the Rust / Swift / Python / Go way and expose a heap explicitly while clearly stating in the docs that it can be used as a priority queue. I strongly believe this approach is very clean and simple.

  • We are clear about the data structure and let the API be enhanced in the future — if someone comes up with a new, revolutionary way of implementing a priority queue that is better in some aspects (and the choice heap vs the new type would depend on the scenario), our API can still be intact. We could simply add the new revolutionary type enhancing the library — and the heap class would still exist there.
  • Imagine a user wonders whether the data structure we choose is stable. When the solution we follow is a priority queue, this is not obvious. The term is abstract and anything can be beneath. Thus some time is lost by the user to search the docs and find out that it uses a heap internally and as such is not stable. This could've been avoided by just explicitly stating through the API that this is a heap.

I hope you all like the approach of just clearly exposing a heap data structure — and making a priority queue reference in the docs only.

API & implementation

We need to agree upon the matter above, but let me just start another topic — how to implement this. I can see two solutions here:

  • We just provide an ArrayHeap class. Seeing the name you can immediately tell what kind of heap you're dealing with (again, there are dozens of heap types). You see it is array-based. You immediately know the beast you're dealing with.
  • Another possibility that I like much more is providing an IHeap interface and providing one or more implementations. The customer could write code that depends on interfaces — this would allow providing really clear and readable implementations of complex algorithms. Imagine writing a DijkstraAlgorithm class. It could simply depend on the IHeap interface (one parametrized constructor) or just use the ArrayHeap (default constructor). Clean, simple, explicit, no ambiguity involved due to using a "priority queue" term. And a wonderful interface that makes very much sense theoretically.

In the approach above, the ArrayHeap represents an implicit heap-ordered complete d-ary tree, stored as an array. This can be used to create for example a BinaryHeap or a QuaternaryHeap.

Bottom line

For further discussion I strongly encourage you to have a look at this paper. You will know that:

  • 4-ary heaps (quaternary) are simply faster than 2-ary heaps (binary). There are many tests done in the paper — you would be interested in comparing the performance of implicit_4 and implicit_2 (sometimes implicit_simple_4 and implicit_simple_2).

  • The optimal choice of implementation is strongly input-dependent. Out of implicit d-ary heaps, pairing heaps, Fibonacci heaps, binomial heaps, explicit d-ary heaps, rank-pairing heaps, quake heaps, violation heaps, rank-relaxed weak heaps, and strict Fibonacci heaps, the following types seem to cover almost all needs for various scenarios:

    • Implicit d-ary heaps (ArrayHeap), <- surely we need this
    • Pairing heaps, <- if we go the nice and clean IHeap approach, it would be worth adding this, as in many cases it is really fast and faster than the array-based solution.

    For both of them the programming effort is surprisingly low. Have a look at my implementations.


@karelz @benaadams

This would be a really useful type to have in CoreFX. Is anyone interested in grabbing this one?

@safern I'd be very happy to grab this one from now on.

OK, it was a problem between my keyboard and my chair - PriorityQueue is of course based on Heap -- I was thinking about just Queue where it doesn't make sense and forgot that heap is 'sorted' -- very embarrassing thinking-process-outage for someone like me, who loves logic, algorithms, Turing machines, etc., my apologies. (BTW: As soon as I read couple of sentences in your Java docs link, the discrepancy immediately clicked)

From that perspective it makes sense to build the API on top of Heap. We should not make that class public yet though - it will require its own API review and its own discussion if it is something we need in CoreFX. We do not want API surface creep due to implementation, but it may be the right thing to do -- hence the discussion needed.
From that perspective I don't think we need to create IHeap just yet. It may be good decision later.
If there is research that specific heap (e.g. 4-ary as you mention above) is best for general random input, then we should choose that. Let's wait for @safern @ianhays @stephentoub to confirm/voice opinion.

The parametrization of underlying heap with multiple implemented options is something which IMO does not belong into CoreFX (I may be wrong here, again - let's see what others think).
My reason is that IMO we would just soon ship gazillions of specialized collections which would be very hard for people (average developer without strong background in nuances of algorithms) to choose from. Such library, however, would make a great NuGet package, for experts in the field - owned by you/community. In future, we may consider adding it into PowerCollections (we are actively discussing for last 4 months where on GitHub to put this library and if we should own it, or if we should encourage community to own it - there are different opinions on the matter, I expect we will finalize its fate post 2.0)

Assigning to you as you want to work on it ...

@pgolebiowski collaborator invite sent - when you accept ping me, I will be able to assign it to you then (GitHub limitations).

@benaadams I would keep ICollection (mild preference). For consistency with other d.s. in CoreFX. IMO it is not worth to have one odd beast here ... if we were adding a handful new ones (e.g. PowerCollections even to another repo), we should not include the non-generic ones ... thoughts?

OK, it was a problem between my keyboard and my chair.

Haha 😄 No worries.

it makes sense to build the API on top of Heap. We should not make that class public yet though [...] We do not want API surface creep due to implementation, but it may be the right thing to do -- hence the discussion needed. [...] I don't think we need to create IHeap just yet. It may be good decision later.

If the decision of the group is to go with PriorityQueue, I'll just help with the design and implement this. However, please take into consideration the fact that if we add a PriorityQueue now, it will be messy in the API later to add Heap -- as both behave basically the same. It would be sort of redundancy IMO. This would be a design smell for me. I wouldn't add the priority queue. It doesn't help.

Also, one more thought. Actually, the pairing heap data structure could come in handy fairly often. Array-based heaps are horrible at merging them. This operation is basically linear. When you're having lots of merging heaps, you're killing the performance. However, if you use a pairing heap instead of an array heap -- the merging operation is constant (amortized). This is another argument why I would like to provide a nice interface and two implementations. One for general input, second for some specific scenarios, especially when merging of heaps is involved.

Voting for IHeap + ArrayHeap + PairingHeap! 😄 (like in Rust / Swift / Python / Go)

If the pairing heap is too much -- OK. But let's at least go with IHeap + ArrayHeap. Don't you guys feel that going with a PriorityQueue class is locking the possibilities in the future and making the API less clear?

But as I said -- if you all vote for a PriorityQueue class over the proposed solution -- OK.

collaborator invite sent - when you accept ping me, I will be able to assign it to you then (GitHub limitations).

@karelz ping :)

please take into consideration the fact that if we add a PriorityQueue now, it will be messy in the API later to add Heap -- as both behave basically the same. It would be sort of redundancy IMO. This would be a design smell for me. I wouldn't add the priority queue. It doesn't help.

Can you explain more details about why it will be messy later? What are your concerns?
PriorityQueue is concept people use. Having a type named that way is useful, right?
I think the logical operations (at least their names) on Heap might be different. If they are the same, we can have 2 different implementations of the same code in the worst case (not ideal, but not the end of the world). Or we can insert Heap class as parent of PriorityQueue, right? (assuming it is allowed from API review perspective - right now I don't see reason for not, but I don't have that many years of experience with API reviews, so will wait on others to confirm)

Let's see how the voting & further design discussion go ... I am slowly warming up to the idea of IHeap + ArrayHeap, but not fully convinced yet ...

if we were adding a handful new ones ... we should not include the non-generic ones

Red rag to a bull. Anyone got some other collections to add so we can drop ICollection?

Circular/Ring buffer; Generic and Concurrent?

@karelz A solution for the naming issue could be something like IPriorityQueue like DataFlow does for produce/consumer patterns. Many ways to implement a priority queue and if you don't care about that use the interface. Care about the implementation or are creating an instance use implementing class.

Can you explain more details about why it will be messy later? What are your concerns?
PriorityQueue is concept people use. Having a type named that way is useful, right? […] I am slowly warming up to the idea of IHeap + ArrayHeap, but not fully convinced yet ...

@karelz From my experience, I find it really important to have an abstraction (IPriorityQueue or IHeap). Thanks to such an approach a developer can write code that is decoupled. Because it is written against an interface (rather than a specific implementation), there is more flexibility and IoC spirit. It is very easy to write unit tests for such a code (having Dependency Injection one can inject their own mocked IPriorityQueue or IHeap and see what methods are called at what time and with what arguments). Abstractions are good.

It is true that the term "priority queue" is used commonly. The problem is that there is just one way to implement a priority queue effectively — with a heap. Lots of types of heaps. So we could have:

class ArrayHeap : IPriorityQueue {}
class PairingHeap : IPriorityQueue {}
class FibonacciHeap : IPriorityQueue {}
class BinomialHeap : IPriorityQueue {}

or

class ArrayHeap : IHeap {}
class PairingHeap : IHeap {}
class FibonacciHeap : IHeap {}
class BinomialHeap : IHeap {}

For me, the second approach looks better. How do you feel about it?

Regarding the PriorityQueue class — I think it would be too vague and am totally against such a class. A vague interface is good, but not an implementation. If it's a binary heap, just call it BinaryHeap. If it's something else, name it accordingly.

I am all for naming classes clearly because of the problems with classes like SortedDictionary. There is a lot of confusion among developers for what it stands for and how it differs from SortedList. If it was just called BinarySearchTree, life would be simpler and we could go home and see our children earlier.

Let's name a heap a heap.

@benaadams each of them has to make it into CoreFX (i.e. it has to be valuable for CoreFX code itself, or it has to be as basic and widely used as the ones we already have) -- we discussed those things lately quite a lot. Still not 100% consensus, but no one is eager to just add more collections into CoreFX.

The most likely outcome (biased claim, because I like the solution) is that we will create another repo and prime it with PowerCollections and then let community extend it with some basic guidance/oversight from our team.
BTW: @terrajobst thinks it is not worth it ("we have better and more impactful things in the ecosystem to do") and we should encourage community to fully drive it (incl. start with existing PowerCollections) and not make it one of our repos - some though discussions & decisions in front of us.
// I guess there's opportunity for community to jump on this solution before we make our mind ;-). That would make the discussion (and my preference) mute ;-)

@pgolebiowski you are slowly convincing me that having Heap is better than PriorityQueue -- we would just need strong guidance & docs "This is how you do PriorityQueue - use Heap" ... that could work.

However, I am very hesitant to include more than 1 heap implementation into CoreFX. 98%+ of 'normal' C# developers out there do not care. They don't even want to think which one is best, they just need something that gets the job done. Not every piece of every SW is designed with high-performance in mind, rightfully so. Think about all the one off tools, UI apps, etc. Unless you are designing high-performance scaling-out system where this d.s. is on the critical path, you should NOT care.

The same way, I don't care how SortedDictionary or ArrayList or other d.s. are implemented -- they do their job decently. I (as many others) understand that if I need high-perf out of these d.s for my scenarios, I need to measure performance and/or check implementation and have to decide if it is good enough for my scenarios, or if I need to roll out my own special implementation to get the best perf, tuned for my needs.

We should optimize usability for 98% use cases, not for the 2%. If we roll out too many options (implementations) and force everyone to decide, we just caused unnecessary confusion for 98% use cases. I don't think it's worth it ...
IMO .NET ecosystem has great value in offering single-ish choice of many APIs (not just collections) with very decent performance characteristics, useful to most use cases. And offering ecosystem which allows high-performance extensions for those who need it and are willing to dig deeper and learn more / make educated choices and tradeoffs.

That said, having an interface IHeap (like IDictionary and IReadOnlyDictionary), may make sense - I have to think about it a bit more / ask the API review experts in the space ...

We already (to some degree) have what @pgolebiowski is talking about with ISet<T> and HashSet<T>. I say just mirror it. So the API above is changed to an interface (IPriorityQueue<T>) and then we have an implementation (HeapPriorityQueue<T>) which internally uses a heap that may or may not be publicly exposed as it's own class.

Should it (PriorityQueue<T>) also implement IList<T>?

@karelz my issue with ICollection is SyncRoot and IsSynchronized; either they are implemented which means there is an extra allocation for the lock object; or they throw, when its a bit pointless having them.

@benaadams This would be misleading. Since 99.99% of implementations of priority queues are heaps based on arrays (and as I can see we are going with one here as well), would it mean exposing access to the internal structure of the array?

Let's say we have a heap with elements 4, 8, 10, 13, 30, 45. Considering the order, they would be accessed by indexes 0, 1, 2, 3, 4, 5. However, the internal structure of the heap is [4, 8, 30, 10, 13, 45] (in binary, in quaternary it would be different).

  • Returning internal number at index i doesn't really make sense from the user's point of view, as it is almost arbitrary.
  • Returning a number in order (by priority) is too costly -- this is linear.
  • Returning either of these doesn't really make sense in other implementations. Often it is just popping i elements, getting the i-th element, and then pushing them once again.

IList<T> is normally the cheeky workaround for: I want be flexible with collections my api accepts, and I want to Enumerate them but don't want to allocate via IEnumerable<T>

Just realized there is no generic interface for

ICollection<T>
{
    int Count
    CopyTo(T[], int index)
}

So never mind about that 😢 (Though its sort of IReadOnlyCollection)

But Reset on Enumerator should be explicitly implemented because its bad and should just throw.

So my suggested changes

public bool TryDequeue(out T item);     // Add
public bool TryPeek(out T item);        // Add

public struct Enumerator : IEnumerator<T>
{
    public T Current { get; }
    object IEnumerator.Current { get; }
    public bool MoveNext();
    void IEnumerator.Reset();             // Explicit
    public void Dispose();
}

Since you're discussing the methods... We haven't yet agreed upon the IHeap vs IPriorityQueue thingy -- it impacts the names of the methods and the logic a bit. However, either way, I find the current API proposal missing the following:

  • Ability to remove a certain element from the queue
  • Ability to update the priority of an element
  • Ability to merge these structures

These operations are pretty crucial, especially the possibility of updating an element. Without this, many algorithms simply cannot be implemented. We need to introduce a handle type. In the API here, the handle is IHeapNode. Which is another argument for going the IHeap way because otherwise we would have to introduce PriorityQueueHandle type that would always just be a heap node... 😜 Plus it's just vague what it means... Whereas, a heap node -- everyone knows the stuff and can imagine the thingy they're dealing with.

Actually, shortly speaking, for the API proposal, please have a look at this directory. We would probably need only a subset of it. But nevertheless -- it just contains what we need IMO, so it might be worth having a look at it as a starting point.

What are your thoughts, guys?

IHeapNode isn't much different than the clr type KeyValuePair?

However that's then separating the priority and type so now its a PriorityQueue<TKey, TValue> with a IComparer<TKey> comparer?

KeyValuePair is not only a struct but its properties are read-only. It would basically equal creating a new object every time the structure is updated.

Using a key only doesn't work with equal keys -- more information is needed to know which element to update / remove.

From IHeapNode and ArrayHeapNode:

    /// <summary>
    /// Represents a heap node. It is a wrapper around a key and a value.
    /// </summary>
    public interface IHeapNode<out TKey, out TValue>
    {
        /// <summary>
        /// Gets the key for the value.
        /// </summary>
        TKey Key { get; }

        /// <summary>
        /// Gets the value contained in the node.
        /// </summary>
        TValue Value { get; }
    }

    /// <summary>
    /// Represents a node of an <see cref="ArrayHeap{TKey,TValue}"/>.
    /// </summary>
    public class ArrayHeapNode<TKey, TValue> : IHeapNode<TKey, TValue>
    {
        /// <inheritdoc cref="IHeapNode{TKey,TValue}.Key"/>
        public TKey Key { get; internal set; }

        /// <inheritdoc cref="IHeapNode{TKey,TValue}.Value"/>
        public TValue Value { get; }

        /// <summary>
        /// The index of the node within an <see cref="ArrayHeap{TKey,TValue}"/>.
        /// </summary>
        public int Index { get; internal set; }

        /// <summary>
        /// Initializes a new instance of the <see cref="ArrayHeapNode{TKey,TValue}"/> class,
        /// containing the specified value.
        /// </summary>
        public ArrayHeapNode(TKey key, TValue value, int index)
        {
            this.Key = key;
            this.Value = value;
            this.Index = index;
        }
    }

And the update method within an ArrayHeap:

        /// <inheritdoc cref="IHeap{TKey,TValue}.Update"/>
        public override void Update(ArrayHeapNode<TKey, TValue> node, TKey key)
        {
            if (node == null)
                throw new ArgumentNullException(nameof(node));

            var relation = this.Comparer.Compare(key, node.Key);
            node.Key = key;

            if (relation < 0)
                this.MoveUp(node);
            else
                this.MoveDown(node);
        }

But now every element in the Heap is an additional object; what if I want a PriorityQueue behavior but I don't want these extra allocations?

Then you would not be able to update / remove elements. It would be a bare implementation not allowing you to implement any algorithm dependent on DecreaseKey operation (which is hugely common). For example Dijkstra's algorithm, Prim's algorithm. Or if you're writing some sort of a scheduler and some process (or whatever) has changed its priority, you cannot address that.

Also, in every other heap implementation -- elements are just nodes, explicitly. It's perfectly natural in other cases, here it's a bit artificial, but necessary for updating / removing.

In Java the priority queue does not have the option to update elements. Result:

When I implemented heaps for the AlgoKit project and came across all these design problems I thought that this is the reason why the authors of .NET decided not to add such a basic data structure as a heap (or a priority queue). Because neither design is good. Each has its shortcomings.

Shortly speaking -- if we want to add a data structure supporting efficient ordering of elements by their priority, we better do it the right way and add a functionality like updating / removing elements from it.

If you feel bad about wrapping elements in an array with another object -- this is not the only problem. Another is merging. With array-based heaps, this is totally inefficient. However... if we use the pairing heap data structure (The pairing heap: A new form of self-adjusting heap), then:

  • handles to the elements are wonderful nodes -- these should be allocated anyway, so no messy stuff
  • merging is constant (vs linear in array-based solution)

Actually, we could solve this the following way:

  • Add IHeap interface that supports all the methods
  • Add ArrayHeap and PairingHeap with all those handles, updating, removing, merging
  • Add PriorityQueue which is just a wrapper around an ArrayHeap, simplifying the API

PriorityQueue would be in System.Collections.Generic and all the heaps in System.Collections.Specialized.

Works?

It's fairly unlikely we'll get three new data structures through API review. In most cases, a smaller amount of API is better. We can always add more later if it ends up being insufficient, but we cannot remove API.

That's one of the reasons I'm not a fan of the HeapNode class. Imo that kind of thing should be internal-only and the API should expose an already existing type if at all possible - in this case KVP probably.

@ianhays If this is kept internal-only, users would not be able to update priorities of elements in the data structure. This would be pretty much useless and we would end up with all the Java problems -- people re-implementing the data structure that is already in the native library... Sounds bad to me. Far worse than having a simple class representing a node.

BTW: a linked list has a node class so that users can use the proper functionality. This is pretty much mirroring.

users would not be able to update priorities of elements in the data structure.

That's not necessarily true. The priority could be exposed in a way that doesn't require an additional data structure such that instead of

        public override void Update(ArrayHeapNode<TKey, TValue> node, TKey key)
        {

        }

you would have e.g.

        public override void Update(TKey item, TValue priority)
        {

        }

I'm not convinced yet on exposing the priority as a type parameter either. Imo the majority of people are going to be using the default priority type and the TValue will be an unnecessary specificity.

void Update(TKey item, TValue priority)

First -- what is TKey and TValue in your code? How is that supposed to work? The convention in computer science is that:

  • key = priority
  • value = whatever you want to store in an element

Second:

Imo the majority of people are going to be using the default priority type and the TValue will be an unnecessary specificity.

Please define "the default priority type". I can feel that you just want to have PriorityQueue<T>, is that right? If that is the case, consider the fact that a user will likely have to create a new class, a wrapper around his priority and value, plus implement something like IComparable or provide a custom comparator. Pretty poor.

First -- what is TKey and TValue in your code?

The item and the priority of the item. You can switch them to be the priority and the item associated with that priority, but then you have a collection in which the TKeys aren't necessarily unique (i.e. allowing duplicate priorities). I'm not against that, but to me TKey usually implies uniqueness.

The point is that exposing a Node class is not a requirement to exposing an Update method.

Please define "the default priority type". I can feel that you just want to have PriorityQueue, is that right?

Yes. Though reading through this thread again, I'm not completely convinced that's the case.

If that is the case, consider the fact that a user will likely have to create a new class, a wrapper around his priority and value, plus implement something like IComparable or provide a custom comparator. Pretty poor.

You're not wrong. I imagine a fair amount of users will have to create a wrapper type with custom comparator logic. I imagine there are also a fair amount of users that already have a comparable type they want to put in the priorityqueue or a type with a defined comparator. The question is which camp is the larger of the two.

I think the type should be called PriorityQueue<T>, not Heap<T>, ArrayHeap<T> or even QuaternaryHeap<T> to stay consistent with the rest of .Net:

  • We have List<T>, not ArrayList<T>.
  • We have Dictionary<K, V>, not HashTable<K, V>.
  • We have ImmutableList<T>, not ImmutableAVLTreeList<T>.
  • We have Array.Sort(), not Array.IntroSort().

Users of such types usually don't care how they are implemented, it certainly shouldn't be the most prominent thing about the type.

If that is the case, consider the fact that a user will likely have to create a new class, a wrapper around his priority and value, plus implement something like IComparable or provide a custom comparator.

You're not wrong. I imagine a fair amount of users will have to create a wrapper type with custom comparator logic.

In the summary api the compare is provided by the provided comparer IComparer<T> comparer, no wrapper is required. Often the priority will be part of the type e.g. a time scheduler will have the execution time as a property of the type.

Using a KeyValuePair with a custom IComparer adds no extra allocations; though it adds one extra indirection for an update as values need to be compared rather than the item.

Updates would be problematic for struct items unless the item was obtained via a ref return and then updated and passed back into an update method by ref and refs were compared. But that's a bit horrible.

@svick

I think the type should be called PriorityQueue<T>, not Heap<T>, ArrayHeap<T> or even QuaternaryHeap<T> to stay consistent with the rest of .Net.

I'm losing my faith in humanity.

  • Calling a data structure PriorityQueue is being consistent with the rest of .NET and calling it Heap is not? Something wrong with heaps? Same should apply to stacks and queues then.
  • ArrayHeap -> base class for QuaternaryHeap. Huge difference.
  • The discussion is not a matter of picking up a cool name. There is a whole lot of stuff coming as a consequence of following a certain design path. Re-read the thread please.
  • Hashtable, ArrayList. They seem to exist. BTW, a "list" is a pretty bad choice of naming IMO, as the first thought of a user is that List is a list, but it's not a list. 😜
  • It's not about having fun and saying how a thing is implemented. It is about giving meaningful names that show users immediately what they are dealing with.

Wanna stay consistent with the rest of .NET and run into problems like this one?

And what do I see there... Jon Skeet saying that SortedDictionary should be called SortedTree as that reflects the implementation more closely.

@pgolebiowski

Something wrong with heaps?

Yes, it describes the implementation, not usage.

Same should apply to stacks and queues then.

Neither really describes the implementation, for example the name does not tell you they are array-based.

ArrayHeap -> base class for QuaternaryHeap. Huge difference.

Yes, I understand your design. What I'm saying is none of this should be exposed to the user.

There is a whole lot of stuff coming as a consequence of following a certain design path. Re-read the thread please.

I did read the thread. I don't think that design belongs to the BCL. I think it should contain something that is simple to use and understand, not something that results in people wondering what "quaternary heap" is or whether they should use it.

If the default implementation is not good enough for them, that's where other libraries (like your own) come in.

Hashtable, ArrayList. They seem to exist.

Yes, those are .Net Framework 1.0 classes that nobody uses anymore. As far as I can tell, their names were copied from Java and .Net Framework 2.0 designers decided not to follow that convention. In my opinion, that was the right decision.

BTW, a "list" is a pretty bad choice of naming IMO, as the first thought of a user is that List is a list, but it's not a list.

It is. It's not a linked list, but that's not the same thing. And I like not having to write ArrayList or ResizeArray (F#'s name for List<T>) everywhere.

It is about giving meaningful names that show users immediately what they are dealing with.

Most people will have no idea what they are dealing with if they see QuaternaryHeap. On the other hand, if they see PriorityQueue, it should be clear what they are dealing with, even if they don't have any CS background. They will not know what the implementation is, but that's what documentation is for.

@ianhays

First -- what is TKey and TValue in your code?

The item and the priority of the item. You can switch them to be the priority and the item associated with that priority, but then you have a collection in which the TKeys aren't necessarily unique (i.e. allowing duplicate priorities). I'm not against that, but to me TKey usually implies uniqueness.

Keys -- thingies used to do the prioritization. Keys don't have to be unique. We cannot make such an assumption.

The point is that exposing a Node class is not a requirement to exposing an Update method.

I believe it is: Decrease-key/Increase-key support in native libraries. Also on this topic, so there are helper classes involved for dealing with data structures:

I imagine a fair amount of users will have to create a wrapper type with custom comparator logic. I imagine there are also a fair amount of users that already have a comparable type they want to put in the priorityqueue or a type with a defined comparator. The question is which camp is the larger of the two.

I don't know about the others, but I find creating a wrapper with custom comparator logic every single time I want to use a priority queue to be pretty horrible...

Another thought -- let's say I created a wrapper:

class Wrapper
{
    public int Priority { get; set; }
    public string SomeStuff ...
}

I feed PriorityQueue<T> with this type. So it prioritizes its elements based on it. Let's say it has defined some key selector. Such a design can crash the internal work of a heap. You can change priorities any time as you are the owner of the element. There is no notification mechanism for the heap to update. You would be responsible for handling all of that and calling it. Pretty dangerous and not straight-forward to me.

In case of the handle I proposed -- the type was immutable from the user's point of view. And there were no issues with uniqueness.

IMO I like the idea of having an IHeapintereface and then the class API implementing that interface. I'm with @ianhays' on this that we are not going to expose 3 new apis to provide different kind of heaps to the customers as there would need to be a API Review process and we would like to focus on PriorityQueueno matter what the name is.

Second I think there is no need of having a Node internal/public class to store the values in the queue. No need of extra allocations, we could follow something sort of what IDictionary does and when yielding the values for the Enumerable creating KeyValuePair objects if we go for the option of having a priority for each item we store (which I don't think it is the best way to go, I'm not fully convinced of storing a priority per item). The best aproach I would like would be to have PriorityQueue<T> (this name is just to give it one). With this approach we could have a constructor following what the whole BCL does with an IComparer<T> and doing the comparison with that comprarer to give the priority. No need of exposing APIs that pass a priority as a parameter.

I think having some of the APIs receive a priority would make it "less usable" or more complicated for ordinal customers that would like to be the default priority, then we are making it more complicated to understand the usare of it, when giving them the option of having a custom IComparer<T> will be the most reasonable and will be following the guidelines we have across BCL.

The names are what they do, not how they do it.

They are named after the abstract concepts and what they achieve for the user, rather than their implementation and how they achieve it. (Which also means their implementation can be improved to use a different implementation if that proves better)

Same should apply to stacks and queues then.

Stack is a unbounded resizing array and Queue is implemented as an unbounded resizing circular buffer. They are named after their abstract concepts. Stack (abstract data type): Last-In-First-Out (LIFO), Queue (abstract data type): First-In-First-Out (FIFO)

Hashtable, ArrayList. They seem to exist.

DictionaryEntry

Yeah but lets pretend they don't; they are artefacts from a less civilised time. They have no type safety and box if you use primitives or structs; so allocate on every Add.

You can use @terrajobst's Platform Compatibility Analyzer and it will tell you: "Please don't"

List is a list, but it's not a list

It really is a List (abstract data type) also known as Sequence.

Equally Priority Queue is an abstract type that tells you what it achieves in use; not how it does it in implementation (as there can be many different implementations)

Lots of great discussion!

The original specification was designed with a few core principles in mind:

  • General purpose - Cover most use cases with a balance between CPU and Memory consumption.
  • Align, as much as possible, with the existing patterns and conventions in System.Collections.Generic.
  • Allow multiple entries of the same element.
  • Utilize the existing BCL comparison interfaces and classes (i.e. Comparer).*
  • High level of abstraction - The specification is designed to shield consumers from the underlying implementation technology.

@karelz @pgolebiowski
Renaming to "Heap" or another term aligned to a data structure implementation wouldn't match most BCL conventions for collections. Historically, .NET collection classes have been designed to be general purpose rather than focused on the specific data structure/pattern. My original thinking was that early in the .NET ecosystem the API designers intentionally moved from "ArrayList" to "List". The change was likely due to a confusion with an Array - your average developer would have thought "ArrayList? I just want a list, not an array".

If we go with Heap, the same might occur - many intermediate-skilled developers will (sadly) see "Heap" and confuse it for the application's memory heap (i.e. heap and stack) instead of "heap the generalized data structure". The prevalence of System.Collections.Generic will cause it to show up in just about every .NET developer's intelligent proposal, and they'll wonder why they can allocate a new memory heap :)

PriorityQueue, by comparison, is far more discoverable and less prone to confusion. You can type "Queue" and get proposals for PriorityQueue.

A few proposals and questions asked about integer priorities or a generic parameter for priority (TKey, TPriority, etc). Adding an explicit priority would require consumers to write their own logic to map their priorities and increase the complexity of the API. Using the built in IComparer leverages existing BCL functionality, and I've also considered adding overloads to Comparer into the spec to make it easier to provide ad-hoc lambda expressions/anonymous functions as priority comparisons. That's not a common convention in the BCL, sadly.

If entries were required to be unique, Enqueue() would require a uniqueness lookup to throw an ArgumentException. In addition, there are likely valid scenarios to allow an item to be queued more than once. This non-uniqueness design makes providing an Update() operation challenging as there would be no way to tell which object is being updated. As a few comments indicated, this would start getting into APIs returning "node" references which in turn would (likely) require allocations that would need to be garbage collected. Even if that was worked around, it would increase the per-element memory consumption of the priority queue.

At one point I'd had a custom IPriorityQueue interface in the API before I posted the spec. Ultimately I decided against it - the usage pattern I was aiming for was Enqueue, Dequeue, and Iterate. Already covered by the existing interface set. Thinking of this as a Queue that's sorted internally; so long as items hold their own (initial) position in the queue based on their IComparer, the caller never needs to be concerned with how priority is represented. In the old reference implementation I did, (if I remember right!) priority isn't represented at all. It's all relative based on IComparer or Comparison.

I owe you all some customer code examples - my original plan was to go through the existing BCL implementations of PriorityQueue to use as a basis for the examples.

In the summary api the compare is provided by the provided comparer IComparer comparer, no wrapper is required. Often the priority will be part of the type e.g. a time scheduler will have the execution time as a property of the type.

Under the OP API proposed, one of these needs to be met to use the class:

  • Have a type which is already comparable the way you want it to be
  • Wrap a type with another class that contains the priority value and compares the way you want it to
  • Create a custom IComparer for your type and pass it via the constructor. Assumes that the value you want to represent priority is already publicly exposed by your type.

The dual-type API aims to lessens the burden of the second two options, yes? How does construction work of a new PriorityQueue/Heap when you have a type that already contains the priority? What does the API look like?

I believe it is: Decrease-key/Increase-key support in native libraries.

public override void Update(ArrayHeapNode<TKey, TValue> node, TKey key) {}
public override void Update(TKey key, TValue value);

In either case, the item associated with the given priority is updated. Why is the ArrayHeapNode required for updating? What does it accomplish that cannot be accomplished by taking the TKey/TValue directly?

@ianhays

In either case, the item associated with the given priority is updated. Why is the ArrayHeapNode required for updating? What does it accomplish that cannot be accomplished by taking the TKey/TValue directly?

At least with binary heap (the one I am most familiar with), if you want to update the priority of some value and you know its position in the heap, you can do it quickly (in O(log n) time).

But if you only know the value and the priority, you would first need to find the value, which is slow (O(n)).

On the other hand, if you don't need to update efficiently, that one allocation per item in the heap is pure overhead.

I would like to see a solution where you pay that overhead only when you need it, but that might not be possible to implement and the resulting API might not look good.

The dual-type API aims to lessens the burden of the second two options, yes? How does construction work of a new PriorityQueue/Heap when you have a type that already contains the priority? What does the API look like?

That's a good point, there is a gap in the original API for scenarios where the consumer already has a priority value (i.e. integer) maintained separately from the value. The flip side is that a style of API would complicate all intrinsically comparable types. Years ago I wrote very lightweight scheduler component that had it's own internal ScheduledTask class. Each ScheduledTask implemented IComparer. Toss them in a priority queue, good to go.

SortedList uses a Key/Value design and I've found myself avoiding using it as a result. It requires keys to be unique, and my usual requirement is "I have N values I need to keep sorted in a list, and ensure occasional values I add remain sorted". A "key" isn't part of that equation, a list isn't a dictionary.

To me, the same principle applies to queues. A queue, historically, is a one dimensional data structure.

My modern C++ std library knowledge is admittedly a bit rusty, but even std::priority_queue appears to have push, pop, and take a comparer as a templated (generic) parameter. The C++ standard library crew is about as performance sensitive as you can get :)

I did a very quick scan of priority queue and heap implementations in a few programming languages just now - C++, Java, Rust, and Go all work with a single type (similar to the original API posted here). A cursory glance of the most popular heap/priority queue implementations in NPM show the same.

@pgolebiowski don't get me wrong there should be specific implementations of things named explicitly after their specific implementation.

However that's for when you know what specific data structure you want that matches the performance goals you are after and has the trade offs you are willing to accept.

Generally the framework collections cover the 90% of uses where you want a general behaviour. Then if you want a very specific behaviour or implementation you'd probably go for a 3rd party library; and they'd hopefully be named after the implementation so you know it matches your need.

Just don't want to tie the general behaviour types to a specific implementation; as then its weird if the implementation changes because the type name has to stay the same and they will not match.

There are many concerns that need discussion, but let's start with the one that currently has the most impact on the parts we disagree about: updating and removing elements.

How do you want to support these operations then? We have to include them, they are basic. In Java, the designers have omitted them, and as a result:

  1. There are plenty of questions on forums for how to do workarounds due to missing features.
  2. There are third-party implementations of a heap / priority queue to replace the first-party implementation, because it's pretty much useless.

This is just pathetic. Is there anyone who really wants to pursue such a way? I'd be ashamed to release such a disabled data structure.

@pgolebiowski rest assured that everyone here has the best intentions for the platform. No one wants to ship broken APIs. We want to learn from mistakes of others, so please keep bringing such useful relevant information (like the one abut Java story).

However, I want to point out couple of things:

  • Don't expect changes overnight. This is a design discussion. We need to find consensus. We do not rush APIs. Every opinion should be heard and considered, but there is no guarantee everyone's opinion will be implemented/accepted. If you have input, provide it, back it with data and evidence. Also listen to arguments of others, acknowledge them. Provide evidence against other's opinion if you disagree. Sometimes, agree on the fact there is disagreement on some points. Keep in mind that SW, incl. API design is not black-or-white / right-or-wrong thing.
  • Let's keep the discussion civil. Let's not use strong words and statements. Let's disagree with grace and let's keep discussion technical. Everyone can refer also to contributor code of conduct. We recognize and encourage passion about .NET, but let's make sure we don't offend each other.
  • If you have any concerns/questions regarding speed of the design discussion, reactions, etc., feel free to reach out to me directly (my email is on my GH profile). I can help clarify expectations, assumptions and concerns publicly or offline if needed.

I just asked how do you guys want to design the updating / removing thing if you don't like the proposed approach... This is listening to others and looking for consensus, I believe.

I don't doubt your good intentions! Sometimes it matters how you ask - it affects how people perceive the text on other side. Text is free of emotions, so things can be understood differently when written down. English as second language muddies things even more and we all need to be aware of that. I'll be happy to chat about details offline if you're interested ... let's steer the discussion over here back to technical discussion ...

My two cents on the Heap vs. PriorityQueue debate: both approaches are valid and clearly have benefits and drawbacks.

That said, "PriorityQueue" seems far more consistent with the existing .NET approach. Today the core collections are List, Dictionary, Stack, Queue, HashSet, SortedDictionary, and SortedSet. These are named after the functionality and semantics, not the algorithm. HashSet is the only outlier, but even this can be rationalized as relating to the set equality semantics (compared to SortedSet). We do, after all, now have ImmutableHashSet which is based on a tree under the hood.

It would feel weird for one collection to buck the trend here.

I think PriorityQueue with additional constructor: PriorityQueue (IHeap) can be a solution. Constructors without IHeap parameter can use default Heap.
In that case PrioriryQueue will represent abstract data type (as most of the C# collections) and implement IPriorityQueue interface but can use different heap implementations like @pgolebiowski suggested:

class ArrayHeap : IHeap {}
class PairingHeap : IHeap {}
class FibonacciHeap : IHeap {}
class BinomialHeap : IHeap {}

OK. There are lots of different voices. I went through the discussion again and refined my approach. I also address common concerns. The text below makes use of quotes from the posts above.

Our goal

The ultimate goal of this entire discussion is providing a specific set of functionalities to make the user happy. It is a pretty common case that a user has a bunch of elements, where some of them are with a higher priority than others. Eventually, they want to keep this bunch of elements in some specific order to be able to efficiently perform the following operations:

  • Retrieve the element with the highest priority (and be able to remove it).
  • Add a new element to the collection.
  • Remove an element from the collection.
  • Modify an element in the collection.
  • Merge two collections.

Other standard libraries

The authors of other standard libraries have also tried to support this functionality. In this section, I will refer to how it was solved in Python, Java, C++, Go, Swift, and Rust.

None of them support modifying elements already inserted into the collection. It is extremely surprising, because it is very probable that, during the lifetime of a collection, the priorities of its elements will change. Especially, if such a collection is intended to be used during the entire lifetime of a service. There is also a number of algorithms that utilize this very functionality of updating elements. Such a design is thus simply wrong, because as a result, our customers are lost: StackOverflow. This is one of many, many questions like that over the Internet. The designers have failed.

Another thing worth noting is that every single library provides this partial functionality through implementing a binary heap. Well, it has been shown that it not the most performant implementation in the general case (random input). The best fit for the general case is a quaternary heap (an implicit heap-ordered complete 4-ary tree, stored as an array). It is significantly less known and this is probably the reason why the designers went with a binary heap instead. But still — another poor choice, although of a lesser severity.

What do we learn from that?

  • If we want our customers to be happy and keep them from implementing this functionality themselves while ignoring our wonderful data structure, we should provide a support for modifying elements already inserted into the collection.
  • We should not assume that because some functionality was delivered in some way in some standard library, we should do the same, because now it is widely known and users are used to that. Note the last part of the sentence.

Proposed approach

I strongly feel that we should provide:

  • IHeap<T> interface
  • Heap<T> class

The IHeap interface would obviously include methods for achieving all the operations described in the beginning of this post. The Heap class, implemented with a quaternary heap, would be the go-to solution in 98% of the cases. The ordering of elements would be based on IComparer<T> passed into the constructor or the default ordering if a type is already comparable.

Justification

  • Developers can write their logic against an interface. I assume everyone knows how important that is and will not go into details. Read: dependency inversion principle, dependency injection, design by contract, inversion of control.
  • Developers can extend this functionality to meet their custom needs by providing other heap implementations. Such implementations could be included in third-party libraries like PowerCollections. By just referencing such a library, you could simply inject your custom heap into any logic that takes an IHeap as the input. Some examples of other heaps that behave better than the quaternary heap in some specific conditions are: pairing heap, binomial heap, and our beloved binary heap.
  • If a developer just needs a tool that gets the job done without the need of them thinking about which type is best, they can simply use the general-purpose Heap implementation. This is optimizing towards 98% of use cases.
  • We add to the great value of the .NET ecosystem in regards to offering single-ish choice with very decent performance characteristics, useful to most use cases, at the same time allowing high-performance extensions for those who need it and are willing to dig deeper and learn more / make educated choices and tradeoffs.
  • The proposed approach reflects current conventions:

    • ISet and HashSet

    • IList and List

    • IDictionary and Dictionary

  • Some people have stated that we should name classes based on what their instances do, not how they do it. This is not entirely true. It is a common shortcut to say that we should name classes after their behavior. It indeed applies in numerous cases. However, there are cases where this is not the appropriate approach. The most notable examples are basic building blocks — like primitive types, enum types, or data structures. The principle is simply to choose names that are meaningful (i.e. unambiguous for the user). Take into account the fact that the functionality we are discussing is always provided as a heap — be it Python, Java, C++, Go, Swift, or Rust. Heap is one of the most elementary data structures. Heap is indeed unambiguous and clear. It is also is in harmony with Stack, Queue, List, and Array. The same approach with naming was taken in the most modern standard libraries (Go, Swift, Rust) — they expose a heap explicitly.

@pgolebiowski I'm not seeing how Heap<T>/IHeap<T> is named like Stack<T>, Queue<T>, and/or List<T>? None of those names explain how they are internally implemented (an array of T as it happens).

@SamuelEnglard

Heap doesn't say how it is internally implemented either. I fail to understand why for so many people a heap immediately follows a specific implementation. There are many variants of heaps that share the same API, to begin with:

  • d-ary heaps,
  • 2-3 heaps,
  • leftist heaps,
  • soft heaps,
  • weak heaps,
  • B-heaps,
  • radix heaps,
  • skew heaps,
  • pairing heaps,
  • Fibonacci heaps,
  • binomial heaps,
  • rank-pairing heaps,
  • quake heaps,
  • violation heaps.

Saying that we are dealing with a heap is still being very abstract. Actually, even saying that we are dealing with a quaternary heap is abstract -- it could be implemented either as an implicit data structure based on an array of T (like our Stack<T>, Queue<T>, and List<T>) or explicit (using nodes and pointers).

Shortly speaking, Heap<T> is very similar to Stack<T>, Queue<T>, and List<T>, because it is an elementary data structure, a basic abstract building block, that can be implemented in many ways. Additionally, as it happens, they are all implemented using an array of T underneath. I find this similarity really strong.

Does that make sense?

For the record, I'm indifferent about the naming. People that are used to using the C++ standard library will, perhaps, prefer _priority_queue_. People that have been educated about data structures may prefer _Heap_. If I had to vote, I'd choose _heap_, though it's close to a coin toss for me.

@pgolebiowski I worded my question wrong, that's my bad. Yes, Heap<T> doesn't say how it is implemented internally.

Yes Heap is a valid data structure, but Heap != Priority Queue. They both exposes different API surfaces and are used for different ideas. Heap<T>/IHeap<T> should be data types that are used internally by (just theoretical names) PriorityQueue<T>/IPriorityQueue<T>.

@SamuelEnglard
In terms of how the world of Computer Science is organized, yes. Here are the levels of abstraction:

  • implementation: implicit quaternary heap based on an array
  • abstraction: quaternary heap
  • abstraction: family of heaps
  • abstraction: family of priority queues

And yes, having IHeap and Heap, the implementation of PriorityQueue would basically be:

public class PriorityQueue<T>
{
    private readonly IHeap<T> heap;

    public PriorityQueue(IHeap<T> heap)
    {
        this.heap = heap;
    }

    public void Add<T>(T item) => this.heap.Add(item);

    public void Remove<T>(T item) => this.heap.Remove(item);

    // etc...
}

Let's run a decision tree here.

A priority queue could also be implemented with a data structure different than some form of a heap (theoretically). That makes the design of a PriorityQueue like the one above pretty ugly, because it is fixed towards the family of heaps only. It is also a very thin wrapper around an IHeap. This yields a question -- _why not simply use the family of heaps instead_?

We are left with one solution -- fixing the priority queue to a specific implementation of a quaternary heap, without a room for IHeap interface. I feel it is passing through too many levels of abstraction and kills all the wonderful benefits of having an interface.

We are back with the design choice from the middle of the discussion -- have PriorityQueue and IPriorityQueue. But then we would be basically having:

class BinaryHeap : IPriorityQueue {}
class PairingHeap : IPriorityQueue {}
class FibonacciHeap : IPriorityQueue {}
class BinomialHeap : IPriorityQueue {}

Feels not only ugly, but also conceptually wrong -- these are not types of priority queues and don't share the same API (as @SamuelEnglard already noted). I think we should stick to heaps, a family huge enough to have an abstraction for them. We would get:

class BinaryHeap : IHeap {}
class PairingHeap : IHeap {}
class FibonacciHeap : IHeap {}
class BinomialHeap : IHeap {}

And, provided by us, class Heap : IHeap {}.


BTW, someone might find the following helpful:

| Google query | Results |
| :--------------------------------------: | :-----: |
| "data structure" "priority queue" | 172,000 |
| "data structure" "heap" | 430,000 |
| "data structure" "queue" -"priority queue" | 496,000 |
| "data structure" "queue" | 530,000 |
| "data structure" "stack" | 577,000 |

@pgolebiowski I feel myself going back and forth here so I'll conceded

@karelz @safern How do you feel about the above? Can we fix ourselves on the IHeap + Heap approach so that I can present a specific API proposal?

I question the need for an interface here (be it IHeap or IPriorityQueue). Yes, there are many different algorithms that can theoretically be used to implement this data structure. However, it seems unlikely that the framework would ship with more than one, and the idea that library writers really need a canonical interface so that various parties can coordinate in writing cross-compatible heap implementations doesn't seem that likely.

Furthermore, once an interface is released it can never be changed due to compat. That means that the interface tends to fall behind the concrete classes in terms of functionality (an issue with IList and IDictionary today). In contrast, if a Heap class were released and there was serious clamoring for an IHeap interface, I think the interface could be added down the line without issues.

@madelson I agree that the framework likely does not need to ship more than a single implementation of a heap. However, by backing that implementation by an interface, those of us that do care about other implementations can easily swap in another implementation (of our own creation or in an external library) and still be compatible with code that accepts the interface.

I don't really see any negative points to coding to an interface. Those who don't care about it can ignore it and go directly for the concrete implementation. Those who do care can use any implementation of their choosing. It's a choice. And it's a choice that I want.

@pgolebiowski now that you asked, here is my personal opinion (I am somewhat sure that other API reviewers/architects will share it as I've checked for opinion with a few):
The name should be PriorityQueue, and we should not introduce IHeap interface. There should be exactly one implementation (likely via some heap).
IHeap interface is very advanced expert scenario - I would suggest to move it into PowerCollections library (eventually to be created) or any other library.
If the library and IHeap interface become really popular, we can change our mind later and add IHeap based on demand (via your constructor overload), but I don't think it is useful/aligned with the rest of BCL enough to justify the complication of adding new interface now. Start simple, grow to complicated only if really needed.
... just my 2 (personal) cents

Given the difference in opinions, I propose following approach to move the proposal forward (which I suggested earlier this week to @ianhays, and therefore indirectly to @safern):

  • Make 2 alternative proposals - one simple as I described above and one with the Heaps as you proposed, let's bring it to API review, discuss it there and make a decision there.
  • If I see at least one person in that group voting for Heaps proposal, I'll be happy to reconsider my view point.

... trying to be fully transparent about my opinion (which you asked for), please don't take it as discouragement or push back - let's see how the API proposal goes.

@karelz

The name should be PriorityQueue

Any argument? It would be at least nice if you addressed what I wrote above instead of just saying no.

I think you named it pretty well previously: _If you have input, provide it, back it with data and evidence. Also listen to arguments of others, acknowledge them. Provide evidence against other's opinion if you disagree._

Plus, it is not just about the name. It is not that shallow. Please read what I wrote. It is about working between levels of abstraction and being able to enhance code / build on top of it.

Oh, there was an argument in this discussion why:

If we go with Heap, many intermediate-skilled developers will (sadly) see "Heap" and confuse it for the application's memory heap (i.e. heap and stack) instead of "heap the generalized data structure". [...] they'll wonder why they can allocate a new memory heap.

😛

we should not introduce IHeap interface. IHeap interface is very advanced expert scenario - I would suggest to move it into PowerCollections library.

@bendono wrote a very nice comment on this one. @safern also wanted to back the implementation with an interface. And a few others.

Another note -- I am not sure how you imagine moving an interface into a third-party library. How could developers possibly write their code against that interface and use our functionality? They would be forced to either stick to our non-interface functionality or ignore it entirely, there is no other option, it's mutually exclusive. In other words -- our solution would not be extensible at all, resulting with people using either our disabled solution or some third-party library, instead of relying on the same architectural core in both cases. This is what we have in Java's standard library and third-party solutions.

But again, you nicely commented on this one: No one wants to ship broken APIs. We want to learn from mistakes of others, so please keep bringing such useful relevant information (like the one abut Java story).

Given the difference in opinions, I propose following approach to move the proposal forward. [...] Make 2 alternative proposals [...] let's bring it to API review, discuss it there and make a decision there. If I see at least one person in that group voting for Heaps proposal, I'll be happy to reconsider my view point.

There was a lot of discussion on various parts of the API above. It tackled:

  • PriorityQueue vs Heap
  • Adding an interface
  • Supporting updating / removing elements from the collection

Why cannot those people you have in mind just contribute to this discussion? Why should we instead start it again?

If we go with Heap, many intermediate-skilled developers will (sadly) see "Heap" and confuse it for the application's memory heap (i.e. heap and stack) instead of "heap the generalized data structure".

Yeah, this is me. I'm entirely self-taught in programming, so I have no idea what was being said when "Heap" entered the discussion. Not to mention that even in terms of a collection, "a heap of things", to me would more intuitively imply it being unordered in every fashion.

I can't believe this is really happening...

Any argument? It would be at least nice if you addressed what I wrote above instead of just saying no.

If you read my answer, you can notice that I mentioned key arguments for my position:

  • IHeap interface is very advanced expert scenario
  • I don't think it is useful/aligned with the rest of BCL enough to justify the complication of adding new interface now

Same arguments which has been repeated on the thread several times IMO. I just summarized them

. Please read what I wrote.

I was actively monitoring this thread all the time. I read all the arguments and points. This is my summarized opinion, despite reading (and understanding) all your and others' points.
You are apparently passionate about the topic. That is great. However, I have a feeling that we got into a position when we're just repeating similar arguments from each side, which is not going to lead much further - that's why I recommended 2 proposals and getting more feedback on them from larger group of experience BCL API reviewers (BTW: I don't count myself as experienced API reviewer yet).

How could developers possibly write their code against that interface and use our functionality?

The developers who care about the IHeap advanced scenario would reference the interface and the implementation from the 3rd party library. As I said, if it proves to be popular, we could consider moving it into CoreFX later.
The good news is that adding IHeap later is totally doable - it basically adds just one constructor overload on PriorityQueue.
Yes, it is not ideal from your point of view, but it does not prevent the innovation you consider important in future. I think it is reasonable middle-ground.

Why cannot those people you have in mind just contribute to this discussion?

API review is a meeting with active discussion, brainstorming, weighting all angles. In many cases it is more productive/efficient than back-and-forth on GitHub issues. See dotnet/corefx#14354 and its predecessor dotnet/corefx#8034 - too long discussion, several different opinions with gazillion tweaks which are hard to track, no conclusion, while great discussion, also non-trivial waste time for quite a few people, until we sat down and talked about it and got to consensus.
Having API reviewers monitor every single API issue, or even just the chatty ones does not scale well.

Why should we instead start it again?

We won't start again. Why would you think that?
We will finish the API review on the first level (the area owner level) by sending 2 most popular proposals with pros/cons into next API review level.
It's hierarchical approvals/reviews approach. It is similar to business reviews - VPs/CEOs with decision powers don't oversee every discussion about every project in their company, they ask their teams/reports to bring proposals for the most impactful or contagious decisions to have a further discussion about them. Teams/reports have to summarize the problem and present the pros/cons of alternative solutions.

If you believe we are not ready to present the final 2 proposals with pros and cons, because there are things which haven't been said on this thread yet, let's keep discussing, until we have just a few top candidates to review at the next API review level.
I got the feeling that everything that needed to be said has been said.
Makes sense?

The name should be PriorityQueue

Any argument?

If you read my answer, you can notice that I mentioned key arguments for my position.

Geez... I was referring to what I quoted (obviously) -- you deciding to go with a priority queue instead of a heap. And yes, I have read your answer -- it contains exactly 0% of arguments for this.

I was actively monitoring this thread all the time. I read all the arguments and points. This is my summarized opinion, despite reading (and understanding) all your and others' points.

You like to be hyperbolic, I have noticed that before. You merely acknowledge the existence of points.

How could developers possibly write their code against that interface and use our functionality?

The developers who care about the IHeap advanced scenario would reference the interface and the implementation from the 3rd party library. As I said, if it proves to be popular, we could consider moving it into CoreFX later.

Are you aware that you are just repeating my words here and not at all addressing the issue I have presented as a consequence of the above? I wrote:

They would be forced to either stick to our non-interface functionality or ignore it entirely, there is no other option, it's mutually exclusive. In other words -- our solution would not be extensible at all, resulting with people using either our disabled solution or some third-party library, instead of relying on the same architectural core in both cases.

I will make this very clear for you. There would be two disjoint groups of people:

  1. One group that is using our functionality. It doesn't have an interface and cannot be extended, thus it is not connected to what is provided in third-party libraries.
  2. Second group that is completely ignoring our functionality and using purely third-party solutions.

The problem here is, as I said, that those are disjoint groups of people. And they are producing code that does not work together, because there is no common architectural core. _CODEBASE IS NOT COMPATIBLE_. You cannot undo it later.

The good news is that adding IHeap later is totally doable - it basically adds just one constructor overload on PriorityQueue.

I have already written why it sucks: see this post.

Why cannot those people you have in mind just contribute to this discussion?

Having API reviewers monitor every single API issue, or even just the chatty ones does not scale well.

Yes, I was asking why API reviewers cannot monitor every single API issue. Precisely, you indeed responded accordingly.

Makes sense?

No. I am tired of this discussion, really. Some of you guys are clearly participating in it simply because it is your job and you are told to. Some of you need guidance all the time, which is very tiresome. You even asked me to prove why a priority queue should be implemented with a heap internally, clearly lacking computer science background. Some of you don't even understand what a heap really is, making the discussion even more chaotic.

Go with your disabled PriorityQueue that does not allow updating and removing elements. Go with your design that does not allow healthy OO approach. Go with your solution that does not allow reusing the standard library when writing an extension. Go the Java way.

And this... this is just mind-blowing:

If we go with Heap, many intermediate-skilled developers will (sadly) see "Heap" and confuse it for the application's memory heap (i.e. heap and stack) instead of "heap the generalized data structure".

Present the API with your approach. I'm curious.

I can't believe this is really happening...

Well, excuuuuuuuuuuuuuuuse me for not having had the opportunity to get a proper Computer Science education to learn that Heap is some kind of data structure different from the memory heap.

The point still stands, though. A heap of something implies nothing about it being ordered in some kind. If I were to need a collection that allows me to store objects of things to process, where some instances that come in later may need to be processed sooner, I would not be searching for something called a Heap. PriorityQueue on the other hand, perfectly communicates that it does exactly that.

As a backing implementation? Sure, implementation details shouldn't be my concern.
Some IHeap abstraction? Great for API authors and people who do have a CS Major to know what it's used for, no reason not to have.
Giving something a cryptic name that doesn't state its intent all that well and subsequently limits discoverability? 👎

Well, excuuuuuuuuuuuuuuuse me for not having had the opportunity to get a proper Computer Science education to learn that Heap is some kind of data structure different from the memory heap.

This is ridiculous. At the same time you want to participate in a discussion regarding adding a functionality like that. It sounds like trolling.

A heap of something implies nothing about it being ordered in some kind.

You are wrong. It is ordered, as a heap. Like in the pictures you linked.

As a backing implementation? Sure, implementation details shouldn't be my concern.

I have already addressed that. The family of heaps is huge and there are two levels of abstraction above an implementation. Priority queue is the third abstraction layer.

If I were to need a collection that allows me to store objects of things to process, where some instances that come in later may need to be processed sooner, I would not be searching for something called a Heap. PriorityQueue on the other hand, perfectly communicates that it does exactly that.

And without any background, you would ask Google to provide you articles on priority queues? Well, we can argue what is more or less likely in our opinions. But, as has been said very nicely:

If you have input, provide it, back it with data and evidence. Also listen to arguments of others, acknowledge them. Provide evidence against other's opinion if you disagree.

And according to the data you are wrong:

Query | Hits
:----: |:----:|
| "data structure" "priority queue" | 172,000 |
| "data structure" "heap" | 430,000 |

It is almost 3 times more likely that you would come across a heap while reading about data structures. Plus, it is a name that Swift, Go, Rust, and Python developers are familiar with, because their standard libraries provide such a data structure.

Query | Hits
:----: |:----:|
| "golang" "priority queue" | 3.390 |
| "rust" "priority queue" | 8.630 |
| "swift" "priority queue" | 18.600 |
| "python" "priority queue" | 72.800 |
| "golang" "heap" | 79.000 |
| "rust" "heap" | 492.000 |
| "swift" "heap" | 551.000 |
| "python" "heap" | 555.000 |

Actually it is also similar for C++, because a heap data structure was introduced there sometime in the previous century.

Giving something a cryptic name that doesn't state its intent all that well and subsequently limits discoverability? 👎

No opinions. Data. See above. Especially no opinions from someone who has no background. You wouldn't Google a priority queue either without having some reading about data structures before. And a heap is covered in many Data Structures 101.

It's the foundation in Computer Science. It is elementary. When you have several semesters of algorithms and data structures, a heap is something you see in the beginning.

But still:

  • First -- look at the numbers above.
  • Second -- think about all the other languages where a heap is a part of the standard library.

EDIT: See Google Trends.

As another self-taught developer, I have no problem with _heap_. As a developer that is always striving to improve, I've taken the time to learn and understand all about data structures. In short, I don't agree with the implication that a naming convention should target those that have not taken the time to understand the lexicon of the field of which they are a part.

And I also strongly disagree with statements like "The name should be PriorityQueue". If you don't want peoples input, then don't make it open-source and don't ask for it.

Let me provide some explanation of how we think about API naming:

  1. We tend to favor consistency within the .NET platform above almost anything else. That's important in order to make APIs look familiar and predictable. Sometimes this means we accept that a name isn't a 100% correct if that's a term we've used before.

  2. Our goal is to design a platform that is approachable to a wide variety of developers, some of which who haven't had a formal education in computer science. We believe that part of the reason .NET has generally been perceived as very productive and easy to use is partially due to that design point.

We generally employ the "search engine test" when it comes checking how well-known and established a name or terminology is. So I highly appreciate the research @pgolebiowski has done. I haven't done the research myself but my gut feel is that "heap" isn't a term many non-domain experts would be looking for.

Hence, I tend to agree with @karelz that PriorityQueue looks like the better choice. It combines an existing concept (queue) and adds the twist that expresses the desired capability: ordered retrieval based on a priority. But we're not unmovably attached to that name. We've often changed names of data structures and technologies based on customer feedback.

However, I'd like to point out that this:

If you don't want peoples input, then don't make it open-source and don't ask for it.

is a false dichotomy. It's not that we don't want feedback from our ecosystem and contributors (we obviously do). But at the same time we also have to appreciate that our customer base is quite diverse and that GitHub contributors (or developers on our team) aren't always the best proxy for all of our customers. Usability is hard and it will likely take some iterations to add new concepts into .NET, especially in highly popular areas like collections.

@pgolebiowski:

I highly value your insight, data, and suggestions. But I absolutely do not appreciate your style of argumentation. You got personal towards both members of my team as well as community members on this thread. Just because you disagree with us doesn't give you permission to accuse us of not having expertise or not caring because it's "just our job". Consider that many of us have literally moved around the globe, left families and friends behind only because we wanted to do this job. Not only are comments like yours very unfair, they also don't help to advance the design.

So while I like to think I have a thick skin, I don't have a lot of tolerance for that sort of behavior. Our domain is already complex enough; we don't need to add confrontational & adversarial communication.

Please be respectful. Passionately criticizing ideas is fair game, but attacking people isn't. Thank you.

Dear everyone who feels downhearted,

I apologize for decreasing your levels of happiness by my harsh-ish attitude.

@karelz

I've made a fool of myself there by making technical mistake. I apologized. It was accepted. Yet thrown at me later. Not cool IMO.

I am sorry that what I wrote made you unhappy. Though it wasn't that bad as you described it -- I merely provided this as one of many factors that have contributed to my feeling of being tired. It's of a lesser severity, I think. But still, I am sorry.

And yes, everyone makes mistakes. It is OK. Me too, for example by getting carried away sometimes.

What got me most was "you are just being told to do this, you don't believe it" - yeah, that's exactly why I do it ALSO on weekends.

I am sorry, I can see that you work hard and I appreciate that very much. It was visible to me how especially dedicated you were by the 5/10 milestone.

@terrajobst

Just because you disagree with us doesn't give you permission to accuse us of not having expertise or not caring because it's "just our job".

  • Not having expertise -- addressed towards people without computer science background and not understanding the concept of heaps / priority queues. If such a description applies to someone -- well, it applies, it's not my fault.
  • Not caring -- addressed towards those with a tendency of ignoring some of the technical points, thus enforcing repeating arguments, as such making the discussion chaotic and harder to follow by other people (which in turn leads to less input).

Comments like yours are very unfair and also don't help to advance the design.

  • My harsh comments were a result of the inefficiency in this discussion. Inefficiency = chaotic way of discussing, where points are not addressed / resolved and we move further in spite of that => tiresome.
  • Also, as one of the key drivers in the discussion, I strongly feel to have done a lot to help advance the design. Please don't demonize me, as you try to do here and in the social media.

If there is anyone ill who would like to "lick me", please feel free.


We have encountered a problem and have dealt with that. Everyone has learned something out of it. I can see that everyone here is concerned about the quality of the framework, which is absolutely wonderful and makes me motivated to contribute. I look forward to continue working on CoreFX with you. That being said, I will address your new technical input probably tomorrow.

@pgolebiowski

Hopefully we can meet in person at some point. I honestly believe that part of the challenge of doing everything online is that personalities can sometimes mix in bad ways with no intention from either side.

I look forward to continue working on CoreFX with you. That being said, I will address your new technical input probably tomorrow.

Same here. This is an interesting space and there are plenty of amazing things we can do together :-)

@pgolebiowski first, thanks for your reply. It shows you care and you mean well (which is something I secretly hope every person/developer in the world does, all conflict is just misunderstanding / miscommunication). And that makes me really happy - it keeps me going and excited.
I would suggest to start over in our relationship. Let's turn the discussion back to technical, let's all learn from this thread, how to handle similar situations in future, let's all assume again that the other party has only best interest for the platform in mind.
BTW: This is one of a few more difficult encounters/discussions on CoreFX repo in last 9 months, and as you see, we (incl./esp. me) are still learning how to handle them well - so this particular instance is going to benefit even us and it will make us be better in future and help us understand better different view points from passionate community members. Maybe it will shape our updates to contribution docs ...

My harsh comments were a result of the inefficiency in this discussion. Inefficiency = chaotic way of discussing, where points are not addressed / resolved and we move further in spite of that => tiresome.

Understood your frustration! Interestingly similar frustration was also on the other side from the same reason 😉 ... it's almost funny how the world works :).
Unfortunately, the difficult discussion is part of the job when you drive a design decision. It is LOTS of work. Many people underestimate it. Key skill required is patience with everyone and ability to step above your own opinion and think how to drive consensus even if it doesn't go your way. That's why I suggested to have 2 proposals and "escalate" the technical discussion to API reviewer group (mainly because I don't know for sure that I am right, although I secretly hope I am right as every other developer in the world would 😉).

It is VERY hard to have opinion on a topic AND drive the discussion to CONSENSUS on the same thread. From that perspective you and me have the most common on this thread - we both have opinions, yet we both try to drive the discussion to closure & decision. So let's work together closely.

My general approach is: Whenever I think someone is attacking me, is being evil, lazy, frustrates me, or something. I ask first myself and also the particular person: Why? Why did you say that? What did you mean?
Usually it is the sign of lack of understanding/communication of motives. Or sign of reading too much between the lines and seeing insults/accusations/bad intents where they are not.


Now that I am not afraid to keep discussing technical questions, here's what I wanted to ask earlier:

Go with your disabled PriorityQueue that does not allow updating and removing elements.

This is something I don't understand. If we leave out IHeap (in my / the original proposal here), why would that not be possible?
IMO there is no difference between the 2 proposals from the class capabilities point of view, the only difference is -- do we add the PriorityQueue(IHeap) constructor overload or not (leaving the class name dispute aside as independent problem to resolve).

Disclaimer (to avoid miscommunication): I don't have time to read articles and do research, I expect short answer, presenting elevator-pitch argument from whoever wants to drive the technical discussion. Note: It is not me trolling. I would ask the same question to anyone on our team making this claim. If you don't have energy to explain it / drive the discussion (which would be totally understandable given the difficulties and time investment from your side), just say so, no hard feelings. Please don't feel pressured by me or anyone (and that applies to everyone on the thread).

Not trying to add one more unnecessary comments here, this thread has been WAY TOO LONG. The no.1 rule of Internet age is that avoid text communication if you care about people relationship. (Well, I coined it). I believe some other open source community would switch to a Google Hangout for this kind of discussion if the need is apparent. When you look at other people's face, you would never say anything "insulting", and people get familiar with each other very fast. Maybe we can try as well?

@karelz Due to the length of the discussion above, it is highly unlikely for anyone new to contribute if the flow is not modified. As such, I would like to propose the following approach right now:

  • I will conduct voting on fundamental aspects, one after another. We will have clear input from the community. Ideally, API reviewers will come here as well and comment shortly.
  • "Voting posts" will contain enough information to be able to ignore the entire wall of text above.
  • After this session of voting is done, we will know what to expect from API reviewers and will be able to proceed with a certain approach. When we agree upon the fundamental aspects, this issue will be closed, and another one opened (which will reference this one). In the new issue, I will summarize our conclusions and provide API proposal that reflects these decisions. And we will take it from there.

Does that have a chance of making sense?

PriorityQueue that does not allow updating and removing elements.

It was regarding the original proposal, which lacked these capabilities :) Sorry for not making it clear.

If you don't have energy to explain it / drive the discussion (which would be totally understandable given the difficulties and time investment from your side), just say so, no hard feelings. Please don't feel pressured by me or anyone (and that applies to everyone on the thread).

I will not give up. No pain no gain xD

@xied75

I believe some other open source community would switch to a Google Hangout for this kind of discussion if the need is apparent. Maybe we can try as well?

Looks good ;)

Providing an interface

No matter if we go with Heap / IHeap or PriorityQueue / IPriorityQueue (or something else), for the functionality we are about to provide...

_would you like to have an interface, along with the implementation?_

For

@bendono

By backing that implementation by an interface, those of us that do care about other implementations can easily swap in another implementation (of our own creation or in an external library) and still be compatible with code that accepts the interface.

Those who don't care about it can ignore it and go directly for the concrete implementation. Those who do care can use any implementation of their choosing.

Against

@madelson

There are many different algorithms that can theoretically be used to implement this data structure. However, it seems unlikely that the framework would ship with more than one, and the idea that library writers really need a canonical interface so that various parties can coordinate in writing cross-compatible heap implementations doesn't seem that likely.

Furthermore, once an interface is released it can never be changed due to compat. That means that the interface tends to fall behind the concrete classes in terms of functionality (an issue with IList and IDictionary today).

@karelz

Interface is very advanced expert scenario.

If the library and IHeap interface become really popular, we can change our mind later and add IHeap based on demand (via constructor overload), but I don't think it is useful/aligned with the rest of BCL enough to justify the complication of adding new interface now. Start simple, grow to complicated only if really needed.

Potential decision impact

  • Including the interface means we cannot change it in the future.
  • Not including the interface means people write code that either uses our standard library solution or is written against a solution delivered by a third-party library (there is no common interface that would enable cross-compatibility).

Use 👍 and 👎 to vote on this one (for and against an interface, respectively). Alternatively, write a comment. Ideally, API reviewers will participate.

I would like to add that while changing interfaces is hard, with extension methods (and properties coming) interfaces are easier to extend and/or work with (see LINQ)

I would like to add that while changing interfaces is hard, with extension methods (and properties coming) interfaces are easier to extend and/or work with (see LINQ)

They can only work with the publicly defined methods on the interface; so it means getting it right first time.

I'd suggest holding off on the interface for a bit till the class is in use and has settled - then introduce an interface. (with the debate over the shape of the interface being a separate issue)

To be blunt, the only thing that I care about is the interface. A solid implementation would be nice, but I (or anyone else) could always create my own.

I remember a few years back how we had this exact same conversation with HashSet<T>. Microsoft wanted HashSet<T> while the community wanted ISet<T>. If I remember correctly, we got HashSet<T> first and ISet<T> second. Without an interface, usage of HashSet<T> was quite limited as it is difficult (if not often impossible) to change a public API.

I should note that there is also SortedSet<T> now, not to mention numerous non-BCL implementations of ISet<T>. I have used ISet<T> in public APIs and am thankful for it. My private implementation can use any concrete implementation that I feel is right. I can also easily swap out an implementation for another without breaking anything. This would not be possible without the interface.

For those who say that we can always define our own interfaces, consider this. Assume for a moment that ISet<T> in the BCL never happened. Now I can create my own interface IMySet<T> as well as solid implementations. However, one day the BCL HashSet<T> is released. It may or may not implement ISet<T>, but it does not implement IMySet<T>. As a result, I cannot swap in HashSet<T> as an implementation of my IMySet<T>.

I fear we are going to repeat this travesty again.
If you aren't ready to commit to an interface, then it is too early to introduce a concrete class.

I find the disparity in opinions significant. Just by looking at the numbers, slightly more people are for an interface, but that doesn't tell us much. I'll try to ask some other people who have participated in the discussion before but have not expressed their opinions in regards to an interface yet:

@ebickle @svick @horaciojcfilho @paulcbetts @justinvp @akoeplinger @mbeidler @SirCmpwn @andrewgmorris @weshaggard @BrannonKing @NKnusperer @danmosemsft @ianhays @safern @VisualMelon @Joe4evr @jcouv @xied75

For the notified: _guys, could you please provide your input? It would be very helpful, even if you just vote with :+1:, :-1:. You can start reading at this issue comment (4 posts above here) -- we are discussing whether providing an interface is a good idea (only this aspect for now, until it is resolved)._

Maybe some of these people are API reviewers, but I believe we need their support in this fundamental decision before we go forward. @karelz, @terrajobst, would it be possible for you to ask them to help us resolve this aspect? Their input would be very valuable, as they are the ones who will review it eventually -- it would be very helpful to know it at this point, before committing with a certain approach (or going with more than 1 proposal, which would be tiresome and a bit pointless, as we can know their decision earlier).

Personally, I am for an interface, but if the decision is different, I am happy to follow a different path.

I don't want to drag API reviewers into the discussion -- it is long and messy, it would not be efficient for API reviewers to re-read everything or even just decide what is the last important reply (I am losing myself in it).
I think we are at the point when we can create 2 formal API proposals (see the 'good' example there) and highlight pros/cons of each. We can then review them in our API review meeting and make recommendation, taking votes into consideration. Depending on the discussion there (if there are multiple opinions), we might come back and kick off Twitter poll / additional GH voting, etc.

BTW: API review meetings happen almost every Tue.

To help kick it off, here's how a proposal should look like:

Proposal Example / Seed

```c#
namespace System.Collections.Generic
{
public class PriorityQueue
: IEnumerable, ICollection, IEnumerable, IReadOnlyCollection
{
public PriorityQueue();
public PriorityQueue(int capacity);
public PriorityQueue(IComparer comparer);
public PriorityQueue(IEnumerable collection);
public PriorityQueue(IEnumerable collection, IComparer comparer);
public PriorityQueue(int capacity, IComparer comparer);

    public IComparer<T> Comparer { get; }
    public int Count { get; }

    public void Enqueue(T item);
    public T Dequeue();
    public T Peek();
    public void Clear();
    public bool Contains(T item);

    // Sets the capacity to the actual number of elements
    public void TrimExcess();

    public void CopyTo(T[] array, int arrayIndex);
    void ICollection.CopyTo(Array array, int index);
    public T[] ToArray();

    public Enumerator GetEnumerator();
    IEnumerator<T> IEnumerable<T>.GetEnumerator();
    IEnumerator IEnumerable.GetEnumerator();

    bool ICollection.IsSynchronized { get; }
    object ICollection.SyncRoot { get; }
    public struct Enumerator : IEnumerator<T>
    {
        public T Current { get; }
        object IEnumerator.Current { get; }
        public bool MoveNext();
        public void Reset();
        public void Dispose();
    }
}

}
```

TODO:

  • Missing usage example (it's unclear to me how I express the priority of items) -- maybe we should redesign it to have 'int' as priority value input? Maybe some abstraction on it? (I think it was discussed above, but the thread is way too long to read, that's why we need concrete proposals and not more discussion)
  • Missing UpdatePriority scenario
  • Anything else discussed above that is missing here?

@karelz OK, will do it, stay tuned! :smile:

Alright. As far as I can tell, an interface or a heap will not pass the API review. As such I would like to propose a slightly different solution, forgetting about the quaternary heap for example. The data structure below is different in a few ways than what we can find in Python, Java, C++, Go, Swift, and Rust (at least those).

The main focus is on correctness, completeness in terms of functionality, and intuitiveness, while maintaining optimal complexities and great real-world performance.

@karelz @terrajobst

Proposal

Rationale

It is a pretty common case that a user has a bunch of elements, where some of them are with a higher priority than others. Eventually, they want to keep this bunch of elements in some specific order to be able to efficiently perform the following operations:

  1. Add a new element to the collection.
  2. Retrieve the element with the highest priority (and be able to remove it).
  3. Remove an element from the collection.
  4. Modify an element in the collection.
  5. Merge two collections.

Usage

Glossary

  • Value — user data.
  • Key — an object that is used for ordering purposes.

Various types of user data

First we will focus an building the priority queue (only adding elements). The way it is done depends on the type of user data.

Scenario 1

  • TKey and TValue are separate objects.
  • TKey is comparable.
var queue = new PriorityQueue<int, string>();

queue.Enqueue(5, "five");
queue.Enqueue(1, "one");
queue.Enqueue(3, "three");

Scenario 2

  • TKey and TValue are separate objects.
  • TKey is not comparable.
var comparer = Comparer<MyKey>.Create(/* custom logic */);

var queue = new PriorityQueue<MyKey, string>(comparer);

queue.Enqueue(new MyKey(5), "five");
queue.Enqueue(new MyKey(1), "one");
queue.Enqueue(new MyKey(3), "three");

Scenario 3

  • TKey is contained within TValue.
  • TKey is not comparable.
public class MyClass
{
    public MyKey Key { get; set; }
}

Apart from a key comparator, we also need a key selector:

var selector = new Func<MyClass, MyKey>(item => item.Key);

var queue = new PriorityQueue<MyKey, MyClass>(selector, comparer);

queue.Enqueue(new MyClass( /* args */ ));
queue.Enqueue(new MyClass( /* args */ ));
queue.Enqueue(new MyClass( /* args */ ));
Notes
  • We use a different Enqueue method here. This time it only takes one argument (TValue).
  • If the key selector is defined, method Enqueue(TKey, TValue) must throw InvalidOperationException.
  • If the key selector is not defined, method Enqueue(TValue) must throw InvalidOperationException.

Scenario 4

  • TKey is contained within TValue.
  • TKey is comparable.
var queue = new PriorityQueue<MyKey, MyClass>(selector);

queue.Enqueue(new MyClass( /* args */ ));
queue.Enqueue(new MyClass( /* args */ ));
queue.Enqueue(new MyClass( /* args */ ));
Notes
  • The comparer for TKey is assumed to be Comparer<TKey>.Default, as in Scenario 1.

Scenario 5

  • TKey and TValue are separate objects, but of the same type.
  • TKey is comparable.
var queue = new PriorityQueue<int, int>();

queue.Enqueue(5, 50);
queue.Enqueue(1, 10);
queue.Enqueue(3, 30);

Scenario 6

  • User data is a single object, which is comparable.
  • There is no physical key or a user doesn't want to use it.
public class MyClass : IComparable<MyClass>
{
    public int CompareTo(MyClass other) => /* custom logic */
}
var queue = new PriorityQueue<MyClass, MyClass>();

queue.Enqueue(new MyClass( /* args */ ));
queue.Enqueue(new MyClass( /* args */ ));
queue.Enqueue(new MyClass( /* args */ ));
Notes

In the beginning, there is an ambiguity.

  • It is possible that MyClass is a separate object and the user would like to simply have the key and value separate, as it was in Scenario 5.
  • However, it may also be a single object (like in this case).

Here is how PriorityQueue deals with the ambiguity:

  • If the key selector is defined, then there is no ambiguity. Only Enqueue(TValue) is allowed. Therefore, an alternative solution to Scenario 6 is to simply define a selector and pass it to the constructor.
  • If the key selector is not defined, ambiguity is resolved with the first usage of an Enqueue method:

    • If Enqueue(TKey, TValue) is called the first time, key and value are considered to be separate objects (Scenario 5). From then on, the Enqueue(TValue) method must throw InvalidOperationException.

    • If Enqueue(TValue) is called the first time, key and value are considered to be the same object, key selector is inferred (Scenario 6). From then on, the Enqueue(TKey, TValue) method must throw InvalidOperationException.

Other functionality

We have already covered the creation of a priority queue. We also know how to add elements to the collection. Now we will focus on the remaining functionality.

Highest priority element

There are two operations we can do on the element with the highest priority — retrieve it or remove it.

var queue = new PriorityQueue<int, string>();

queue.Enqueue(5, "five");
queue.Enqueue(1, "one");
queue.Enqueue(3, "three");

// retrieve the element with the highest priority
var element = queue.Peek();

// remove that element
queue.Dequeue();

What is exactly being returned by the Peek and Dequeue methods will be discussed later.

Modifying an element

For an arbitrary element, we might like to modify it. As an example, if a priority queue is used during the entire lifetime of a service, there is a high chance the developer would like to have a possibility of updating priorities.

Surprisingly, such a functionality is not provided by equivalent data structures: in Python, Java, C++, Go, Swift, and Rust. Possibly some others too, but I have only checked those. The result? Disappointed developers:

We have basically two options here:

  • Do it the Java way and don't provide this functionality. I am strongly against that. It forces the user to remove an element from the collection (which alone doesn't work well, but I will get to that later) and then add it once again. It is very ugly, doesn't work in all cases, and is inefficient.
  • Introduce a new concept of handles.

Handles

Every time a user adds an element to the priority queue, they are given a handle:

var handle = queue.Enqueue(42, "forty two");

Handle is a class with the following public API:

public class PriorityQueueHandle<TKey, TValue>
{
    public TKey Key { get; }
    public TValue Value { get; }
}

It is a reference to a unique element in the priority queue. If you are concerned about the efficiency, please see FAQ (and you will not be concerned anymore).

Such an approach allows us to easily modify a unique element in the priority queue, in a very intuitive and simple way:

/*
 * User wants to retrieve a server that at any given moment
 * has the lowest average response time.
 * He doesn't want to maintain a separate key object (it is
 * inside his type) and the key is already comparable.
 */

var queue = new PriorityQueue<double, ServerStats>(selector);

/* adding some elements */

var handle = queue.Enqueue(server);

/*
 * Server stats are kept along with handles (e.g. in a dictionary).
 * Whenever there is a need of updating the priority of a certain
 * server, the user simply updates the appropriate ServerStats object
 * and then simply uses the handle associated with it:
 */

queue.Update(handle);

In the example above, because there was a specific selector defined, a user could update the object himself. The priority queue simply needed to be notified to rearrange itself (we don't want to make it observable).

Similarly to how the type of user data impacts the way a priority queue is build and filled with elements, the way it is updated also varies. I will make the scenarios shorter this time, as you already probably know what I am going to write.

Scenarios

Scenario 7
  • TKey and TValue are separate objects.
var queue = new PriorityQueue<int, string>();

var handle = queue.Enqueue(1, "three");
queue.Enqueue(1, "three");
queue.Enqueue(2, "three");

queue.Update(handle, 3);

As you can see, such an approach provides a simple way of referring to a unique element in the priority queue, no questions asked. It is especially helpful in scenarios where keys may be duplicated. Also, this is very efficient — we know the element to update in O(1), and the operation can be performed in O(log n).

Alternatively, the user could use additional methods that search through the entire structure in O(n) and then update the first element that matches the arguments. This is how removing an element is done in Java. It is neither fully correct nor efficient, but sometimes simpler:

var queue = new PriorityQueue<int, string>();

queue.Enqueue(1, "one");
queue.Enqueue(2, "three");
queue.Enqueue(3, "three");

queue.Update("three", 30);

The method above finds the first element with its value equal to "three". Its key will be updated to 30. We don't know though which one will be updated if there is more than one that satisfies the condition.

It could be slightly safer with a method Update(TKey oldKey, TValue, TKey newKey). This adds another condition — the old key also has to match. Both solutions are simpler, but not 100% safe and less performant (O(1 + log n) vs O(n + log n)).

Scenario 8
  • TKey is contained within TValue.
var queue = new PriorityQueue<int, MyClass>(selector);

/* adding some elements */

queue.Update(handle);

This scenario is what was given as an example in the Handles section.

The above is achieved in O(log n). Alternatively, the user could use a method Update(TValue) that finds the first element equal to the one specified and does an internal rearranging. Of course, this is O(n).

Scenario 9
  • TKey and TValue are separate objects, but of the same type.

Using a handle, there is no ambiguity, as always. In case of other methods that allow for updating — there is, of course. This is a trade-off between simplicity, performance, and correctness (which depends on whether data can be duplicated or not).

Scenario 10
  • User data is a single object.

With a handle, there is again no issue. Updating via other methods may be simpler — but again, less performant and not always correct (equal entries).

Removing an element

Can be done simply and correctly via a handle. Alternatively, via methods Remove(TValue) and Remove(TKey, TValue). Same issues as previously described.

Merging two collections

var queue1 = new PriorityQueue<int, string>();
var queue2 = new PriorityQueue<int, string>();

/* add some elements to both */

queue1.Merge(queue2);

Notes

  • After merging is done, queues share the same internal representation. The user can use any of the two.
  • The types have to match (checked statically).
  • Comparers have to be equal, otherwise InvalidOperationException.
  • Selectors have to be equal, otherwise InvalidOperationException.

Proposed API

public class PriorityQueue<TKey, TValue>
    : IEnumerable,
    IEnumerable<PriorityQueueHandle<TKey, TValue>>,
    IReadOnlyCollection<PriorityQueueHandle<TKey, TValue>>
    // ICollection not included on purpose
{
    public PriorityQueue();
    public PriorityQueue(IComparer<TKey> comparer);
    public PriorityQueue(Func<TValue, TKey> keySelector);
    public PriorityQueue(Func<TValue, TKey> keySelector, IComparer<TKey> comparer);

    IComparer<TKey> Comparer { get; }
    Func<TValue, TKey> KeySelector { get; }
    public int Count { get; }

    public void Clear();

    public bool Contains(PriorityQueueHandle<TKey, TValue> handle); // O(log n)
    public bool Contains(TValue value); // O(n)
    public bool Contains(TKey key, TValue value); // O(n)

    public PriorityQueueHandle<TKey, TValue> Enqueue(TKey key, TValue value); // O(log n)
    public PriorityQueueHandle<TKey, TValue> Enqueue(TValue value); // O(log n)

    public PriorityQueueHandle<TKey, TValue> Peek(); // O(1)
    public PriorityQueueHandle<TKey, TValue> Dequeue(); // O(log n)

    public void Update(PriorityQueueHandle<TKey, TValue> handle); // O(log n)
    public void Update(PriorityQueueHandle<TKey, TValue> handle, TKey newKey); // O(log n)
    public void Update(TValue value, TKey newKey); // O(n)
    public void Update(TKey oldKey, TValue value, TKey newKey); // O(n)

    public void Remove(PriorityQueueHandle<TKey, TValue> handle); // O(log n)
    public void Remove(TValue value); // O(n)
    public void Remove(TKey key, TValue value); // O(n)

    public void Merge(PriorityQueue<TKey, TValue> other); // O(1)

    public IEnumerator<PriorityQueueHandle<TKey, TValue>> GetEnumerator();
    IEnumerator IEnumerable.GetEnumerator();
}

public class PriorityQueueHandle<TKey, TValue>
{
    public TKey Key { get; }
    public TValue Value { get; }
}

Open questions

Predicates instead

I am strongly for the approach with handles, because it is efficient, intuitive, and fully correct. The question is how to deal with simpler but potentially not that safe methods. One thing we could do is to replace those simpler methods with something like this:

  • UpdateFirst(Func<PriorityQueueHandle<TKey, TValue>, bool> predicate)
  • UpdateAll(Func<PriorityQueueHandle<TKey, TValue>, bool> predicate)

The usage would be pretty sweet and powerful. And truly intuitive. And readable (very expressible). I am for that one.

var queue = new PriorityQueue<int, string>();

/* add some elements */

queue.UpdateAll(x => x.Key < 15, 0);

// or for example

queue.UpdateFirst(x => x.Value.State == State.Idle, 100);

Setting the new key could potentially be a function as well, which takes the old key and transforms it somehow.

Same for Contains and Remove.

UpdateKey

If the key selector is not defined (and thus key and value are kept separately), the method could be named UpdateKey. It's probably more expressible. When the key selector is defined though, Update is better, because the key is already updated and what is needed to be done is rearrangement of some elements within the priority queue.

FAQ

Aren't handles inefficient?

There is no issue with efficiency in regards to using handles. A common fear is that it will require additional allocations, because we are using a heap based on an array internally. Fear not. Read on.

How are you going to implement it then?

It would be a completely different approach to delivering a priority queue. For the first time, a standard library won't provide this functionality implemented as a binary heap, which is represented as an array underneath. It will be implemented with a pairing heap, which by design doesn't use an array — it simply represents a tree instead.

How performant would it be?

  • For general random input, a quaternary heap would be slightly faster.
  • However, it would have to be based on an array to be faster. Then, handles could not be based on nodes simply — additional allocations would be needed. Then — we couldn't update and remove elements reasonably.
  • The way it is designed right now, we have an easy-to-use API while pretty performant implementation.
  • An additional benefit of having a pairing heap beneath is the ability to merge two priority queues in O(1), instead of O(n) as in binary/quaternary heaps.
  • Pairing heap is still blazingly fast. See references. Sometimes it's faster than quaternary heaps (depends on input data and operations performed, not only merging).

References

  • Michael L. Fredman, Robert Sedgewick, Daniel D. Sleator, and Robert E. Tarjan (1986), The pairing heap: A new form of self-adjusting heap, Algorithmica 1:111-129.
  • Daniel H. Larkin, Siddhartha Sen, and Robert E. Tarjan (2014), A back-to-basics empirical study of priority queues, arXiv:1403.0252v1 [cs.DS].

BTW, it is for the 1st API review iteration. The section with _open questions_ needs to be resolved (I need your opinion [and of API reviewers if possible]). If the 1st iteration passes at least partially, for the rest of the discussion I'd like to create a new issue (close and reference this one).

As far as I can tell, an interface or a heap will not pass the API review.

@pgolebiowski : Why not just close the issue then? Without an interface, this class is close to useless. The only place that I could use it is in private implementations. At which point I can just create (or re-use) my own priority queue when needed. I cannot expose a public API in my code with this type in the signature as it will break as soon as I need to swap it out for another implementation.

@bendono
Well, Microsoft has the final word here, not a few people commenting on the thread. And we know that:

I am somewhat sure that other API reviewers/architects will share it as I've checked for opinion with a few: the name should be PriorityQueue, and we should not introduce IHeap interface. There should be exactly one implementation (likely via some heap).

This is shared by @karelz, Software Engineer Manager, and @terrajobst, Program Manager and the owner of this repository + some API reviewers.

Although I obviously like the approach with interfaces, as clearly stated in the previous posts, I can see it is pretty tough given we don't have much power in this discussion. We have made our points, but we are just some commenters. The code does not belong to us anyway. What else can we do?

Why not just close the issue then? Without an interface, this class is close to useless.

I've done enough -- instead of hating my work, do something. Do the actual work. Why do you try blaming me for anything? Blame yourself for not succeeding in persuading others to your point of view.

And please, spare me rhetoric like that. It is really childish -- do better.

BTW, the proposal focuses on things that are common no matter if we introduce an interface or not, or whether the class is named PriorityQueue or Heap. So focus on what really matters here and show us some bias for action if you want something.

@pgolebiowski Of course it is Microsoft's decision. But it is better to present an API that you want to use that meets your needs. If it is rejected, then so be it. I just don't see the need to compromise the proposal.

I apologize if you interpreted my comments as blaming you. That was certainly not my intention.

@pgolebiowski why not use KeyValuePair<TKey,TValue> for the handle?

@SamuelEnglard

Why not use KeyValuePair<TKey,TValue> for the handle?

  • Well, the PriorityQueueHandle is in fact _pairing heap node_. It exposes two properties -- TKey Key and TValue Value. However, it has much more logic inside, which is just internal. Please refer to my implementation of this. It contains for example also pointers to other nodes in the tree (everything would be internal in CoreFX).
  • KeyValuePair is a struct, so it gets copied every time + it cannot be inherited.
  • But the main thing is that PriorityQueueHandle is a rather complicated class that just happens to expose same public API as KeyValuePair.

@bendono

But it is better to present an API that you want to use that meets your needs. If it is rejected, then so be it. I just don't see the need to compromise the proposal.

  • It is true, I will have that in mind and see what happens. @karelz, along with the proposal, could you also pass some of the previous post (they're right before the proposal), where we voted for the interfaces? We might come back to this later, after the 1st iteration of API review.
  • Still, no matter what solution we go, the functionality will be very similar and it would be helpful to have it reviewed. Because if the idea for handles is rejected and we cannot really update / remove elements properly (or we can't have separate TKey and TValue), this class is truly close to being useless then -- as it is now in Java.
  • In particular, if we don't have an interface, my AlgoKit library will not be able to have common core for heaps with CoreFX, which would be truly sad for me.
  • And yes, it is truly surprising for me that adding an interface is perceived as a disadvantage by Microsoft.

That was certainly not my intention.

Sorry, my mistake then.

My questions on the design (disclaimer: these questions & clarifications, no hard push backs (yet)):

  1. Do we really need PriorityQueueHandle? What if we just expect unique values in the queue?

    • Motivation: It seems rather involved concept. If we have it, I would like to understand why we need it? How is it helping? Or is it just specific implementation detail leaking into the API surface? Is it going to buy us that much perf to pay for the complication in the API?

  2. Do we need Merge? Do other collections have it? We should not add APIs, just because they are easy to implement, there should be some common-ish use case for the APIs.

    • Maybe just adding some initialization from IEnumerable<KeyValuePair<TKey, TValue>> would be sufficient? + rely on Linq

  3. Do we need comparer overload? Can we just always fall back to default? (disclaimer: I am missing knowledge/expertise in this case, so just asking)
  4. Do we need keySelector overloads? I think we should decide if we want to have priority as part of the value, or a separate thing. Separate priority seems a bit more natural to me, but I don't have strong opinion. Do we know the pros vs. cons?

Separate/parallel decision points:

  1. Class name PriorityQueue vs. Heap
  2. Introduce IHeap and constructor overload?

Introduce IHeap and constructor overload?

Seems like things have settled down enough for me to throw in my 2 cents.... I like the interface, personally. It abstracts the implementation details away from the API (and the core functionality described by that API) in a way that in my opinion simplifies the structure and enables the most usability.

What I don't have as strong of an opinion on is whether we do the interface at the same time as the PQueue/Heap/ILikeThisItemMoreThanThisItemList or if we add it later. The argument that the API may be "in flux" and as such we should release it as a class first until we get feedback is certainly a valid one that I don't disagree with. The question then becomes when it is considered "stable" enough to add an interface. Way above in the thread IList and IDictionary were mentioned as lagging behind their canonical implementations' APIs which we added a loooong time ago, so what period of time is considered an acceptable resting period?

If we can define that period with reasonable certainty and be sure it isn't unacceptably blocking, then I see no problem shipping this heapy data structure thing without an interface. Then after that period of time we can examine usage and consider adding an interface.

And yes, it is truly surprising for me that adding an interface is perceived as a disadvantage by Microsoft.

That's because in a lot of ways it is a disadvantage. If we ship an interface, it's pretty much done. There's not a lot of wiggle for room for iterating on that API, so we better be sure it's right the first time and will continue to be right for the many years to come. Missing some nice-to-have functionality is a way better place to be in than being stuck with a potentially inadequate interface where it would be oh-so-nice to have just this one little change that would make it all better.

Thank you for the input, @karelz and @ianhays!

Allowing duplicates

Do we really need PriorityQueueHandle? What if we just expect unique values in the queue?

Motivation: It seems rather involved concept. If we have it, I would like to understand why we need it? How is it helping? Or is it just specific implementation detail leaking into the API surface? Is it going to buy us that much perf to pay for the complication in the API?

No, we don't need that. The priority queue API proposed above is quite powerful and very flexible. It is based on an assumption that elements and priorities could be duplicated. Because of that assumption, having a handle is necessary to be able to remove or update the correct node. However, if we put a restriction that elements have to be unique, we could achieve the same result as above with a simpler API and without exposing an internal class (PriorityQueueHandle), which is indeed not ideal.

Let's assume that we allow only unique elements. We could still support all the previous scenarios and keep optimal performance. Simpler API:

public class PriorityQueue<TElement, TPriority>
    : IEnumerable,
    IEnumerable<(TElement element, TPriority priority)>,
    IReadOnlyCollection<(TElement element, TPriority priority)>
    // ICollection not included on purpose
{
    public PriorityQueue();
    public PriorityQueue(IComparer<TPriority> comparer);
    public PriorityQueue(Func<TElement, TPriority> prioritySelector);
    public PriorityQueue(Func<TElement, TPriority> prioritySelector, IComparer<TPriority> comparer);

    IComparer<TPriority> Comparer { get; }
    Func<TElement, TPriority> PrioritySelector { get; }
    public int Count { get; }

    public void Clear();
    public bool Contains(TElement element); // O(1)

    public (TElement element, TPriority priority) Peek(); // O(1)
    public (TElement element, TPriority priority) Dequeue(); // O(log n)

    public void Enqueue(TElement element, TPriority priority); // O(log n)
    public void Enqueue(TElement element); // O(log n)

    public void Update(TElement element); // O(log n)
    public void Update(TElement element, TPriority priority); // O(log n)

    public void Remove(TElement element); // O(log n)

    public IEnumerator<(TElement element, TPriority priority)> GetEnumerator();
    IEnumerator IEnumerable.GetEnumerator();
}

Shortly, there would be an underlying Dictionary<TElement, InternalNode>. Testing whether the priority queue contains an element can be done even faster than in the previous approach. Updating and removing elements is significantly simplified, as we can always point to a direct element in the queue.

Probably allowing duplicates is not worth the hassle and the above is enough. I think I like it. What do you think?

Merge

Do we need Merge? Do other collections have it? We should not add APIs, just because they are easy to implement, there should be some common-ish use case for the APIs.

Agree. We don't need it. We can always add API but not remove it. I am OK with removing this method and (potentially) adding it later (if there is a need).

Comparer

Do we need comparer overload? Can we just always fall back to default? (disclaimer: I am missing knowledge/expertise in this case, so just asking)

This I think is pretty important, for two reasons:

  • A user would like to have their elements to be ordered in an ascending order or a descending order. The highest priority does not tell us how ordering should be done.
  • If we skip a comparer, we enforce the user to always deliver us a class that implements IComparable.

Plus, it is consistent with the existing API. Have a look at SortedDictionary.

Selector

Do we need keySelector overloads? I think we should decide if we want to have priority as part of the value, or a separate thing. Separate priority seems a bit more natural to me, but I don't have strong opinion. Do we know the pros vs. cons?

I also like separate priority. Apart from that, it is easier to implement this, better performance & memory usage, more intuitive API, less work for the user (no need to implement IComparable).

Regarding the selector now...

Pros

This is what makes this priority queue flexible. It allows users to have:

  • elements and their priorities as separate elements
  • elements (complex classes) that have priorities somewhere inside them
  • an external logic that retrieves priority for a given element
  • elements that implement IComparable

It allows pretty much every configuration I can think of. I find it useful, as users could just "plug n play". It is also rather intuitive.

Cons

  • There is more to learn.
  • More API. Two additional constructors, one additional Enqueue and Update method.
  • If we decide to have the element and priority separate or joined, we enforce some of the users (who have their data in a different format) to adapt their code to use this data structure.

Separate/parallel decision points

Class name PriorityQueue vs. Heap

Introduce IHeap and constructor overload.

  • It feels that PriorityQueue should be a part of CoreFX, rather than Heap.
  • Regarding the IHeap interface — because a priority queue can be implemented with something different than a heap, probably we would like not to expose it this way. However, we might need IPriorityQueue.

If we ship an interface, it's pretty much done. There's not a lot of wiggle for room for iterating on that API, so we better be sure it's right the first time and will continue to be right for the many years to come. Missing some nice-to-have functionality is a way better place to be in than being stuck with a potentially inadequate interface where it would be oh-so-nice to have just this one little change that would make it all better.

Totally agree!

Comparable elements

If we add another assumption: that elements have to be comparable, then the API is even simpler. But again, it is less flexible.

Pros

  • No need for IComparer.
  • No need for the selector.
  • One Enqueue and Update method.
  • One generic type instead of two.

Cons

  • We cannot have element and priority as separate objects. Users need to provide a new wrapper-class if they have it in that format.
  • Users always need to implement IComparable before they can use this priority queue.
  • If we have elements and priorities separate, we could have it implemented easier, with better performance and memory usage -- internal dictionary <TElement, InternalNode> and have InternalNode contain the TPriority.
public class PriorityQueue<T> : IEnumerable, IEnumerable<T>, IReadOnlyCollection<T>
    where T : IComparable<T>
    // ICollection not included on purpose
{
    public PriorityQueue();
    // some other constructors like building it from a collection, or initial capacity if we have an array beneath

    public int Count { get; }

    public void Clear();
    public bool Contains(T element);

    public T Peek();
    public T Dequeue();

    public void Enqueue(T element);
    public void Update(T element);
    public void Remove(T element);

    public IEnumerator<T> GetEnumerator();
    IEnumerator IEnumerable.GetEnumerator();
}

Everything is a trade-off. We remove some features, lose some flexibility, but maybe we don't need that to satisfy 95% of our users.

I also like the interface approach because it is more flexible and gives more usages, but I also agree with @karelz and @ianhays that we should wait until the new class API is used and we get feedback until we actually ship the interface and then we are stocked with that version and can't change it without breaking customers that already used it.

Also about comparers I think it follows the other BCL apis and I don't like the fact of users needing to provide a new wrapper-class. I really like the constructor overload receiving a comparer approach and using that comparer inside all of the comparisons that need to be done internally, if not comparer provided or comparer is null, then use default comparer.

@pgolebiowski thanks for such detailed and descriptive API proposal and being so pro active and energetic on getting this API approved and added to CoreFX 👍 when this discussion is done and we consider it is ready for review I would merge all the inputs and final API surface into one comment and update the main issue comment at the top as that would make reviewers' life easier.

OK, I am convinced about comparer, it makes sense and it is consistent.
I am still torn on selector - IMO we should try to go without it - let's split it into 2 variants.

```c#
public class PriorityQueue
: IEnumerable,
IEnumerable<(TElement element, TPriority priority)>,
IReadOnlyCollection<(TElement element, TPriority priority)>
// ICollection not included on purpose
{
public PriorityQueue();
public PriorityQueue(IComparer comparer);

public IComparer<TPriority> Comparer { get; }
public int Count { get; }

public void Clear();
public bool Contains(TElement element); // O(1)

public (TElement element, TPriority priority) Peek(); // O(1)
public (TElement element, TPriority priority) Dequeue(); // O(log n)

public void Enqueue(TElement element, TPriority priority); // O(log n)
public void Update(TElement element, TPriority priority); // O(log n)

public void Remove(TElement element); // O(log n)

public IEnumerator<(TElement element, TPriority priority)> GetEnumerator();
IEnumerator IEnumerable.GetEnumerator();

//
// Selector part
//
public PriorityQueue(Func prioritySelector);
public PriorityQueue(Func prioritySelector, IComparer comparer);

public Func<TElement, TPriority> PrioritySelector { get; }

public void Enqueue(TElement element); // O(log n)
public void Update(TElement element); // O(log n)

}
````

Open questions:

  1. Class name PriorityQueue vs. Heap
  2. Introduce IHeap and constructor overload? (Should we wait for later?)
  3. Introduce IPriorityQueue? (Should we wait for later - IDictionary example)
  4. Use selector (of priority stored inside the value) or not (5 APIs difference)
  5. Use tuples (TElement element, TPriority priority) vs. KeyValuePair<TPriority, TElement>

    • Should Peek and Dequeue rather have out argument instead of tuple?

I'll try to run it by API review group tomorrow for early feedback.

Fixed Peek and Dequeue tuple field name to priority (thanks @pgolebiowski for pointing it out).
Top-post updated with latest proposal above.

I second @safern's comment - Thanks a lot to @pgolebiowski for driving it forward!

Open questions:

I think we could have one more here:

  1. Limit ourselves to unique elements only (don't allow duplicates).

Good point, captured as 'Assumptions' in top post, rather than open question. The opposite would uglyfy the API quite a bit.

Should we return bool fromRemove? And also the priority as out priority arg? (I believe we added similar overloads recently into other data structures)

Similar to Contains - I bet someone will want the priority out of it. We might want to add an overload with out priority.

I remain of the opinion (though I'm yet to voice it) that Heap would imply an implementation, and would duly support calling this PriorityQueue. (I'd even support then a proposal for a slimmer Heap class which does allow duplicate elements, and doesn't allow updating, etc. (more in line with the original proposal) but I don't expect that to happen).

KeyValuePair<TPriority, TElement> most certainly should not be used, as duplicate priorities are expected, and I think that KeyValuePair<TElement, TPriority> is also confusing, so I would support not using KeyValuePair at all, and either using plain tuples, or out parameters for the priorities (personally I like out params, but I'm not fussed).

If we disallow duplicates, we need to decide behaviour of attempting to re-add them with different/changed priorities.

I do not support the selector proposal, for the simple reason that it implies redundancy, and duly will recreate confusion. If the Priority of an Element in stored with it, then it will be stored in two places, and if they become out-of-sync (i.e. someone forgets to call the useless-looking Update(TElement) method) then much suffering will ensure. If the selector is a computer, then we are open to people intentionally adding an element, then changing the values they are computed from, and now if they do try to re-add it, there is a wealth of things that could go wrong depending on the decision of what happens when this occurs. At a slightly higher level, trying though could result in adding a changed copy of the Element, as it is no longer equal to what it once was (this is a general issue with mutable keys, but I think that separating the Priority and Element will help to avoid potential issue).

The selector itself is liable to be change behaviour, which is another way that users can break everything without thinking. Much better, I think, to have users explicitly provide the priorities. The big issue I see with this, is that it conveys that duplicate entries are allowed, as we are declaring a pair, not an Element. However, sensible inline documentation and a bool return value on Enqueue should trivially remedy this. Better than a Boolean would perhaps be to return the old/new priority of the element (e.g. if Enqueue uses the newly provided priority, or the minimum of the two, or the old priority), but I think that Enqueue ought just to fail if you try to re-add something, and should therefore just return a bool indicating success. This keeps Enqueue and Update completely separate and well defined.

I would support not using KeyValuePair at all, and either using plain tuples

I'm with you on tuples.

If we disallow duplicates, we need to decide behaviour of attempting to re-add them with different/changed priorities.

  • I see a similarity with the indexer within a Dictionary. There, when you do dictionary["something"] = 5 then it gets updated if "something" was a key there previously. If it wasn't there, it just gets added.
  • However, the Enqueue method is analogical for me to the Add method in the dictionary. Which means it should throw an exception.
  • Taking the above points into account, we could consider adding an indexer to the priority queue to support the behavior you have in mind.
  • But in turn, an indexer might not be something that works with the concept of queues.
  • Which leads us to the conclusion that the Enqueue method should just throw an exception if someone wants to add a duplicated element. Similarly, Update method should throw an exception if someone wants to update the priority of an element that is not present.
  • Which leads us to a new solution -- add TryUpdate method that indeed returns bool.

I do not support the selector proposal, for the simple reason that it implies redundancy

Isn't it that the key is not physically copied (if it exists), but the selector just remains a function that gets called when priorities need to be evaluated? Where is the redundancy?

I think that separating the Priority and Element will help to avoid potential issue

The only problem is the case when the customer doesn't have the physical priority. He cannot do much then.

Much better, I think, to have users explicitly provide the priorities. The big issue I see with this, is that it conveys that duplicate entries are allowed, as we are declaring a pair, not an Element.

I see some issues with that solution, but not necessarily why it conveys that duplicate entries are allowed. I think the "no-duplicates" logic should be applied on TElement only -- the priority is just a value here.

var queue = new PriorityQueue<string, double>();

queue.Enqueue("first", 0.1);
queue.Enqueue("first", 0.5); // should be unsuccessful IMO

@VisualMelon does that make sense?

@pgolebiowski

I agree wholly with adding a TryUpdate, and Update throwing makes sense to me. I was thinking more along the lines of ISet with regard to Enqueue (rather than IDictionary). Throwing would make sense also, I must have missed that point, but I do think the return bool conveys the 'setness' of the type. Perhaps a TryEnqueue would be in order also (with Add throwing)? A TryRemove would also be appreciated (fails if empty). Regarding your last point, yes, that is the behaviour I had in mind also. I suppose an analogy with IDictionary is better than ISet on reflection, and that ought to be clear enough. (To summarise: I'd support everything throwing per your suggestion, but having Try* is a must if that is the case; I also agree with your statements regarding failure conditions).

Regarding the indexer, I think you're right that it doesn't really 'fit' the queue concept, I wouldn't support that. If anything, a well-named method to Queue or Update would be in order.

Isn't it that the key is not physically copied (if it exists), but the selector just remains a function that gets called when priorities need to be evaluated? Where is the redundancy?

You're right, I misread the proposal update (the bit about storing it in the Element). Taking it then that the Selector is called on demand (which would have to be made clear in any documentation, as it could have performance implications), the points still stand that result of the function can change without the data-structure responding, leaving the two out-of-sync unless Update is called. Even worse, if a user changes the effective priority of multiple Elements, and only updates one of them, then the data structure will end up depending on 'uncommited' changes when they are selected from 'unupdated' Elements (I haven't looked into the proposed DataStructure implementation in detail, but I think this is necessarily an issue for any Update O(<n)). Forcing the user to explicitly update any priorities remedies this, necessarily taking the data structure from one consistent state to another.

Note that all my stated gripes so far with the Selector proposal are about the robustness of the API: I think the selector makes it easy to use incorrectly. However, usability aside, conceptually, the Elements in the Queue shouldn't have to know their priority. It is potentially irrelevant to them, and if the user ends up wrapping their Elements in a struct Queable<T>{} or something then that seems like a failure in providing a frictionless API (as much as I hurts me to use the term).

You could of course argue that without the selector, the burden is on the user to call the selector, but I think if the Elements know their priority, they will (hopefully) be exposed tidily (i.e. a property), and if they don't, then there is no selector to worry about (generating Priorities on the fly, etc.). There is a lot more meaningless plumbing required to support a selector if you don't want it, then to pass in the priorities if you already have a well-defined 'selector'. The selector entices the user to expose this information (perhaps unhelpfully), while separate priorities provide an extremely transparent interface which I can't imagine will influence design decisions.

The only argument I can really think of in favour of the selector is that it means you can use pass a PriorityQueue around, and other parts of the program can use it without knowing how the priories are computed. I'll have to think about this a little longer, but given the 'niche' suitability, this strikes me as little reward for the reasonably heavy overhead in more general cases.

_Edit: Having though about it some more, it would rather be very nice to have self-contained PriorityQueues which you can just throw Elements at, but I maintain that the cost of working around this would be great, while the cost of introducing would be considerably less._

I've been doing a bit of work looking through open source .NET codebases to see real world examples of how priority queues are used. The tricky problem is filtering out any student projects and training source code.

Usage 1: Roslyn Notification Service
https://github.com/dotnet/roslyn/
The Roslyn compiler includes a private priority queue implementation called "PriorityQueue" that appears to have a very specific optimization in it - an object queue that to reuses objects in the queue to avoid them being garbage collected. The Enqueue_NoLock method performs an evaluation on current.Value.MinimumRunPointInMS < entry.Value.MinimumRunPointInMS to determine where in the queue to place the new node. Either of the two main priority queue designs proposed here (comparer function/delegate vs explicit priority) would fit this usage scenario.

Usage 2: Lucene.net
https://github.com/apache/lucenenet
This is arguably the largest usage example for a PriorityQueue I could find in .NET. Apache Lucene.net is a full .net port of the popular Lucene search engine library. We use the Java version at my company, and according to Apache's website a few big names use the .NET version. There's a large number of forks of the .NET project in Github.

Lucene includes its own PriorityQueue implementation that is subclassed by a number of "specialized" priority queues: HitQueue, TopOrdAndFloatQueue, PhraseQueue, and SuggestWordQueue. As well, the project directly instantiates PriorityQueue in a number of places.

Lucene's PriorityQueue implementation, linked to above, is very similar to the original priority queue API posted in this issue. It's defined as "PriorityQueue" and accepts an IComparer parameter in its constructor. Methods and Properties include Count, Clear, Offer (enqueue/push), Poll (dequeue/pop), Peek, Remove (removes the first matching item found in the queue), Add (synonym for Offer). Interestingly, they also provide enumeration support for the queue.

Priority in Lucene's PriorityQueue is determined by the comparer passed into the constructor, and if none was passed, it assumes that the compared objects implement IComparable and uses that interface to do the comparison. The original API design posted here is similar, except that it works with value types as well.

There are a large number of usage examples through their code base, SloppyPhraseScorer being one of them.

SloppyPhraseScorer's constructor instantiates a new PhraseQueue (pq), which is one of their own custom subclasses of PriorityQueue. A collection of PhrasePositions is generated, which appears to be a wrapper for a set of postings, a position, and a set of terms. The FillQueue method enumerates over the phrase positions and enqueues them. PharseFreq() calls an AdvancePP function and, at a high level, appears to dequeue, update an item's priority, and then enqueue again. The priority is determined relatively (using a comparer) rather than explicitly (priority is not "passed in" as a second parameter during enqueue).

You can see that based on their implementation of PhraseQueue a comparison value passed in via the constructor (e.g. integer) might not cut it. Their comparison function ("LessThan") evaluates three different fields: PhrasePositions.doc, PhrasePositions.position, and PhrasePositions.offset.

Usage 3: Game Development
I haven't finished searching through usage examples in this space, but I saw quite a few examples of custom .NET PriorityQueue's being used in game development. In a very general sense these tended to be clustered around pathfinding as the main use case (Dijkstra's). You can find a lot of people asking how to implement pathfinding algorithms in .NET due to Unity 3D.

Still need to dig through this area; saw a few examples with explicit priority being enqueued and some examples using Comparer/IComparable.

On a separate note, there's been some discussion around unique elements, removing explicit elements, and determine whether a specific element exists.

Queues, as a data structure, generally support Enqueuing and Dequeuing. If we head down the path of providing other set/list-like operations, I wonder whether we're actually designing a different data structure altogether - something akin to a sorted list of tuples. If a caller has a need other than Enqueue, Dequeue, Peek, perhaps they need something other than a priority queue? Queue, by definition, implies insertion into a queue and ordered removal from the queue; not much else.

@ebickle

I appreciate your effort in going through other repositories and checking how the priority queue functionality was delivered there. However, IMO this discussion would benefit from providing specific design proposals. This thread is already hard to follow and telling a lengthy story without any conclusions makes it even harder.

Queues [...] generally support Enqueuing and Dequeuing. [...] If a caller has a need other than Enqueue, Dequeue, Peek, perhaps they need something other than a priority queue? Queue, by definition, implies insertion into a queue and ordered removal from the queue; not much else.

  • What is the conclusion? Proposal?
  • It is very far from the truth. Even Dijkstra's algorithm that you mentioned makes use of updating priorities of elements. And what is needed to update specific elements is also needed to remove specific elements.

Great discussion. @ebickle's research is extremely useful IMO!
@ebickle do you have conclusion on [2] lucene.net - does our latest proposal fit the usages or not? (I hope I didn't miss it in your detailed description)

It looks like we need Try* variants from above + IsEmpty + TryPeek/TryDequeue + EnqueueOrUpdate? Thoughts?
```c#
public bool IsEmpty();

public bool TryPeek(ref TElement element, ref TPriority priority); // false if empty
public bool TryDequeue(ref TElement element, ref TPriority priority); // false if empty

public bool TryEnqueue(TElement element, TPriority priority); // false if it is duplicate (doe NOT update it)
public void EnqueueOrUpdate(TElement element, TPriority priority); // TODO: Should return bool status for enqueued vs. updated?
public bool TryUpdate(TElement element, TPriority priority); // false if element does not exist (does NOT add it)

public bool TryRemove(TElement element); // false if element does not exist

```

@karelz

It looks like we need Try* variants from above

Yes, exactly.

Should return bool status for enqueued vs. updated?

If we want to return statuses everywhere, then everywhere. For this particular method, I think it should be true if either operation succeeded (Enqueue or Update).

For the rest -- I simply agree :smile:

Just one question -- why ref instead of out? I'm not sure:

  • We don't need to have it initialized before entering the function.
  • It is not used for the method (it just goes out).

If we want to return statuses everywhere, then everywhere. For this particular method, I think it should be true if either operation succeeded (Enqueue or Update).

It would always return true, that's not a good idea. We should use void in such case IMO. Otherwise people will be confused and will check return value and add useless never-to-be-executed code. (Unless I missed something)

ref vs. out agreed, I debated it myself. I don't have strong opinion / enough experience to make decision myself. We can ask API reviewers / wait for more comments.

It would always return true

You are right, my bad. Sorry.

I don't have strong opinion / enough experience to make decision myself.

Maybe I am missing something, but I find it pretty simple. If we use ref, we are basically saying that Peek and Dequeue want to use somehow the TElement and TPriority passed to it (I mean read those fields). Which isn't really the case -- our methods are supposed to only assign values to those variables (and they are in fact required to do so by the compiler).

Top-post updated with my Try* APIs
Added 2 open questions:

  • [6] Is Peek and Dequeue throwing useful at all?
  • [7] TryPeek and TryDequeue - should use ref or out args?

ref vs. out - You are right. I was optimizing for avoiding of initialization in case we return false. That was stupid & blindsided from me - premature optimization. I'll change it to out and remove the question.

6: I might be missing something, but if not throwing an exception, what should Peek or Dequeue do if the queue is empty? I suppose this starts to beg the question as to whether the data structure should accept null (I'd prefer not, but no firm opinion). Even if we allow null, Value Types have no null (default certainly doesn't count), so Peek and Dequeue have no way to convey a meaningless result, and duly I think must throw an exception (thus removing any concerns about out parameters!). I see no reason not follow the example of the existing Queue.Dequeue

I'll just add that Dijkstra's (aka. Heuristic search without a Heuristic) shouldn't require any Priority changes. I've never wished to update the priority of anything, and I bash out heuristic searches all the time. (the point of the algorithm is that once you've explored a state you know that you've explored the best route to it, otherwise it risks being non-optimal, hence you could never improve the priority, and you certainly don't want to decrease it (i.e. consider a worse route to it). _Ignoring cases where you intentionally choose the heuristic such that it isn't optimal, in which case you can of course get a non-optimal result, but you'll still never update a priority_)

if not throwing an exception, what should Peek or Dequeue do if the queue is empty?

True.

thus removing any concerns about out parameters!

How? Well, the out parameters are for the TryPeek and TryDequeue methods. The ones that throw an exception are Peek and Dequeue.

I'll just add that Dijkstra's shouldn't require any Priority changes.

I might be wrong, but to my knowledge, the Dijkstra's algorithm makes use of the DecreaseKey operation. See for example this. Whether this is efficient or not is a different aspect. In fact, Fibonacci heap was designed in a way that achieves the DecreaseKey operation asymptotically in O(1) (to improve Dijkstra).

But still, what is important in our discussion -- being able to update an element in a priority queue is very helpful and there are people who search for such a functionality (see previously linked questions on StackOverflow). I have used it myself a few times too.

Sorry, yes, I see the issue with out params now, misreading things again. And it seems that a variant of Dijkstra's (a name which seems to be applied rather more broadly than I believed...) can be implemented perhaps more efficiently (if your queue already contains the Elements, presumably there is benefit for repeatable searches) with updateable priorities. That's what I get for using long words and names. Note that I'm not proposing we drop the Update, it would just be nice also to have a slimmer Heap or other (i.e. the original proposal) without the restrictions this capability imposes (as glorious a capability as I appreciate it is).

7: These should be out. Any input could never have any meaning, so shouldn't exist (i.e. we shouldn't go with ref). With out params, we aren't going to improve on returning default values. This is what Dictionary.TryGetValue does, and I see no reason to do otherwise. _That said, you could treat the ref as value or default, but if you don't have a meaningful default then it frustrates matters._

API review discussion:

  • We will need to have a proper design meeting (2h) to discuss all the pros/cons, the 30min we invested today was not enough to get to consensus / close on open questions.

    • We will likely invite most invested community members - @pgolebiowski , anyone else?

Here are key raw notes:

  • Experiment - we should do experiment (in CoreFxLabs), release it as pre-release NuGet package and ask consumers for feedback (via blog post). We do not believe we can nail the API without feedback loop.

    • We did similar thing for ImmutableCollections in the past (fast preview release cycle was key & helpful in shaping the APIs).

    • We can bundle this experiment together with MultiValueDictionary which already is in CoreFxLab. TODO: Check if we have more candidates, we do not want to blog post each of them separately.

    • The experimental NuGet package will be nuked after the experiment ends and we will move the source code into CoreFX or CoreFXExtensions (to be decided later).

  • Idea: Return just TElement from Peek & Dequeue, don't return TPriority (users can use Try* methods for that).
  • Stability (for same priorities) - items with same priorities should be returned in the order they were inserted into the queue (general queue behavior)
  • We want to enable duplicates (similar to other collections):

    • Get rid of Update - user can Remove and then Enqueue item back with different priority.

    • Remove should remove just 1st found item (as List does).

  • Idea: Can we abstract IQueue interface to abstract Queue and PriorityQueue (Peek and Dequeue returning just TElement helps here)

    • Note: It may prove as impossible, but we should explore it before settling on the API

We had long discussion about selector - we are still not decided (may be required by IQueue above?). Majority didn't like selector, but things may change in future API review meeting.

Other open questions were not discussed.

The latest proposal looks pretty good to me. My opinions/questions:

  1. If Peek and Dequeue used out parameters for priority, then they could also have overloads that don't return priority at all, which I think would simplify common usage. Though priority can also be ignored using a discard, which makes this less important.
  2. I don't like that the selector version is distinguished by using a different constructor and enables a different set of methods. Maybe there should be a separate PriorityQueue<T>? Or a set of static and extension methods working with PriorityQueue<T, T>?
  3. Is it defined in what order are the elements returned when enumerating? My guess is that it's undefined, to make it efficient.
  4. Should there be some way to enumerate just the items in the priority queue, ignoring priorities? Or is having to use something like priorityQueue.Select(t => t.element) acceptable?
  5. If the priority queue is internally using a Dictionary on the element type, should there be an option to pass an IEqualityComparer<TElement>?
  6. Tracking the elements using a Dictionary is unnecessary overhead if I never need to update priorities. Should there be an option to disable it? Though this can be added later, if it turns out that it's useful.

@karelz

Stability (for same priorities) - items with same priorities should be returned in the order they were inserted into the queue (general queue behavior)

I think this would mean that internally, priority would have to be something like a pair of (priority, version), where version is incremented for every addition. I think that would significantly bloat memory usage of the priority queue, especially considering that the version would likely have to be 64 bits. I'm not sure that would be worth it.

We do not want to prevent duplicate values (similar to other collections):
Get rid of Update - user can Remove and then Enqueue item back with different priority.

Depending on the implementation, Update is likely going to be much more efficient than Remove followed by Enqueue. For example, with binary heap (I think quaternary heap has the same time complexities), Update (with unique values and a dictionary) is O(log n), while Remove (with duplicate values and no dictionary) is O(n).

@pgolebiowski
Completely agree we need to stay focused on proposals here; I made the original API proposal and original post. Back in January @karelz asked for some specific usage examples; that opened up a much broader question regarding the specific need and usage of a PriorityQueue API. I had my own PriorityQueue implementation and used it in a few projects and felt something similar would be useful in the BCL.

What I lacked in the original post was a broad survey of existing queue usage; real world examples will help keep the design grounded and ensure it can be broadly used.

@karelz
Lucene.net contains at least one priority queue comparison function (similar to Comparison) that evaluates multiple fields to determine priority. Lucene's "implicit" comparison pattern won't map well into the current API proposal's explicit TPriority parameter. Some sort of mapping would be required - combining the multiple fields into one or into a 'comparable' data structure that can be passed as the TPriority.

Proposal:
1) PriorityQueue class based on my original proposal above (listed under the Original Proposal heading). Potentially add Update(T), Remove(T), and Contains(T) convenience functions. Fits majority of existing usage examples from open source community.
2) PriorityQueue variant of dotnet/corefx#1. Dequeue() and Peek() return TElement instead of tuple. No Try* functiuons, Remove() returns bool instead of throwing to fit List pattern. Acts as a 'convenience type' so developers that have an explicit priority value do not need to create their own comparable type.

Both types support duplicate elements. Need to determine whether or not we guarantee FIFO for elements of the same priority; likely not if we support priority updates.

Build both variants in the CoreFxLabs as @karelz suggested and solicit feedback.

Questions:

public void Enqueue(TElement element, TPriority priority); // Throws if it is duplicate`
public bool TryEnqueue(TElement element, TPriority priority); // Returns false if it is duplicate (does NOT update it)

Why no duplicates?

public (TElement element, TPriority priority) Peek(); // Throws if empty
public (TElement element, TPriority priority) Dequeue(); // Throws if empty
public void Remove(TElement element); // Throws if element does not exist

If these methods are desirable, have them as extension methods?

public void Enqueue(TElement element);
public void Update(TElement element);

Should throw; but why no Try variant, also extension methods?

Struct based enumerator?

@benaadams throw on dupes was originally suggested (by me) to avoid the ugly handle type. Latest update from API review reverts that.

We should not add methods as extension methods when we can add them on the type. Extension methods are backup if we can't add them on the type (e.g. it is interface, or we want to deliver it to .NET Framework faster).

Dupes: If you have multiple entries that are the same you only need to update one? Then they become different?

Throwing methods: basically are wrappers and they will be?

public void Remove(TElement element)
{
    if (!TryRemove(element))
    {
        throw new Exception();
    }
}

Though its control flow via exceptions?

@benaadams check my reply with API review notes: https://github.com/dotnet/corefx/issues/574#issuecomment-308206064

  • We want to enable duplicates (similar to other collections):

    • Get rid of Update - user can Remove and then Enqueue item back with different priority.

    • Remove should remove just 1st found item (as List does).

Though its control flow via exceptions?

Not sure what you mean. It seems like a pretty common pattern in BCL.

Though its control flow via exceptions?

Not sure what you mean.

If you just have TryX methods

  • if you care about the result; you check it
  • if you don't care about the result you don't check it

No exception involvement; but the name encourages you to make a decision to throw away the return.

If you have non-Try exception throwing methods

  • if you care about the result; you either have to pre-check or use a try/catch to detect
  • if you don't care about the result; you either have to empty catch the method, or get unexpected exceptions

So you are in the anti-pattern of using exception handling for flow-control. They just seem a bit superfluous; can always just add a throw if false to recreate.

It seems like a pretty common pattern in BCL.

The TryX methods weren't a common pattern in the original BCL; until the Concurrent types (though think Dictionary.TryGetValue in 2.0 may have been first example?)

e.g. the TryX methods have only just been added to Core for Queue and Stack and aren't part of Framework yet.

Stability

  • Stability does not come for free. This comes with an overhead in performance and in memory. For an ordinary usage, the stability in priority queues is not important.
  • We expose an IComparer. If the customer wants to have stability, they can easily add it themselves. It would be easy to build StablePriorityQueue on top of our implementation — using the ordinary approach with storing the "age" of an element and using it during comparisons.

IQueue

There are API conflicts then. Let's now consider a simple IQueue<T> so that it works with the existing Queue<T>:

public interface IQueue<T> :
    IEnumerable,
    IEnumerable<T>,
    IReadOnlyCollection<T>
{
    void Enqueue(T element);

    T Peek();
    T Dequeue();

    bool TryPeek(out T element);
    bool TryDequeue(out T element);
}

We have two options for the priority queue:

  1. Implement IQueue<TElement>.
  2. Implement IQueue<(TElement, TPriority)>.

I will write implications of both paths using simply numbers (1) and (2).

Enqueue

In the priority queue, we need to enqueue both an element and its priority (current solution). In the ordinary heap, we add just one element.

  1. In the first case, we expose Enqueue(TElement), which is pretty strange for a priority queue if we are inserting an element without a priority. We are then forced to do... what? Assume default(TPriority)? Nah...
  2. In the second case, we expose Enqueue((TElement, TPriority) element). We essentially add two arguments by accepting a tuple created out of them. Not ideal API probably.

Peek & Dequeue

You wanted those methods to return TElement only.

  1. Works with what you want to achieve.
  2. Doesn't work with what you want to achieve — returns (TElement, TPriority).

Enumerate

  1. The customer cannot write any LINQ query that makes use of the priorities of the elements. This is wrong.
  2. It works.

We could mitigate some of the issues by providing PriorityQueue<T>, where there is no physical separate priority.

  • The user is essentially forced to write a wrapper class for their code to be able to use our class. This means that in many cases they would also want to implement IComparable. Which adds lots of fairly boilerplate code (and a new file in their source code, most likely).
  • We can obviously re-discuss this, if you want. I also provide an alternative approach below.

Two priority queues

If we deliver two priority queues, the overall solution would be more powerful and flexible. There are a few notes to take:

  • We expose two classes to provide the same functionality, but only for various types of input format. Doesn't feel best.
  • PriorityQueue<T> could potentially implement IQueue<T>.
  • PriorityQueue<TElement, TPriority> would expose a strange API if it tries to implement the IQueue interface.

Well… it would work. Not ideal though.

Updating and removing

Given the assumption that we want to allow duplicates and we don't want to use the concept of a handle:

  • It is impossible to provide the functionality of updating priorities of elements fully correctly (update the specific node only). Analogically, it applies to removing elements.
  • Additionally, both operations have to be done in O(n), which is pretty sad, because it is possible to do both in O(log n).

This basically leads to what is in Java, where users have to provide their own implementation of the entire data structure if they want to be able to update priorities in their priority queues without naively iterating through the entire collection every time.

Alternative handle

With the assumption that we don't want to add a new handle type, we can do it differently to be able to have the full support for updating and removing elements (plus do it efficiently). The solution is to add alternative methods:

void Enqueue(TElement element, TPriority priority, out object handle);

void Update(object handle, TPriority priority);

void Remove(object handle);

These would be methods for those who want to have a proper control on updating and removing elements (and don't do it in O(n), as if it was a List :stuck_out_tongue_winking_eye:).

But better though, let's consider...

Alternative approach

Alternatively, we can remove this functionality from the priority queue entirely and add it in the heap instead. This has numerous benefits:

  • The more powerful and efficient operations are available by using a Heap.
  • Simple API and straightforward usage is available by using a PriorityQueue -- for people who want their code to just work.
  • We don't run into the problems of Java.
  • We could make the PriorityQueue stable now -- it's not a trade-off anymore.
  • The solution is in harmony with the feeling that people with stronger computer science background would be aware of the existence of Heap. They would also be aware of the limitations of the PriorityQueue and thus could use the Heap instead (for example if they want to have more control over updating/removing elements or don't want the data structure to be stable at the expense of speed and memory usage).
  • Both priority queue and heap could easily allow duplicates while not compromising on their functionalities (because their purposes are different).
  • It would be easier to make a common IQueue interface -- because the power and functionality would be thrown into the heap field. The API of PriorityQueue could be focused on making it compatible with Queue through an abstraction.
  • We don't need to provide the IPriorityQueue interface anymore (we rather focus on keeping the functionality of PriorityQueue and Queue similar). Instead, we can add it in the heap area -- and essentially have IHeap there. This is great, because it allows people to build third-party libraries on top of what is in the standard library. And it feels right -- because again, we consider heaps to be more advanced than priority queues, so extensions would be provided by that area. Such extensions would also not suffer from the choices we would do in PriorityQueue, because it would be separate.
  • We don't need to consider the IHeap constructor for the PriorityQueue anymore.
  • Priority queue would be a helpful class to use internally in CoreFX. However, if we add features like stability and drop some other functionalities, we are likely to end up with the need for something more powerful than that solution. Fortunately, there would be the more powerful and performant Heap at our command!

Basically, the priority queue would be focused primarily on ease of use, at the expense of: performance, power, and flexibility. The heap would be for those who are aware of the performance and functional implications of our decisions in the priority queue. We mitigate many issues with trade-offs.

If we are for making an experiment, I think it is possible now. Let's just ask the community. Not everyone reads this thread -- we might generate other valuable comments and usage scenarios. What do you think? Personally, I would love such a solution. I think it would make everyone happy.

Important note: if we want such an approach, we need to design priority queue and heap together. Because their purposes would be different and one solution would provide what the other one would not.

Delivering IQueue then

With the approach presented above, in order for the priority queue to implement IQueue<T> (so that it makes sense), and assuming that we drop the support for the selector there, it would have to have one generic type. Although that would mean users need to provide a wrapper if they have (user data, priority) separtely, such a solution is still intuitive. And most importantly, it allows all input formats (that's why it has to be done this way if we drop the selector). Without the selector, Enqueue(TElement, TPriority) would not allow types that are already comparable. A single generic type is also crucial for enumeration -- so that this method can be included in IQueue<T>.

Miscellaneous

@svick

Is it defined in what order are the elements returned when enumerating? My guess is that it's undefined, to make it efficient.

To have the order while enumerating we essentially need to sort the collection. That's why yes, it should be undefined, as the customer can simply run OrderBy themselves and achieve about the same performance (but some people don't need it).

Idea: in the priority queue this could be ordered, in the heap unordered. It feels better. Somehow a priority queue feels like iterating it in order. A heap definitely not. Another benefit of the approach above.

@pgolebiowski
That all seems very sane. Would you mind clarifying, in the Delivering IQueue then section, are you suggesting T : IComparable<T> for the "one generic type" as an alternative to the selector given the allowance of duplicate elements?

I would support having the two separate types.

I don't understand the rationale behind using object as the handle type: is this just to avoid creating a new type? Defining a new type would provide minimal implementation overhead, while making the API harder to misuse (what's stopping me trying to pass a string to Remove(object)?), and easier to use (what's stopping me trying to pass the Element itself to Remove(object), and who could blame me for trying?).

I propose the additional of a suitably named dummy type, to replace object in the handle methods, in the interests of a more expressive interface.

If debugability trumps memory overhead, the handle type could even include information about which queue it belongs to (would have to become Generic, which would further the type safety of the interface), providing useful exceptions along the lines of ("Handle supplied was created by a different Queue"), or whether it has already been consumed ("Element referenced by Handle has already been removed").

If the handle idea goes ahead, I would propose that if this information is deemed useful, then a subset of these exception would also be thrown by the corresponding TryRemove and TryUpdate methods, excepting that where the element is no longer present, either because it has been dequeued or removed by handle. This would entail a less boring, generic, suitably named handle type.

@VisualMelon

Would you mind clarifying, in the Delivering IQueue then section, are you suggesting T : IComparable<T> for the "one generic type" as an alternative to the selector given the allowance of duplicate elements?

Sorry for not making this clear.

  • I meant delivering PriorityQueue<T>, with no constraint on the T.
  • It would still make use of an IComparer<T>.
  • If it happens that the T is already comparable, then simply Comparer<T>.Default would be assumed (and you can call the default constructor of the priority queue with no arguments).
  • The selector had a different purpose — to be able to consume all types of user data. There are several configurations:

    1. User data is separate from the priority (two physical instances).
    2. User data contains the priority.
    3. User data is the priority.
    4. Rare case: priority is obtainable via some other logic (resides in an object different from the user data).

    The rare case would not be possible in PriorityQueue<T>, but that doesn't matter much. What is important is that we can now handle (1), (2), and (3). However, if we had two generic types, we would have to have a method like Enqueue(TElement, TPrioriity). That would limit us to (1) only. (2) would lead to redundancy. (3) would be incredibly ugly. There is a bit more on this in the IQueue > Enqueue section above (second Enqueue method and default(TPriority).

I hope it is clearer now.

BTW, assuming such a solution, designing the API of PriorityQueue<T> and IQueue<T> would be trivial. Just take some of the methods in Queue<T>, throw them into IQueue<T> and make PriorityQueue<T> implement them. Tadaa! 😄

I don't understand the rationale behind using object as the handle type: is this just to avoid creating a new type?

  • Yes, precisely. Given an assumption that we don't want to expose such a type, it's the only solution to still use the concept of a handle (and as such have more power and speed). I agree it's not ideal though -- it was rather a clarification what would have to happen if we stick to the current approach and want to have more power & efficiency (which I am against).
  • Given that we would go with the alternative approach (separate priority queue and heaps), this is easier. We could let the PriorityQueue<T> reside in System.Collections.Generic, while the heap functionality would be in System.Collections.Specialized. There, we would have a higher chance of introducing such a type, eventually having the wonderful compile-time error detection.
  • But once again -- it is very important to design priority queue and heap functionality together if we would like to have such an approach. Because one solution provides what the other one doesn't.

If debugability trumps memory overhead, the handle type could even include information about which queue it belongs to (would have to become Generic, which would further the type safety of the interface), providing useful exceptions along the lines of ("Handle supplied was created by a different Queue"), or whether it has already been consumed ("Element referenced by Handle has already been removed").

If the handle idea goes ahead, I would propose that if this information is deemed useful...

Definitely more is possible then :wink:. Especially to extend all of that by our community in third-party libraries.

@karelz @safern @ianhays @terrajobst @bendono @svick @alexey-dvortsov @SamuelEnglard @xied75 and others -- do you think such an approach would make sense (as described in this and this post)? Would it satisfy all of your expectations and needs?

I think the idea of creating two class makes a lot of sense and solves a lot of issues. Only though I have is that it might make sense for PriorityQueue<T> to use the Heap<T> internally and just "hide" it's advanced functionality.

So basically...

IQueue<T>

public interface IQueue<T> :
    IEnumerable,
    IEnumerable<T>,
    IReadOnlyCollection<T>
{
    int Count { get; }

    void Clear();
    bool Contains(T element);
    bool IsEmpty();

    void Enqueue(T element);

    T Peek();
    T Dequeue();

    bool TryPeek(out T element);
    bool TryDequeue(out T element);
}

Notes

  • In System.Collections.Generic.
  • Idea: we could also add methods for removing elements here (Remove and TryRemove). It is not in Queue<T> though. But it's not necessary.

PriorityQueue<T>

public class PriorityQueue<T> : IQueue<T>
{
    public PriorityQueue();
    public PriorityQueue(IComparer<T> comparer);
    public PriorityQueue(IEnumerable<T> collection);
    public PriorityQueue(IEnumerable<T> collection, IComparer<T> comparer);

    public IComparer<T> Comparer { get; }
    public int Count { get; }

    public bool IsEmpty();
    public void Clear();
    public bool Contains(T element);

    public void Enqueue(T element);

    public T Peek();
    public T Dequeue();

    public bool TryPeek(out T element);
    public bool TryDequeue(out T element);

    public void Remove(T element);
    public bool TryRemove(T element);

    public IEnumerator<T> GetEnumerator();
    IEnumerator IEnumerable.GetEnumerator();
}

Notes

  • In System.Collections.Generic.
  • If the IComparer<T> is not delivered, Comparer<T>.Default is summoned.
  • It is stable.
  • Allows duplicates.
  • Remove and TryRemove remove only the first occurence (if they find it).
  • Enumeration is ordered.

Heaps

Not writing everything here now, but:

  • In System.Collections.Specialized.
  • It would not be stable (and as such faster and more efficient in terms of memory).
  • Handle support, proper updating and removing

    • done quickly in O(log n) instead of O(n)

    • done correctly

  • Enumeration is not ordered (faster).
  • Allows duplicates.

Agree with the IQueue proposal. I was going to pitch the same thing today, it feels like the right level of abstraction to have at the interface level. "An interface to a data structure that can have items added to it and ordered items removed from it."

  • The spec for IQueue looks good.

  • Consider adding "int Count { get; }" to IQueue so that it's clear Count is desired regardless of whether or not we inherit from IReadOnlyCollection.

  • On the fence regarding TryPeek, TryDequeue in IQueue considering they aren't in Queue, but those helpers should probably be added to Queue and Stack as well.

  • IsEmpty seems like an outlier; not many other collection types in the BCL have it. To add it to the interface we'd have to assume it's going to be added to Queue, and it seems a bit strange adding it to Queue and nothing else. Recommending we drop it from the interface, and perhaps the class as well.

  • Drop TryRemove and change Remove to "bool Remove". Keeping in alignment with the other collection classes will be important here - developers will have a lot of muscle-memory that says "remove() in a collection doesn't throw". It's an area many developers won't test well and will cause a lot of surprises if the normal behavior is changed.

From your earlier quote @pgolebiowski

  1. User data is separate from the priority (two physical instances).
  2. User data contains the priority.
  3. User data is the priority.
  4. Rare case: priority is obtainable via some other logic (resides in an object different from the user data).

Also recommend considering 5. User data contains the priority in multiple fields (as we saw in Lucene.net)

On the fence regarding TryPeek, TryDequeue in IQueue considering they aren't in Queue

They are System/Collections/Generic/Queue.cs#L253-L295

On the other hand Queue doesn't have
c# public void Remove(T element); public bool TryRemove(T element);

Consider adding "int Count { get; }" to IQueue so that it's clear Count is desired regardless of whether or not we inherit from IReadOnlyCollection.

OK. Will modify it.

IsEmpty seems like an outlier; not many other collection types in the BCL have it.

This one was added by @karelz, I just copied it. I like it though, might be considered in the API review :)

Drop TryRemove and change Remove to "bool Remove".

I think Remove and TryRemove is consistent with other methods like that (Peek and TryPeek or Dequeue and TryDequeue).

  1. User data contains the priority in multiple fields

It's a valid point, but this in fact can also be handled with a selector (it's any function after all) -- but that's for heaps.

IsEmpty seems like an outlier; not many other collection types in the BCL have it.

FWIW, bool IsEmpty { get; } is something I wish we'd added to IProducerConsumerCollection<T>, and I've regretted it not being there on multiple occasions since. Without it, wrappers often need to do the equivalent of Count == 0, which for some collections is significantly less efficient to implement, in particular on most of the concurrent collections.

@pgolebiowski How do you feel about the idea of creating a github repository to host the current API contracts and an .md file or two containing the design rationale. Once that stabalizes, it can be used as a place to build out the initial implementation before doing a PR up to CoreFxLabs when ready?

@svick

If Peek and Dequeue used out parameters for priority, then they could also have overloads that don't return priority at all, which I think would simplify common usage

Agreed. Let's add them.

Should there be some way to enumerate just the items in the priority queue, ignoring priorities?

Good question. What would you propose? IEnumerable<TElement> Elements { get; }?

Stability - I think this would mean that internally, priority would have to be something like a pair of (priority, version)

I think we can avoid that by considering Update as logical Remove+Enqueue. We would add items always to the end of same-priority items (basically consider comparer result 0 as -1). IMO that should work.


@benaadams

The TryX methods weren't a common pattern in the original BCL; until the Concurrent types (though think Dictionary.TryGetValue in 2.0 may have been first example?)
e.g. the TryX methods have only just been added to Core for Queue and Stack and aren't part of Framework yet.

I admit I am still new-ish to BCL. From API review meetings and from the fact we added bunch of Try* methods recently I was under the impression it is a common pattern for much longer 😉.
Either way, it is common pattern now and we should not be afraid to use it. The fact the pattern is not yet in .NET Framework should not stop us innovate in .NET Core -- that is its main purpose, to innovate faster.


@pgolebiowski

Alternatively, we can remove this functionality from the priority queue entirely and add it in the heap instead. This has numerous benefits

Hmm, something tells me you might have an agenda here 😆
Now seriously, it is actually good direction we were aiming for all the time - PriorityQueue was never meant to be a reason for NOT doing Heap. If we are all ok with the fact that Heap might not make it into CoreFX and might stay "just" in CoreFXExtensions repo as advanced data structure alongside PowerCollections, I am fine with that,

Important note: if we want such an approach, we need to design priority queue and heap together. Because their purposes would be different and one solution would provide what the other one would not.

I don't see why we need to do it together. IMO we can focus on PriorityQueue and add "proper" Heap in parallel/later. I don't mind if someone does them together, but I don't see any strong reason why existence of easy-to-use PriorityQueue should affect design of "proper/high-perf" advanced Heap family.

IQueue

Thanks for the write up. Given your points, I don't think we should go out of our way to add IQueue. IMO it is nice-to-have. If we had a selector, then it would be natural. However, I don't like the selector approach as it brings weirdness and complication in describing when the selector is called by the PriorityQueue (only upon Enqueue and Update.

Alternative (object) handle

That is actually not a bad idea (although a bit ugly) to have such overloads IMO. We would have to be able to detect that the handle is from wrong PriorityQueue, which is O(log n).
I have a feeling that API reviewers will reject it, but IMO it is worth a shot to experiment with it ...

Stability

I don't think stability comes with any perf/memory overhead (assuming we dropped Update already or that we treat Update as logical Remove+Enqueue, so we basically reset element's age). Just treat comparer result 0 as -1 and everything is good ... Or am I missing something?

Selector and IQueue<T>

It might be a good idea to have 2 proposals (and we may potentially take both):

  • PriorityQueue<T,U> without selector and without IQueue (which would be cumbersome)
  • PriorityQueue<T> with selector and IQueue

Quite a few folks hinted that above.

Re: Latest proposal from @pgolebiowski - https://github.com/dotnet/corefx/issues/574#issuecomment-308427321

IQueue<T> - we could add Remove and TryRemove, but Queue<T> doesn't have them.

Would they make sense to be added to Queue<T>? (@ebickle agrees) If yes, we should bundle the addition as well.
Let's add it in the interface and call out they are questionable / need Queue<T> addition as well.
Same for IsEmpty -- whatever needs Queue<T> and Stack<T> additions, let's mark it so in the interface (it will be easier to review & digest).
@pgolebiowski can you please add comment to IQueue<T> with list of classes we think it will be implemented by?

PriorityQueue<T>

Let's add struct enumerator (I think it is common BCL pattern lately). It was called out couple of times on the thread, then dropped/forgotten.

Heaps

Namespace choice: Let's not waste time on namespace decision yet. If Heap ends up in CoreFxExtensions, we do not know yet what kind of namespaces we will allow there. Maybe Community.* or something like that. It depends on CoreFxExtensions purpose/operation mode discussions result.
Note: One idea for CoreFxExtensions repo is to give proven community members Write permissions, and let it be driven primarily by community with .NET team providing just advice (incl. API review expertise) & .NET team/MS being the arbiter when needed. If that's where we land, we would likely not want it in System.* or Microsoft.* namespace. (Disclaimer: early thinking, please don't jump to conclusions yet, there are other alternative governance ideas in flight)

Drop TryRemove and change Remove to bool Remove. Keeping in alignment with the other collection classes will be important here - developers will have a lot of muscle-memory that says "remove() in a collection doesn't throw". It's an area many developers won't test well and will cause a lot of surprises if the normal behavior is changed.

👍 We should definitely at least consider aligning it with other collections. Can someone scan other collections what is their Remove pattern?

@ebickle How do you feel about the idea of creating a github repository to host the current API contracts and an .md file or two containing the design rationale.

Let's host it directly in CoreFxLabs. 👍

@karelz

Should there be some way to enumerate just the items in the priority queue, ignoring priorities?

Good question. What would you propose? IEnumerable<TElement> Elements { get; }?

Yeah, either that or a property returning some kind of ElementsCollection is probably the best option. Especially since it's similar to Dictionary<K, V>.Values.

We would add items always to the end of same-priority items (basically consider comparer result 0 as -1). IMO that should work.

That's not how heaps work, there is no "end of same-priority items". Items with the same priority can be spread all over the heap (unless they're close to being the current minimum).

Or, put another way, to maintain stability, you have to consider comparer result of 0 to be -1 not just when inserting, but also when items are moved around in the heap afterwards. And I think you have to store something like version along with the priority to do that properly.

IQueue - we could add Remove and TryRemove, but Queue doesn't have them.

Would they make sense to be added to Queue? (@ebickle agrees) If yes, we should bundle the addition as well.

I don't think Remove should be added to IQueue<T>; and I'd even suggest Contains is dodgy; it limits the usefulness of the interface and what types of Queue it can be used for unless you start also throwing NotSupportedExceptions. i.e. what is the scope?

Is it just for vanilla queues, message queues, distributed queues, ServiceBus queues, Azure Storage Queues, ServiceFabric Reliable​Queues etc... (glossing over some of them preferring async methods)

That's not how heaps work, there is no "end of same-priority items". Items with the same priority can be spread all over the heap (unless they're close to being the current minimum).

Fair point. I was thinking about it as binary search tree. Gotta brush my d.s. basics I guess :)
Well, either we implement it with different d.s. (array, binary search tree (or whatever the official name is) - @stephentoub has ideas/suggestions), or we settle for random order. I don't think maintaining version or age field is worth the guarantee of stability.

@ebickle How do you feel about the idea of creating a github repository to host the current API contracts and an .md file or two containing the design rationale.

@karelz Let's host it directly in CoreFxLabs.

Please have a look at this document.

It might be a good idea to have 2 proposals (and we may potentially take both):

  • PriorityQueue without selector and without IQueue (which would be cumbersome)
  • PriorityQueue with selector and IQueue

I got rid of the selector altogether. It is necessary only if we want to have one unified PriorityQueue<T, U>. If we have two classes -- well, a comparer is enough.


Please let me know if you find anything worth adding / changing / removing.

@pgolebiowski

Document looks good. Starting to feel like a solid solution I'd expect to see in the .NET core libraries :)

  • Should add a nested Enumerator class. Not sure if the situation has changed, but years ago having the child structure avoided a garbage collector allocation caused by boxing the result of GetEnumerator(). See https://github.com/dotnet/coreclr/issues/1579 for example.
    public class PriorityQueue<T> // ... { public struct Enumerator : IEnumerator<T> { public T Current { get; } object IEnumerator.Current { get; } public bool MoveNext(); public void Reset(); public void Dispose(); } }

  • PriorityQueue should have a nested Enumerator struct too.

  • I'm still voting to have "public bool Remove(T element)" without the TryRemove since it's a long-standing pattern and changing it makes developer mistakes highly likely. We can let the API Review crew chime in, but it's an open question in my mind.

  • Is there any value in specifying the initial capacity in the constructor or having a TrimExcess function, or is that a micro-optimization right now - especially considering the IEnumerable constructor parameters?

@pgolebiowski thanks for putting it into a document. Can you please submit your doc as PR against CoreFxLab master branch? Tag @ianhays @safern they will be able to merge the PRs.
We can then use CoreFxLab issues for further discussions about particular design points - it should be easier than this mega-issue (I just kicked off email to create area-PriorityQueue there).

Please put a link to the CoreFxLab pull request here?

  • @ebickle @jnm2 -- added
  • @karelz @SamuelEnglard -- referenced

If the API is approved in its current form (two classes), I would create another PR to CoreFXLab with an implementation for both using a quaternary heap (see implementation). The PriorityQueue<T> would just use one array beneath, while PriorityQueue<TElement, TPriority> would use two -- and rearrange their elements together. This could spare us additional allocations and be as efficient as we can. I'll add that once there is green light.

Is there any plan of making thread-safe version like ConcurrentQueue?

I'd :heart: to see a concurrent version of this, implementing IProducerConsumerCollection<T>, so it could be used with BlockingCollection<T> etc.

@aobatact @khellang It sounds like a completely different thread:

I agree it is very valuable though :wink:!

public bool IsEmpty();

Why is this a method? Every other collection on the framework that has IsEmpty has it as a property.

I updated the proposal in top-post with IsEmpty { get; }.
I have also added link to the latest proposal doc in corefxlab repo.

Hi all,
I think there are a lot of voices arguing it would be good to support update if possible. I think nobody ended up doubting it is a useful feature for graph searches.

But there were some objections raised here and there. So I can check my understanding it seems like the main objections are:
-it is not clear how update should work when there are duplicate elements.
-there is some argument about whether duplicate elements are even desirable to support in a priority queue
-there is some concern that it might be inefficient to have an additional lookup data structure for finding elements in the backing data structure just to update their priority. Especially for scenarios where update is never performed! And/or affect worst-case performance guarantees...

Anything I missed?

OK I guess one more was - it's been claimed that update/remove semantically meaning 'updateFirst' or 'removeFirst' might be weird. :)

In which version of .Net Core that we can begin to use PriorityQueue?

@memoryfraction .NET Core itself does not have a PriorityQueue yet (there are proposals - but we have not had time to spend on it recently). There is no reason why it has to be in the Microsoft .NET Core distribution however. Anyone in the community can put one on NuGet. Perhaps someone on this issue can suggest one.

@memoryfraction .NET Core itself does not have a PriorityQueue yet (there are proposals - but we have not had time to spend on it recently). There is no reason why it has to be in the Microsoft .NET Core distribution however. Anyone in the community can put one on NuGet. Perhaps someone on this issue can suggest one.

Thanks for your reply and suggestion.
When I use C# to practice leetcode, and need to use SortedSet to handle set with duplicate elements. I have to wrap the element with a unique ID to solve the issue.
So, I prefer to have the Priority Queue in .NET Core in the future because it will be more convenient.

What's the current state of this issue? I've just found myself recently requiring a PriorityQueue

This proposal does not enable collection initialization by default. PriorityQueue<T> does not have an Add method. I think collection initialization is a big enough language feature to warrant the addition of a duplicate method.

If someone has a fast binary heap they'd like to share, I'd like to compare it with what I wrote.
I fully implemented the API as proposed. However, rather than a heap, it uses a simple sorted array. I keep the lowest value item at the end so it's sorted in the reverse of the usual order. When the number of items is less than 32 I use a simple linear search to find where to insert new values. After that, I used a binary search. Using random data I found this approach to be faster than a binary heap in the simple case of filling the queue and then draining it.
If I have the time, I'll put it in a public git repo so people can critique it and update this comment with the location.

@SunnyWar I'm currently using: https://github.com/BlueRaja/High-Speed-Priority-Queue-for-C-Sharp
which has good performance. I think it would make sense to test your implementation against the GenericPriorityQueue there

Please disregard my earlier post. I found an error in the test. Once fixed the binary heap does perform the best.

@karelz Where is this proposal currently at on ICollection<T>? I didn't get why that was not going to be supported, even though the collection is obviously not intended to be readonly?

@karelz re the open question of dequeue order: I argue in favor of lowest priority value is removed first. It is convenient for shortest-path algorithms and is natural for timer queue implementations too. And those are the main two scenarios I think it'll be used for. I'm not really aware of what the opposing argument would be, unless its a linguistic one related to 'high' priorities, and 'high' numeric values.

@karelz re the enumeration order... how about having a Sort() method on the type which will put the collection's internal data heap into sorted order in-place (in O (n log n)). And calling Enumerate() after Sort() enumerates the collection in O(n). And if Sort() is NOT called, it returns elements in non-sorted order in O(n)?

Triage: moving to future as there doesn't seem to be consensus on whether we should implement this.

That’s very disappointing to hear. I still have to use thirdparty solutions for this.

What are the primary reasons you don't like using a third party solution?

heap data structure is a MUST for doing leetcode
more leetcode, more c# code interview which means more c# developers.
more developers means better ecosystem.
better ecosystem means that if we can still program in c# tomorrow.

in sum: this is not only a feature, but also the future. that is why is issue is labeled 'future'.

What are the primary reasons you don't like using a third party solution?

Because when there isn't a standard, everyone invents their own, each with its own set of quirks.

There have been plenty of times when I've wished for a System.Collections.Generic.PriorityQueue<T> because it's relevant to something I'm working on. This proposal's been around for 5 years now. Why has it still not happened?

Why has it still not happened?

Priority starvation, no pun intended. Designing collections is work because if we put them in the BCL we have to think about how they exchange, what interfaces they implement, what the semantics are etc. None of this is rocket science, but it takes a while. Furthermore, you need to a set of concrete user cases and customers to judge the design against, which gets harder as the collections get more and more specialized. So far, there has always been other work that was considered more important than that.

Let's look at a recent example: immutable collections. They were designed by someone not working on the BCL team for a use case in VS that was around immutability. We worked with him to get the APIs "BCL-ified". And when Roslyn came online, we replaced their copies with ours in as many places as we could and we tweaked the design (and implementation) a lot based on their feedback. Without a "hero" scenario this is hard.

Because when there isn't a standard, everyone invents their own, each with its own set of quirks.

@masonwheeler is this what you have seen for PriorityQueue<T>? That there are several 3rd party options that are not interchangeable and no clearly accepted "best library for most"? (I have not done the research so I don't know the answer)

@eiriktsarpalis How is there not consensus on whether to implement?

@terrajobst Do you really need a hero scenario when this is a well-known pattern with well-known applications? Very little innovation should be required here. There's even a pretty fully-developed interface spec already. In my opinion the real reason this utility doesn't exist is not that its too hard, nor there's no demand nor use case for it. The real reason is that for any real software project its just easier to just build one yourself than to try to wage a political campaign for this to be put in the framework libraries.

What are the primary reasons you don't like using a third party solution?

@terrajobst
I’ve personally made the experience that 3rdparty solutions don’t always perform as expected/ don’t use current language features. With a standardized version i (as a user) could be quite certain that performance is the best you can get.

@danmosemsft

@masonwheeler is this what you have seen for PriorityQueue<T>? That there are several 3rd party options that are not interchangeable and no clearly accepted "best library for most"? (I have not done the research so I don't know the answer)

Yes. Just google "C# priority queue"; the first page is full of:

  1. Priority queue implementations on Github and other hosting sites
  2. People asking why in the world there's no official priority queue in Collections.Generic
  3. Tutorials on how to build your own priority queue implementation

@terrajobst Do you really need a hero scenario when this is a well-known pattern with well-known applications? Very little innovation should be required here.

In my experience yes. The devil's in the details and once we shipped an API we can't really make breaking changes. And there are many implementation choices that are user visible. We can preview the API as an OOB for a while but we've also learned that while we can certainly gather feedback the lack of a hero scenario means you don't have any tie breakers, which often results in a situation where when the hero scenario arrives, the data structure isn't meeting its requirements.

Apparently we need a hero. 😛

@masonwheeler I assumed your link would be to this😄 Now it's in my head.

Although as @terrajobst says, our primary issue here has been resources/attention (and you can blame us for that if you like) we would also like to strengthen the non-Microsoft ecosystem so that it's more likely you can find strong libraries for whatever scenario.

[edited for clarity]

@danmosemsft Nah, if I was going to pick that song I'd do the Shrek 2 version.

Hero app candidate #1: how about using one in TimerQueue.Portable?

Already been considered, prototyped, and discarded. It makes the very common case of a timer being quickly created and destroyed (e.g. for a timeout) less efficient.

@stephentoub I assume you mean that its less efficient for some scenarios where there's a small number of timers. But how does it scale?

I assume you mean that its less efficient for some scenarios where there's a small number of timers. But how does it scale?

No, I meant the common case is that you have lots of timers at any moment, but very few are firing. This is what results when timers are used for timeouts. And the important thing there is the speed at which you can add and remove from the data structure... you want that to be 0(1) and with very low overhead. If that becomes O(log N), it's a problem.

Having priority queue will definitely make C# more interview friendly.

Having priority queue will definitely make C# more interview friendly.

Yes, that's true.
I'm looking for it for the same reason.

@stephentoub For the timeouts that never actually happen, that makes perfect sense to me. But I wonder what happens to the system when suddenly a lot of timeouts do start happening, because suddenly there is packet loss or an unresponsive server or whatever? Also do recurring timers ala System.Timer use the same implementation? There, the timeout expiring would be the 'happy path'.

But I wonder what happens to the system when suddenly a lot of timeouts do start happening

Try it. :)

We saw a good deal of real workloads suffering with the previous implementation. In every case we've looked at, the new one (which doesn't use a priority queue but instead just has a simple split between soon and not-soon timers) has fixed the issue, and we've not seen new ones with it.

Also do recurring timers ala System.Timer use the same implementation?

Yes. But they're generally spread out, often fairly evenly, in real-world applications.

5 years later, there is still no PriorityQueue.

5 years later, there is still no PriorityQueue.

Must not be a high enough priority...

@stephentoub but maybe actually @eiriktsarpalis do we actually have any traction on this right now? If we have a finalized API design I would be willing to work on it

I haven't seen a declaration this is settled yet and I'm not sure if there can be a final api design without a designated killer app. But...
assuming the top killer app candidate is programming contests / teaching / interviews, I think Eric's design at top looks serviceable enough... and I still have my counterproposal sitting around (newly revised, still not withdrawn!)

https://github.com/TimLovellSmith/corefxlab/blob/priorityQueueSpecSolved/docs/specs/priority-queue.md

The main differences I'm noticing between them are whether it should try to talk like a dictionary, and whether selectors should be a thing.

I'm starting to forget exactly why I wanted it to be a dictionary. Having an indexed assigner for updating priorities, and being able to enumerate keys with their priorities, seemed like goodness and very dictionary-like to me. Being able to visualize it as a dictionary in the debugger could be sweet too.

I think selectors actually kind of drastically change the expected class interface, FWIW. Selector should be something baked in at construction time, and if you have it, it removes the need for anyone to ever pass anyone else a priority value -- they should just call the selector if they want to know the priority. So you wouldn't want to have priority parameters in any method signatures at all except the selector's pretty much. So in that case it becomes a sort of more specialized 'PrioritySet' class. [For which impure selectors are a possible bug problem!]

@TimLovellSmith I understand the desire for this behavior, but because this has been a 5 year ask, wouldn't it be reasonable to just implement a array based binary heap with a selector and sharing the same API surface area as Queue.cs? I think that is by far what most users need. In my opinion this data structure hasn't been implemented because we can't agree on an API design, but if we are making a PriorityQueue I do believe we should just emulate the api design of Queue.

@Jlalond Hence the proposal here, huh? I have no objection to heap based array implementation. I now recall the biggest objection I had versus selectors, is that its not easy enough to do priority updates right. A plain old queue doesn't have a priority update operation, but updating element priority is an important operations for many priority queue algorithms.
People should be forever thankful if we go with an API which doesn't require them to debug corrupted priority queue structures.

@TimLovellSmith Sorry Tim, I should've clarified the inheritance from IDictionary.

Do we have a use case where the priority would change?

I like your implementation, but I think we can reduce the replication of IDictionary Behavior. I think we shouldn't inherit from IDictionary<> but only ICollection, because I don't think the dictionary access patterns are intuitive,

However I do really think returning T and it's associated priority would make sense, but I don't know a use case where I would need to know the priority of an element within the data structure when enqueueing or dequeueing it.

@Jlalond
If we're have a priority queue, then it should support all operations that it simply and efficiently can, _and_ that are
'expected' to be there by people familiar with this kind of data structure / abstraction of one.

Updating priority belongs in the API because it falls into both those categories. Updating priority is important enough consideration in many algorithms that the complexity of doing the operation with heap data structures affects the complexity of the overall algorithm, and it is regularly measured, see:
https://en.wikipedia.org/wiki/Priority_queue#Specialized_heaps

TBH mainly its decrease_key which is interesting for algorithms and efficiency, and measured, so I'm personally fine for the API only has DecreasePriority(), and doesn't actually have UpdatePriority.
But for convenience it seems nicer not to bother users of the API with worrying about whether every change is an increase or decrease. So that joe developer has 'update' right there out of the box without having to wonder why it only supports decrease and not increase, (and which to use when), I think its best to have a general update priority API, but document that when using it for decrease it has cost X, when using it for increase it has cost Y because it is implemented the same as Remove+Add.

RE dictionary I've agreed. Removed it from my proposal in the name of simpler is better.
https://github.com/TimLovellSmith/corefxlab/blob/priorityQueueSpecSolved/docs/specs/priority-queue.md

I don't know a use case where I would need to know the priority of an element within the data structure when enqueueing or dequeueing it.

I'm not sure what this comment is about. You don't need to know the priority of an element in order to dequeue it. But you do need to know its priority when you enqueue it. You do maybe want to learn the final priority of an element when you dequeue it, as that value can have meaning e.g. distance.

@TimLovellSmith

I'm not sure what this comment is about. You don't need to know the priority of an element in order to dequeue it. But you do need to know its priority when you enqueue it. You do maybe want to learn the final priority of an element when you dequeue it, as that value can have meaning e.g. distance.

Sorry, I misunderstood what TPriorty meant in your key value pair.

Okay, last two questions, why they KeyValuePair with TPriority, and the best time I can see for a reprioritization is N log N, I agree it is valuable, but also what would that API look like?

the best time I can see for a reprioritization is N log N,

That's best time for reprioritization of N items as opposed to one item, right?
I'll clarify the doc: "UpdatePriority | O(log n)" should read "UpdatePriority (single item) | O(log n)".

@TimLovellSmith Makes sense, but wouldn't any update of priority actually ne O2 * (log n), as we'd need to remove the element and then reinsert it? Basing my assumption off this being a binary heap

@TimLovellSmith Makes sense, but wouldn't any update of priority actually ne O2 * (log n), as we'd need to remove the element and then reinsert it? Basing my assumption off this being a binary heap

Constant factors like that 2 are generally ignored in complexity analysis because they become mostly irrelevant as the size of N grows.

Understood, mostly wanted to know if Tim had any exciting ideas to only do
one operation :)

On Tue, Aug 18, 2020, 23:51 masonwheeler notifications@github.com wrote:

@TimLovellSmith https://github.com/TimLovellSmith Makes sense, but
wouldn't any update of priority actually ne O2 * (log n), as we'd need to
remove the element and then reinsert it? Basing my assumption off this
being a binary heap

Constant factors like that 2 are generally ignored in complexity analysis
because they become mostly irrelevant as the size of N grows.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/dotnet/runtime/issues/14032#issuecomment-675887763,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AF76XTI3YK4LRUVTOIQMVHLSBNZARANCNFSM4LTSQI6Q
.

@Jlalond
No novel ideas, I was just ignoring an constant factor, but decreases are also different from increases

If the priority update is a decrease, you don't have to remove it and re-add it, it just bubbles up, so its 1 * O(log n) = O(log n).
If the priority update is an increase, you probably have to remove it and re-add it, so its 2 * O(log n) = still O (log n).

Someone else might have invented a better datastructure/algorithm for increase priority than remove + readd, but I haven't tried to find it, O(log n) seems good enough.

KeyValuePair crept into the interface while it was a dictionary, but survived the removal of dictionary, mainly so that its possible to iterate the collection of _elements with their priorities_. And also so that you can dequeue elements with their priorities. But maybe for simplicity again, dequeue with priorities should only be in the 'advanced' version of Dequeue API. (TryDequeue :D)

I'll make that change.

@TimLovellSmith Cool, I look forward to your revised proposal. Can we submit it for review so that I can start work on this? Additionally, I know there was some impetus for a pairing queue to improve merge time, but I still think an array based binary heap would be the most overall performance. Thoughts?

@Jlalond
I've made the minor changes to simplify that Peek + Dequeue api.
Some ICollection apis related to KeyValuePair still remain, because I don't see anything obviously better to replace them with.
Same PR link. Is there another way I should submit it for review?

https://github.com/TimLovellSmith/corefxlab/blob/priorityQueueSpecSolved/docs/specs/priority-queue.md

I don't mind which implementation it uses as long as its about as fast as array based heap or better.

@TimLovellSmith I actually know nothing about API review, but I imagine @danmosemsft can point us in the right direction.

My first question is what would be? I assume it would be an int, or a floating point number, but for me declaring the number that is "internal" to the queue seems odd to me

And I actually prefer binary heaps, but wanted to make sure we're all okay going in that direction

@eiriktsarpalis @layomia are the area owners and would have to shepherd this through API review. I haven't been following the discussion, have we reached a point of consensus?

As we've discussed in the past the core libraries are not the best place for all collections, especially those that are opinionated and/or niche. Eg., we ship annually - we cannot move as quickly as a library can. We have a goal to build a community around a collections pack to fill that gap. But 84 thumbs up on this one suggests there's a widespread need for this in the core libraries.

@Jlalond Sorry, I didn't quite get what the first question is supposed to be there. Were you asking why TPriority is generic? Or has to be a number? I doubt many folks will be using non-numeric priority types. But they might prefer to use a byte, int, long, float, double, or enum.

@TimLovellSmith Yep, just wanted to know if it was generic

@danmosemsft Thanks Dan. The number of unaddressed objections/comments I see to my proposal[1] is roughly zero, and it addresses any important questions that were left open by @ebickle's proposal at top (which it is becoming increasingly similar to).

So I claim it is passing sanity checks so far. There must be some review required still, we can talk about whether its useful to inherit IReadOnlyCollection (sounds not very useful but I should defer to the experts) and so on -- I guess that is what the api review process is for! @eiriktsarpalis @layomia may I ask you to give this a look?

[1] https://github.com/TimLovellSmith/corefxlab/blob/priorityQueueSpecSolved/docs/specs/priority-queue.md

[PS - based on thread sentiment, the current proposed killer app is coding competitions, interview questions, etc, where you just want nice simple API to use that works either classes or structs plus your favorite numeric type of the day, with the right O(n) and not too slow an implementation - and I'm happy to be guinea pig to apply it to a few problems. Sorry its not anything more exciting.]

Just speaking up because I do use priority queues all the time for 'real' projects, primarily for graph search (e.g. problem optimisation, route finding), and ordered extraction (by chunks (e.g. quad-tree nearest neighbours), scheduling (i.e. like the Timer discussion above)). I have a couple of projects I could plug an implementation into directly for sanity checking/testing, if it helps. The current proposal looks good to me (would work for all my purposes, if not an ideal fit). I have a couple of small observations:

  • TElement appears a couple of times where it should be TKey
  • My own implementations often include a bool EnqueueOrUpdateIfHigherPriority for ease of use (and in some cases efficiency), which often ends up being the only enqueueing/updating method that gets used (e.g. in graph search). Obviously it's non-essential and would add additional complexity to the API, but it is a very nice thing to have.
  • There are two copies of Enqueue (I don't mean Add): is one meant to be bool TryEnqueue?
  • "// enumerates the collection in arbitrary order, but with the least element first": I don't think that last bit is useful; I would prefer the enumerator didn't have to perform any comparisons, but I presume this will be 'free' so it won't bother me
  • The naming of EqualityComparer is a bit unexpected: I'd have expected KeyComparer to go with PriorityComparer.
  • I don't understand the notes on the complexity of CopyTo and ToArray.

@VisualMelon
Thank you very much for the review!

I like the naming suggestion of KeyComparer. The decrease priority API sounds very useful. How about we add one or both of those APIs. Since we're using in the 'least priority comes first' model, would you only use decrease? Or would you wish for increase too?

    public void EnqueueOrIncreasePriority(TKey key, TPriority priority); // doesn't throw, only updates priority of keys already in the collection if new priority is higher
    public void EnqueueOrDeccreasePriority(TKey key, TPriority priority); // doesn't throw, only updates priority of keys already in the collection new priroity is lower

The reason I thought enumerating returns the least element first was to be consistent with the mental model that Peek() returns the element at the front of the collection. And also no comparisons are necessary to implement it if its a binary heap, so it seemed like a freebie. But maybe it is less free than I think - unwarranted complexity? I'm happy to take it out for that reason.

Two Enqueues was accidental. We do have EnqueueOrUpdate, where priority always gets updated. I guess TryUpdate is the logical inverse of that, where priority should never be updated for items already in the collection. I can see it fills that logical gap but I'm not sure whether it is an API people will want in practice.

I can only imagine using the decreasing variant: it's a natural operation in route finding to want to enqueue a node or replace an existing node with one that has a lower priority; I couldn't immediately suggest a use for the opposite. A boolean return parameter can be handy should you need to perform any operation when the item is queued/not queued.

I wasn't assuming a binary heap, but if it's free then I have no issue with the first element always being the minimum, it just seemed an odd constraint.

@VisualMelon @TimLovellSmith Would one API be fine?
EnqueueOrUpdatePriority? I imagine just the ability to reposition a node within the queue is valuable for traversal algorithms even if it increases or decreases priority

@Jlalond yeah, I mention it because it can be helpful for efficiency depending on the implementation, and directly encodes the use case of only wanting to queue something if it improves on what you already have in the queue. I won't presume to say whether the additional complexity is appropriate for such a general purpose tool, but it certainly isn't necessary.

Updated to remove EnqueueOrIncreasePriority, keeping EnqueueOrUpdate and EnqueueOrDecrease. Letting them return bool when an item is newly added to the collection, like HashSet.Add().

@Jlalond I apologize for the above error (deletion of your latest comment asking when this issue will be reviewed).

I just wanted to mention that @eiriktsarpalis will be back next week and we'll discuss prioritizing this feature and reviewing the APIs then.

This is something we are considering but are not committed yet for .NET 6.

I see that this thread got revived by starting off from scratch and forking, even though we had in-depth discussions on this topic in 2017 for several months (the majority of this huge thread) and collectively produced this proposal — scroll above to see. This is also the proposal that @karelz linked in the first line of the first post for visbility:

See LATEST Proposal in corefxlab repo.

As of 2017-06-16, we were waiting for an API review that has never happened and the status was:

If the API is approved in its current form (two classes), I would create another PR to CoreFXLab with an implementation for both using a quaternary heap (see implementation). The PriorityQueue<T> would just use one array beneath, while PriorityQueue<TElement, TPriority> would use two -- and rearrange their elements together. This could spare us additional allocations and be as efficient as we can. I'll add that once there is green light.

I recommend to continue iterating on the proposal we made in 2017, starting from a formal review that has not happened yet. This will allow us to avoid forking and iterate on the back of a proposal put together after hundreds of posts exchanged by community members, also out of respect for the effort put into this by everyone involved. I am happy to discuss it if there is feedback.

@pgolebiowski Thanks for coming back to the discussion. I would like to apologize for effectively forking the proposals even further, which is not the most effective way to collaborate. It was an impulsive move on my part. (I had the impression that the discussion had completely stalled due to there being too many open questions in the proposals too far, and just needed a more opinionated proposal.)

How about if we try to move forward on this, by at least temporarily forgetting the code content of my PR, and resume discussing here the identified 'open issues'? I would like to give my opinion on all of them.

  1. Class name PriorityQueue vs. Heap <-- I think it is discussed already and consensus is that PriorityQueue is better.

  2. Introduce IHeap and constructor overload? <-- I don't foresee a lot of value. I think for 95% of the world there is not a compelling reason (in e.g. performance delta) to have multiple heap implementations, and the other 5% will probably have to write their own.

  3. Introduce IPriorityQueue? <-- I also don't think its needed. I am assuming other languages and frameworks get along just fine without these interfaces in their standard library. Let me know if that is incorrect.

  4. Use selector (of priority stored inside the value) or not (5 APIs difference) <-- I don't see a strong argument in favor of supporting selectors in the priority queue. I feel that IComparer<> already serves as a really good minimal extensibility mechanism for comparing item priorities, and selectors doesn't offer any significant improvement. Also people are likely to be confused about how to use selectors with mutable priorities (or how NOT to use them).

  5. Use tuples (TElement element, TPriority priority) vs. KeyValuePair <-- Personally I think KeyValuePair is the preferred option for a priority queue where items have updatable priorities, since the items are being treated as keys in a set/dictionary.

  6. Should Peek and Dequeue rather have out argument instead of tuple? <-- I'm not sure of all the pros and cons, especially performance. Should we just pick the one which performs better in a simple benchmark test?

  7. Is Peek and Dequeue throwing useful at all? <-- Yes... It is what should happen for algorithms which assume incorrectly that there are still items in the queue. If you don't want algorithms to ever assume that, the best thing to do is not provide Peek and Dequeue apis at all, but just provide TryPeek and TryDequeue. [I would guess that there are algorithms which can provably safely call Peek or Dequeue in certain cases, and having Peek/Dequeue is a slight performance+usability improvement for those.]

Aside from that I also have some suggestions to consider adding to the proposal under consideration:

  1. PriorityQueue should only work with unique items, which are their own handle. It should support custom IEqualityComparer, in case it wants to compare non-extensible objects (like strings) in specific ways for doing priority updates.

  2. PriorityQueueshould support a variety of operations like 'remove', 'decrease priority if smaller', and especially 'enqueue item or decrease priority if smaller operation', all in O(log n) - for efficiently and simply implementing graph search scenarios.

  3. It would be convenient for lazy programmers if PriorityQueue provides index operator[] for get/set priority of existing items. It should be O(1) for get, O(log n) for set.

  4. Smallest priority first is best. Because priority queues are used for a lot are optimization problems, with 'minimal cost'.

  5. Enumeration should not provide any ordering guarantees.

Open issue - handles:
PriorityQueue items and priorities seems to be intended to allow working with duplicate values. Those values either need to be considered as 'immutable', or there must be a method to call to notify the collection when item priorities changed, possibly with handles, so it can be specific about priority of which duplicates changed (what scenario needs to do this??)... I have doubts about all useful this is, and whether this is a good idea, but if we have update priorities in such a collection it might be nice if costs incurred by that method can be incurred lazily, so that when you're just working immutable items, and don't want to use there's no extra cost... how to resolve? Can it be by having overloaded methods that use handles and ones which do not? Or should we just not have any item handles that are distinct from items themselves (used as hash key to lookup)?

A thought to help move this faster, it may make sense to move this type to the "Community Collections" library being discussed in https://github.com/dotnet/runtime/discussions/42205

I agree that smallest priority first is best. A priority queue is also useful in the development of turn-based gaming, and being able to have the priority be a monotonically-increasing counter of when each element gets its next turn is very helpful.

We are looking at bringing this is in for API review ASAP (though we are not yet committed at delivering an implementation within the .NET 6 timeframe).

But before we send this for review, we will need to iron out some of the high-level open questions. I think @TimLovellSmith's write up is a good starting point towards reaching that goal.

A few remarks on those points:

  • Consensus on questions 1-3 is long established, I think we could treat those as resolved.

  • _Use selector (of priority stored inside the value) or not_ -- Agreed with your remarks. Another issue with that design is that selectors are optional, which means you need to be careful not to call the wrong Enqueue overload (or risk getting an InvalidOperationException).

  • I favor using a tuple over KeyValuePair. Using a .Key property to access a TPriority value feels strange to me. A tuple lets you use .Priority which has better self-documentation properties.

  • _Should Peek and Dequeue rather have out argument instead of tuple?_ -- I think we should just follow established convention in similar methods elsewhere. So probably use an out argument.

  • _Is Peek and Dequeue throwing useful at all?_ -- Agreed 100% with your comments.

  • _Smallest priority first is best_ -- Agreed.

  • _Enumeration should not provide any ordering guarantees_ -- Might this not violate user expectations? What are the trade-offs? NB we can probably defer details like this for API review discussion.

I would like to rephrase a few other open questions as well:

  • Do we ship PriorityQueue<T>, PriorityQueue<TElement, TPriority> or both? -- Personally I think we should only implement the latter, since it seems to provide a cleaner and more general-purpose solution. In principle priority should not be a property intrinsic to the queued element, so we should not force users to encapsulate it in wrapper types.

  • Do we require element uniqueness, up to equality? -- Correct me if I'm wrong, but I feel uniqueness is an artificial constraint, forced on us by the requirement to support update scenaria without resorting to a handle approach. It also complicates the API surface, since we now also need to worry about elements having the right equality semantics (what's the appropriate equality when elements are large DTOs?). I can see three possible paths here:

    1. Require uniqueness/equality and support update scenaria by passing the original element (up to equality).
    2. Do not require uniqueness/equality and support update scenaria using handles. These could be obtained using optional Enqueue method variants for users who need them. If handle allocations are a big enough concern, I'm wondering whether those could be amortized à la ValueTask?
    3. Do not require uniqueness/equality and do not support updating priorities.
  • Do we support merging queues? -- Consensus seems to be no, since we're only shipping one implementation (presumably using array heaps) where merging is not efficient.

  • What interfaces should it implement? -- I saw a few proposals recommending IQueue<T>, but this feels like a leaky abstraction. I would personally prefer to keep it simple and just implement ICollection<T>.

cc @layomia @safern

@eiriktsarpalis On the contrary - there's nothing artificial about the constraint to support updates at all!

Algorithms with priority queues often update priorities of elements. The question is, do you provide an object-oriented API, which 'just works' for updating priorities of ordinary objects... or do you force people to
a) understand the handle model
b) keep additional data in memory, such as extra properties or an external dictionary, to keep track of the handles of objects, on their side, just so they can do updates (must go with the dictionary for classes they can't change? or upgrade the objects into tuples etc)
c) 'memory manage', or 'garbage collect' external structures i.e. clean up handle for items which are no longer in the queue, when using the dictionary approach
d) not confuse opaque handles in contexts with multiple queues since they are only meaningful in context of a single priority queue

Plus there is this philosophical question around the whole reason anyone would want a queue that _keeps track of objects by priority_ to behave this way: why should the same object (equals returns true) have two _different_ priorities? If they are really supposed to have different priorities, why aren't they modelled as different objects? (or upconverted to tuples, with distinguishers?)

Also for handles, there has to be some internal table of handles in the priority queue just so that handles will actually work. I think this is equivalent work to keeping a dictionary to look up objects in the queue.

PS: every .net object already supports concepts of equality/uniqueness, so its not a big ask to 'require' it.

Regarding KeyValuePair, I'd agree it's not ideal (though in the proposal, Key is for the element, Value for the priority, which is consistent with the various Sorted data-types in the BCL), and it's appropriateness would depend on a decision regarding uniqueness. Personally, I would _much_ prefer a dedicated well-named struct over a tuple anywhere on the public API, especially so for input.

Regarding uniqueness, that is a fundamental concern, and I don't think anything else can be decided until that is. I would favour element uniqueness as defined by the (optional) comparator (as per both existing proposals and suggestion i) if the goal is a single easy-to-use and general purpose API. Unique vs non-unique is a big rift, and I use both types for different purposes. The former is 'harder' to implement, and covers the most (and more typical) use cases (just my experience) while being harder to mis-use. Those use cases that _require_ non-uniqueness should (IMO) be serviced by a different type (e.g. a plain old binary heap), and I would appreciate having both available. This is essentially what the original proposal linked by @pgolebiowski provides (as I understand it) modulo a (simple) wrapper. _Edit: Nope, that wouldn't support tied priorities_

On the contrary - there's nothing artificial about the constraint to support updates at all!

I'm sorry, I didn't mean to indicate that support for updates is artificial; rather the requirement for uniqueness is introduced artificially to support updates.

PS: every .net object already supports concepts of equality/uniqueness, so its not a big ask to 'require' it

Sure, but occasionally the equality semantics that come with the type might not be the desirable one (e.g. reference equality, full-blown structural equality, etc.). I'm just pointing out that equality is hard and forcing it into the design brings in a whole new class of potential user bugs.

@eiriktsarpalis Thanks for clarifying that. But is it really artificial? I don't think it is. Its just another natural solution.

The api has to be _well-defined_. You can't provide an update API without requiring the user to be specific about exactly what they want to update. Handles, and object equality are just two different approaches to building the well-defined API.

Handle approach: every time you add an object to the collection, you must give it a 'name', i.e. a 'handle', so that you can refer to that exact object without ambiguity later in the conversation.

Object uniqueness approach: every time you add an object to the collection, it must be a different object, or you must specify how to handle the case the object already exists.

Unfortunately, only the object approach really allows you to support some of the most useful high-level API methods property , like 'EnqueueIfNotExists' or 'EnqueueOrDecreasePriroity(item)' - they make no sense to provide in a handle-centric design, because you can't not know if the item already exists in the queue (since its your job to keep track of that, with handles).

One of the most telling criticisms of the handle approach, or dropping the uniqueness constraint to me, is that it makes all sorts of scenarios with updatable priority much more complicated to implement:

e.g.

  1. use a PriorityQueue for strings representing messages/sentiments/tags/username which get upvoted/scoreupdated, unique values have changing priority
  2. use PriorityQueue, double> to order unique tuples [whether they have priority changes or not] - must keep track of extra handles somewhere
  3. use PriroityQueue to prioritize graph indexes, or database object ids, now you have to sprinkle handles through your implementation

PS

Sure, but occasionally the equality semantics that come with the type might not be the desirable one

There shall be escape hatches, like IEqualityComparer, or upconverting to a richer type.

Thanks for the feedback 🥳 Will update the proposal over the weekend, taking into account all new input, and share a new revision for another round. ETA 2020-09-20.

Priority queue proposal (v2.0)

Summary

The .NET Core community proposes to add to the system library _priority queue_ functionality, a data structure in which each element additionally has a priority associated with it. Specifically, we propose to add PriorityQueue<TElement, TPriority> to the System.Collections.Generic namespace.

Tenets

In our design, we were guided by the following tenets (unless you know better ones):

  • Wide coverage. We want to offer .NET Core customers a valuable data structure that is versatile enough to support a wide variety of use cases.
  • Learn from known mistakes. We strive to deliver priority queue functionality that would be free from the customer-facing issues present in other frameworks and languages, e.g. Java, Python, C++, Rust. We will avoid making design choices that are known to make customers unhappy and to reduce usefulness of priority queues.
  • Extreme care with 1-way door decisions. Once an API is introduced, it cannot be modified or deleted, only extended. We will carefully analyze design choices to avoid suboptimal solutions that our customers will be stuck with forever.
  • Avoid design paralysis. We accept that there may be no perfect solution. We will balance trade-offs and move forward with the delivery, to finally deliver our customers the functionality they've been awaiting for years.

Background

From the perspective of a customer

Conceptually, a priority queue is a collection of elements, where each element has an associated priority. The most important functionality of a priority queue is that it provides efficient access to the element with the highest priority in the collection, and an option to remove that element. Expected behavior may also include: 1) ability to modify the priority of an element that is already in the collection; 2) ability to merge multiple priority queues.

Computer science background

A priority queue is an abstract data structure, i.e. it is a concept with certain behavioral characteristics, as described in the previous section. Most efficient implementations of a priority queue are based on heaps. However, contrary to the general misconception, a heap is also an abstract data structure, and can be realized in various ways, each offering different benefits and disadvantages.

Most software engineers are familiar only with the array-based binary heap implementation — it is the simplest one, but unfortunately not the most efficient one. For general random input, two examples of more efficient heap types are: quaternary heap and pairing heap. For more information on heaps, please refer to Wikipedia and this paper.

Update mechanism is the key design challenge

Our discussions have demonstrated that the most challenging area in the design, and at the same time with the greatest impact on the API, is the update mechanism. Specifically, the challenge is to determine whether and how the product we want to offer customers should support updating priorities of elements already present in the collection.

Such capability is necessary to implement e.g. Dijkstra’s shortest path algorithm or a job scheduler that needs to handle changing priorities. The update mechanism is missing in Java, which has shown to be disappointing for engineers, e.g. in these three StackOverflow questions viewed over 32k times: example, example, example. To avoid introducing API with such limited value, we believe that a fundamental requirement for the priority queue functionality we deliver would be to support the ability of updating priorities for elements already present in the collection.

To deliver the update mechanism, we must ensure that the customer can be specific about what exactly they want to update. We’ve identified two ways of delivering this: a) via handles; and b) via enforcing uniqueness of elements in the collection. Each of these comes with different benefits and costs.

Option (a): Handles. In this approach, every time an element is added to the queue, the data structure provides its unique handle. If the customer wants to use the update mechanism, they need to keep track of such handles so that they can later unambiguously specify which element they wish to update. The primary cost of this solution is that customers need to manage these pointers. However, it does not mean that there need to be any internal allocations to support handles within the priority queue — any non-array-based heap is based on nodes, where each node is automatically its own handle. As an example, see the API of the PairingHeap.Update method.

Option (b): Uniqueness. This approach puts two additional constraints on the customer: i) elements within the priority queue must conform to certain equality semantics, which brings a new class of potential user bugs; ii) two equal elements cannot be stored in the same queue. By paying this cost, we get the benefit of supporting the update mechanism without resorting to the handle approach. However, any implementation that leverages uniqueness/equality to determine the element to update will require an extra internal mapping, so that it’s done in O(1) and not in O(n).

Recommendation

We recommend to add to the system library a PriorityQueue<TElement, TPriority> class that supports the update mechanism through handles. The underlying implementation would be a pairing heap.

public class PriorityQueueNode<TElement, TPriority>
{
    public TElement Element { get; }
    public TPriority Priority { get; }
}

public class PriorityQueue<TElement, TPriority> :
    IEnumerable<PriorityQueueNode<TElement, TPriority>>
{
    public PriorityQueue();
    public PriorityQueue(IComparer<TPriority> comparer);

    public IComparer<TPriority> Comparer { get; }
    public int Count { get; }

    public bool IsEmpty { get; }
    public void Clear();
    public bool Contains(TElement element); // O(n)
    public bool TryGetNode(TElement element, out PriorityQueueNode<TElement, TPriority> node); // O(n)

    public PriorityQueueNode<TElement, TPriority> Enqueue(TElement element, TPriority priority); //O(log n)

    public PriorityQueueNode<TElement, TPriority> Peek(); // O(1)
    public bool TryPeek(out PriorityQueueNode<TElement, TPriority> node); // O(1)

    public void Dequeue(out TElement element); // O(log n)
    public bool TryDequeue(out TElement element, out TPriority priority); // O(log n)

    public void Update(PriorityQueueNode<TElement, TPriority> node, TPriority priority); // O(log n)
    public void Remove(PriorityQueueNode<TElement, TPriority> node); // O(log n)

    public void Merge(PriorityQueue<TElement, TPriority> other) // O(1)

    public IEnumerator<PriorityQueueNode<TElement, TPriority>> GetEnumerator();
    IEnumerator IEnumerable.GetEnumerator();

    public struct Enumerator : IEnumerator<PriorityQueueNode<TElement, TPriority>>
    {
        public PriorityQueueNode<TElement, TPriority> Current { get; }
        object IEnumerator.Current { get; }
        public bool MoveNext() => throw new NotImplementedException();
        public void Reset() => throw new NotImplementedException();
        public void Dispose() => throw new NotImplementedException();
    }
}

Example usage

1) Customer who doesn't care about the update mechanism

var queue = new PriorityQueue<Job, double>();
queue.Enqueue(firstJob, 10);
queue.Enqueue(secondJob, 5);
queue.Enqueue(thirdJob, 40);

var withHighestPriority = queue.Peek(); // { element: secondJob, priority: 5 }

2) Customer who cares about the update mechanism

var queue = new PriorityQueue<Job, double>();
var mapping = new Dictionary<Job, PriorityQueueNode<Job, double>>();

mapping[job] = queue.Enqueue(job, priority);

queue.Update(mapping[job], newPriority);

FAQ

In what order does the priority queue enumerate elements?

In undefined order, so that the enumeration can happen in O(n), similarly as with a HashSet. No efficient implementation would provide capabilities to enumerate a heap in linear time while ensuring that elements are enumerated in order — this would require O(n log n). Because ordering on a collection can be trivially achieved with .OrderBy(x => x.Priority) and not every customer cares about enumeration with this order, we believe it is better to provide undefined enumeration order.

Appendices

Appendix A: Other languages with priority queue functionality

| Language | Type | Notes |
|:-:|:-:|:-:|
| Java | PriorityQueue | Extends the abstract class AbstractQueue and implements the interface Queue. |
| Rust | BinaryHeap | |
| Swift | CFBinaryHeap | |
| C++ | priority_queue | |
| Python | heapq | |
| Go | heap | There is a heap interface. |

Appendix B: Discoverability

Note that when discussing data structures, the term _heap_ is used 4 times more often than _priority queue_.

  • "array" AND "data structure" — 17.400.000 results
  • "stack" AND "data structure" — 12.100.000 results
  • "queue" AND "data structure" — 3.850.000 results
  • "heap" AND "data structure" — 1.830.000 results
  • "priority queue" AND "data structure" — 430.000 results
  • "trie" AND "data structure" — 335.000 results

Please review, am happy to receive feedback and keep iterating :) I feel that we are converging! 😄 Also am happy to receive questions — will add them to the FAQ!

@pgolebiowski I don't tend to use handle-based APIs (I'd expect to be wrapping this as per example 2 were I to refit any existing code) but it mostly looks good to me. I might have a go with the implementation you mention to see how it feels, but here are some comments off the bat:

  • It seems odd that TryDequeue returns a node, because it isn't really a handle at that point (I'd prefer two separate out parameters)
  • Will it be stable in that if two elements are queued with the same priority they dequeue in a predicable order? (nice to have for reproducibility; can be implemented easily enough by the consumer otherwise)
  • The parameter for Merge should be PriorityQueue'2 not PriorityQueueNode'2, and can you clarify the behaviour? I'm not familiar with the pairing heap, but presumably the two heaps overlap thereafter
  • I'm not a fan of the name Contains for the 2-parameter method: that is not a name I would guess for a TryGet style method
  • Should the class support a custom IEqualityComparer<TElement> for the purpose of Contains?
  • There doesn't seem to be a way to efficiently determine whether a node is still in the heap (not sure when I'd use this, thought)
  • It's weird that Remove returns is a bool; I'd expect it was called TryRemove or that it threw (which I'm assuming Update does if the node isn't in the heap).

@VisualMelon thanks a lot for the feedback! Will quickly resolve these, definitely agree:

  • It seems odd that TryDequeue returns a node, because it isn't really a handle at that point (I'd prefer two separate out parameters)
  • The parameter for Merge should be PriorityQueue'2 not PriorityQueueNode'2, and can you clarify the behaviour? I'm not familiar with the pairing heap, but presumably the two heaps overlap thereafter
  • It's weird that Remove returns is a bool; I'd expect it was called TryRemove or that it threw (which I'm assuming Update does if the node isn't in the heap).
  • I'm not a fan of the name Contains for the 2-parameter method: that is not a name I would guess for a TryGet style method

Clarification for these two:

  • Will it be stable in that if two elements are queued with the same priority they dequeue in a predicable order? (nice to have for reproducibility; can be implemented easily enough by the consumer otherwise)
  • There doesn't seem to be a way to efficiently determine whether a node is still in the heap (not sure when I'd use this, thought)

For the first point, if the goal is reproducibility, then the implementation will be deterministic, yes. If the goal is _if two elements which are placed into the queue with the same priority, it follows that they will come out in the same order in which they were placed into the queue_ — I'm not sure if the implementation can be adjusted to achieve this property, a quick answer would be "probably no".

For the second point, yes, heaps are not good for checking if an element exists in the collection, a customer would have to keep track of this separately to achieve this in O(1) or reuse the mapping they use for the handles, if they use the update mechanism. Otherwise, O(n).

  • Should the class support a custom IEqualityComparer<TElement> for the purpose of Contains?

Hmm... I'm starting to think that perhaps putting Contains into responsibility of this priority queue may be too much, and using the method from Linq may suffice (Contains has to be applied on enumeration anyway).

@pgolebiowski thanks for the clarifications

For the first point, if the goal is reproducibility, then the implementation will be deterministic, yes

Not so much deterministic, as guaranteed forever (e.g. even the implementation changes, the behaviour would not), so I think the answer I was after is 'no'. That's fine: the consumer can add a sequence ID to the priority if they need this, though at that point a SortedSet would do the job.

For the second point, yes, heaps are not good for checking if an element exists in the collection, a customer would have to keep track of this separately to achieve this in O(1) or reuse the mapping they use for the handles, if they use the update mechanism. Otherwise, O(n).

Does it not require a subset of the work necessary for Remove? I may not have been clear: I meant given a PriorityQueueNode, checking if it is in the heap (not a TElement).

Hmm... I'm starting to think that perhaps putting Contains into responsibility of this priority queue may be too much

I wouldn't complain if Contains wasn't there: it's also a trap for people who don't realise they should be using handles.

@pgolebiowski You seem pretty strongly in favor of handles, am I right in thinking its for efficiency reasons?

From an efficiency point of view, I suspect handles are truly the best, for both scenarios with uniqueness and without, so I'm fine with that as being the primary solution offered by the framework.

At all:
From a usability point of view, I think duplicate elements is rarely what I want, which still leaves me wondering whether it is valuable for framework to provide support for both models. But... a unique-element-friendly "PrioritySet" class would at least be easy to add later as a wrapper for the proposed PriorityQueue, e.g. in response to continued demand for a more friendly API. (if the demand exists. As I think it might!)

For the current API proposed itself, a couple thoughts/questions:

  • how about also providing overload of TryPeek(out TElement element, out TPriority priority)?
  • if you have updatable duplicate keys, I have one worry that if 'dequeue' doesn't return the priority queue node, then how would you check are removing the right exact node from your handle tracking system? Since you could have more than one copy of element with the same priority.
  • Does Remove(PriorityQueueNode) throw if the node isn't found? or return false?
  • Should there be a TryRemove() version that doesn't throw if Remove throws?
  • I'm not sure Contains() api is useful most of the time? 'Contains' seems to be in the eye of the beholder, especially for scenarios with 'duplicate' elements with distinct priorities, or other distinct features! In which case the end user probably must do their own search anyway. But at least it can be useful for the scenario with no duplicates.

@pgolebiowski Thanks for taking the time to draft the new proposal! A few comments on my side:

  • Echoing @TimLovellSmith's comment, I'm not sure either Contains() or TryGetNode() should exist at all in the API in its currently proposed form. They imply equality for TElement is significant, which is presumably one of the things a handle-based approach was trying to avoid.
  • I would probably rephrase public void Dequeue(out TElement element); as public TElement Dequeue();
  • Why does the TryDequeue() method need to return a priority?
  • Shouldn't the class also implement ICollection<T> or IReadOnlyCollection<T>?
  • What should happen if I try to update a PriorityQueueNode returned from a different PriorityQueue instance?
  • Do we want to support an efficient merge operation? AFAICT this implies we cannot use an array-based representation. How would this impact the implementation in terms of allocations?

Most software engineers are familiar only with the array-based binary heap implementation — it is the simplest one, but unfortunately not the most efficient one. For general random input, two examples of more efficient heap types are: quaternary heap and pairing heap.

What are the trade-offs for choosing the latter approaches? Is it possible to use an array-based implementation for those?

It can't implement ICollection<T> or IReadOnlyCollection<T> when it doesn't have the right signatures for 'Add' and 'Remove' etc.

Its much closer in spirit/shape to ICollection<KeyValuePair<T,Priority>>

Its much closer in spirit/shape to ICollection<KeyValuePair<T,Priority>>

Wouldn't it be KeyValuePair<TPriority, TElement>, since the ordering is done by TPriority, which is effectively a keying mechanism?

OK, it seems like we are overall in favor of dropping the methods Contains and TryGet. Will remove them in the next revision and explain the removal reason in the FAQ.

As for the interfaces implemented — isn't IEnumerable<PriorityQueueNode<TElement, TPriority>> sufficient? What kind of functionality is missing?

On the KeyValuePair - there were a few voices that a tuple or a struct with .Element and .Priority are more desirable. I think I'm in favor of these.

Wouldn't it be KeyValuePair<TPriority, TElement>, since the ordering is done by TPriority, which is effectively a keying mechanism?

There are good arguments for both sides. On the one hand, yes, exactly what you just said. On the other hand, keys in a KVP collection are generally expected to be unique, and it's perfectly valid to have multiple elements with the same priority.

On the other hand, keys in a KVP collection are generally expected to be unique

I'd disagree with this statement - a collection of key-value pairs is just that; any uniqueness requirements are layered on top of that.

isn't IEnumerable<PriorityQueueNode<TElement, TPriority>> sufficient? What kind of functionality is missing?

I'd personally expect IReadOnlyCollection<PQN<TElement, TPriority>> to be implemented, since the provided API already mostly satisfies that interface. Plus, that would be consistent with other collection types.

Regarding interface:

```
public bool TryGetNode(TElement element, out PriorityQueueNode node); // O(n)

What's the point of it if I can just do enumerate collection and do comparison of elements? And why only Try version?

public PriorityQueueNode<TElement, TPriority> Peek(); // O(1)
public void Dequeue(out TElement element); // O(log n)
I find it a bit strange to have discrepancy between Dequeue and Peek. They do pretty much same thing expect one is removing element from queue and other is not, it's looks weird for me if one returns priority and element and other just element.

public void Remove(PriorityQueueNode<TElement, TPriority> node); // O(log n)
`Queue` doesn't have `Remove` method, why `PriorityQueue` should ?


I would also like to see constructor

public PriorityQueue(IEnumerable>collection);
```

I would also like PriorityQueue to implement IReadonlyCollection<> interface.

Jumping to other things than public API surface.

Such capability is necessary to implement e.g. Dijkstra’s shortest path algorithm or a job scheduler that needs to handle changing priorities. The update mechanism is missing in Java, which has shown to be disappointing for engineers, e.g. in these three StackOverflow questions viewed over 32k times: example, example, example. To avoid introducing API with such limited value, we believe that a fundamental requirement for the priority queue functionality we deliver would be to support the ability of updating priorities for elements already present in the collection.

I would like to disagree. Last time I wrote Dijkstra in C++ std::priority_queue was sufficient enough and it doesn't handles priority update. AFAIK common consensus for that case is to add fake element into queue with altered priority and value and monitor have we process value or not. Same can be done with job scheduler.

To be honest I'm not sure how Dijkstra would look like with current queue proposal. How I'm suppose to keep track of nodes I need update priority? With TryGetNode()? Or have another collection of nodes? Would love to see code for current proposal.

If you look into Wikipedia, there is no assumption of updating priorities for priority queue. Same for all other languages which doesn't have that functionality and got away with that. I know "strive to be better", but is there actually demand for that?

For general random input, two examples of more efficient heap types are: quaternary heap and pairing heap. For more information on heaps, please refer to Wikipedia and this paper.

I've looked into paper and this is quote from it:

The results show that the optimal choice of implementation is strongly input-dependent. Furthermore, it shows that care must be taken to optimize for cache performance, primarily at the L1-L2 barrier. This suggests that complicated, cache-oblivious structures are unlikely to perform well compared to simpler, cache-aware structures.

From what it looks current queue proposal would be locked behind tree implementation rather than array, and something tells me what there is chance what tree nodes scattered around memory would be not as performant as array of elements.

I think having benchmarks to compare simple binary heap based on array and pairing heap would be ideal to make proper decision, before that, I don't think it's smart to lock design behind specific implementation (I'm looking on you Merge method).

Jumping to another topic, I would personally prefer to have KeyValuePair than new custom class.

  • Less API surface
  • I can do something like this: `new PriorityQueue(new Dictionary() { { 1, 1 }, { 2, 2 }, { 3, 3 }, { 4, 4 }, { 5, 5 } }); I know it's limited by uniqueness of the key. Also I think it would bring nice synergy to consume IDictionary based collections.
  • It's a struct rather than class so no NullReference exceptions
  • At some point PrioirtyQueue would need to be serialized/deserialized and I think it would be easier to do what with already existing object.

Downside is mixed feelings looking on TPriority Key but I think good documentation can solve that.
`

As a workaround for Dijkstra, adding fake elements into queue works, and total number of graph edges you process doesn't change. But the number of the temporary nodes staying resident in the heap generated by processing edges changes, and that can impact the memory usage and efficiency of enqueue and dequeue.

I was wrong about not being able to do IReadOnlyCollection, that should be fine. No Add() and Remove() in that interface, hah! (What was I thinking...)

@Ivanidzo4ka Your comments convince me even more that it would make sense to have two separate types: one a simple binary heap (i.e. with no update), and a second as described by one of the proposals:

  • Binary Heaps are simple, easy to implement, have a small API, and are known to perform well in practise in many important scenarios.
  • A fully fledged priority queue provides theoretical benefits for some scenarios (i.e. reduced space complexity, and reduced time complexity for search on densely connected graphs), and provides a natural API for more many algorithms/scenarios.

In the past, I've mostly got away with a binary heap I wrote the best part of a decade ago, and a SortedSet/Dictionary combination, and there are scenarios where I've used both types in separate roles (KSSP springs to mind).

It's a struct rather than class so no NullReference exceptions

I'd argue it's the other way around: if someone is passing default values around, I'd love them to see an NRE; however, I think we need to be clear where we expect these things to be used: nodes/handles should probably be classes, but if we are just reading/returning a pair, then I agree it should be a struct.


I'm tempted to suggest that it should be possible to update the element as well as the priority in the handles based proposal. You can achieve the same effect by just remove a handle and adding a new one, but it is a useful operation, and may have performance benefits depending on the implementation (e.g. some heaps can reduce the priority of something relatively cheaply). Such a change would make it tidier to implement many things (e.g. this somewhat nightmare inducing example based on the existing AlgoKit ParingHeap), especially those that operate in an unknown region of the state space.

Priority queue proposal (v2.1)

Summary

The .NET Core community proposes to add to the system library _priority queue_ functionality, a data structure in which each element additionally has a priority associated with it. Specifically, we propose to add PriorityQueue<TElement, TPriority> to the System.Collections.Generic namespace.

Tenets

In our design, we were guided by the following tenets (unless you know better ones):

  • Wide coverage. We want to offer .NET Core customers a valuable data structure that is versatile enough to support a wide variety of use cases.
  • Learn from known mistakes. We strive to deliver priority queue functionality that would be free from the customer-facing issues present in other frameworks and languages, e.g. Java, Python, C++, Rust. We will avoid making design choices that are known to make customers unhappy and to reduce usefulness of priority queues.
  • Extreme care with 1-way door decisions. Once an API is introduced, it cannot be modified or deleted, only extended. We will carefully analyze design choices to avoid suboptimal solutions that our customers will be stuck with forever.
  • Avoid design paralysis. We accept that there may be no perfect solution. We will balance trade-offs and move forward with the delivery, to finally deliver our customers the functionality they've been awaiting for years.

Background

From the perspective of a customer

Conceptually, a priority queue is a collection of elements, where each element has an associated priority. The most important functionality of a priority queue is that it provides efficient access to the element with the highest priority in the collection, and an option to remove that element. Expected behavior may also include: 1) ability to modify the priority of an element that is already in the collection; 2) ability to merge multiple priority queues.

Computer science background

A priority queue is an abstract data structure, i.e. it is a concept with certain behavioral characteristics, as described in the previous section. Most efficient implementations of a priority queue are based on heaps. However, contrary to the general misconception, a heap is also an abstract data structure, and can be realized in various ways, each offering different benefits and disadvantages.

Most software engineers are familiar only with the array-based binary heap implementation — it is the simplest one, but unfortunately not the most efficient one. For general random input, two examples of more efficient heap types are: quaternary heap and pairing heap. For more information on heaps, please refer to Wikipedia and this paper.

Update mechanism is the key design challenge

Our discussions have demonstrated that the most challenging area in the design, and at the same time with the greatest impact on the API, is the update mechanism. Specifically, the challenge is to determine whether and how the product we want to offer customers should support updating priorities of elements already present in the collection.

Such capability is necessary to implement e.g. Dijkstra’s shortest path algorithm or a job scheduler that needs to handle changing priorities. The update mechanism is missing in Java, which has shown to be disappointing for engineers, e.g. in these three StackOverflow questions viewed over 32k times: example, example, example. To avoid introducing API with such limited value, we believe that a fundamental requirement for the priority queue functionality we deliver would be to support the ability of updating priorities for elements already present in the collection.

To deliver the update mechanism, we must ensure that the customer can be specific about what exactly they want to update. We’ve identified two ways of delivering this: a) via handles; and b) via enforcing uniqueness of elements in the collection. Each of these comes with different benefits and costs.

Option (a): Handles. In this approach, every time an element is added to the queue, the data structure provides its unique handle. If the customer wants to use the update mechanism, they need to keep track of such handles so that they can later unambiguously specify which element they wish to update. The primary cost of this solution is that customers need to manage these pointers. However, it does not mean that there need to be any internal allocations to support handles within the priority queue — any non-array-based heap is based on nodes, where each node is automatically its own handle. As an example, see the API of the PairingHeap.Update method.

Option (b): Uniqueness. This approach puts two additional constraints on the customer: i) elements within the priority queue must conform to certain equality semantics, which brings a new class of potential user bugs; ii) two equal elements cannot be stored in the same queue. By paying this cost, we get the benefit of supporting the update mechanism without resorting to the handle approach. However, any implementation that leverages uniqueness/equality to determine the element to update will require an extra internal mapping, so that it’s done in O(1) and not in O(n).

Recommendation

We recommend to add to the system library a PriorityQueue<TElement, TPriority> class that supports the update mechanism through handles. The underlying implementation would be a pairing heap.

public class PriorityQueueNode<TElement, TPriority>
{
    public TElement Element { get; }
    public TPriority Priority { get; }
}

public class PriorityQueue<TElement, TPriority> :
    IEnumerable<PriorityQueueNode<TElement, TPriority>>,
    IReadOnlyCollection<PriorityQueueNode<TElement, TPriority>>
{
    public PriorityQueue();
    public PriorityQueue(IComparer<TPriority> comparer);

    public IComparer<TPriority> Comparer { get; }
    public int Count { get; }

    public bool IsEmpty { get; }
    public void Clear();

    public PriorityQueueNode<TElement, TPriority> Enqueue(TElement element, TPriority priority); //O(log n)

    public PriorityQueueNode<TElement, TPriority> Peek(); // O(1)
    public bool TryPeek(out PriorityQueueNode<TElement, TPriority> node); // O(1)
    public bool TryPeek(out TElement element, out TPriority priority); // O(1)
    public bool TryPeek(out TElement element); // O(1)

    public PriorityQueueNode<TElement, TPriority> Dequeue(); // O(log n)
    public void Dequeue(out TElement element, out TPriority priority); // O(log n)
    public bool TryDequeue(out PriorityQueueNode<TElement, TPriority> node); // O(log n)
    public bool TryDequeue(out TElement element, out TPriority priority); // O(log n)
    public bool TryDequeue(out TElement element); // O(log n)

    public void Update(PriorityQueueNode<TElement, TPriority> node, TPriority priority); // O(log n)
    public void Remove(PriorityQueueNode<TElement, TPriority> node); // O(log n)

    public void Merge(PriorityQueue<TElement, TPriority> other) // O(1)

    public IEnumerator<PriorityQueueNode<TElement, TPriority>> GetEnumerator();
    IEnumerator IEnumerable.GetEnumerator();

    public struct Enumerator : IEnumerator<PriorityQueueNode<TElement, TPriority>>
    {
        public PriorityQueueNode<TElement, TPriority> Current { get; }
        object IEnumerator.Current { get; }
        public bool MoveNext() => throw new NotImplementedException();
        public void Reset() => throw new NotImplementedException();
        public void Dispose() => throw new NotImplementedException();
    }
}

Example usage

1) Customer who doesn't care about the update mechanism

var queue = new PriorityQueue<Job, double>();
queue.Enqueue(firstJob, 10);
queue.Enqueue(secondJob, 5);
queue.Enqueue(thirdJob, 40);

var withHighestPriority = queue.Peek(); // { element: secondJob, priority: 5 }

2) Customer who cares about the update mechanism

var queue = new PriorityQueue<Job, double>();
var mapping = new Dictionary<Job, PriorityQueueNode<Job, double>>();

mapping[job] = queue.Enqueue(job, priority);

queue.Update(mapping[job], newPriority);

FAQ

1. In what order does the priority queue enumerate elements?

In undefined order, so that the enumeration can happen in O(n), similarly as with a HashSet. No efficient implementation would provide capabilities to enumerate a heap in linear time while ensuring that elements are enumerated in order — this would require O(n log n). Because ordering on a collection can be trivially achieved with .OrderBy(x => x.Priority) and not every customer cares about enumeration with this order, we believe it is better to provide undefined enumeration order.

2. Why is there no Contains or TryGet method?

Providing such methods has negligible value, because finding an element in a heap requires enumeration of the entire collection, meaning that any Contains or TryGet method would be a wrapper around enumeration. Further, to check if an element exists in collection, the priority queue would have to be aware of how to conduct equality checks of TElement objects, which is not something we believe should fall into the responsibility of a priority queue.

3. Why are there Dequeue and TryDequeue overloads that return PriorityQueueNode?

This is for customers who want to use the Update or Remove method and keep track of handles. While dequeuing an element from the priority queue, they would receive a handle that they could use to adjust the state of their handle tracking system.

4. What happens when the Update or Remove method receives a node from a different queue?

The priority queue will throw an exception. Each node is aware of the priority queue it belongs to, similarly as a LinkedListNode<T> is aware of the LinkedList<T> it belongs to.

Appendices

Appendix A: Other languages with priority queue functionality

| Language | Type | Notes |
|:-:|:-:|:-:|
| Java | PriorityQueue | Extends the abstract class AbstractQueue and implements the interface Queue. |
| Rust | BinaryHeap | |
| Swift | CFBinaryHeap | |
| C++ | priority_queue | |
| Python | heapq | |
| Go | heap | There is a heap interface. |

Appendix B: Discoverability

Note that when discussing data structures, the term _heap_ is used 4 times more often than _priority queue_.

  • "array" AND "data structure" — 17.400.000 results
  • "stack" AND "data structure" — 12.100.000 results
  • "queue" AND "data structure" — 3.850.000 results
  • "heap" AND "data structure" — 1.830.000 results
  • "priority queue" AND "data structure" — 430.000 results
  • "trie" AND "data structure" — 335.000 results

Thank you all for the feedback! I’ve updated the proposal to v2.1. Changelog:

  • Removed the methods Contains and TryGet.
  • Added an FAQ #2: _Why is there no Contains or TryGet method?_
  • Added the interface IReadOnlyCollection<PriorityQueueNode<TElement, TPriority>>.
  • Added an overload of bool TryPeek(out TElement element, out TPriority priority).
  • Added an overload of bool TryPeek(out TElement element).
  • Added an overload of void Dequeue(out TElement element, out TPriority priority).
  • Changed void Dequeue(out TElement element) to PriorityQueueNode<TElement, TPriority> Dequeue().
  • Added an overload of bool TryDequeue(out TElement element).
  • Added an overload of bool TryDequeue(out PriorityQueueNode<TElement, TPriority> node).
  • Added an FAQ #3: _Why are there Dequeue and TryDequeue overloads that return PriorityQueueNode?_
  • Added an FAQ #4: _What happens when the Update or Remove method receives a node from a different queue?_

Thanks for the change notes ;)

Small requests for clarification:

  • Can we qualify FAQ 4 such that it holds for elements not in the queue? (i.e. ones removed)
  • Can we add an FAQ about stability, i.e. the guarantees (if any) when dequeuing elements with the same priority (my understanding is there is no plan to make any guarantees, which is important to know for e.g. scheduling).

@pgolebiowski Regarding the proposed Merge method:

public void Merge(PriorityQueue<TElement, TPriority> other); // O(1)

Clearly such an operation would not have copy semantics, so I'm wondering whether there are would be any gotchas around making changes to this and other _after_ a merge has been performed (e.g. either instance failing to satisfy the heap property).

@eiriktsarpalis @VisualMelon — Thanks! Will address raised points, ETA 2020-10-04.

If others have any more feedback/questions/concerns/thoughts — please share 😊

Priority queue proposal (v2.2)

Summary

The .NET Core community proposes to add to the system library _priority queue_ functionality, a data structure in which each element additionally has a priority associated with it. Specifically, we propose to add PriorityQueue<TElement, TPriority> to the System.Collections.Generic namespace.

Tenets

In our design, we were guided by the following tenets (unless you know better ones):

  • Wide coverage. We want to offer .NET Core customers a valuable data structure that is versatile enough to support a wide variety of use cases.
  • Learn from known mistakes. We strive to deliver priority queue functionality that would be free from the customer-facing issues present in other frameworks and languages, e.g. Java, Python, C++, Rust. We will avoid making design choices that are known to make customers unhappy and to reduce usefulness of priority queues.
  • Extreme care with 1-way door decisions. Once an API is introduced, it cannot be modified or deleted, only extended. We will carefully analyze design choices to avoid suboptimal solutions that our customers will be stuck with forever.
  • Avoid design paralysis. We accept that there may be no perfect solution. We will balance trade-offs and move forward with the delivery, to finally deliver our customers the functionality they've been awaiting for years.

Background

From the perspective of a customer

Conceptually, a priority queue is a collection of elements, where each element has an associated priority. The most important functionality of a priority queue is that it provides efficient access to the element with the highest priority in the collection, and an option to remove that element. Expected behavior may also include: 1) ability to modify the priority of an element that is already in the collection; 2) ability to merge multiple priority queues.

Computer science background

A priority queue is an abstract data structure, i.e. it is a concept with certain behavioral characteristics, as described in the previous section. Most efficient implementations of a priority queue are based on heaps. However, contrary to the general misconception, a heap is also an abstract data structure, and can be realized in various ways, each offering different benefits and disadvantages.

Most software engineers are familiar only with the array-based binary heap implementation — it is the simplest one, but unfortunately not the most efficient one. For general random input, two examples of more efficient heap types are: quaternary heap and pairing heap. For more information on heaps, please refer to Wikipedia and this paper.

Update mechanism is the key design challenge

Our discussions have demonstrated that the most challenging area in the design, and at the same time with the greatest impact on the API, is the update mechanism. Specifically, the challenge is to determine whether and how the product we want to offer customers should support updating priorities of elements already present in the collection.

Such capability is necessary to implement e.g. Dijkstra’s shortest path algorithm or a job scheduler that needs to handle changing priorities. The update mechanism is missing in Java, which has shown to be disappointing for engineers, e.g. in these three StackOverflow questions viewed over 32k times: example, example, example. To avoid introducing API with such limited value, we believe that a fundamental requirement for the priority queue functionality we deliver would be to support the ability of updating priorities for elements already present in the collection.

To deliver the update mechanism, we must ensure that the customer can be specific about what exactly they want to update. We’ve identified two ways of delivering this: a) via handles; and b) via enforcing uniqueness of elements in the collection. Each of these comes with different benefits and costs.

Option (a): Handles. In this approach, every time an element is added to the queue, the data structure provides its unique handle. If the customer wants to use the update mechanism, they need to keep track of such handles so that they can later unambiguously specify which element they wish to update. The primary cost of this solution is that customers need to manage these pointers. However, it does not mean that there need to be any internal allocations to support handles within the priority queue — any non-array-based heap is based on nodes, where each node is automatically its own handle. As an example, see the API of the PairingHeap.Update method.

Option (b): Uniqueness. This approach puts two additional constraints on the customer: i) elements within the priority queue must conform to certain equality semantics, which brings a new class of potential user bugs; ii) two equal elements cannot be stored in the same queue. By paying this cost, we get the benefit of supporting the update mechanism without resorting to the handle approach. However, any implementation that leverages uniqueness/equality to determine the element to update will require an extra internal mapping, so that it’s done in O(1) and not in O(n).

Recommendation

We recommend to add to the system library a PriorityQueue<TElement, TPriority> class that supports the update mechanism through handles. The underlying implementation would be a pairing heap.

public class PriorityQueueNode<TElement, TPriority>
{
    public TElement Element { get; }
    public TPriority Priority { get; }
}

public class PriorityQueue<TElement, TPriority> :
    IEnumerable<PriorityQueueNode<TElement, TPriority>>,
    IReadOnlyCollection<PriorityQueueNode<TElement, TPriority>>
{
    public PriorityQueue();
    public PriorityQueue(IComparer<TPriority> comparer);

    public IComparer<TPriority> Comparer { get; }
    public int Count { get; }

    public bool IsEmpty { get; }
    public void Clear();

    public PriorityQueueNode<TElement, TPriority> Enqueue(TElement element, TPriority priority); //O(log n)

    public PriorityQueueNode<TElement, TPriority> Peek(); // O(1)
    public bool TryPeek(out PriorityQueueNode<TElement, TPriority> node); // O(1)
    public bool TryPeek(out TElement element, out TPriority priority); // O(1)
    public bool TryPeek(out TElement element); // O(1)

    public PriorityQueueNode<TElement, TPriority> Dequeue(); // O(log n)
    public void Dequeue(out TElement element, out TPriority priority); // O(log n)
    public bool TryDequeue(out PriorityQueueNode<TElement, TPriority> node); // O(log n)
    public bool TryDequeue(out TElement element, out TPriority priority); // O(log n)
    public bool TryDequeue(out TElement element); // O(log n)

    public void Update(PriorityQueueNode<TElement, TPriority> node, TPriority priority); // O(log n)
    public void Remove(PriorityQueueNode<TElement, TPriority> node); // O(log n)

    public IEnumerator<PriorityQueueNode<TElement, TPriority>> GetEnumerator();
    IEnumerator IEnumerable.GetEnumerator();

    public struct Enumerator : IEnumerator<PriorityQueueNode<TElement, TPriority>>
    {
        public PriorityQueueNode<TElement, TPriority> Current { get; }
        object IEnumerator.Current { get; }
        public bool MoveNext() => throw new NotImplementedException();
        public void Reset() => throw new NotImplementedException();
        public void Dispose() => throw new NotImplementedException();
    }
}

Example usage

1) Customer who doesn't care about the update mechanism

var queue = new PriorityQueue<Job, double>();
queue.Enqueue(firstJob, 10);
queue.Enqueue(secondJob, 5);
queue.Enqueue(thirdJob, 40);

var withHighestPriority = queue.Peek(); // { element: secondJob, priority: 5 }

2) Customer who cares about the update mechanism

var queue = new PriorityQueue<Job, double>();
var mapping = new Dictionary<Job, PriorityQueueNode<Job, double>>();

mapping[job] = queue.Enqueue(job, priority);

queue.Update(mapping[job], newPriority);

FAQ

1. In what order does the priority queue enumerate elements?

In undefined order, so that the enumeration can happen in O(n), similarly as with a HashSet. No efficient implementation would provide capabilities to enumerate a heap in linear time while ensuring that elements are enumerated in order — this would require O(n log n). Because ordering on a collection can be trivially achieved with .OrderBy(x => x.Priority) and not every customer cares about enumeration with this order, we believe it is better to provide undefined enumeration order.

2. Why is there no Contains or TryGet method?

Providing such methods has negligible value, because finding an element in a heap requires enumeration of the entire collection, meaning that any Contains or TryGet method would be a wrapper around enumeration. Further, to check if an element exists in collection, the priority queue would have to be aware of how to conduct equality checks of TElement objects, which is not something we believe should fall into the responsibility of a priority queue.

3. Why are there Dequeue and TryDequeue overloads that return PriorityQueueNode?

This is for customers who want to use the Update or Remove method and keep track of handles. While dequeuing an element from the priority queue, they would receive a handle that they could use to adjust the state of their handle tracking system.

4. What happens when the Update or Remove method receives a node from a different queue?

The priority queue will throw an exception. Each node is aware of the priority queue it belongs to, similarly as a LinkedListNode<T> is aware of the LinkedList<T> it belongs to. Further, if a node has been removed from a queue, trying to invoke Update or Remove on it will also result with an exception.

5. Why is there no Merge method?

Merging two priority queues can be achieved in constant time, which makes it a tempting feature to offer customers. However, we have no data to demonstrate that there is demand for such a functionality, and we cannot justify including it in the public API. Further, the design of such a functionality is non-trivial and, given that this feature may not be needed, it could unnecessarily complicate the API surface and implementation.

Nevertheless, not including the Merge method now is a 2-way door — if in the future customers do express interest in having the merge functionality supported, it will be possible to extend the PriorityQueue type. As such, we recommend not to include the Merge method yet, and to move forward with the launch.

6. Does the collection provide a stability guarantee?

The collection will not provide a stability guarantee out of the box, i.e. if two elements are queued with the same priority, the customer will not be able to assume that they will be dequeued in a certain order. However, if a customer would like to achieve stability using our PriorityQueue, they could define a TPriority and the corresponding IComparer<TPriority> that would ensure it. Further, the data collection will be deterministic, i.e. for a given sequence of operations, it will always behave the same way, allowing for reproducibility.

Appendices

Appendix A: Other languages with priority queue functionality

| Language | Type | Notes |
|:-:|:-:|:-:|
| Java | PriorityQueue | Extends the abstract class AbstractQueue and implements the interface Queue. |
| Rust | BinaryHeap | |
| Swift | CFBinaryHeap | |
| C++ | priority_queue | |
| Python | heapq | |
| Go | heap | There is a heap interface. |

Appendix B: Discoverability

Note that when discussing data structures, the term _heap_ is used 4 times more often than _priority queue_.

  • "array" AND "data structure" — 17.400.000 results
  • "stack" AND "data structure" — 12.100.000 results
  • "queue" AND "data structure" — 3.850.000 results
  • "heap" AND "data structure" — 1.830.000 results
  • "priority queue" AND "data structure" — 430.000 results
  • "trie" AND "data structure" — 335.000 results

Changelog:

  • Removed the method void Merge(PriorityQueue<TElement, TPriority> other) // O(1).
  • Added an FAQ #5: Why is there no Merge method?
  • Modified FAQ #4 to also hold for nodes that have been removed from the priority queue.
  • Added an FAQ #6: Does the collection provide a stability guarantee?

The new FAQ looks great. I played around with coding Dijkstra against the proposed API, with handle dictionary, and it seemed basically fine.

The one small learning I had from doing this was that the current set of method names/overloads doesn't work so well for implicit typing of out variables. What I wanted to do with the C# code was TryDequeue(out var node) - but unfortunately I needed to give the explicit type of out variable as PriorityQueueNode<> otherwise the compiler didn't know if I wanted a priority queue node or an element.

    var shortestDistances = new Dictionary<int, int>();
    var queue = new PriorityQueue<int, int>();
    var handles = new Dictionary<int, PriorityQueueNode<int, int>>();
    handles[startNode] = queue.Enqueue(startNode, 0);
    while (queue.TryDequeue(out PriorityQueueNode<int, int> nextQueueNode))
    {
        int nodeToExploreFrom = nextQueueNode.Element, minDistance = nextQueueNode.Priority;
        shortestDistances.Add(nodeToExploreFrom, minDistance);
        handles[nodeToExploreFrom] = null; // so it is clearly already visited
        foreach (int nextNode in nodes)
        {
            int candidatePathDistance = minDistance + edgeDistances[nodeToExploreFrom, nextNode];
            if (handles.TryGetValue(nextNode, out var nextNodeHandle))
            {
                if (nextNodeHandle != null && candidatePathDistance < nextNodeHandle.Priority)
                {
                    queue.Update(nextNodeHandle, candidatePathDistance);
                }
                // or else... we already got there
            }
            else
            {
                handles[nextNode] = queue.Enqueue(nextNode, candidatePathDistance);
            }
        }
    }

The main unresolved design question in my view is whether or not the implementation should support priority updates. I can see three possible paths here:

  1. Require uniqueness/equality and support updates by passing the original element.
  2. Support priority updates using handles.
  3. Do not support priority updates.

These approaches are mutually exclusive and have their own sets of trade-offs, respectively:

  1. The equality based approach complicates the API contract by forcing uniqueness and requires additional book-keeping under the hood. This happens whether or not the user needs priority updates.
  2. The handle based approach would imply at least one additional allocation per enqueued element. While it does not enforce uniqueness, my impression is that for scenaria that do need updates this invariant is almost certainly implicit (e.g. notice how both examples listed above store handles in external dictionaries indexed by the elements themselves).
  3. Updates are either not supported at all, or would require a linear traversal of the heap. In case of duplicate elements, updates can be ambiguous.

Identifying common PriorityQueue applications

It's important that we identify which of the above approaches could provide the best value for most of our users. So I've been going through .NET codebases, both Microsoft internal and public, for instances of Heap/PriorityQueue implementations in order to better understand what the most common usage patterns are:

Most applications implement variations of either heap sort or job scheduling. A few instances were used for algorithms such as topological sorting or Huffman coding. A smaller number was used for calculating distances in graphs. Of the 80 PriorityQueue implementations examined, only 9 were found to have implemented some form of priority updates.

At the same time, none of the Heap/PriorityQueue implementations in either Python, Java, C++, Go, Swift, or Rust core libraries support priority updates in their APIs.

Recommendations

In light of this data, it is clear to me that .ΝΕΤ needs a baseline PriorityQueue implementation that exposes the essential API (heapify/push/peek/pop), offers certain performance guarantees (e.g. no additional allocations per enqueue) and does not enforce uniqueness constraints. This implies that the implementation _would not support O(log n) priority updates_.

We should also consider following up with a separate heap implementation that does support _O(log n)_ updates/removals, and uses an equality-based approach. Since this would be a specialized type, the uniqueness requirement shouldn't be much of an issue.

I've been working on prototyping both implementations, and will follow up with an API proposal shortly.

Thank you very much for the analysis, @eiriktsarpalis! I appreciate in particular the time to analyze the internal Microsoft codebase to find relevant use cases and support our discussion with data.

The handle based approach would imply at least one additional allocation per enqueued element.

This assumption is false, you don't need additional allocations in node-based heaps. The pairing heap is more performant for a general random input than the array-based binary heap, meaning that you would have the incentive to use this node-based heap even for a priority queue that does not support updates. You can see benchmarks in the paper I referenced earlier.

Of the 80 PriorityQueue implementations examined, only 9 were found to have implemented some form of priority updates.

Even given this small sample, that's 11-12% of all usages. Also, this may be underrepresented in certain domains like e.g. video games, where I'd expect this percentage to be higher.

In light of this data, it is clear to me that [...]

I don't think such a conclusion is clear, given that one of the key assumptions is false and it's arguable whether 11-12% of the customers is an important enough use case or not. What I'm missing in your assessment is the evaluation of the cost impact of the "data structure supporting updates" for customers who don't care about this mechanism — which to me, this cost is negligible, as they could use the data structure without being affected by the handle mechanism.

Basically:

| | 11-12% use cases | 88-89% use cases |
|:-:|:-:|:-:|
| cares about updates | yes | no |
| is negatively affected by the handles | N/A (they are desired) | no |
| is positively affected by the handles | yes | no |

To me, this is a no-brainer in favor of supporting 100% use cases, not only 88-89%.

This assumption is false, you don't need additional allocations in node-based heaps

If both the priority and the item are value types (or if both are reference types you don't own and/or can't change the base type of), can you link to an implementation that demonstrates no additional allocations being required (or just describe how it's achieved)? That'd be helpful to see. Thanks.

It would be helpful if you could elaborate more, or just say what you're trying to say. I'd need to disambiguate, there would be ping-pong, and that would turn into a lengthy discussion. Alternatively, we could arrange a call.

I'm saying we want to avoid any Enqueue operation requiring an allocation, whether on the part of the caller or the part of the implementation (ammortized internal allocation is fine, e.g. to expand an array used in the implementation). I'm trying to understand how that's possible with a node-based heap (e.g. if those node objects are exposed to the caller, that prohibits pooling by the implementation due to concerns around inappropriate reuse / aliasing). I want to be able to write:
C# pq.Enqueue(42, 84);
and have that not allocate. How do the implementations you refer to achieve that?

or just say what you're trying to say

I thought I was.

we want to avoid any Enqueue operation requiring an allocation [...] I want to be able to write: pq.Enqueue(42, 84); and have that not allocate.

Where does this desire come from? It's a nice to have side effect of a solution, not a requirement that 99.9% customers need to have satisfied. I don't see why you would choose this low-impact dimension to make design choices between solutions.

We're not making design choices based on optimizations for the 0.1% of customers if these negatively impact 12% of the customers in another dimension. "caring about no allocations" + "dealing with two value types" is an edge case.

I find the dimension of supported behavior/functionality far more important, especially when designing a general-purpose versatile data structure for a wide audience and a variety of use cases.

Where does this desire come from?

From wanting the core collection types to be usable in scenarios that care about performance. You say the node-based solution would support 100% of use cases: that is not the case if every enqueue allocates, just as List<T>, Dictionary<TKey, TValue>, HashSet<T>, and so on would become unusable in many situations if they allocated on every Add.

Why do you believe it's only "0.1%" that care about the allocation overheads of these methods? From where is that data coming?

"caring about no allocations" + "dealing with two value types" is an edge case

It is not. It's also not just "two value types". As I understand it, the proposed solution would require either a) an allocation on every enqueue regardless of the Ts involved, or b) would require the element type to derive from some known base type which in turn prohibits a large number of possible uses to avoid extra allocation.

@eiriktsarpalis
So you don't forget any options, I think there is a feasible option 4 to add to options 1, 2, and 3, in your list which is a compromise:

  1. an implementation that supports the 12% use case, while also nearly-optimizing for the other 88% by allowing updates to elements that are equatable, and only _lazily_ building the lookup table required to do those updates the first time an update method is called (and updating it on subsequence updates+removals). Therefore incurring less cost for apps which don't use the functionality.

We might still decide that because there's extra performance available to the 88% or 12% from an implementation which doesn't need an updatable data structure, or is optimized for one in the first place, its better to provide options 2 and 3, than option 4. But I thought we should not forget another option exists.

[Or I suppose you could just see this as a better option 1 and update the description of 1 to say that bookkeeping is not forced, but lazy, and correct equatable behavior is only required when updates are used...]

@stephentoub That's exactly what I had in mind about saying simply what you want to say, thank you :)

Why do you believe it's only 0.1% that care about the allocation overheads of these methods? From where is that data coming?

From intuition, i.e. the same source based on which you believe it's more important to prioritize "no additional allocations" over "ability to conduct updates". At least for the update mechanism, we have the data that 11-12% customers do need to have this behavior supported. I don't think remotely close customers would care about "no additional allocations".

In either case, you are for some reason choosing to fixate on the memory dimension, forgetting about other dimensions, e.g. raw speed, which is another trade-off for your preferred approach. An array-based implementation providing "no additional allocations" would be slower than a node-based implementation. Again, I think it's arbitrary here to prioritize memory over speed.

Let's take a step back and focus on what the customers want. We have a design choice that may or may not make the data structure unusable for 12% of the customers. I think we would need to be very careful with providing reasons for why we would choose not to support these.

An array-based implementation providing "no additional allocations" would be slower than a node-based implementation.

Please share the two C# implementations you're using to perform that comparison and the benchmarks used to come to that conclusion. Theoretical papers are certainly valuable but they are only one small piece of the puzzle. The more important thing is when the rubber meets the road, factoring in the details of the given platform and given implementations, and you're able to validate on the specific platform with the specific implementation and typical/expected data sets / usage patterns. It could very well be that your assertion is correct. It also may not be. I'd like to see the implementations / data to understand better.

Please share the two C# implementations you're using to perform that comparison and the benchmarks used to come to that conclusion

This is a valid point, the paper I quote only compares and benchmarks implementations in C++. It conducts multiple benchmarks with different data sets and usage patterns. I am quite confident this would be transferable to C#, but if you believe this is something we need to double down, I think the best course of action would be for you to ask a colleague to conduct such a study.

@pgolebiowski I would be interested in better understanding the nature of your objection. The proposal advocates for two separate types, would that not cover your requirements?

  1. an implementation that supports the 12% use case, while also nearly-optimizing for the other 88% by allowing updates to elements that are equatable, and only lazily building the lookup table required to do those updates the first time an update method is called (and updating it on subsequence updates+removals). Therefore incurring less cost for apps which don't use the functionality.

I would probably classify that as a performance optimization for option 1, however I do see a couple of issues with that particular approach:

  • Update now becomes _O(n)_, which can result in unpredictable performance depending on usage patterns.
  • The lookup table is also needed for validating uniqueness. Enqueueing the same element twice _before_ calling Update would be accepted and arguably bring the queue to an inconsistent state.

@eiriktsarpalis Its only O(n) once, and O(1) afterwards, which is O(1) amortized. And you can postpone validating uniqueness until the first update. But maybe this is overly clever. Two classes is easier to explain.

I've spent the last few days prototyping two PriorityQueue implementations: a basic implementation without update support and an implementation that supports updates using element equality. I've named the former PriorityQueue and the latter, for lack of a better name, PrioritySet. My goal is to gauge API ergonomics and compare performance.

The implementations can be found in this repo. Both classes are implemented using array-based quad heaps. The updatable implementation also uses a dictionary that maps elements to internal heap indices.

Basic PriorityQueue

namespace System.Collections.Generic
{
    public class PriorityQueue<TElement, TPriority> : IReadOnlyCollection<(TElement Element, TPriority Priority)>
    {
        // Constructors
        public PriorityQueue();
        public PriorityQueue(int initialCapacity);
        public PriorityQueue(IComparer<TPriority>? comparer);
        public PriorityQueue(int initialCapacity, IComparer<TPriority>? comparer);
        public PriorityQueue(IEnumerable<(TElement Element, TPriority Priority)> values);
        public PriorityQueue(IEnumerable<(TElement Element, TPriority Priority)> values, IComparer<TPriority>? comparer);

        // Properties
        public int Count { get; }
        public IComparer<TPriority> Comparer { get; }

        // O(log(n)) push operation
        public void Enqueue(TElement element, TPriority priority);
        // O(1) peek operations
        public TElement Peek();
        public bool TryPeek(out TElement element, out TPriority priority);
        // O(log(n)) pop operations
        public TElement Dequeue();
        public bool TryDequeue(out TElement element, out TPriority priority);
        // Combined push/pop, generally more efficient than sequential Enqueue();Dequeue() calls.
        public TElement EnqueueDequeue(TElement element, TPriority priority);

        public void Clear();

        public Enumerator GetEnumerator();
        public struct Enumerator : IEnumerator<(TElement Element, TPriority Priority)>, IEnumerator;
    }
}

Here's a basic sample using the type

var queue = new PriorityQueue<string, int>();

queue.Enqueue("John", 1940);
queue.Enqueue("Paul", 1942);
queue.Enqueue("George", 1943);
queue.Enqueue("Ringo", 1940);

Assert.Equal("John", queue.Dequeue());
Assert.Equal("Ringo", queue.Dequeue());
Assert.Equal("Paul", queue.Dequeue());
Assert.Equal("George", queue.Dequeue());

Updatable PriorityQueue

namespace System.Collections.Generic
{
    public class PrioritySet<TElement, TPriority> : IReadOnlyCollection<(TElement Element, TPriority Priority)> where TElement : notnull
    {
        // Constructors
        public PrioritySet();
        public PrioritySet(int initialCapacity);
        public PrioritySet(IComparer<TPriority> comparer);
        public PrioritySet(int initialCapacity, IComparer<TPriority>? priorityComparer, IEqualityComparer<TElement>? elementComparer);
        public PrioritySet(IEnumerable<(TElement Element, TPriority Priority)> values);
        public PrioritySet(IEnumerable<(TElement Element, TPriority Priority)> values, IComparer<TPriority>? comparer, IEqualityComparer<TElement>? elementComparer);

        // Members shared with baseline PriorityQueue implementation
        public int Count { get; }
        public IComparer<TPriority> Comparer { get; }
        public void Enqueue(TElement element, TPriority priority);
        public TElement Peek();
        public bool TryPeek(out TElement element, out TPriority priority);
        public TElement Dequeue();
        public bool TryDequeue(out TElement element, out TPriority priority);
        public TElement EnqueueDequeue(TElement element, TPriority priority);

        // Update methods and friends
        public bool Contains(TElement element); // O(1)
        public bool TryRemove(TElement element); // O(log(n))
        public bool TryUpdate(TElement element, TPriority priority); // O(log(n))
        public void EnqueueOrUpdate(TElement element, TPriority priority); // O(log(n))

        public void Clear();
        public Enumerator GetEnumerator();
        public struct Enumerator : IEnumerator<(TElement Element, TPriority Priority)>, IEnumerator;
    }
}

Performance Comparison

I wrote a simple heapsort benchmark that compares the two implementations in their most basic application. I also included a sort benchmark that uses Linq for comparison:

BenchmarkDotNet=v0.12.1, OS=ubuntu 20.04
AMD EPYC 7452, 1 CPU, 4 logical and 2 physical cores
.NET Core SDK=5.0.100-rc.2.20479.15
  [Host]     : .NET Core 5.0.0 (CoreCLR 5.0.20.47505, CoreFX 5.0.20.47505), X64 RyuJIT
  DefaultJob : .NET Core 5.0.0 (CoreCLR 5.0.20.47505, CoreFX 5.0.20.47505), X64 RyuJIT

| Method | Size | Mean | Error | StdDev | Ratio | RatioSD | Gen 0 | Gen 1 | Gen 2 | Allocated |
|-------------- |------ |-------------:|-----------:|-----------:|------:|--------:|--------:|--------:|--------:|----------:|
| LinqSort | 30 | 1.439 us | 0.0072 us | 0.0064 us | 1.00 | 0.00 | 0.0095 | - | - | 672 B |
| PriorityQueue | 30 | 1.450 us | 0.0085 us | 0.0079 us | 1.01 | 0.01 | - | - | - | - |
| PrioritySet | 30 | 2.778 us | 0.0217 us | 0.0192 us | 1.93 | 0.02 | - | - | - | - |
| | | | | | | | | | | |
| LinqSort | 300 | 24.727 us | 0.1032 us | 0.0915 us | 1.00 | 0.00 | 0.0305 | - | - | 3912 B |
| PriorityQueue | 300 | 29.510 us | 0.0995 us | 0.0882 us | 1.19 | 0.01 | - | - | - | - |
| PrioritySet | 300 | 47.715 us | 0.4455 us | 0.4168 us | 1.93 | 0.02 | - | - | - | - |
| | | | | | | | | | | |
| LinqSort | 3000 | 412.015 us | 1.5495 us | 1.3736 us | 1.00 | 0.00 | 0.4883 | - | - | 36312 B |
| PriorityQueue | 3000 | 491.722 us | 4.1463 us | 3.8785 us | 1.19 | 0.01 | - | - | - | - |
| PrioritySet | 3000 | 677.959 us | 3.1996 us | 2.4981 us | 1.64 | 0.01 | - | - | - | - |
| | | | | | | | | | | |
| LinqSort | 30000 | 5,223.560 us | 11.9077 us | 9.9434 us | 1.00 | 0.00 | 93.7500 | 93.7500 | 93.7500 | 360910 B |
| PriorityQueue | 30000 | 5,688.625 us | 53.0746 us | 49.6460 us | 1.09 | 0.01 | - | - | - | 2 B |
| PrioritySet | 30000 | 8,124.306 us | 39.9498 us | 37.3691 us | 1.55 | 0.01 | - | - | - | 4 B |

As can be expected, the overhead of tracking element locations adds a significant performance hit, roughly 40-50% slower compared to the baseline implementation.

I appreciate all the effort, I see it took a lot of time and energy.

  1. I really don't see though the reason for 2 almost identical data structures, where one is an inferior version of another.
  2. Further, even if you would want to have such 2 versions of a priority queue, I don't see how the "superior" version is better than Priority queue proposal (v2.2) from 20 days ago.

tl;dr:

  • This proposal is exciting!! But it doesn't yet fit my high-perf use-cases.
  • >90% of my computational geometry/motion planning performance load is PQ enqueue/dequeue because that's the dominant N^m LogN in an algorithm.
  • I'm in favor of separate PQ implementations. I usually don't need priority updates, and 2x worse perf is unacceptable.
  • PrioritySet is a confusing name and isn't discoverable vs PriorityQueue
  • Storing priority twice (once in my element, once in the queue) feels expensive. Struct copies & memory usage.
  • If computing priority were expensive, I would simply store a tuple (priority: ComputePriority(element), element) into a PQ, and my priority-getter function would simply be tuple => tuple.priority.
  • Performance should be benchmarked per-operation or alternatively on real-world use-cases (e.g. optimized multi-start multi-end search on a graph)
  • Unordered enumeration behavior is unexpected. Prefer Queue-like Dequeue()-order semantics?
  • Consider supporting Clone operation & merge operation.
  • Basic operations must be 0-alloc in steady-state usage. I will pool these queues.
  • Consider supporting EnqueueMany which performs heapify to aid pooling.

I work in high-performance search (motion-planning) & computational geometry code (e.g. sweepline algorithms) that is relevant to both robotics and games, I use a lot of custom hand-rolled priority queues. The other common use-case I have is a Top-K query where updatable priority is not useful.

Some feedback regarding the two-implementations (yes vs no update support) debate.

Naming:

  • PrioritySet implies set semantics, but the queue doesn't implement ISet.
  • You're using the UpdatablePriorityQueue name which is easier to discover if I search PriorityQueue.

Performance:

  • Priority Queue performance is nearly always my performance bottleneck (>90%) in my geometry / planning algorithms
  • Consider passing a Func or Comparison to ctor rather than copying TPriority around (expensive!). If computing priority is expensive, I will insert (priority, element) into a PQ and pass a comparison which looks at my cached priority.
  • A significant number of my algorithms do not need PQ updates. I'd consider using a built-in PQ that does not support updates, but if something has a 2x perf cost to support a feature I don't need (updating) then it is useless to me.
  • For performance analysis / tradeoffs, it would be important to know the relative wall-time cost of enqueue/dequeue per operation @eiriktsarpalis -- "tracking is 2x slower" isn't enough to evaluate whether a PQ is useful.
  • I was glad to see your constructors perform Heapify. Consider a constructor which takes an IList, List, or Array to avoid enumeration allocations.
  • Consider exposing an EnqueueMany which performs Heapify if the PQ is initially empty, since in high-perf it is common to pool collections.
  • Consider making Clear not zero array if elements do not contain references.
  • Allocations in enqueue/dequeue are unacceptable. My algorithms are zero-alloc for performance reasons, with pooled thread-local collections.

APIs:

  • Cloning a priority queue is a trivial operation with your implementations and frequently useful.

    • Related: Should enumerating a Priority Queue have queue-like semantics? I would expect a dequeue-order collection, similar to what Queue does. I expect new List(myPriorityQueue) to not mutate the priorityQueue, but to work as I just described.

  • As mentioned above, it is preferable to take in a Func<TElement, TPriority> rather than insert with priority. If computing priority is expensive, I can simply insert (priority, element) and provide a func tuple => tuple.priority
  • Merging two priority queues is sometimes useful.
  • It is strange that Peek returns a TItem but Enumeration & Enqueue have (TItem, TPriority).

That being said, for a significant number of my algorithms, my priority queue items contain their priorities, and storing that twice (once in the PQ, once in the items) sounds inefficient. This is especially the case if I am ordering by numerous keys (similar use-case to OrderBy.ThenBy.ThenBy). This API also cleans a lot of the inconsistencies where Insert takes a priority but Peek doesn't return it.

Finally, it is worth noting that I often insert indices of an array into a Priority Queue, rather than array elements themselves. This is supported by all APIs discussed so far, though. As an example, if I am processing the beginning/ends of intervals on a x-axis line, I might have priority queue events (x, isStartElseEnd, intervalId) and order by x then by isStartElseEnd. This is often because I have other data structures that map from an index to some computed data.

@pgolebiowski I took the liberty of including your proposed pairing heap implementation to the benchmarks, just so we can compare instances of all three approaches directly. Here are the results:

| Method | Size | Mean | Error | StdDev | Median | Gen 0 | Gen 1 | Gen 2 | Allocated |
|-------------- |-------- |-------------------:|-----------------:|-----------------:|-------------------:|----------:|------:|------:|-----------:|
| PriorityQueue | 10 | 774.7 ns | 3.30 ns | 3.08 ns | 773.2 ns | - | - | - | - |
| PrioritySet | 10 | 1,643.0 ns | 3.89 ns | 3.45 ns | 1,642.8 ns | - | - | - | - |
| PairingHeap | 10 | 1,660.2 ns | 14.11 ns | 12.51 ns | 1,657.2 ns | 0.0134 | - | - | 960 B |
| PriorityQueue | 50 | 6,413.0 ns | 14.95 ns | 13.99 ns | 6,409.5 ns | - | - | - | - |
| PrioritySet | 50 | 12,193.1 ns | 35.41 ns | 29.57 ns | 12,188.3 ns | - | - | - | - |
| PairingHeap | 50 | 13,955.8 ns | 193.36 ns | 180.87 ns | 13,989.2 ns | 0.0610 | - | - | 4800 B |
| PriorityQueue | 150 | 27,402.5 ns | 76.52 ns | 71.58 ns | 27,410.2 ns | - | - | - | - |
| PrioritySet | 150 | 48,485.8 ns | 160.22 ns | 149.87 ns | 48,476.3 ns | - | - | - | - |
| PairingHeap | 150 | 56,951.2 ns | 190.52 ns | 168.89 ns | 56,953.6 ns | 0.1831 | - | - | 14400 B |
| PriorityQueue | 500 | 124,933.7 ns | 429.20 ns | 380.48 ns | 124,824.4 ns | - | - | - | - |
| PrioritySet | 500 | 206,310.0 ns | 433.97 ns | 338.81 ns | 206,319.0 ns | - | - | - | - |
| PairingHeap | 500 | 229,423.9 ns | 3,213.33 ns | 2,848.53 ns | 230,398.7 ns | 0.4883 | - | - | 48000 B |
| PriorityQueue | 1000 | 284,481.8 ns | 475.91 ns | 445.16 ns | 284,445.6 ns | - | - | - | - |
| PrioritySet | 1000 | 454,989.4 ns | 3,712.11 ns | 3,472.31 ns | 455,354.0 ns | - | - | - | - |
| PairingHeap | 1000 | 459,049.3 ns | 1,706.28 ns | 1,424.82 ns | 459,364.9 ns | 0.9766 | - | - | 96000 B |
| PriorityQueue | 10000 | 3,788,802.4 ns | 11,715.81 ns | 10,958.98 ns | 3,787,811.9 ns | - | - | - | 1 B |
| PrioritySet | 10000 | 5,963,100.4 ns | 26,669.04 ns | 22,269.86 ns | 5,950,915.5 ns | - | - | - | 2 B |
| PairingHeap | 10000 | 6,789,719.0 ns | 134,453.01 ns | 265,397.13 ns | 6,918,392.9 ns | 7.8125 | - | - | 960002 B |
| PriorityQueue | 1000000 | 595,059,170.7 ns | 4,001,349.38 ns | 3,547,092.00 ns | 595,716,610.5 ns | - | - | - | 4376 B |
| PrioritySet | 1000000 | 1,592,037,780.9 ns | 13,925,896.05 ns | 12,344,944.12 ns | 1,591,051,886.5 ns | - | - | - | 288 B |
| PairingHeap | 1000000 | 1,858,670,560.7 ns | 36,405,433.20 ns | 59,815,170.76 ns | 1,838,721,629.0 ns | 1000.0000 | - | - | 96000376 B |

Key takeaways

  • ~The pairing heap implementation asymptotically performs much better than its array-backed counterparts. However it can be up to 2x slower for small heap sizes (< 50 elements), catches up at around 1000 elements, but is up to 2x faster for heaps of size 10^6~.
  • As expected, the pairing heap produces a significant amount of heap allocations.
  • The "PrioritySet" implementation is consistently slow ~the slowest of all three contenders~, so we might not want to pursue that approach after all.

~In light of the above I still believe there are valid trade-offs between the baseline array heap and the pairing heap approach~.

EDIT: updated the results after a bugfix in my benchmarks, thanks @VisualMelon

@eiriktsarpalis Your benchmark for the PairingHeap is, I think, wrong: the parameters to Add are the wrong way around. When you swap them around, it's a different story: https://gist.github.com/VisualMelon/00885fe50f7ab0f4ae5cd1307312109f

(I did exactly the same thing when I first implemented it)

Note that this doesn't mean the pairing heap is faster or slower, rather it seems to depend heavily on the distribution/order of the data supplied.

@eiriktsarpalis re: the usefulness of PrioritySet...
We shouldn't expect updatable to be anything other than slower for heapsort, since it has no priority updates in the scenario. (Also for heapsort, its likely you even want to keep duplicates, a set is just not appropriate.)

The litmus test for seeing whether PrioritySet is useful should be benchmarking algorithms which use priority updates, versus a non-updating implementation of same algorithm, enqueueing the duplicate values and ignoring duplicates when dequeueing.

Thanks @VisualMelon, I updated my results and comments after your suggested fix.

rather it seems to depend heavily on the distribution/order of the data supplied.

I believe it might have benefited from the fact that the enqueued priorities were monotonic.

The litmus test for seeing whether PrioritySet is useful should be benchmarking algorithms which use priority updates, versus a non-updating implementation of same algorithm, enqueueing the duplicate values and ignoring duplicates when dequeueing.

@TimLovellSmith my goal here was to measure performance for the most common PriorityQueue application: rather than measure the performance of updates, I wanted to see the impact for the case where updates are not needed at all. It might however make sense to produce a separate benchmark that compares pairing heap with "PrioritySet" updates.

@miyu thanks for your detailed feedback, it is very much appreciated!

@TimLovellSmith I wrote a simple benchmark that uses updates:

| Method | Size | Mean | Error | StdDev | Median | Gen 0 | Gen 1 | Gen 2 | Allocated |
|------------ |-------- |---------------:|---------------:|---------------:|---------------:|-------:|------:|------:|-----------:|
| PrioritySet | 10 | 1.052 us | 0.0106 us | 0.0099 us | 1.055 us | - | - | - | - |
| PairingHeap | 10 | 1.055 us | 0.0042 us | 0.0035 us | 1.055 us | 0.0057 | - | - | 480 B |
| PrioritySet | 50 | 7.394 us | 0.0527 us | 0.0493 us | 7.380 us | - | - | - | - |
| PairingHeap | 50 | 8.587 us | 0.1678 us | 0.1570 us | 8.634 us | 0.0305 | - | - | 2400 B |
| PrioritySet | 150 | 27.522 us | 0.0459 us | 0.0359 us | 27.523 us | - | - | - | - |
| PairingHeap | 150 | 32.045 us | 0.1076 us | 0.1007 us | 32.019 us | 0.0610 | - | - | 7200 B |
| PrioritySet | 500 | 109.097 us | 0.6548 us | 0.6125 us | 109.162 us | - | - | - | - |
| PairingHeap | 500 | 131.647 us | 0.5401 us | 0.4510 us | 131.588 us | 0.2441 | - | - | 24000 B |
| PrioritySet | 1000 | 238.184 us | 1.0282 us | 0.9618 us | 238.457 us | - | - | - | - |
| PairingHeap | 1000 | 293.236 us | 0.9396 us | 0.8789 us | 293.257 us | 0.4883 | - | - | 48000 B |
| PrioritySet | 10000 | 3,035.982 us | 12.2952 us | 10.8994 us | 3,036.985 us | - | - | - | 1 B |
| PairingHeap | 10000 | 3,388.685 us | 16.0675 us | 38.1861 us | 3,374.565 us | - | - | - | 480002 B |
| PrioritySet | 1000000 | 841,406.888 us | 16,788.4775 us | 15,703.9522 us | 840,888.389 us | - | - | - | 288 B |
| PairingHeap | 1000000 | 989,966.501 us | 19,722.6687 us | 30,705.8191 us | 996,075.410 us | - | - | - | 48000448 B |

On a separate note, has their been discussion/feedback on the lack of stability being an issue (or non-issue) for people's usecases?

has their been discussion/feedback on the lack of stability being an issue (or non-issue) for people's usecase

None of the implementations guarantee stability, however it should be pretty straightforward for users to obtain stability by augmenting the ordinal with insertion order:

var pq = new PriorityQueue<string, (int priority, int insertionCount)>();
int insertionCount = 0;

foreach (string element in elements)
{
    int priority = 42;
    pq.Enqueue(element, (priority, insertionCount++));
}

To summarize some of my previous posts, I've been trying to identify what a popular .NET priority queue would look like, so I went through the following data:

  • Common priority queue usage patterns in .NET source code.
  • PriorityQueue implementations in core libraries of competing frameworks.
  • Benchmarks of various .NET priority queue prototypes.

Which yielded the following takeaways:

  • 90% of priority queue use cases do not require priority updates.
  • Supporting priority updates results in a more complicated API contract (requiring either handles or element uniqueness).
  • In my benchmarks, implementations that support priority updates are 2-3x slower compared to ones that don't.

Next Steps

Going forward, I propose we take the following actions for .NET 6:

  1. Introduce a System.Collections.Generic.PriorityQueue class that is simple, caters to the majority of our user requirements and is as efficient as possible. It'll use an array-backed quaternary heap and will not support priority updates. A prototype of the implementation can be found here. I will be creating a separate issue detailing the API proposal shortly.

  2. We recognize the need for heaps that support efficient priority updates, so will keep working towards introducing a specialized class that addresses this requirement. We are evaluating a couple of prototypes [1,2] each with their own sets of trade-offs. My recommendation would be to introduce this type at a later stage, since more work is needed to finalize the design.

At this moment, I would like to thank the contributors to this thread, in particular @pgolebiowski and @TimLovellSmith. Your feedback has played a huge role in guiding our design process. I hope to keep receiving your input as we iron out the design for the updatable priority queue.

Going forward, I propose we take the following actions for .NET 6: [...]

Sounds good :)

Introduce a System.Collections.Generic.PriorityQueue class that is simple, caters to the majority of our user requirements and is as efficient as possible. It'll use an array-backed quaternary heap and will not support priority updates.

If we have the decision by the codebase owners that this direction is approved and desired, can I continue leading the API design for that bit and provide the final implementation?

At this moment, I would like to thank the contributors to this thread, in particular @pgolebiowski and @TimLovellSmith. Your feedback has played a huge role in guiding our design process. I hope to keep receiving your input as we iron out the design for the updatable priority queue.

It was quite a journey :D

The API for System.Collections.Generic.PriorityQueue<TElement, TPriority> has just been approved. I've created a separate issue to continue our conversation on a potential heap implementation that supports priority updates.

I'm going to close this issue, thank you all for your contributions!

Maybe someone can do a write up on this journey! A whole 6 years for one API. :) Any chance to win a Guinness?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

v0l picture v0l  ·  3Comments

GitAntoinee picture GitAntoinee  ·  3Comments

jamesqo picture jamesqo  ·  3Comments

omajid picture omajid  ·  3Comments

aggieben picture aggieben  ·  3Comments