Vavr: Port missing methods from java.util.TreeMap

Created on 21 Apr 2016 · 33Comments · Source: vavr-io/vavr

I am really missing the java.util.TreeMap methods that would create submaps efficiently from an existing javaslang.collection.TreeMap:

- TreeMap<K, V> subMap(K fromKey, K toKey)
- TreeMap<K, V> subMap(K fromKey, boolean fromInclusive, K toKey, boolean toInclusive)
- TreeMap<K, V> headMap(K toKey)
- TreeMap<K, V> headMap(K toKey, boolean inclusive)
- TreeMap<K, V> tailMap(K fromKey)
- TreeMap<K, V> tailMap(K fromKey, boolean inclusive)

Similarly, I think it would be good to port the methods that find the higher/lower keys efficiently:

- Tuple2<K, V> higher(K key)
- K higherKey(K key)
- Tuple2<K, V> lower(K key)
- K lowerKey(K key)
- Tuple2<K, V> ceiling(K key)
- K ceilingKey(K key)
- Tuple2<K, V> floor(K key)
- K floorKey(K key)

The higher()/lower()/ceiling()/floor() methods are replacements of the higherEntry()/lowerEntry()/ceilingEntry()/floorEntry() java.util.TreeMap counterparts.

What do you think?
I would be happy to help implement this.

feature «vavr-collection»

Source

eduardmanas

👍3

Most helpful comment

Is there any reason why we used head() instead of first()?

we try to be as close as possible to Scala API

ruslansennov on 22 Apr 2016

👍2

All 33 comments

Hi Eduard,
great Idea! I'm really looking forward to your PR!
Let's call the methods *Entry (e.g. higherEntry) to maximize the recognition value / to align it to the original Java 8 method names. (It does not make a difference that we return a Tuple2 instead of an Entry.)

danieldietrich on 22 Apr 2016

That's great, let me give it a try. I'll create a NavigableMap interface similar to the java.util.

The reason why I suggested to drop "Entry" from the method names is to keep it consistent with the current naming in the javaslang collection. A lot of methods return Tuple2 (equivalent of Map.Entry) and we do not name them "Entry". Examples are get(), head(), last(), min(), max(). In the java.util.Map hierarchy, these methods would return the value and not the Map.Entry.

I get your point we need to maximise recognition with the original Java 8 method names. But also we need to balance this with building a cohesive library with consistent names.

So what do you think, "Entry" or no "Entry"? ;)

eduardmanas on 22 Apr 2016

Also, I noticed an inconsistency with the head()/tail() method names. head() returns the first item of the map, but tail() returns a TreeMap without the first element.

Also, last() returns the last item in the Map, but there is no natural counterpart called first() (it is currently called head())

What if we deprecate both head() and tail() for inconsistent, and we add the following methods to the new NavigableMap interface:

* TreeMap<K, V> headMap() // returns the head of the tree without the last item - equivalent of headMap(last(), false)
* TreeMap<K, V> tailMap() // returns the tail of the tree without the first item - equivalent of tailMap(first(), false)
* first() // Returns the first item - equivalent of min().getOrElseThrow()

Update:
On a closer look, it seems the inconsistency comes from the Traversable interface, where we use head() to get the first item and last() to get the last item. Is there any reason why we used head() instead of first()?

Similarly, it would feel more natural if head() had been the natural counterpart of the existing tail() returning Traversable<T>.

eduardmanas on 22 Apr 2016

👍1

Is there any reason why we used head() instead of first()?

we try to be as close as possible to Scala API

ruslansennov on 22 Apr 2016

👍2

head↔last (first↔last)
init↔tail (all - last)↔(all - first)

I agree that the naming is quote unfortunate, but functional programming is filled with these (e.g. List.map will return a List (not a Map), filter (will this return those that match or those that don't?) instead of find ... or simply Java's String instead of Text or something... and let's not get started on flatMap or foldRight).

Groovy dared to rename some of these -- which make a lot more sense in my opinion --, but since Javaslang is meant as a Scala alternative for Java (I think), it might be best to appeal to existing Scala users.

ps. I would vote for no *Entry, we're not following Java conventions (and not returning an Entry)

paplorinc on 22 Apr 2016

You are right. The current javaslang implementation is a port of the Traversable Scala trait. I think we should keep it as it is. It will confuse some Java programmers at the beginning (like me!), but at least it is consistent with the Scala port.

It anything, we might want to consider adding first()/firstOption() to Traversable as an alternative to head()/headOption(). It would feel more natural to people coming from a Java background.

eduardmanas on 22 Apr 2016

I prefer not to provide alternatives to give a clear path to do things. We currently have one exception: length() and size()

I've read a bit ... Daniel C. Sobral writes that a persistent version can't have back references. Therefore we need an ordering. The ordered collections of Javaslang are based on a RedBlackTree. Is there an efficient way to seach neighbors in a binary tree?

rmleon/GorillasCollection has an example of a NavigableMap interface.

danieldietrich on 22 Apr 2016

:+1: for not naming the methods *Entry

danieldietrich on 22 Apr 2016

I have two implementations to calculate the floor in RedBlackTree:

Option 1: "uglier" imperative approach and faster (but not too ugly)
Option 2: "nicer" functional approach and slower (but not too slow)

I have tested with trees of 100, 1000, 10000 and 100000 items, and Option 1 is consistently 10-15% faster. Having said that, both options have complexity O(log N), so they are both relatively fast.

Any preferences?

Option 1

        @Override
        public Option<T> floor(T value) {
            Option<T> candidate = Option.none();
            RedBlackTree<T> current = this;
            while (!current.isEmpty()) {
                final Node<T> node = (Node<T>) current;
                final int result = empty.comparator.compare(value, node.value);
                if (result == 0) {
                    return Option.of(value);
                }
                if (result < 0) {
                    current = node.left;
                }
                else {
                    candidate = Option.of(node.value);
                    current = node.right;
                }
            }
            return candidate;
        }

Option 2

        @Override
        public Option<T> floor(T value) {
            return floor(value, Option.none());
        }

        @Override
        public Option<T> floor(T value, Option<T> candidate) {
            final int result = empty.comparator.compare(value, this.value);
            if (result == 0) {
                return Option.of(value);
            }
            if (result < 0) {
                return left.floor(value, candidate);
            }
            return right.floor(value, Option.of(this.value));
        }

eduardmanas on 23 Apr 2016

If I understood it correctly, the only difference is that you did a manual tail call optimization in the first example, right?

Since it's logarithmic, you probably won't get a stack overflow, I'm personally ok with both.
I'm not sure yet, how this will return the greatest element in this set less than or equal to the given element, but will investigate once it's pushed :).

Notes / personal preferences:

Maybe change the if's to else ifs, where applicable to signal to the user that they're mutually exclusive :)
In the first case the while can be a for, if you don't mind leaving the update part empty
not sure what the empty.comparator is exactly (shouldn't there be a local comparator), but can we extract it outside the loop?
could we reorder the ifs to <, =, >
maybe try a switch (normalized by signum?), to signal that the subject of the comparison is the same?

paplorinc on 23 Apr 2016

👍1

I compared the floor method performance against java.util.TreeMap. TreeMap is about 20% faster than Option1, and about 30% faster than Option2.

I created another version that uses nulls instead of Option inside of the loop (Option 0). That option is 10% faster than Option 1, and just 10% slower than java.util.TreeMap.

The java.util.TreeMap implementation is similar to Option0/Option1. These are the main differences:

TreeMapdoes not make use of empty nodes
TreeMapcan access the parent node

I think most of that 10% performance is due to the extra processing of empty nodes in our RedBlackTrees.

I am going to proceed to implement the remaining methods using the Option 0 approach.

Option 0

``` @Override
public Option floor(T value) {
T candidate = null;
RedBlackTree current = this;
while (!current.isEmpty()) {
final Node node = (Node) current;
final int result = empty.comparator.compare(value, node.value);
if (result < 0) {
current = node.left;
}
else if (result > 0) {
candidate = node.value;
current = node.right;
}
else {
return Option.of(value);
}
}
return candidate == null ? Option.none() : Option.of(candidate);
}

I just noticed you are using JMH for Benchmarking, so here the results:

Benchmark (containerSize) Mode Cnt Score Error Units
TreeMapBenchmark.floor 10 thrpt 10 104.216 ± 17.007 ops/us
TreeMapBenchmark.floor:floorJava TreeMapBenchmark.floor:floorOption0 TreeMapBenchmark.floor:floorOption1 TreeMapBenchmark.floor:floorOption2 TreeMapBenchmark.floor 100 thrpt 10 64.840 ± 2.076 ops/us
TreeMapBenchmark.floor:floorJava TreeMapBenchmark.floor:floorOption0 TreeMapBenchmark.floor:floorOption1 TreeMapBenchmark.floor:floorOption2 TreeMapBenchmark.floor 1000 thrpt 10 45.472 ± 1.646 ops/us
TreeMapBenchmark.floor:floorJava TreeMapBenchmark.floor:floorOption0 TreeMapBenchmark.floor:floorOption1 TreeMapBenchmark.floor:floorOption2 TreeMapBenchmark.floor 10000 thrpt 10 29.929 ± 0.567 ops/us
TreeMapBenchmark.floor:floorJava TreeMapBenchmark.floor:floorOption0 TreeMapBenchmark.floor:floorOption1 TreeMapBenchmark.floor:floorOption2 10 thrpt 10 28.811 ± 6.408 ops/us
10 thrpt 10 27.011 ± 5.214 ops/us
10 thrpt 10 24.929 ± 3.590 ops/us
10 thrpt 10 23.466 ± 3.486 ops/us
100 thrpt 10 19.670 ± 0.820 ops/us
100 thrpt 10 16.900 ± 0.187 ops/us
100 thrpt 10 15.502 ± 1.287 ops/us
100 thrpt 10 12.769 ± 0.167 ops/us
1000 thrpt 10 13.127 ± 0.475 ops/us
1000 thrpt 10 11.847 ± 0.387 ops/us
1000 thrpt 10 11.131 ± 0.434 ops/us
1000 thrpt 10 9.367 ± 0.410 ops/us
10000 thrpt 10 8.421 ± 0.154 ops/us
10000 thrpt 10 7.766 ± 0.375 ops/us
10000 thrpt 10 7.116 ± 0.208 ops/us
10000 thrpt 10 6.627 ± 0.089 ops/us

```

eduardmanas on 23 Apr 2016

👍1

Hi @eduardmanas,

that sounds great! +1 for implementing Option0. Internally Scala does also prefer more optimized implementations over the most concise/purely function. It is absolutely ok to use null!

I'm not 100% sure if we really need a NavigableMap interface. Since TreeMap is the only implementation we could also place the methods within SortedMap. Is there a chance that there will be a SortedMap implementation that will not implement NavigableMap. Does there exist anywhere out there such an implementation?

Btw - if you don't mind, I would also commit the benchmarks of floorJava and floorOption0. This will be a good showcase.

danieldietrich on 23 Apr 2016

@paplorinc +1 for using else if and ordering the cases by <, == and >. /cc @eduardmanas

danieldietrich on 23 Apr 2016

@danieldietrich, I was also thinking whether there is any point having a NavigableMap interface.

Not sure why anybody would want to have a non-navigable SortedMap, as the whole point of having the map sorted is that you can navigate it. For starters there is a lot of overlap between the java.util.SortedMap and java.util.NavigableMap. SortedMap, for example, has many navigational methods such as firstKey/headMap/lastKey/subMap/tailMap that also belong to the NavigableMap interface.

This conceptual inconsistency is also clearly shown in the JDK implementation. All JDK implementation of SortedMap also implement NavigableMap, and vice-versa. Given NavigableMap extends SortedMap, if this was a clean split of functionality, implementations should just implement NavigableMap and not SortedMap

Last but not least, the Scala collections hierarchy does not have a NavigableMap interface.

So +1 to place the new methods in the javaslang `SortedMap interface.

eduardmanas on 24 Apr 2016

@eduardmanas Yes, that makes sense. Let's place it in the SortedMap interface as long there is no _real_ reason to split that interface.

danieldietrich on 24 Apr 2016

I agree with using the else if (already implemented in Option 0) ;)

As per <, ==, >, I am not so sure. I prefer <, >, ==, and here is why.

Understanding how to navigate Red-Black trees is complex stuff. Anybody looking at this code will probably have to bang his/her head against a table for a while until everything starts to make sense anyway.

When looking at the condition inside of the while, you are conceptually in a node and you are presented with three options depending whether the value is same, lower or higher than the value of the node.

The option that everybody will tackle first is the simplest case when values are the same. This case is straight forward. You found the node you were looking for, just return it. At this point, people will probably want to go for a break faced with the daunting thought of having to tackle the other two scenarios. If the Comparator returns -1, does that mean the value is lower or higher than the node? If the value is lower, do I go left or right? Do I keep track of the candidate? Have I found my value?

In other words, the < and > cases are similar in terms of complexity and reasoning, and therefore it makes sense to have them one after the other. The == case is much easier to understand and completely different in terms of reasoning. If we put == in the middle, we would be breaking this flow.

eduardmanas on 24 Apr 2016

I prefer <, >, ==, and here is why.

Go ahead and choose the solution that makes most sense. It is good to keep rules in mind but we should choose different ways to do s.th. when it makes sense. There is no 'on size fits all'.

danieldietrich on 24 Apr 2016

👍1

Quick Question:

Should min()/max() use the natural comparator or the tree comparator?

The javadoc in Traversable states min()/max() should use the natural comparator, so using the (expected) tree comparator would be breaking that contract. This seems wrong/counter-intuitive for any sorted collection.

head()/last(), on the other hand, use the tree comparator.

This is especially troublesome with descending maps. For example, if we have this tree

TreeMap<Integer, Integer> tree = TreeMap.of(Tuple.of(1, 1), Tuple.of(2, 2), Tuple.of(3, 3));

Then we get the below as expected:

tree.min()  // Option.of(Tuple.of(1, 1))
tree.max()  // Option.of(Tuple.of(3, 3))
tree.head()  // Tuple.of(1, 1)
tree.last()  // Tuple.of(3, 3)

However, when we change the comparator then we get weird things happening with min()/max():

tree.descendingMap().min()  // Option.of(Tuple.of(1, 1))
tree.descendingMap().max()  // Option.of(Tuple.of(3, 3))
tree.descendingMap().head()  // Tuple.of(3, 3) 
tree.descendingMap().last()  // Tuple.of(1, 1)

eduardmanas on 27 Apr 2016

@eduardmanas Yes, I see. Changing the contract of Traversable min/max will change the semantics but it would be more intuitive to use the underlying comparator, if present.

I think this can be seen as bug. Very nice catch! I created an additional issue: #1304

danieldietrich on 27 Apr 2016

By the way, there is a bug in the current implementation of TreeMap.last() and TreeSet.last(), as they are backed by Traversable.max(). It is broken for any TreeMap/TreeSet not using the default comparator.

I'll fix them as part of my PR.

eduardmanas on 27 Apr 2016

Nice, thank you!

danieldietrich on 27 Apr 2016

@eduardmanas TreeMap and TreeSet use the RedBlackTree as backend. I think there the min (minimum) and max (maximum) functions work correct. These should be used...

danieldietrich on 27 Apr 2016

That's right @danieldietrich! We should be using the RedBlackTree.min()/RedBlackTree.max()

Using Traversable.max() is also very inefficient as we need to traverse the whole tree to get to the last node.

I was planning to override min()/max() to use the backing RedBlackTree to make them more efficient when I realised the "little note" in the javadocs.

I'll fix #1304 as part of this PR. I think only TreeMap and TreeSet impacted. Do you know of any others?

eduardmanas on 27 Apr 2016

Great! No, there are no others.

danieldietrich on 27 Apr 2016

Would it be a capital punishment if I used a java.util.Deque instead of a javaslang Stack inside of the RedBlackTree? I am experimenting trying to improve performance, and java.util.Deque seems to help in some scenarios.

Also, are iterators meant to be thread-safe in javaslang collections?

eduardmanas on 4 May 2016

To put it into context, when using java.util.ArrayDeque the performance of the javaslang implementation goes up by 40% when iterating over a TreeMap.

I don't think it is an issue with the javaslang.collection.Stack implementation, as when using java.util.LinkedList I get similar performances. The difference I think is that java.util.ArrayDequeu uses an array as the backing structure for the stack, and hence avoids having to create so many objects and linking them while iterating.

I don't see any issues using java.util.ArrayDeque unless the iterator needs to be thread-safe. For what I can see the current iterator implementation using javaslang.collection.Stack is not thread-safe anyway, as it is possible for two threads to retrieve the same item from the iterator.

eduardmanas on 4 May 2016

used a java.util.Deque instead of a javaslang Stack inside of the RedBlackTree

If you can guarantee that the mutability won't leak out and you don't need to synchronize the object for certain operations, it should be ok. Also, the readability of the code shouldn't change too much.

Generally I tend to use mutability inside a method only, builder-pattern style.

paplorinc on 4 May 2016

Sorry for being so silent the last days. I also think that using an ArrayDequeue should be ok if the mutability does not leak out. Please keep in mind that we get thread-safety for free when using immutable, persistent data structures. We have to ensure that different versions (e.g. when adding or removing elements) of our collections are still persistent (i.e. the previous version does not change).

Iterators are inherently mutable and (in our case) not thread-safe. Here it is ok to use the fastest implementation possible.

danieldietrich on 6 May 2016

Hello guys! Is the implementation on this feature stalled? Any plans on going forward?

nfekete on 2 Feb 2017

https://github.com/javaslang/javaslang/pull/1317 addresses this issue, but it's in review for almost a year.
@eduardmanas, @danieldietrich, what's the reason for that?

paplorinc on 2 Feb 2017

@nfekete @paplorinc @eduardmanas I'm the reason for the stall - more specifically the lack of time. Eduard did a great job. I was not satisfied with the changes to the low-level data structure RedBlackTree and worked on an alternative solution. I had a 'nearly finished' version at hand that only missed one method (that could be shipped in a preliminary O(n) version). Eduard meanwhile came up with the suggestion to optimize the RBT in the way that the empty tree is represented by null. I wanted to fix that first. Now I only need time.

Here is the current state of the branch: https://github.com/danieldietrich/javaslang/blob/RedBlackTree/javaslang/src/main/java/javaslang/collection/RedBlackTree.java#L1204

If we can get rid of the Empty RBT, then we do not need AbstractTree and Node any more. There will be only one class RedBlackTree (and the DescendingTreeView aka ReverseRedBlackTreeView).

danieldietrich on 5 Feb 2017

Efficient headMap, tailMap, and subMap are the essential methods I would expect when working with a sorted map - while looking around vavr, it looks like these are not implemented?

Looking at the RedBlackTree#find implementation, it seems like with a bit of additional work, RBT could support these types of range-based operations?