I've stumbled on a bug in the implementation of RedBlackTree, which results in an occasional ClassCastException when calling methods like intersection or difference. In particular, the following test throws:
final RedBlackTree<Integer> t1 =
RedBlackTree.ofAll(Integer::compare, Vector.of(1462193440, 0,
2147483647, -2147483648, 0, 637669539, -1612766076, -1, 1795938819, 1,
0, -420800448, -2147483648, 497885405, 0, 1084073832, 1, 1439964148,
1961646330));
final RedBlackTree<Integer> t2 =
RedBlackTree.ofAll(Integer::compare, Vector.of(-1, 1, 2147483647,
-1434983536, -2147483648, -1452486079, 1365799971, 231691980,
-1780534767, -2147483648, 1448658704, 0, 1526591298));
final RedBlackTree<Integer> t3 =
RedBlackTree.ofAll(Integer::compare, Vector.of(-1188825722, -582974613));
assertThat(t1.intersection(t2).intersection(t3).isEmpty());
java.lang.ClassCastException: io.vavr.collection.RedBlackTreeModule$Empty cannot be cast to io.vavr.collection.RedBlackTreeModule$Node
at io.vavr.collection.RedBlackTreeModule$Node.joinGT(RedBlackTree.java:703)
at io.vavr.collection.RedBlackTreeModule$Node.joinGT(RedBlackTree.java:703)
at io.vavr.collection.RedBlackTreeModule$Node.join(RedBlackTree.java:692)
at io.vavr.collection.RedBlackTreeModule$Node.split(RedBlackTree.java:802)
at io.vavr.collection.RedBlackTree.intersection(RedBlackTree.java:172)
at io.vavr.collection.RedBlackTree.intersection(RedBlackTree.java:176)
Thanks for reporting!
First analysis:
Our algorithm is inspired by/based on: https://github.com/kazu-yamamoto/llrbtree/blob/master/Data/Set/RBTree.hs#L482
Given the example, in joinGT we observe:
n1: (B:0 (B:-420800448 R:-1) (B:1 R:497885405)), n2: (B:1084073832), h2: 1, n1.blackHeight: 2
n1: (B:1 R:497885405), n2: (B:1084073832), h2: 1, n1.blackHeight: 1
n1: (B:-1 R:0), n2: (B:2147483647), h2: 0, n1.blackHeight: 1
n1: (R:0), n2: (B:2147483647), h2: 0, n1.blackHeight: 1
which means that a RED node with a single value 0 and no children has black hight 1, which is suspicious. I would have expected 0.
Now the questions are
1) does the hight calculation fail somewhere?
2) if yes, where?
I'm busy tomorrow and will get my hands on it at evening.
The empty node has BLACK color so the RED node in question should have empty children and thus black-height 1.
I suspect the problem is with one of the building blocks of the intersection algorithm like join/split. Will dig up my old algorithms and data structures text book...
Thanks @roberterdin. I will dig out the property checks I once created in order to check the constraints of Red/Black Trees. If we perform operations on random Red/Black Trees, we will se if any other operation suffers from that bug.
An easy to implement fix for a 0.9.2 patch release could be to do the intersection by building a new tree using insert operations. This is only for the case we can't fix the algorithm. I also spent some hours and double-checked the original Haskell implementation.
I added an issue and asked for help checking the counter-example in the Haskell version (see linked issue above).
Here is a shorter/more readable example:
final RedBlackTree<Integer> t1 =
RedBlackTree.ofAll(Integer::compare, Vector.of(8, 14, 0, 7, 9, 3));
final RedBlackTree<Integer> t2 =
RedBlackTree.ofAll(Integer::compare, Vector.of(7, 9, 14, 6, 0, 5, 11, 10, 4, 12, 8, 13));
final RedBlackTree<Integer> t3 =
RedBlackTree.ofAll(Integer::compare, Vector.of(1, 2));
assertThat(t1.intersection(t2).intersection(t3).isEmpty());
On the other hand, this test passes correctly:
assertThat(t1.intersection(t2.intersection(t3)).isEmpty());
I played around with it a bit yesterday and noticed a few things.
1) It happens hardly ever. I ran
IntStream.iterate(0, i -> i)
.parallel()
.unordered()
.forEach(i -> {
Random r = ThreadLocalRandom.current();
List<Integer> s1 = List.fill(r.nextInt(6), () -> r.nextInt(15));
List<Integer> s2 = List.fill(r.nextInt(6), () -> r.nextInt(15));
final RedBlackTree<Integer> t1 = RedBlackTree.ofAll(Integer::compare, s1);
final RedBlackTree<Integer> t2 = RedBlackTree.ofAll(Integer::compare, s2);
try {
t1.intersection(t2);
}catch (ClassCastException ex){
System.out.println("t1: " + t1);
System.out.println("t2: " + t2);
}
});
For > 10⁹ iterations and it did not happen. I ran it again without random tree size, i.e. a tree with size 5 intersected with a tree with size 2, which is when it happens in the above example. There the error happened after 200 odd million iterations.
2) https://github.com/vavr-io/vavr/blob/master/vavr/src/main/java/io/vavr/collection/RedBlackTree.java#L639
This creates trees with incorrect black-heights. It creates Non-empty nodes with black-height 0 which should not be possible, right?
It is not necessarily the source of the error but probably also something worth investigating.
3) A ∩ B = A - (A - B), which would be slow but also an easy fix.
Thanks Robert! Point 2) is interesting. We should solve it for a neartime 0.9.2 patch release with 3).
For 1.0.0 RedBlackTree will be re-implemented with #1535. We will not have any casts between (non-empty) Node and Empty RB-Tree. This will help a lot.
Unfortunately, trading intersection for difference doesn't work since difference has the same problem:
RedBlackTree<Integer> t1 = RedBlackTree.ofAll(Integer::compare, List.of(20, 6, 8, 10, 15, -5, 14, 5, 9));
RedBlackTree<Integer> t2 = RedBlackTree.ofAll(Integer::compare, List.of(19, 0, 9));
RedBlackTree<Integer> t3 = RedBlackTree.of(Integer::compare, 0);
assertThat(!t1.difference(t2).difference(t3).isEmpty());
@roberterdin Good that you've found an example with two sets. With my generator, I have only been able to find examples involving three or more sets.
I think difference and intersection are correct. The root of the problem is probably located in tree construction/inserting new nodes. I‘ve done more tests and the coloring is suspicious. I will check that.
I think I found the issue. The wrong black-heights cause the joinLT and joinGT to fail.


The call you can see should end up in the if statement but instead ends up in the else which causes the exception. Haven't had time to look at why these heights are wrong. Just wanted to prevent you from investigating something that's not relevant.
edit:
Removing the -1 here https://github.com/vavr-io/vavr/blob/master/vavr/src/main/java/io/vavr/collection/RedBlackTree.java#L639 seems to fix the issue. Might introduce new bugs or incorrect trees though ;) (Also as far as I can tell it is there in the original code... my Haskell is non-existent though)
edit2: don't think removing the -1 is a good idea... check the provided fix instead
Please verify if https://github.com/roberterdin/vavr/commit/0a564fe3a607d6647566e7136c31ada79d0a2693 fixes it.
@roberterdin Nice, thank you! I hope I'm able to do it this evening because I'm on a family trip.
@roberterdin Great finding!! 😃
Please find my comments here: roberterdin/vavr@0a564fe
@roberterdin In addition to your solution, please add the following to RedBlackTreeTest.java directly before the line // union():
/*
* > let tree1 = fromList [8, 14, 0, 7, 9, 3]
* > let tree2 = fromList [7, 9, 14, 6, 0, 5, 11, 10, 4, 12, 8, 13]
* > tree1 `intersection` tree2
* Node B 2 (Node B 1 (Node R 1 Leaf 0 Leaf) 7 Leaf) 8 (Node B 1 (Node R 1 Leaf 9 Leaf) 14 Leaf)
* > printSet (tree1 `intersection` tree2)
* B 8 (2)
* + B 7 (1)
* + R 0 (1)
* +
* +
* +
* + B 14 (1)
* + R 9 (1)
* +
* +
* +
*/
@Test
public void shouldPassIntersectionRegression1_Issue2098() {
final RedBlackTree<Integer> tree1 = of(8, 14, 0, 7, 9, 3);
final RedBlackTree<Integer> tree2 = of(7, 9, 14, 6, 0, 5, 11, 10, 4, 12, 8, 13);
final String actual = tree1.intersection(tree2).toString();
final String expected = "(B:7 B:0 (R:9 B:8 B:14))"; // TODO: should be "(B:8 (B:7 R:0) (B:14 R:9))"
assertThat(actual).isEqualTo(expected);
}
/*
* > let tree1 = fromList [8, 14, 0, 7, 9, 3]
* > let tree2 = fromList [7, 9, 14, 6, 0, 5, 11, 10, 4, 12, 8, 13]
* > let tree3 = fromList [1, 2]
* > (tree1 `intersection` tree2) `intersection` tree3
* Leaf
* > tree1 `intersection` (tree2 `intersection` tree3)
* Leaf
*/
@Test
public void shouldPassIntersectionRegression2_Issue2098() {
final RedBlackTree<Integer> tree1 = of(8, 14, 0, 7, 9, 3);
final RedBlackTree<Integer> tree2 = of(7, 9, 14, 6, 0, 5, 11, 10, 4, 12, 8, 13);
final RedBlackTree<Integer> tree3 = of(1, 2);
final RedBlackTree<Integer> actual = tree1.intersection(tree2).intersection(tree3);
final RedBlackTree<Integer> expected = tree1.intersection(tree2.intersection(tree3));
assertThat(actual).isEqualTo(expected);
}
/*
* > let tree1 = [1462193440, 0, 2147483647, -2147483648, 0, 637669539, -1612766076, -1, 1795938819, 1, 0, -420800448, -2147483648, 497885405, 0, 1084073832, 1, 1439964148, 1961646330]
* > let tree2 = [-1, 1, 2147483647, -1434983536, -2147483648, -1452486079, 1365799971, 231691980, -1780534767, -2147483648, 1448658704, 0, 1526591298]
* > tree1 `intersection` tree2
* Node B 2 (Node B 1 (Node R 1 Leaf (-2147483648) Leaf) (-1) Leaf) 0 (Node B 1 (Node R 1 Leaf 1 Leaf) 2147483647 Leaf)
*/
@Test
public void shouldPassIntersectionRegression3_Issue2098() {
final RedBlackTree<Integer> tree1 = of(1462193440, 0, 2147483647, -2147483648, 0, 637669539, -1612766076, -1, 1795938819, 1, 0, -420800448, -2147483648, 497885405, 0, 1084073832, 1, 1439964148, 1961646330);
final RedBlackTree<Integer> tree2 = of(-1, 1, 2147483647, -1434983536, -2147483648, -1452486079, 1365799971, 231691980, -1780534767, -2147483648, 1448658704, 0, 1526591298);
final String actual = tree1.intersection(tree2).toString();
final String expected = "(B:-1 B:-2147483648 (R:1 B:0 B:2147483647))"; // TODO: should be "(B:0 (B:-1 R:-2147483648) (B:2147483647 R:1))"
assertThat(actual).isEqualTo(expected);
}
I think I finally got it. There is a + 1 missing for the black height at https://github.com/vavr-io/vavr/blob/master/vavr/src/main/java/io/vavr/collection/RedBlackTree.java#L747
Are you sure your tests are correct? I get shouldPassIntersectionRegression1: "(B:8 (B:7 R:0) (B:14 R:9))", your TODO says "(B:8 (B:7 R:0) (B:14 R:0))" but I reckon that's a typo?
@roberterdin nice! yes, thx, it is a typo. I corrected it in the comment above.
Most helpful comment
I played around with it a bit yesterday and noticed a few things.
1) It happens hardly ever. I ran
For > 10⁹ iterations and it did not happen. I ran it again without random tree size, i.e. a tree with size 5 intersected with a tree with size 2, which is when it happens in the above example. There the error happened after 200 odd million iterations.
2) https://github.com/vavr-io/vavr/blob/master/vavr/src/main/java/io/vavr/collection/RedBlackTree.java#L639
This creates trees with incorrect black-heights. It creates Non-empty nodes with black-height 0 which should not be possible, right?
It is not necessarily the source of the error but probably also something worth investigating.
3) A ∩ B = A - (A - B), which would be slow but also an easy fix.