Hi,
today I've stumbled across the sort order of arrays using a block.
I would expected that [1, 2] == [1, 2].sort { 0 } to be true.
It is true for at least Ruby and Java. In Crystal it's false.
Is it a bug or just an undefined behaviour of quicksort?
Kind regards,
Peter
I inquired about this a few days ago. Ultimately this is not a bug, but should perhaps be reconsidered.
The term you're looking for is stable sort. Indeed, many languages' built-in sorting algorithms are stable, even though this property slightly limits the performance of sorting. C++, on the other hand, made a separate stable_sort.
I think the most popular sorting method is Timsort, used in Java and Python, and it is stable. Quicksort, which Crystal uses, does not have this property.
bcardiff shared this example code https://gist.github.com/5395bfaef35ac1d14a1a
This adds the original indices of the items to the key tuples to keep the original order.
We just have a hand-made quicksort to be able to sort stuff, but we didn't actually though a lot about which algorithm to use and if we want it to be stable. So asking this here is a good chance to start the discussion :smile_cat:
(that is, because we wanted to bootstrap the compiler and it uses sort...)
It seems that after 53be65dee36a8dbcdb2d3dc1fde17202bd024acb sorting Array is stable:
$ crystal -v
Crystal 0.19.4 [7f82f79] (2016-10-07)
$ crystal eval "p [1, 2] == [1, 2].sort { 0 }"
false
$.build/crystal -v
Crystal ci (2016-11-19)
$ .build/crystal eval "p [1, 2] == [1, 2].sort { 0 }"
true
I suspect this issue is resolved then and we can close it?
/cc @c910335
Maybe it's worth checking more thoroughly, seeing as 2 different algorithms are used...
@BlaXpirit Of course, good point! %)
@BlaXpirit You are right. It's stable until a size of 16:
$ .build/crystal eval "a = (1..16).to_a; p a == a.sort { 0 }"
true
$ .build/crystal eval "a = (1..17).to_a; p a == a.sort { 0 }"
false
So, it's not stable enough ;)
@splattael
Actually, sort in Ruby is not guaranteed to be stable.
https://docs.ruby-lang.org/en/trunk/Array.html#method-i-sort
$ ruby --version
ruby 2.3.2p217 (2016-11-15 revision 56796) [x86_64-darwin16]
$ ruby -e "puts (0...7).to_a.sort { 0 }.to_s"
[3, 1, 2, 0, 4, 5, 6]
@c910335 So, Ruby's behaviour has changed in a patch level :scream:
In Ruby 2.3.1 it is stable:
$ ruby -v
ruby 2.3.1p112 (2016-04-26 revision 54768) [x86_64-linux]
$ ruby -e 'p (0...7).to_a.sort { 0 }'
[0, 1, 2, 3, 4, 5, 6]
The sentence The result is not guaranteed as stable. When comparison of two elements returns 0, the order of the elements is unpredictable. is new in 2.3.2.
It wasn't there in 2.3.0 https://docs.ruby-lang.org/en/2.3.0/Array.html#method-i-sort.
TBH:
¯_(ツ)_/¯
I wish it was stable ;)
On my machine Ruby 2.3.2 is still stable :open_mouth:
$ ruby -v
ruby 2.3.2p217 (2016-11-15 revision 56796) [x86_64-linux]
$ ruby -e 'p (0...7).to_a.sort { 0 }'
[0, 1, 2, 3, 4, 5, 6]
(╯°□°)╯︵ ┻━┻
Computering is hard. Let's go shopping.
I did some research which I have should done before submitting this issue.
Ruby's sort is not stable. It never was. (Maybe it was stable by accident).
Some references:
I am closing this issue.
Sorry for the repetitive noise :-(
@splattael nice work 👍
Go for example provides two sort functions, one stable and one unstable. We could do the same, so if you really need a stable sort you can do it.
https://golang.org/pkg/sort/#Sort
https://golang.org/pkg/sort/#Stable
Whoa, I always assumed ruby's sort was stable. I've even always used it as if it was stable
Even just tried it:
$ ruby -ve "puts (0...7000).to_a.sort { 0 } == (0...7000).to_a"
ruby 2.5.5p157 (2019-03-15 revision 67260) [x86_64-linux-gnu]
true
$ crystal eval "puts (0...7000).to_a.sort { 0 } == (0...7000).to_a"
false
but in OS X
$ ruby -ve "puts (0...70).to_a.sort { 0 } == (0...70).to_a"
ruby 2.3.7p456 (2018-03-28 revision 63024) [universal.x86_64-darwin17]
false
Weird. Appears it just calls the C level qsort_r function: https://linux.die.net/man/3/qsort
Which is different on OS's.
That's a bit confusing, to end users, in my mind.
ref: https://www.ruby-forum.com/t/sort-by-is-not-stable/198761
I think java avoids confusion by sorting "natives" (like ints) using quicksort (since reordering is not detectable), and sorts class objects using timsort, so in the end everything "looks" stable, by default.
Also wonder if Timsort is faster than the workarounds that people propose on ruby-core (#with_index etc.) which don't seem very fast in some benchmarks I saw: https://bugs.ruby-lang.org/issues/1089 ("5x slower")
Most helpful comment
Go for example provides two
sortfunctions, one stable and one unstable. We could do the same, so if you really need a stable sort you can do it.https://golang.org/pkg/sort/#Sort
https://golang.org/pkg/sort/#Stable