Chapel: Improve Sort module

Created on 18 Mar 2017  路  19Comments  路  Source: chapel-lang/chapel

Next steps for the Sort module:

  • Implement timSort (suggested by @russel) (https://github.com/chapel-lang/chapel/issues/15045)
  • Implement radixSort (@mppf called dibs)
  • Utilize first class functions (once FCF support improves) as comparators
  • Support sorting arrays with rank != 1, as long as comparator specifies which rank to use
  • Utilize something like a raw() accessor for stride/offset-independent implementations (#5368)
  • Parallelism & Performance
Libraries / Modules good first issue Feature Request

All 19 comments

Hi @ben-albrecht ,
I would like to help out with this.
I was looking at timsort and radix sort. For timsort, the implementation requires binary insertion sort for sorting below a threshold. For radix sort, the implementations require counting sort.

I can find neither binary insertion or counting sort in the Sort.chpl module. I think it would be a good idea to include these as well.

Hi @xSooDx - thanks for your interest in contributing!

As the PR notes (maybe unclearly), @mppf plans to implement radixSort.

Sounds like a good plan for timSort. The insertionSort might be a good starting point for the binaryInsertionSort implementation. Also, be aware that strides and offsets in domains need to be accounted for in the Chapel implementation.

I have seen the Sort module and will follow the same structure.

I have made an interesting observation:
For binaryInsertionSort I want to use the binary search function from the Search Module. However I do not want to use the entire module.
Using use Search and use Search only binarySearch globaly, my code works.
However using use Search only binarySearch within my binaryInsertionSort procedure, I get

binaryInsertionSort.chpl:4: In function 'binaryInsertionSort':
binaryInsertionSort.chpl:17: error: unresolved call 'binarySearch([domain(1,int(64),false)] int(64), int(64), comparator=DefaultComparator, lo=int(64), hi=int(64))'

It does not show a failure on the use line.

My test code:

use Sort;
use Search only binarySearch;

proc binaryInsertionSort(Data: [?Dom] ?eltType, comparator:?rec=defaultComparator) {
  //use Search only binarySearch;  //DOES NOT WORK
  chpl_check_comparator(comparator, eltType);

  const low = Dom.low,
        high = Dom.high,
        stride = abs(Dom.stride);
  for i in low..high by stride {
    var ithVal = Data[i];
    var inserted=false;
    var found:bool;
    var loc:int;
    var j:int = i-1;
    (found,loc)=binarySearch(Data,ithVal,comparator=comparator,lo=low,hi=i);
    while(j>=loc)
    {
        Data[j+stride]=Data[j];
        j-=stride;
    }
    Data[j+stride]=ithVal;      
  }        
}
pragma "no doc"
/* Error message for multi-dimension arrays */
proc binaryInsertionSort(Data: [?Dom] ?eltType, comparator:?rec=defaultComparator)
  where Dom.rank != 1 {
    compilerError("insertionSort() requires 1-D array");
}

var a:[1..10] int = (1,5,3,0,9,7,6,4,12,11);
writeln(a);
binaryInsertionSort(a);
//insertionSort(a);
writeln(a);

Output:

1 5 3 0 9 7 6 4 12 11
0 1 3 4 5 6 7 9 11 12

Please let me know if the style and structure of code is fine.

Hi @xSooDx -- For the use problem, this is a known issue with using generic functions, but it is poorly documented. I've added an issue to track this here: #5833 -- thanks for reporting it. For now, I'd place the use Search inside binaryInsertionSort's body, so that we don't pollute the module-level namespace.

It looks like you're off to a great start. Do you think you could add this to Sort.chpl, and open a PR? It would be easier to give a code-review in that format. I can also link your PR from the issue description to make it clear that it's being worked on.

I have made the pull request #5840.

I am proceeding with timSort. There is an iter `_MergeIterator' in the Sort module, but it has a TODO to remove it. Should I use this to merge the runs of timSort, or make my own merge procedure?

Hi @xSooDx -- sounds good. Yeah, I would not rely on the existing _MergeIterator as we plan to remove/reimplement it.

You'll have to bare with us, as we're closing in our our 1.15 release. I plan to provide some feedback on #5840 sometime tomorrow. Thanks for taking the lead on this!

Does array slicing like var a1:[1..n] int = a2[base..base+n] also take care of copying the stride?

Also there are quiet a few extra functions I need to implement for timSort. Should I define them inside the timSort procedure itself or keep them outside? Because they have no use apart from inside timSort itself (even the merge function is specific to timsort).

I have updated pr #5840 with timSort.

I was experimenting with parallelising timSort (as part of a college project) and have gotten a significant performance improvement. Although it may no longer be 'timSort' as I have ignored the serial rule on merging runs (X>Y+Z and Y>Z) and perform merging in parallel.
Performance results:

Time taken to sort 131072 bytes (16384 int(64)s)
quickSort (seconds): 0.1807
mergeSort (seconds): 0.92939
timSort (seconds): 0.560166

Old performance results:

Time taken to sort 131072 bytes (16384 int(64)s)
quickSort (seconds): 0.182228
mergeSort (seconds): 0.571098
timSort (seconds): 1.89295

I though this might be of interest. I have uploaded the code on a branch here.

This has recently become the subject of StackOverflow post Python provides a nice interface using the axis argument. I keep quoting Python, sorry, but it's a good standard!

@mppf's thought here:

We need to get the Sort module into modules/standard or modules/internal.

  • no public bubbleSort quickSort .... these can go in a package / mason module
  • only sort()

    • it calls the right things internally

    • can sort radix sort

    • lower case data argument (Data and Dom)

    • argument stable to request a stable sort (default is false)

    • perhaps argument for reversing the order -- in radix sort, variable sized records, is running out of key making it < or > ?

  • isSorted() is there too
  • should sort have a parallel flag?

    • maybe not, if you can use serial and the module code can query it

Another thing to investigate is if we can have good enough performance with reindexing in the sort module - can we get the sorts to all work with 0-based indexing and stride 1?

Radix sort improvements:

  • use BitOps.clz instead of msbRadixSortClz.
  • add parallel count step

It'd be nice to also have something like argsort in NumPy to get the permutation without moving the data:

https://docs.scipy.org/doc/numpy/reference/generated/numpy.argsort.html

Hi, I am a new contributor. I would like to implement tim sort.

@RisingGeek - Great! See the comment history of this issue and https://github.com/chapel-lang/chapel/pull/5840 for background info / previous efforts.

@ben-albrecht, The main idea of Tim sort would be to divide the array into runs and then sort them using insertion sort and finally merge them by using a function similar to the merge function of merge sort. Am I right?

@RisingGeek - that sounds about right. I have opened https://github.com/chapel-lang/chapel/issues/15045 for further discussion on TimSort implementation details, since this is a meta-issue for improving Sort in general.

I will figure out a way

Was this page helpful?
0 / 5 - 0 ratings