Chapel: Build a library of iterator tools

Created on 21 Mar 2019  路  41Comments  路  Source: chapel-lang/chapel

Libraries / Modules Feature Request

Most helpful comment

@akshansh2000 You may want to make this into its own Pull Request. (also side note: If you add write the code block as ```chpl you get syntax highlighting)

iter repeat (arg, times = 0) {
    if times == 0 then
        for count in 1.. do yield arg;
    else
        for count in 1..#times do yield arg;
}

You also will want a parallel iterator. I'll do this one for you so you get the gist of how to do so in Chapel...

iter repeat (arg, times = 0, param tag : iterKind) where tag == iterKind.standalone {
    if times == 0 then
        forall count in 1.. do yield arg;
    else
        forall count in 1..#times do yield arg;
}

You'll also definitely want a leader-follower version so you can do something like this...

forall (ix, a) in zip(repeat(fixedIdx), arr) {
   a = ix;
}

In the above example, you zip over an infinite stream repeat(fixedIdx) and some finite container.

All 41 comments

This is up for grabs as a GSoC 2019 project. Some suggestions to GSoC applicants:

  • Determining the subset of iterators to implement and which ones are parellizable/distributable is left as an exercise for the applicants, and can be included as part of the application project plan / timeline.
  • Implementing one of the iterator functions, with tests, documentation, and
    performance testing in advance would be helpful in several ways:

    • It will give you an idea of how long this process takes and should give

      you a more accurate estimate for your timeline.

    • It will also show that are ramped up on interacting with Chapel's

      testing system (start_test) and chpldoc.

  • I would expect that handling some parallel and/or distributed iterators would be part of the GSoC timeline.

    • This may limit how many iterators get implemented in the summer, and that's fine. It's OK to leave some (or many) iterators out of the timeline or push them into stretch goals section.

Though not explicitly listed, this may be considered part of https://github.com/chapel-lang/chapel/issues/6329

I want to take this up as a GSoC 2019 project. I was just wondering if beginning with implementing the repeat and product itertools of python would be a good way to start. As far as I know, these tools are not available in Chapel as of now. Please let me know if I missed something, and whether to go ahead with this.

For example, the basic implementation of product for integer ranges could be given as:

iter product(x: range, y: range) {
    for i in x do for j in y do yield (i, j);
}

writeln(product(1..5, 3..5));

// (1, 3) (1, 4) (1, 5) (2, 3) (2, 4) (2, 5) (3, 3) (3, 4) (3, 5) (4, 3) (4, 4) (4, 5) (5, 3) (5, 4) (5, 5)

So what I need to do is write the tests and documentation, work on determining whether it can be made parallelizable or not, and make this process fast by reducing communication overheads, right?

Please let me know if I'm missing something.
Thank You!

Hi @akshansh2000, repeat and product seem like good starts. I expect repeat to be relatively simple, but product will be a little more challenging.

Regarding your product implementation, there are a few design goals to consider:

  • We'd want to support variadic arguments
  • We'd want our arguments to be generic so that we don't have to write an overload for every iterable type
  • We'd want to support a repeat argument similar to itertools.product
  • Note that we won't be able to support heterogenous types for our arguments, due to being unable to iterate over or index into a tuple of heterogenous types in chapel.

Beyond that, documentation, evaluating performance, and determining if the implementation should/can be parallelized would be your next steps (in any order you prefer).

Thank you, @ben-albrecht. I'm working on it. I'll keep you updated.

The basic implementation for the repeat tool is given:

iter repeat (arg, times = 0) {
    if times == 0 then
        for count in 1.. do yield arg;
    else
        for count in 1..#times do yield arg;
}

// For integer/real/complex etc..

writeln(repeat(10, 15));

// For arrays/strings

writeln(repeat([1, 4, 6, 7], 4));
writeln(repeat("Testing", 5));

// For dictionaries

var dictDomain: domain(string);
var dict: [dictDomain] int;

dict["one"] = 1;
dict["two"] = 2;
dict["three"] = 3;

writeln(repeat((dictDomain, dict), 3));

// For infinite repetition

write(repeat("infinite"));

Output:

10 10 10 10 10 10 10 10 10 10 10 10 10 10 10
1 4 6 7 1 4 6 7 1 4 6 7 1 4 6 7
Testing Testing Testing Testing Testing
({two, three, one}, 2 3 1) ({two, three, one}, 2 3 1) ({two, three, one}, 2 3 1)
infinite infinite infinite infinite .....

Please let me know what all I need to change in this. This sure could be parallelized (only in the case of a bounded range), and I'll work on that next. The documentation and performace evaluation is further in the queue.

@akshansh2000 You may want to make this into its own Pull Request. (also side note: If you add write the code block as ```chpl you get syntax highlighting)

iter repeat (arg, times = 0) {
    if times == 0 then
        for count in 1.. do yield arg;
    else
        for count in 1..#times do yield arg;
}

You also will want a parallel iterator. I'll do this one for you so you get the gist of how to do so in Chapel...

iter repeat (arg, times = 0, param tag : iterKind) where tag == iterKind.standalone {
    if times == 0 then
        forall count in 1.. do yield arg;
    else
        forall count in 1..#times do yield arg;
}

You'll also definitely want a leader-follower version so you can do something like this...

forall (ix, a) in zip(repeat(fixedIdx), arr) {
   a = ix;
}

In the above example, you zip over an infinite stream repeat(fixedIdx) and some finite container.

Thanks for the clarification on the parallel iterator, @LouisJenkinsCS! I'll work on the leader-follower part and share my progress.

(Thanks for the syntax highlighting part as well hahaha)

Also, as for the PR, I'm supposed to make a Mason package and make a PR to the mason-registry repository, right?

Hm... I'm not sure, honestly. Previously students just submit a PR to the repository and it gets stuck somewhere (likely labeled as some kind of prototype/test), but I suppose it makes sense if @ben-albrecht would want you to implement it as a Mason package.

Alright, I'll wait for @ben-albrecht sir to reply, and work on my code until then.

Thank you!

Mason package vs package module is still an open question at this point. Though, I'm leaning towards a package module right now so that we can utilize the nightly performance testing. Once mason packages have this functionality, we could migrate this to a mason package, like we plan to do for other existing package modules (https://github.com/chapel-lang/chapel/issues/10713).

The repeat iterator is almost complete, I believe.
Please take a look at my implementation and let me know if I need to change something.

Also, I'm a bit confused here. How do I implement infinite repetition for a parallel loop?
In the case of a forall loop, it needs to know to the total number of tasks to divide the repeat task into different tasks, while in the case of a coforall loop, the program terminates at runtime due to excessive memory usage. Please help me out with this, I'm a bit lost.

Here's the code; I've commented the non working code below.

// Serial Iterator

iter repeat (arg, times = 0) {
    if times == 0 then
        for count in 1.. do yield arg;
    else
        for count in 1..#times do yield arg;
}

// Standalone Parallel Iterator

iter repeat (param tag : iterKind, arg, times = 0) where tag == iterKind.standalone {
    if times == 0 then
        coforall count in 1.. do yield arg; // parallel infinite loop creates problem
    else
        coforall count in 1..#times do yield arg;
}

// Procedure to compute chunks which are to be iterated through

proc computeChunk(r : range, myChunk, numChunks) where r.stridable == false {
    const numElems = r.length;
    const elemsPerChunk = numElems / numChunks;
    const mylow = r.low + (elemsPerChunk * myChunk);

    if (myChunk != numChunks - 1) then
        return mylow..#elemsPerChunk;
    else
        return mylow..r.high;
}

const numTasks = here.maxTaskPar; // max number of tasks supported by locale

// Leader

iter repeat (param tag : iterKind, arg, times = 0) where tag == iterKind.leader {
    if times == 0 then
        coforall task_id in 0.. do yield(0..1,); // parallel infinite loop creates problem
    else
        coforall task_id in 0..#numTasks {
            const working_iters = computeChunk(0..#times, task_id, numTasks);
            yield(working_iters,);
        }
}

// Follower

iter repeat (param tag : iterKind, arg, times = 0, followThis)
        where tag == iterKind.follower && followThis.size == 1 {
    const working_iters = followThis(1);

    for idx in working_iters do yield arg;
}

for element in repeat(1, 10) do write(element, ' '); // works - 1 1 1 1 1 1 1 1 1 1
writeln();

forall element in repeat('abc', 4) do write(element, ' '); // works - abc abc abc abc
writeln();

forall element in repeat(123) do write(element, ' '); // doesn't work, parallel infinite loop
writeln();

var arr: [1..5] int;

forall (idx, element) in zip(repeat(3, 5), arr) do element = idx; // works
writeln(arr); // - 3 3 3 3 3

forall (idx, element) in zip(repeat(2), arr) do element = idx; // doesn't work, parallel infinite loop
writeln(arr);

Thank you!

Also, I'm a bit confused here. How do I implement infinite repetition for a parallel loop?

This is not supported today in Chapel. As a result, we will not be able to pursue parallel implementations of any infinite iterators for this project.

Alright, so should I put the rest of the code into a module and create a PR?

Oh yeah, I forgot that you can't break from a forall loop. I'm wondering if its wise to return a 'killswitch' which the leader and follower will check to see if it should continue.

Note that if I make the array the 'leader', it works fine. It has to do with the leader not knowing when to stop, which is definitely a problem. TIO

Was this page helpful?
0 / 5 - 0 ratings