Chapel: permutation method in Random module overwrites the input array with permuted indices of the domain

Created on 16 Sep 2020 · 21Comments · Source: chapel-lang/chapel

Summary of Problem

Invoking the permutation method on an array overwrites the array with permuted indices of its domain. Shouldn't the method return a new permuted version of the input array? Would it be a good idea to modify the current permutation API to take an extra parameter(like inPlace=true) to determine if the input array is to be permuted inplace (like shuffle)?

Steps to Reproduce

Compile and run the below source code
Source Code:

permutations.chpl use Random; var randStream = new RandomStream(real); var randomArr : [1..5] real; randStream.fillRandom(randomArr); writeln("Array before permuting is ",randomArr); permutation(randomArr,seed=10); writeln("Array after permuting is ",randomArr);

Compile command:
chpl permutations.chpl --fast

Execution command:
./permutations.chpl

Output
Array before permuting is 0.16559 0.754886 0.615159 0.171141 0.539561
Array after permuting is 1.0 5.0 4.0 3.0 2.0

Configuration Information

chpl --version: chpl version 1.22.1
gcc --version: Apple LLVM version 10.0.1 (clang-1001.0.46.4)

Libraries / Modules Design user issue

Source

CaptainSharf

Most helpful comment

However in the case of numpy, the array is shuffled across the first dimension.

I would say that this behavior isn't appropriate for Chapel or languages with true 2D arrays, and that it makes more sense for languages in which a 2D array is really a 1D array of 1D arrays (which, I would argue, the formatting of your Python example suggests it has).

And I agree with the approach of thinking of the domain's indices as being in a 1D / ordinal space, permuting there, then transforming back to the multidimensional space. Some existing routines (like, I think, indexToOrder and orderToIndex, though I may not have those names right, and don't have time to check right now) should help with this.

bradcray on 22 Sep 2020

👍2

All 21 comments

I would tend to agree that, to me, permutation(A); intuitively seems like it would permute the elements of A rather than overwriting the existing elements with a permutation of A's indices. I might expect permutation(Dom); to return an array storing a permutation of D's indices or permutation(Dom, Arr); to store a permutation of Dom into Arr. But I also don't recall whether there was any rationale or precedent here that I'm currently forgetting. I believe these were added in https://github.com/chapel-lang/chapel/pull/2912 by @mppf, so am curious what Michael has to say.

bradcray on 16 Sep 2020

👍1

We already have shuffle(A) to rearrange the elements of A. https://chapel-lang.org/docs/modules/standard/Random.html#Random.shuffle

I think it would be fine to change permutation to permutation(Dom, Arr). One thing to keep in mind is that the domain being permuted isn't necessarily the same as the destination domain for the array. E.g. you might want a permutation of 100..<200 stored in an array that has indices 0..100.

mppf on 16 Sep 2020

Hi, @mppf , I have raisedthis PR #16417 for this issue, can you please review it when you're free?

CaptainSharf on 18 Sep 2020

Hi @bradcray,
Thanks for your feedback for my PR #16417. Here are my thoughts regarding the questions you raised

should we preserve the original implementation of permutation() and add this new version as an overload to avoid breaking existing code?
This is a good idea. I think that newer implementations of this method should be overloaded
what are the advantages of the new version taking in an array as input rather than returning a newly-created array declared over the domain argument? Or should we support both versions as overloads?
I favour the version of the method that takes a domain as input and returns a random permutation over the indices of the domain as an array. This version is consistent with the behaviour of the permutations method in other frameworks like numpy. Based on performance and other considerations do you think one version is better than the other?
in the version that takes a domain and an array, should the two be required to have the same size? Or perhaps the array should be strictly smaller than the domain?
In my current implementation, I require the size of the array to be at least as big as the size of the domain. The extra elements in the array are set to zero.

CaptainSharf on 20 Sep 2020

Here, are some of the methods for generating random permutations in a few other frameworks that I could find:

Numpy

np.random.permutation(arr) returns a random permutation of the given array without modifying arr. Support for numpy arrays with rank>1.
np.random.permutation(n) returns an array of a random permutation of numbers in [1,n]

Matlab

p = randperm(n) returns an array containing a random permutation of the integers from 1 to n without repeating elements.
p = randperm(n,k) returns an array containing k unique integers selected randomly from 1 to n.
p = randperm(s,___) generates a random permutation of integers from specified random number stream s.

Go
In rand.Perm module func Perm(n int) []int returns an array of random permutation of all numbers in [0,n)

CaptainSharf on 20 Sep 2020

👍2

@CaptainSharf - Thanks for bringing this up and providing a review of some other languages.

I think just changing the behavior of our permutation to always return a new array makes the most sense. permutation could take an array to do an out-of-place shuffle of the elements, or take a domain to generate a permutation of the domain indices. I don't think in-place index shuffling (the current permutation behavior) is that important to continue supporting.

FWIW, this would follow the convention from numpy, where permutation is the out-of-place shuffle, and shuffle is the in-place shuffle.

ben-albrecht on 21 Sep 2020

👍2

If we were to pursue an inPlace argument that was suggested in the OP, we could potentially merge both shuffle and permutation into a single function, e.g.

/* Return an array containing the shuffled elements of ``arr``. \
   If ``inPlace == true``, shuffle the elements in place and return nothing. */
proc shuffle(arr: [], param inPlace=false, ...) { }

/* Return an array containing the shuffled indices of ``dom``. */
proc shuffle(dom: domain, ...) { }

@mppf has already expressed opposition to this proposal offline on the basis of not liking arguments impacting the function signature. It's probably not worth discussing inPlace arguments further unless it resonates with others.

ben-albrecht on 21 Sep 2020

👍1

@ben-albrecht , Thanks for the feedback!!
Just for the sake of summarization, we think it would be a good idea to replace the existing implementation of permutations with two overloaded methods

proc permutation(arr:[]) that returns a new permuted version of the input array
proc permutation(dom:domain) that returns a new array with permuted indices of the domain?
Am I correct in assuming this?

CaptainSharf on 21 Sep 2020

👍1

@CaptainSharf - Yes, that is correct.

Another design element: How can we properly deprecate the existing permutation behavior? This is challenging because we are only changing the return type (from nothing to array), so it will be hard to warn only users using it incorrectly.

Whatever we decide on, we ought to update the deprecation guidelines to cover this special case for future deprecations.

ben-albrecht on 21 Sep 2020

@ben-albrecht , yes it seems to me we won't be able to deprecate the existing behaviour easily when an array is passed. There would not be an issue in introducing an overloaded version of permutations which takes domain as argument. Since shuffle randomly permutes the array, one can still accomplish the intended behaviour we're looking for permute by saving the state of the array before shuffling and restoring it later.

CaptainSharf on 21 Sep 2020

Right, for the deprecation in this case, we can just provide proc permutation(dom:domain) for a release and add proc permutation(arr:[]) in a later release.

mppf on 21 Sep 2020

👍1

To deal with the deprecation issue, I'd be inclined to name this routine permute() rather than permutation() just because permute({1..n}) or permute(A) reads much more cleanly than permutation({1..n}) or permutation(A) to me (i.e., they seem "more like an English sentence"). Given that NumPy, Matlab, and Go don't share a name for this capability, it doesn't feel like we'd be breaking with a deeply-engrained traditional routine name (and if the documentation used the phrase "permutation" someone would probably still find the routine when searching for it. I'd still deprecate permutation(), but this would permit us to support permute(A) with the new meaning immediately rather than having to delay.

Reading some of the examples and notes made me wonder whether these routines can/do work with multidimensional domains / arrays (and whether they do their permutations in a dimension-independent way, or whether a permutation would permute rows and columns without completely scrambling everything; for arrays, I'd expect the former, but would want to know what other multidimensional permutations do).

The extra elements in the array are set to zero.

This seemed surprising to me. Zero might be a valid index of a given array (e.g., permute({0..#n});), so filling entries with zero doesn't seem well-motivated and could be confusing. Of course, with the new definition, I think there's no more need for sizes to match, so this concern goes away.

Based on performance and other considerations do you think one version is better than the other?

If the user happened to have an unused array that was the right size and distribution, passing in the array would save some time, but I'm not sure that use case is common enough to warrant complicating the interface at this point. We could always add an overload that took a domain and array later on if such cases arose in practice and were causing performance issues. This allows us to kick concerns like size mismatches, domain mismatches, different distributions down the road as well.

bradcray on 21 Sep 2020

Hi, @bradcray,
The permutations() routine does not work for multi dimensional arrays in Matlab and Go. However in the case of numpy, the array is shuffled across the first dimension. For ex. in numpy

>>> x = np.arange(9).reshape(3,3)
>>> x
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])
>>> y = np.random.permutation(x)
>>> y
array([[6, 7, 8],
       [0, 1, 2],
       [3, 4, 5]])

CaptainSharf on 22 Sep 2020

However given that, I think it is possible to shuffle the contents of the array independent of the dimension. We could use the Fischer-Yates algorithm over the multi dimensional array by mapping an N-D array index to a 1-D index.

/* shuffles the array independent of the dimension */
proc shuffleDeep(arr:[?D]){
    for i in 0..arr.size by -1{
      var flatRandIdx = randlc_bounded(D.idxType, PCGRandomStreamPrivate_rngs,seed, PCGRandomStreamPrivate_count,0, i);
      var randIdx = getNdIndices(arr,flatRandIdx);
      var currIdx = getNdIndices(arr,i);
      arr[randIdx] <=> arr[currIdx];
    }
}

/* Transforms a flat index to an array index */
proc getNdIndices(arr:[], in flatInd){
    var idx : [0..arr.domain.rank-1] int = -1;
    var div = arr.size;
    for i in 0..arr.domain.rank-1 by -1{
        div/=arr.domain.dim(i).size;

        const lo = arr.domain.dim(i).low;
        const stride = abs(arr.domain.dim(i).stride);
        const zeroInd = flatInd/div;
        const currInd = lo+(zeroInd*stride);

        idx[i] = currInd;
        flatInd = flatInd%div;
    }
    return idx;
}

CaptainSharf on 22 Sep 2020

However in the case of numpy, the array is shuffled across the first dimension.

bradcray on 22 Sep 2020

👍2

To deal with the deprecation issue, I'd be inclined to name this routine permute() rather than permutation() just because permute({1..n}) or permute(A) reads much more cleanly than permutation({1..n}) or permutation(A) to me (i.e., they seem "more like an English sentence"). Given that NumPy, Matlab, and Go don't share a name for this capability, it doesn't feel like we'd be breaking with a deeply-engrained traditional routine name (and if the documentation used the phrase "permutation" someone would probably still find the routine when searching for it. I'd still deprecate permutation(), but this would permit us to support permute(A) with the new meaning immediately rather than having to delay.

I'd be OK with permute() as the new out-of-place shuffle function. The switch from noun to verb raises broader naming convention questions for me, which I've brought up over in https://github.com/chapel-lang/chapel/issues/6698#issuecomment-696781441.

@bradcray - I agree with your take on multi-dimensional shuffling. Further discussion on this might benefit from a separate issue so that we can keep this issue/task scoped to deprecating permutation() and introducing permute() for 1D arrays initially.

@CaptainSharf - FYI you can add start your code blocks with ```chpl to enable syntax highlighting.

ben-albrecht on 22 Sep 2020

👍1

Hi,
The methods orderToIndex and indexOrder seem to be defined for a range rather than a domain. Are there built-in methods that define a mapping from a domain index to an integer and an inverse from an integer to a domain index? Do similar built-in methods exist for associative domains?

CaptainSharf on 23 Sep 2020

indexOrder seems to be defined on rectangular domains at least, though it is not documented: https://github.com/chapel-lang/chapel/issues/16400#issuecomment-697393465

I thought we had the reverse operation as well, but am not finding it with a quick look... (I think we should).

As far as I know, these are not supported on associative domains at present.

bradcray on 23 Sep 2020

I found the method indexOrder defined in the documentation for domains. There's a method dsiIndexOrder in the DefaultRectangular.chpl module, that computes the order of an index. I'll keep searching for orderToIndex... I need orderToIndex for implementing the shuffle

CaptainSharf on 23 Sep 2020

Hi @ben-albrecht @bradcray ,
This issue is dependent on #14885. Would it be a good workaround if we had a custom implementation for orderToIndex in the Random module which is then used by the methods shuffle and permute ?

CaptainSharf on 1 Oct 2020

If you have working copies of those routines, why not add them to the ChapelArray.chpl module to resolve #14885 rather than as a Random-local workaround?

bradcray on 1 Oct 2020

Was this page helpful?

0 / 5 - 0 ratings