The unravel_index op seems to no longer correctly work with 'magic' shape values, such as '-1's. The following example still works with mxnet 1.3.1, but does not on latest master (it returns all zeros in the result without throwing an error) or 1.4.0.
We have a use case for this in Sockeye.
macOs
MXNet commit hash:
pip-installed:
mxnet 1.5.0b20190111
Input data taken from Sockeye unit tests.
x = mx.nd.array([335, 620, 593, 219, 36], dtype='int32')
mx.nd.unravel_index(x, shape=(-1, 200))
With mxnet==1.5.0b20190111, the result is incorrect:
[[ 0 0 0 0 0]
[135 20 193 19 36]]
<NDArray 2x5 @cpu(0)>
With mxnet==1.3.1, the result is correct:
[[ 1 3 2 1 0]
[135 20 193 19 36]]
<NDArray 2x5 @cpu(0)>
However, if the shape parameter is fully specified (shape=(5,200)), mxnet==1.5.0b20190111 returns the correct values.
Thank you for submitting the issue! I'm labeling it so the MXNet community members can help resolve it. @mxnet-label-bot add [Python, Bug, NDArray]
Could someone test this also with the 1.4 release candidate? If this is present there as well, I'd appreciate a last-minute fix very much! :)
Hoping to get some activity on this.
I bisected the nightly builds to figure out, when this bug was introduced.
mxnet==1.3.1b20181005 is finemxnet==1.3.1b20181010 contains the bugThere is no code change in the ravel.* files or the unravel_index op itself since its addition, so I would guess that some other type of change caused this unwanted regression; maybe unravel_index uses some outdated interface?
The only commits where I could guess that they could have some effect on this:
Unfortunately I don't have a way to bisect commits directly through source compilation right now.
I would appreciate some support on this. Thanks!
I can confirm that the problem exists in 1.4.0
Pretty easy to diagnose.
The type for mshadow::index_t got changed from "unsigned" to "int64_t" between both versions. This type is used to encode dimensions of shapes internally.
In 1.3.1, a "-1" therefore was interpreted as unsigned(-1) which is basically max_int. In 1.4.0 and later it is interpreted as "-1" and that changed the behaviour of the operator.
It is debatable whether we ever wanted to explicitly allow the use of "-1" in the shape argument of the operator. At least I as the original author had this not in mind and also the documentation does not say anything about magic numbers. In fact, the only case where the operator did work (and could ever work) with magic numbers is when "-1" is specified as first coordinate. Nothing else is possible.
Two solutions:
struct unravel_index {
template<typename DType>
MSHADOW_XINLINE static void Map(int i, index_t N, index_t ndim, index_t *shape,
DType *unravelled, DType *ravelled) {
index_t idx(ravelled[i]);
#pragma unroll
for (int j = ndim; j--; ) {
index_t tmp = idx / shape[j];
unravelled[i+j*N] = idx - tmp*shape[j];
idx = tmp;
}
}
};
Oh thank you so much! This is quite hilarious - we relied on exploiting this undocumented behavior without knowing :)
For Sockeye this means that we can update to 1.4.0 immediately and make this exploit of range overflow explicit by using shape=(sys.maxsize, y) in our code instead of shape=(-1, y). We rely on this functionality because we cannot infer the first shape value (batch size) in a HybridBlock in our case. As such, I would vote for going with option 1, explicitly supporting it in the future.
PR #14356 will fix this.
Most helpful comment
PR #14356 will fix this.