Incubator-mxnet: certain sample_ ops hanging with random_normal input

Created on 23 Jan 2020 · 10Comments · Source: apache/incubator-mxnet

Description

While testing sample_ operators for opperf, I came across an issue where sample_gamma, sample_generalized_negative_binomial, and sample_negative_binomial would hang when provided with mx.nd.random_normal as input args. For example, the following command hangs:

mx.nd.sample_gamma(alpha=mx.nd.random_normal(shape=(3,)),beta=mx.nd.random_normal(shape=(3,)))

Interestingly, when the mx.nd.random_normal call is made and the output value is assigned to a variable before that variable is used in the operator call, the call to the operator does not hang.

Also, this bug appears to be specific to random_normal. random_uniform works properly in the same context.

Error Message

No error, but hangs and the process maximizes its computational resources.

Steps to reproduce

(Paste the commands you ran that produced the error.)

python3 (open a Python shell)
import mxnet as mx
mx.nd.sample_gamma(alpha=mx.nd.random_normal(shape=(3,)),beta=mx.nd.random_normal(shape=(3,)))

Sometimes, it takes multiple calls of Step 3 to trigger the bug. Just run the command from Step 3 repeatedly until the bug appears (shouldn't take more than 2-3 runs).

Environment

I found this bug on Mac and Ubuntu builds of MXNet, with large tensor disabled and enabled (respectively).

Bug Operator

Source

connorgoggins

👍1

All 10 comments

The hang you describe happened in the rejection sampling implementation of sample_gamma, which was basically an infinite while loop. I believe this hang was triggered by invalid parameters of the sampling method, in which case there would never be accepted samples.

In your example, the alpha parameter, if sampled from a standard normal, would be a negative number by chance, which could cause problems. However, if you generate alpha from U(0,1), everything would go as it should be.

Appendix:
Part of the sample_gamma's docstring

alpha : float or NDArray, optional
The shape of the gamma distribution. Should be *greater than zero*.

xidulu on 23 Jan 2020

@xidulu good point
@connorgoggins would be great to paste the outputs when u write command so as to complement your sentences.

Also, since we uncovered this use-case. Would be a great idea to improve the "Error handling" or "error messaging".

ChaiBapchya on 24 Jan 2020

@mxnet-label-bot add [good first issue]

ChaiBapchya on 24 Jan 2020

@xidulu it's strange that this restriction on alpha isn't specified anywhere in the docs [1, 2, 3].

These three operators need better documentation (to indicate to users that alpha must be in U(0,1)) and better error handling - it should fail gracefully if invalid input is provided to alpha instead of entering an infinite while loop.

connorgoggins on 24 Jan 2020

@connorgoggins
You could find the constraint in the docstring of python frontend.
https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/ndarray/random.py#L340

I don't know why the website was rendered from the C++ doc instead of the python doc. 🤔

to indicate to users that alpha must be in U(0,1)

Bigger than zero actually, U(0,1) is just an example.

Currently, none of the sampling methods in mx.random performs input validity check when then input is a tensor. However, you could take a look at the DeepNumpy interface, which contains an entirely rewritten random module: mx.np.random. This module would raise corresponding exception when the input parameters break the restriction.

xidulu on 25 Jan 2020

👍1

@xidulu Good point. But since website is rendering C++ I'll update that.

Moreover, I took a brief look at np_gamma_op.* files and didn't find value check.
CheckSuccessKernel is checking for values in output right?
Where are the params checked for value?

ChaiBapchya on 25 Jan 2020

@ChaiBapchya

Oh, you are right, I forgot to implement the check_legal_kernel like https://github.com/apache/incubator-mxnet/blob/master/src/operator/numpy/random/np_normal_op.h#L131 for gamma.

I'll try to add this check.

xidulu on 25 Jan 2020

I can help with that if you want

On Fri, Jan 24, 2020, 8:31 PM Xi Wang notifications@github.com wrote:

@ChaiBapchya https://github.com/ChaiBapchya

Oh, you are right, I forgot to implement the check_legal_kernel like
https://github.com/apache/incubator-mxnet/blob/master/src/operator/numpy/random/np_normal_op.h#L131
for gamma.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/apache/incubator-mxnet/issues/17412?email_source=notifications&email_token=ACT3X66AYUIVUIHQZLMQPY3Q7O6DFA5CNFSM4KKOGD4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJ4VCDA#issuecomment-578375948,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ACT3X625N3P2POZBXFPW2DLQ7O6DFANCNFSM4KKOGD4A
.