While testing sample_ operators for opperf, I came across an issue where sample_gamma, sample_generalized_negative_binomial, and sample_negative_binomial would hang when provided with mx.nd.random_normal as input args. For example, the following command hangs:
mx.nd.sample_gamma(alpha=mx.nd.random_normal(shape=(3,)),beta=mx.nd.random_normal(shape=(3,)))
Interestingly, when the mx.nd.random_normal call is made and the output value is assigned to a variable before that variable is used in the operator call, the call to the operator does not hang.
Also, this bug appears to be specific to random_normal. random_uniform works properly in the same context.
No error, but hangs and the process maximizes its computational resources.
(Paste the commands you ran that produced the error.)
python3 (open a Python shell)import mxnet as mxmx.nd.sample_gamma(alpha=mx.nd.random_normal(shape=(3,)),beta=mx.nd.random_normal(shape=(3,)))Sometimes, it takes multiple calls of Step 3 to trigger the bug. Just run the command from Step 3 repeatedly until the bug appears (shouldn't take more than 2-3 runs).
I found this bug on Mac and Ubuntu builds of MXNet, with large tensor disabled and enabled (respectively).
The hang you describe happened in the rejection sampling implementation of sample_gamma, which was basically an infinite while loop. I believe this hang was triggered by invalid parameters of the sampling method, in which case there would never be accepted samples.
In your example, the alpha parameter, if sampled from a standard normal, would be a negative number by chance, which could cause problems. However, if you generate alpha from U(0,1), everything would go as it should be.
Appendix:
Part of the sample_gamma's docstring
alpha : float or NDArray, optional
The shape of the gamma distribution. Should be *greater than zero*.
@xidulu good point
@connorgoggins would be great to paste the outputs when u write command so as to complement your sentences.
Also, since we uncovered this use-case. Would be a great idea to improve the "Error handling" or "error messaging".
@mxnet-label-bot add [good first issue]
@xidulu it's strange that this restriction on alpha isn't specified anywhere in the docs [1, 2, 3].
These three operators need better documentation (to indicate to users that alpha must be in U(0,1)) and better error handling - it should fail gracefully if invalid input is provided to alpha instead of entering an infinite while loop.
@connorgoggins
You could find the constraint in the docstring of python frontend.
https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/ndarray/random.py#L340
I don't know why the website was rendered from the C++ doc instead of the python doc. 🤔
to indicate to users that alpha must be in U(0,1)
Bigger than zero actually, U(0,1) is just an example.
Currently, none of the sampling methods in mx.random performs input validity check when then input is a tensor. However, you could take a look at the DeepNumpy interface, which contains an entirely rewritten random module: mx.np.random. This module would raise corresponding exception when the input parameters break the restriction.
@xidulu Good point. But since website is rendering C++ I'll update that.
Moreover, I took a brief look at np_gamma_op.* files and didn't find value check.
CheckSuccessKernel is checking for values in output right?
Where are the params checked for value?
@ChaiBapchya
Oh, you are right, I forgot to implement the check_legal_kernel like https://github.com/apache/incubator-mxnet/blob/master/src/operator/numpy/random/np_normal_op.h#L131 for gamma.
I'll try to add this check.
I can help with that if you want
On Fri, Jan 24, 2020, 8:31 PM Xi Wang notifications@github.com wrote:
@ChaiBapchya https://github.com/ChaiBapchya
Oh, you are right, I forgot to implement the check_legal_kernel like
https://github.com/apache/incubator-mxnet/blob/master/src/operator/numpy/random/np_normal_op.h#L131
for gamma.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/apache/incubator-mxnet/issues/17412?email_source=notifications&email_token=ACT3X66AYUIVUIHQZLMQPY3Q7O6DFA5CNFSM4KKOGD4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJ4VCDA#issuecomment-578375948,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ACT3X625N3P2POZBXFPW2DLQ7O6DFANCNFSM4KKOGD4A
.
@ChaiBapchya
Thx a lot! That would be great :)
this is probably not a good first issue without pointers to where the related source code is for addressing the problem.