(A clear and concise description of what the bug is.)
mxnet.ndarray.op.random_pdf_poisson has floating point exception when given lam is shape (0,). Please see the provided code snippet for example.
(Paste the complete error message. Please also include stack trace by setting environment variable DMLC_LOG_STACK_TRACE_DEPTH=10 before running your script.)
Floating point exception (core dumped)
To Reproduce
(If you developed your own code, please provide a short script that reproduces the error. For existing examples, please provide link.)
import mxnet
import numpy as np
lam = mxnet.nd.array(np.random.rand(0))
sample = mxnet.nd.array(np.random.rand(2))
mxnet.ndarray.op.random_pdf_poisson(sample=sample, lam=lam)
(Paste the commands you ran that produced the error.)
1.
2.
We recommend using our script for collecting the diagnositc information. Run the following command and paste the outputs below:
curl --retry 10 -s https://raw.githubusercontent.com/dmlc/gluon-nlp/master/tools/diagnose.py | python
# paste outputs here
Got 404 when trying to get the script.
Some environment information:
Here's the problem:
% DMLC_LOG_STACK_TRACE_DEPTH=150 MXNET_ENGINE_TYPE=NaiveEngine lldb python3.7 -- test_18937.py
(lldb) target create "python3.7"
Current executable set to 'python3.7' (x86_64).
(lldb) settings set -- target.run-args "test_18937.py"
(lldb) run
Process 36591 launched: '/usr/local/bin/python3.7' (x86_64)
Process 36591 stopped
* thread #2, stop reason = exec
frame #0: 0x0000000100006000 dyld`_dyld_start
dyld`_dyld_start:
-> 0x100006000 <+0>: popq %rdi
0x100006001 <+1>: pushq $0x0
0x100006003 <+3>: movq %rsp, %rbp
0x100006006 <+6>: andq $-0x10, %rsp
(lldb) cont
Process 36591 resuming
[23:22:22] ../src/engine/engine.cc:55: MXNet start using engine: NaiveEngine
[23:22:22] ../src/storage/storage.cc:198: Using Pooled (Naive) StorageManager for CPU
Process 36591 stopped
* thread #2, queue = 'com.apple.main-thread', stop reason = EXC_ARITHMETIC (code=EXC_I386_DIV, subcode=0x0)
frame #0: 0x0000000115bca540 libmxnet.dylib`mxnet::op::PdfCaller<mshadow::cpu, float, mxnet::op::PDF_Poisson<false>, 1, false>::op(inputs=0x00007ffeefbfcb50, outputs=0x00007ffeefbfcb30, s=0x00000001272d39c9) at pdf_op.h:469
466 static void op(const std::vector<TBlob>& inputs,
467 const std::vector<TBlob>& outputs,
468 mshadow::Stream<xpu> *s) {
-> 469 CHECK_EQ(inputs[0].Size()%inputs[1].Size(), 0);
470 CHECK_EQ(inputs[0].Size()%outputs[0].Size(), 0);
471 index_t num_samples(inputs[0].Size() / inputs[1].Size());
472 mxnet_op::Kernel<LaunchExWrapper<pdf>, xpu>::LaunchEx(s, outputs[0].Size(), num_samples,
The code needs to guard against zero-size array for right operand of %, and we should add a smoke test to guard against such problem in this op, similar to https://github.com/apache/incubator-mxnet/pull/18972/files.
@xidulu same question as https://github.com/apache/incubator-mxnet/issues/18936#issuecomment-678064250, since we are deprecating ndarray in favor of np/npx, do we need to register an alias of this op in np/npx? (or is it already registered)
@szha
As far as I am concerned, pdf ops are not registered under npx yet and I don't think its that necessary because:
Btw, a possible solution for this bug could be adding a zero-size check (e.g. https://github.com/apache/incubator-mxnet/blob/master/src/operator/numpy/random/np_normal_op.h#L320) before the kernel launch: https://github.com/apache/incubator-mxnet/blob/master/src/operator/random/pdf_op.h#L514