Incubator-mxnet: [MXNet 1.5.0.rc2] Issues with asnumpy() method

Created on 2 Jul 2019 · 16Comments · Source: apache/incubator-mxnet

Hello,

I've decided to try MXNet 1.5.0.rc2.
And I have a lot of crashes due to asnumpy() calls like the following one :

phase = nd.array(np.arctan2(imag_part.asnumpy(), real_part.asnumpy()))

phase = nd.array(np.arctan2(imag_part.asnumpy(), real_part.asnumpy()))
  File "/opt/miniconda3/envs/intelpython3/lib/python3.6/site-packages/mxnet-1.5.0-py3.6.egg/mxnet/ndarray/ndarray.py", line 1996, in asnumpy
    ctypes.c_size_t(data.size)))
  File "/opt/miniconda3/envs/intelpython3/lib/python3.6/site-packages/mxnet-1.5.0-py3.6.egg/mxnet/base.py", line 253, in check_call
    raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: std::exception

The same issue occurs if I store asnumpy() result in a temporary variable. The crash seems random.

I didn't had any problems in MXNet 1.4.1

Bug

Source

Wallart

All 16 comments

Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended labels: Bug

mxnet-label-bot on 2 Jul 2019

@Wallart Hi, thanks for this issue!
could you provide the following to help us reproduce the error?

build flags or pip versions
machine type and device context(CPU GPU)
values for imag_part and real_part?

running the following commands works fine for me on cpu with mxnet mkldnn

>>> import mxnet as mx
>>> import numpy as np
>>> x = mx.nd.array([-1, +1, +1, -1])
>>> y = mx.nd.array([-1, -1, +1, +1])
>>> phase = mx.nd.array(np.arctan2(x.asnumpy(), y.asnumpy()))
>>> phase
[-2.3561945  2.3561945  0.7853982 -0.7853982]
<NDArray 4 @cpu(0)>

roywei on 2 Jul 2019

Here are my build flags https://pastebin.com/bwRX0tXN
The only difference with my older builds (1.3 / 1.4 / etc.) is the USE_GPERFTOOLS set to false.
I am running the code on a machine equipped with :
Intel i7 7700k
16 Gb of RAM
2 GTX 1080 Ti
I am using Intel Python 3.6.8 with all MKL/MKLDNN libs. Everything is installed through miniconda, inside a Docker container.
In fact I am trying to implement a tacotron2 and the issue occurs in the dataloader, the tensors are still on cpu. Their shape is (1, 513, 533) and they are full of zeros.

I have the same results as you both on cpu and gpu(0)

Wallart on 2 Jul 2019

I just discovered that crashes are not random. When I put a breakpoint on faulty lines there is no exceptions when I resume the code execution.

Same error with that type of instructions. Line is passing when a breakpoint is used
return nd.log(nd.clip(x, a_min=clip_val, a_max=x.max().asscalar())) * c

return nd.log(nd.clip(x, a_min=clip_val, a_max=x.max().asscalar())) * c
  File "/opt/miniconda3/envs/intelpython3/lib/python3.6/site-packages/mxnet-1.5.0-py3.6.egg/mxnet/ndarray/ndarray.py", line 2014, in asscalar
    return self.asnumpy()[0]
  File "/opt/miniconda3/envs/intelpython3/lib/python3.6/site-packages/mxnet-1.5.0-py3.6.egg/mxnet/ndarray/ndarray.py", line 1996, in asnumpy
    ctypes.c_size_t(data.size)))
  File "/opt/miniconda3/envs/intelpython3/lib/python3.6/site-packages/mxnet-1.5.0-py3.6.egg/mxnet/base.py", line 253, in check_call
    raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: std::exception

Wallart on 2 Jul 2019

Hi @Wallart running the two faulty lines you provided alone does not throw any errors.
Are these lines all in your dataloader code? Could you provide a minimum reproduciable code?

Maybe the probelms is when running above code with dataloader & multiple workers? I'm still not able to reprdudce your error with your two lines

roywei on 3 Jul 2019

👍1

Hi @Wallart are you able to come up with a reproduciable example? I suspect it may only occur when using dataloader with multi-workers, but i can't reproduce the error. You can also turn on DEBUG flag to get more information on the error.

roywei on 9 Jul 2019

👍1

@roywei I put the source code on Github.
The error occurs before the dataloader. If you use the dataset to plot a sample for example, you can reproduce.
https://github.com/Wallart/gluon-tacotron2/blob/master/dataset/wav_dataset.py

Wallart on 10 Jul 2019

👍1

@mxnet-label-bot add [Bug]

frankfliu on 10 Jul 2019

@Wallart your code is running fine on my machine. I'm testing on LJ-Speech Dataset

added a few lines in main to format the text files, and everything is plotted, and asnumpy() works fine. Could you try without Docker?

if __name__ == '__main__':
    logging.basicConfig()
    logging.getLogger().setLevel(logging.INFO)

    params = {
        'max_wav_value': 32768.0,  # for 16 bits files
        'sampling_rate': 22050,
        'filter_length': 1024,
        'hop_length': 256,
        'win_length': 1024,
        'n_mel_channels': 80,
        'mel_fmin': 0.0,
        'mel_fmax': 8000.0
    }
    with open('~/Downloads/LJSpeech-1.1/metadata.csv', encoding='utf-8') as f:
        for line in f:
            record = line.split('|')
            file_name = record[0] + '.txt'
            content = record[2]
            with open('~/Downloads/LJSpeech-1.1/wavs/' + file_name, 'w', encoding='utf-8') as text_output:
                text_output.write(content)

    french = WavDataset('~/Downloads/LJSpeech-1.1/wavs', text_to_sequence, **params)
    assert type(french[0]) == tuple

Screen Shot 2019-07-10 at 2 16 55 PM

roywei on 10 Jul 2019

@roywei I reproduced the exact same environment outside the Docker. Still using intelpython through miniconda, sames build flags, and the error is still here

EDIT : I uninstalled my custom build of mxnet and did a 'pip install mxnet==1.5.0' and now it's working (I don't know what flags are used for the pip release).
The newest releases of 1.5.0 might have fixed the problem or maybe my build flags are involved.

I will build the newer 1.5.0 to see if I can reproduce.

Wallart on 24 Jul 2019

Same problem. I am building MXNet with something that makes it crash

EDIT : 'pip install mxnet-cu101-mkl==1.5.0' works. I will try to rebuild without LAPACK flag

Wallart on 24 Jul 2019

You can find the build flags used for pip packages here: https://github.com/apache/incubator-mxnet/tree/master/make/pip

roywei on 24 Jul 2019

I finally found the issue. At runtime I was linking the outdated MKLDNN (v0.14) provided by Anaconda whereas MXNet is probably using v1.0+

Wallart on 6 Aug 2019

🎉1

@Wallart, the master branch uses v0.20 of MKL-DNN while the 1.5.0 release uses v0.19. Please use the self-contained MKL-DNN in MXNet. The anaconda MKL-DNN distribution is not actively maintained.

TaoLv on 6 Aug 2019

This issue came back if using mxnet 1.6 and softmax outputs

test p shape: (145, 49)
Traceback (most recent call last):
  File ".\data_process.py", line 69, in <module>
    print(p.asnumpy())
  File "C:\vnstudio\lib\site-packages\mxnet\ndarray\ndarray.py", line 2535, in asnumpy
    ctypes.c_size_t(data.size)))
  File "C:\vnstudio\lib\site-packages\mxnet\base.py", line 255, in check_call
    raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [10:54:17] c:\jenkins\workspace\mxnet-tag\mxnet\3rdparty\mshadow\mshadow\./extension/reshape.h:32: Check failed: ishape.Size() == shape.Size() (870 vs. 7105) : reshape size must match

aGiant on 24 May 2020

Hi @aGiant , I think your issue is a different problem. Looking at the stack trace it says (870 vs. 7105) : reshape size must match, could you check your input and how you are reshaping it?
If you believe it's a bug, please open a new issue.
Thanks

roywei on 25 May 2020

Was this page helpful?

0 / 5 - 0 ratings