After upgrading to Mxnet 1.5.0, keras-mxnet (2.2.4.1) now causes a large number of models to fail with the error:
mxnet.base.MXNetError: Error in operator batchnorm8: [10:17:02] include/mxnet/./tuple.h:211: Check failed: i >= 0 && i < ndim(): index = 1 must be in range [0, -1)
Reproduced on Ubuntu 18.04/Amazon Linux 2 with keras-mxnet 2.2.4.1 and the mxnet-mkl 1.5.0 package (though I believe the same issue exists in the non-mkl variant).
----------Python Info----------
Version : 3.6.3
Compiler : GCC 4.8.2 20140120 (Red Hat 4.8.2-15)
Build : ('default', 'May 4 2018 04:22:28')
Arch : ('64bit', '')
------------Pip Info-----------
Version : 9.0.1
Directory : /mnt/conda/envs/emrpython3/lib/python3.6/site-packages/pip
----------MXNet Info-----------
Version : 1.5.0
Directory : /mnt/conda/envs/emrpython3/lib/python3.6/site-packages/mxnet
Commit Hash : 75a9e187d00a8b7ebc71412a02ed0e3ae489d91f
Library : ['/mnt/conda/envs/emrpython3/lib/python3.6/site-packages/mxnet/libmxnet.so']
Build features:
โ CUDA
โ CUDNN
โ NCCL
โ CUDA_RTC
โ TENSORRT
โ CPU_SSE
โ CPU_SSE2
โ CPU_SSE3
โ CPU_SSE4_1
โ CPU_SSE4_2
โ CPU_SSE4A
โ CPU_AVX
โ CPU_AVX2
โ OPENMP
โ SSE
โ F16C
โ JEMALLOC
โ BLAS_OPEN
โ BLAS_ATLAS
โ BLAS_MKL
โ BLAS_APPLE
โ LAPACK
โ MKLDNN
โ OPENCV
โ CAFFE
โ PROFILER
โ DIST_KVSTORE
โ CXX14
โ INT64_TENSOR_SIZE
โ SIGNAL_HANDLER
โ DEBUG
----------System Info----------
Platform : Linux-4.14.62-65.117.amzn1.x86_64-x86_64-with-glibc2.3.4
system : Linux
node : ip-172-21-234-255
release : 4.14.62-65.117.amzn1.x86_64
version : #1 SMP Fri Aug 10 20:03:52 UTC 2018
----------Hardware Info----------
machine : x86_64
processor : x86_64
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Thread(s) per core: 2
Core(s) per socket: 16
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 79
Model name: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
Stepping: 1
CPU MHz: 2696.687
BogoMIPS: 4600.06
Hypervisor vendor: Xen
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 46080K
NUMA node0 CPU(s): 0-31
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0020 sec, LOAD: 0.5713 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.1402 sec, LOAD: 0.0510 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.0391 sec, LOAD: 0.3577 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0179 sec, LOAD: 0.1829 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0131 sec, LOAD: 0.1619 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0016 sec, LOAD: 0.0421 sec.
Package used (Python/R/Scala/Julia):
Python
(Paste the complete error message, including stack trace.)
[2019-08-01 10:17:03,869] {docker_operator.py:219} INFO - File "<redacted>", line 130, in <module>
[2019-08-01 10:17:03,869] {docker_operator.py:219} INFO - embedding_trainable)
[2019-08-01 10:17:03,869] {docker_operator.py:219} INFO - File "<redacted>", line 54, in get_model
[2019-08-01 10:17:03,870] {docker_operator.py:219} INFO - activation='softmax')(l_drop)
[2019-08-01 10:17:03,870] {docker_operator.py:219} INFO - File "/opt/conda/envs/emrpython3/lib/python3.6/site-packages/keras/engine/base_layer.py", line 470, in __call__
[2019-08-01 10:17:03,870] {docker_operator.py:219} INFO - output = self.call(inputs, **kwargs)
[2019-08-01 10:17:03,870] {docker_operator.py:219} INFO - File "/opt/conda/envs/emrpython3/lib/python3.6/site-packages/keras/layers/core.py", line 893, in call
[2019-08-01 10:17:03,870] {docker_operator.py:219} INFO - output = K.bias_add(output, self.bias, data_format='channels_last')
[2019-08-01 10:17:03,870] {docker_operator.py:219} INFO - File "/opt/conda/envs/emrpython3/lib/python3.6/site-packages/keras/backend/mxnet_backend.py", line 96, in func_wrapper
[2019-08-01 10:17:03,870] {docker_operator.py:219} INFO - train_symbol = func(*args, **kwargs)
[2019-08-01 10:17:03,871] {docker_operator.py:219} INFO - File "/opt/conda/envs/emrpython3/lib/python3.6/site-packages/keras/backend/mxnet_backend.py", line 3986, in bias_add
[2019-08-01 10:17:03,871] {docker_operator.py:219} INFO - x_dim = ndim(x)
[2019-08-01 10:17:03,871] {docker_operator.py:219} INFO - File "/opt/conda/envs/emrpython3/lib/python3.6/site-packages/keras/backend/mxnet_backend.py", line 537, in ndim
[2019-08-01 10:17:03,871] {docker_operator.py:219} INFO - shape = x.shape
[2019-08-01 10:17:03,871] {docker_operator.py:219} INFO - File "/opt/conda/envs/emrpython3/lib/python3.6/site-packages/keras/backend/mxnet_backend.py", line 4399, in shape
[2019-08-01 10:17:03,871] {docker_operator.py:219} INFO - return self._get_shape()
[2019-08-01 10:17:03,871] {docker_operator.py:219} INFO - File "/opt/conda/envs/emrpython3/lib/python3.6/site-packages/keras/backend/mxnet_backend.py", line 4408, in _get_shape
[2019-08-01 10:17:03,872] {docker_operator.py:219} INFO - _, out_shape, _ = self.symbol.infer_shape_partial()
[2019-08-01 10:17:03,872] {docker_operator.py:219} INFO - File "/opt/conda/envs/emrpython3/lib/python3.6/site-packages/mxnet/symbol/symbol.py", line 1152, in infer_shape_partial
[2019-08-01 10:17:03,872] {docker_operator.py:219} INFO - return self._infer_shape_impl(True, *args, **kwargs)
[2019-08-01 10:17:03,872] {docker_operator.py:219} INFO - File "/opt/conda/envs/emrpython3/lib/python3.6/site-packages/mxnet/symbol/symbol.py", line 1210, in _infer_shape_impl
[2019-08-01 10:17:03,872] {docker_operator.py:219} INFO - ctypes.byref(complete)))
[2019-08-01 10:17:03,872] {docker_operator.py:219} INFO - File "/opt/conda/envs/emrpython3/lib/python3.6/site-packages/mxnet/base.py", line 253, in check_call
[2019-08-01 10:17:03,873] {docker_operator.py:219} INFO - raise MXNetError(py_str(_LIB.MXGetLastError()))
[2019-08-01 10:17:03,873] {docker_operator.py:219} INFO - mxnet.base.MXNetError: Error in operator batchnorm8: [10:17:03] include/mxnet/./tuple.h:211: Check failed: i >= 0 && i < ndim(): index = 1 must be in range [0, -1)
[2019-08-01 10:17:03,873] {docker_operator.py:219} INFO - Stack trace:
[2019-08-01 10:17:03,873] {docker_operator.py:219} INFO - [bt] (0) /opt/conda/envs/emrpython3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x25b2ab) [0x7f0fc8d9a2ab]
[2019-08-01 10:17:03,873] {docker_operator.py:219} INFO - [bt] (1) /opt/conda/envs/emrpython3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x25c308) [0x7f0fc8d9b308]
[2019-08-01 10:17:03,873] {docker_operator.py:219} INFO - [bt] (2) /opt/conda/envs/emrpython3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x7008cd) [0x7f0fc923f8cd]
[2019-08-01 10:17:03,873] {docker_operator.py:219} INFO - [bt] (3) /opt/conda/envs/emrpython3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x246fd62) [0x7f0fcafaed62]
[2019-08-01 10:17:03,874] {docker_operator.py:219} INFO - [bt] (4) /opt/conda/envs/emrpython3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x247262a) [0x7f0fcafb162a]
[2019-08-01 10:17:03,874] {docker_operator.py:219} INFO - [bt] (5) /opt/conda/envs/emrpython3/lib/python3.6/site-packages/mxnet/libmxnet.so(MXSymbolInferShapeEx+0x103e) [0x7f0fcaf226ce]
[2019-08-01 10:17:03,874] {docker_operator.py:219} INFO - [bt] (6) /opt/conda/envs/emrpython3/lib/python3.6/site-packages/mxnet/libmxnet.so(MXSymbolInferShapePartialEx+0x82) [0x7f0fcaf22dc2]
[2019-08-01 10:17:03,874] {docker_operator.py:219} INFO - [bt] (7) /opt/conda/envs/emrpython3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so(ffi_call_unix64+0x4c) [0x7f0f5ed92df8]
[2019-08-01 10:17:03,874] {docker_operator.py:219} INFO - [bt] (8) /opt/conda/envs/emrpython3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so(ffi_call+0x178) [0x7f0f5ed91e98]
from keras.models import Sequential
from keras.layers.core import Dense
from keras.optimizers import SGD
import numpy as np
num_features=108
model = Sequential()
model.add(
Dense(
units=40,
activation='sigmoid',
input_shape=(num_features,),
bias_initializer='zeros',
kernel_initializer='random_uniform',
),
)
model.add(
Dense(
units=1,
activation='sigmoid',
bias_initializer='zeros',
kernel_initializer='random_uniform',
),
)
num_rows=1001
labels = np.random.randint(low=0, high=2, size=(num_rows,), dtype=np.int32)
input_data = np.random.rand(num_rows,num_features)
model.compile(
loss='binary_crossentropy',
optimizer=SGD(lr=0.01, momentum=0.9)
)
model.fit(
input_data,
labels,
epochs=1,
batch_size=32,
verbose=0,
)
Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended labels: Bug
@iamthebot Requesting a reproducible example on the same
@mxnet-label-bot add [Bug, Pending Requester Info]
cc @roywei @karan6181
@vrakesh @sandeep-krishnamurthy will update this issue with a reproducible example tomorrow or worst case Monday. MxNet is called via a wrapper in a library that is not yet open source so I'll need to put a small snippet together that reproduces the issue without added dependencies.
Thank you @iamthebot - That would be great.
@sandeep-krishnamurthy @roywei @karan6181 Updated with minimal reproducible example.
@iamthebot Hi, I was able to reproduce the bug. The latest keras-mxnet from source is fine, but 2.4.1 is not compatible with mxnet 1.5.0. We are working on keras-mxnet 2.4.2 release but blocked by another bug detected by CI. ETA of availability is around next week. Thanks again for the issue!
@roywei I have confirmed that building keras-mxnet as of commit 0785d1d allows tests to pass. I'll keep this issue open until the 2.4.2 package is released.
@iamthebot 2.2.4.2 is available now, https://github.com/awslabs/keras-apache-mxnet/releases/tag/2.2.4.2
this should be fixed.
please reopen if this issue persists.
It seems that this issue still exists in 2.2.4.2 and MXNet 1.5.1
mxnet.base.MXNetError: Error in operator batchnorm6: [10:29:46] include/mxnet/tuple.h:211: Check failed: i >= 0 && i < ndim(): index = 1 must be in range [0, -1)
@roywei gentle ping
so, gentlemen, how to solve this issue ???? I also meet this error, anyone has good solution? today the mxnet is 1.6.0 version
i can hardly know which line of my code caused the error from the error report....
terminate called after throwing an instance of 'dmlc::Error'
what(): [10:17:39] include/mxnet/./tuple.h:220: Check failed: i >= 0 && i < ndim(): index = 1 must be in range [0, -1)
Stack trace:
[bt] (0) /home/user1/miniconda3/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x307d3b) [0x7f1147037d3b]
[bt] (1) /home/user1/miniconda3/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x30a9f8) [0x7f114703a9f8]
[bt] (2) /home/user1/miniconda3/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x8d0444) [0x7f1147600444]
[bt] (3) /home/user1/miniconda3/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x345ae5d) [0x7f114a18ae5d]
@ry @mjwall @pluskid @mrkn @pjstadig @bjoern234 @iamthebot