We are working on extending MXNet support to IBM Z (s390x architecture) machines. We were able to successfully install MXNet on s390x but we encountered the following error while trying to run it.
Traceback (most recent call last):
File "scr.py", line 76, in <module>
finetune_net = resnet50_v2(pretrained=True, ctx=ctx)
File "/root/mxnet-1.5.0/mxnet/python/mxnet/gluon/model_zoo/vision/resnet.py", line 512, in resnet50_v2
return get_resnet(2, 50, **kwargs)
File "/root/mxnet-1.5.0/mxnet/python/mxnet/gluon/model_zoo/vision/resnet.py", line 391, in get_resnet
root=root), ctx=ctx)
File "/root/mxnet-1.5.0/mxnet/python/mxnet/gluon/block.py", line 384, in load_parameters
loaded = ndarray.load(filename)
File "/root/mxnet-1.5.0/mxnet/python/mxnet/ndarray/utils.py", line 175, in load
ctypes.byref(names)))
File "/root/mxnet-1.5.0/mxnet/python/mxnet/base.py", line 253, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [19:50:04] include/mxnet/./tuple.h:354: Check failed: ndim >= -1 (-906324999 vs. -1) : ndim cannot be less than -1, received -906324999
Stack trace:
[bt] (0) /opt/mxnet/lib/libmxnet.so(mxnet::Tuple<long>::SetDim(int)+0x670) [0x3ff841f6888]
[bt] (1) /opt/mxnet/lib/libmxnet.so(mxnet::TShape::TShape(int, long)+0x2a) [0x3ff841f6d12]
[bt] (2) /opt/mxnet/lib/libmxnet.so(mxnet::LegacyTShapeLoad(dmlc::Stream*, mxnet::TShape*, unsigned int)+0x4e) [0x3ff8629f0be]
[bt] (3) /opt/mxnet/lib/libmxnet.so(mxnet::NDArray::LegacyLoad(dmlc::Stream*, unsigned int)+0x68) [0x3ff862a4578]
[bt] (4) /opt/mxnet/lib/libmxnet.so(mxnet::NDArray::Load(dmlc::Stream*)+0x12e) [0x3ff862a91de]
[bt] (5) /opt/mxnet/lib/libmxnet.so(mxnet::NDArray::Load(dmlc::Stream*, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> >*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >*)+0x8b8) [0x3ff862ac4d8]
[bt] (6) /opt/mxnet/lib/libmxnet.so(MXNDArrayLoad+0xf4) [0x3ff8692eaa4]
[bt] (7) /usr/lib/s390x-linux-gnu/libffi.so.6(ffi_call_SYSV+0x98) [0x3ff88385760]
[bt] (8) /usr/lib/s390x-linux-gnu/libffi.so.6(ffi_call+0x72) [0x3ff88385e12]
USE_SSE=0.Using MXNet 1.5.1
----------Python Info----------
Version : 3.6.9
Compiler : GCC 8.4.0
Build : ('default', 'Apr 18 2020 01:56:04')
Arch : ('64bit', 'ELF')
------------Pip Info-----------
Version : 9.0.1
Directory : /usr/lib/python3/dist-packages/pip
----------MXNet Info-----------
Version : 1.5.0
Directory : /root/mxnet-1.5.0/mxnet/python/mxnet
Num GPUs : 0
Hashtag not found. Not installed from pre-built package.
----------System Info----------
Platform : Linux-4.15.0-99-generic-s390x-with-Ubuntu-18.04-bionic
system : Linux
node : e0e720e9a63c
release : 4.15.0-99-generic
version : #100-Ubuntu SMP Wed Apr 22 20:31:47 UTC 2020
----------Hardware Info----------
machine : s390x
processor : s390x
Architecture: s390x
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Big Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s) per book: 1
Book(s) per drawer: 1
Drawer(s): 4
NUMA node(s): 1
Vendor ID: IBM/S390
Machine type: 3906
CPU dynamic MHz: 5208
CPU static MHz: 5208
BogoMIPS: 21881.00
Hypervisor: z/VM 7.1.0
Hypervisor vendor: IBM
Virtualization type: full
Dispatching mode: horizontal
L1d cache: 128K
L1i cache: 128K
L2d cache: 4096K
L2i cache: 2048K
L3 cache: 131072K
L4 cache: 688128K
NUMA node0 CPU(s): 0-3
Flags: esan3 zarch stfle msa ldisp eimm dfp edat etf3eh highgprs te vx vxd vxe gs sie
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0044 sec, LOAD: 0.0622 sec.
Timing for GluonNLP GitHub: https://github.com/dmlc/gluon-nlp, DNS: 0.0037 sec, LOAD: 0.0612 sec.
Timing for GluonNLP: http://gluon-nlp.mxnet.io, DNS: 0.0039 sec, LOAD: 0.1054 sec.
Timing for D2L: http://d2l.ai, DNS: 0.0040 sec, LOAD: 0.0172 sec.
Timing for D2L (zh-cn): http://zh.d2l.ai, DNS: 0.0037 sec, LOAD: 0.2370 sec.
Timing for FashionMNIST: https://repo.mxnet.io/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0038 sec, LOAD: 0.0423 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0038 sec, LOAD: 0.4736 sec.
Error open Conda: https://repo.continuum.io/pkgs/free/, HTTP Error 403: Forbidden, DNS finished in 0.0038928985595703125 sec.
The Intel x86 and AMD64 / x86-64 series of processors use the little-endian format whereas s390x architecture uses big-endian format. Current MXNet parameter serialization format is just a memory dump and can't be loaded on a system with different endianness.
For MXNet 2, we're switching to using the numpy serialization format which will prevent such issues (https://numpy.org/devdocs/reference/generated/numpy.lib.format.html)
For now, you can't use the pretrained models directly on s390x architecture but you'd need some workaround where you load the parameter on a x86 machine, call asnumpy, save via numpy and finally load the parameters on s390x machine using numpy and convert them to mxnet. Would that work for you?
@leezu Thanks for your reply.
For now, you can't use the pretrained models directly on s390x architecture but you'd need some workaround where you load the parameter on a x86 machine, call asnumpy, save via numpy and finally load the parameters on s390x machine using numpy and convert them to mxnet. Would that work for you?
I'll give it a try.
Are you sure the build succeed? Above logs are only related to the build configuration and not the actual build