Incubator-mxnet: fix operators to support large arrays.

Created on 30 Oct 2018  路  22Comments  路  Source: apache/incubator-mxnet

We're working on a model that requires very large NDArrays. For example, we want to create an NDArray as follows:

arr = mx.nd.random.normal(shape=(50000000, 100))

The current implementation doesn't fail with an error, but it doesn't generate a matrix correctly (it only fills the rows at the beginning).

mx.nd.zeros also fails.

It's unclear what operators support and which operators don't.

Bug Operator

All 22 comments

@mxnet-label-bot [Bug, Operator]

The type of iterator is int in MXNet kernel.
So it will fail when the number of iteration is greater than 2^31 - 1, (namely 2147483647).
50000000 * 100 > 2^31 - 1

Please see the code:
CPU:
https://github.com/apache/incubator-mxnet/blob/master/src/operator/mxnet_op.h#L506

GPU:
https://github.com/apache/incubator-mxnet/blob/master/src/operator/mxnet_op.h#L627

Will take a look

@wkcn is right. We need to chang int to index_t. I am busy with other tasks now and can only come to this in one week. Let me know if it requires an immediate fix.

@apeforest
will it drop the performance down If changing the type of iteration to int64_t?
In PyTorch, the type of iteration is a template type.

i'm fixing some of the operators. but we need a systematic fix. The problem is everywhere. i'll provide a temp fix for some of the operators.

@wkcn in cpu, it shouldn't be a problem. I heard concerns on GPUs. Potentially, we can use int64_t for CPU and int for GPU.

@zheng-da
Maybe we can try to use int32_t for small for-loop, and int64_t for large for-loop.

@wkcn My concern is that this modification makes the code complex. As for using different int types for CPU and GPU, it's relatively easier. We can use the template argument to easily achieve it.

@pengzhao-intel what is the performance difference between int32 and int64 in intel CPUs?

@apeforest I have fixed some of the operators, including all random generators, zeros, ones, full, arange, gather_nd.
https://github.com/zheng-da/incubator-mxnet/commit/2c3d9a3a491d33497c2b37897e73796a0c28e19d
But we need to do more to fix the rest of the operators.

@zheng-da Maybe size_t is better.

@zheng-da Do you plan to create a PR with your change? I will be glad to review. Also, I have created an epic (https://issues.apache.org/jira/browse/MXNET-1184) to address this support in a systematic way. Please feel free to add additional tasks to it as needed. Thanks.

@zheng-da in general, int64 is only half of int32 performance.

Hi, I modified src/operator/mxnet_op.h to support int32_t and int64_t as the type of iterator. It may be helpful.

And I wrote a script to replace the type of interator to IndexType.

Usage:


    1. Install the_silver_searcher


    1. Input the command: ag "MSHADOW_XINLINE static void Map" > map.txt


    1. python replace_index.py

However, there was some bug in the script. :-(

@apeforest I just fixed the operators I use in my model. Could you help add test and fix other operators?

In my test, it seem that the performances of +/- between int32_t and int64_t are approximate.

CPU: Intel i7-7500U
OS: Arch Linux x64
Compiler: g++ 8.2.1
Compiler Flag: g++ ctype.cpp -o test -g -lpthread -std=c++11 -Wno-invalid-source-encoding
Test Code: https://github.com/wkcn/c_performance

Test int8_t
929 ms
798 ms
831 ms
2024 ms
Test int16_t
860 ms
803 ms
840 ms
1950 ms
Test int32_t
858 ms
822 ms
878 ms
1947 ms
Test int64_t
899 ms
837 ms
828 ms
7345 ms
Test float
1187 ms
1191 ms
1198 ms
1199 ms
Test double
1209 ms
1211 ms
1205 ms
1205 ms

integer operations are cheap. even if int64 is a little more expensive, it's hard to believe that it can affect the overall performance by much.

try the gemm with int 32 and int
64 and see how much peak GFLOPS it can achieve

I believe the following is also a repo of this issue:

import mxnet as mx
mx.nd.eye(10240 * 5) * 2

[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]

This issue has been fixed. In 1.5.0 release, user need to build MXNet from source with the compilation flag USE_INT64_TENSOR_SIZE=1. We are working to make this flag on by default and available in pip package in next minor release. Closing this issue for now.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

sbodenstein picture sbodenstein  路  3Comments

realbns2008 picture realbns2008  路  3Comments

seongkyun picture seongkyun  路  3Comments

dmadeka picture dmadeka  路  3Comments

qiliux picture qiliux  路  3Comments