Incubator-mxnet: ValueError: Too many slices such that some splits are empty

Created on 2 Jul 2016  路  6Comments  路  Source: apache/incubator-mxnet

Hi,

I get this error when I try to train on 2 GPUs, I wonder what is it about?

INFO:root:Auto-select kvstore type = local_update_cpu
INFO:root:Start training with [gpu(1), gpu(2)]
Traceback (most recent call last):
  File "train_lstm.py", line 147, in <module>
    main(sys.argv)
  File "train_lstm.py", line 143, in main
    epoch_end_callback = [ mx.callback.do_checkpoint( '%s/%s' % (params_dir, expt_name) ) ]
  File "/home/chuaf/miniconda3/lib/python3.5/site-packages/mxnet-0.7.0-py3.5.egg/mxnet/model.py", line 788, in fit
    sym_gen=self.sym_gen)
  File "/home/chuaf/miniconda3/lib/python3.5/site-packages/mxnet-0.7.0-py3.5.egg/mxnet/model.py", line 192, in _train_multi_device
    logger=logger)
  File "/home/chuaf/miniconda3/lib/python3.5/site-packages/mxnet-0.7.0-py3.5.egg/mxnet/executor_manager.py", line 302, in __init__
    slices = _split_input_slice(train_data.batch_size, work_load_list)
  File "/home/chuaf/miniconda3/lib/python3.5/site-packages/mxnet-0.7.0-py3.5.egg/mxnet/executor_manager.py", line 43, in _split_input_slice
    raise ValueError('Too many slices such that some splits are empty')
ValueError: Too many slices such that some splits are empty

Most helpful comment

your batch_size is smaller than the number of gpus you are using. Some gpus cannot get data

All 6 comments

your batch_size is smaller than the number of gpus you are using. Some gpus cannot get data

ah ok, thanks...

Thanks my brother!

@piiswrong I occurs this issue when inference. Does this mean MXNet only can use one gpu when inference because of this code below? Can MXNet make inference use multi-gpus?

mod.forward(Batch([mx.nd.array(img)]))

It seems need to use mxnet.dataiter, to infer based on batch (multi-gpus)?

I knew.

batch_size = 32
mod2 = mx.mod.Module(symbol=sym, label_names=None, context=mx.gpu())
mod2.bind(for_training=False, data_shapes=[('data', (batch_size,3,224,224))])

My batch size is equal to the number of gpus, however, the error still encountered.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

JonBoyleCoding picture JonBoyleCoding  路  3Comments

seongkyun picture seongkyun  路  3Comments

dushoufu picture dushoufu  路  3Comments

qiliux picture qiliux  路  3Comments

realbns2008 picture realbns2008  路  3Comments