Incubator-mxnet: ValueError: Too many slices such that some splits are empty

Created on 2 Jul 2016 · 6Comments · Source: apache/incubator-mxnet

Hi,

I get this error when I try to train on 2 GPUs, I wonder what is it about?

INFO:root:Auto-select kvstore type = local_update_cpu
INFO:root:Start training with [gpu(1), gpu(2)]
Traceback (most recent call last):
  File "train_lstm.py", line 147, in <module>
    main(sys.argv)
  File "train_lstm.py", line 143, in main
    epoch_end_callback = [ mx.callback.do_checkpoint( '%s/%s' % (params_dir, expt_name) ) ]
  File "/home/chuaf/miniconda3/lib/python3.5/site-packages/mxnet-0.7.0-py3.5.egg/mxnet/model.py", line 788, in fit
    sym_gen=self.sym_gen)
  File "/home/chuaf/miniconda3/lib/python3.5/site-packages/mxnet-0.7.0-py3.5.egg/mxnet/model.py", line 192, in _train_multi_device
    logger=logger)
  File "/home/chuaf/miniconda3/lib/python3.5/site-packages/mxnet-0.7.0-py3.5.egg/mxnet/executor_manager.py", line 302, in __init__
    slices = _split_input_slice(train_data.batch_size, work_load_list)
  File "/home/chuaf/miniconda3/lib/python3.5/site-packages/mxnet-0.7.0-py3.5.egg/mxnet/executor_manager.py", line 43, in _split_input_slice
    raise ValueError('Too many slices such that some splits are empty')
ValueError: Too many slices such that some splits are empty

Source

freddycct

Most helpful comment

your batch_size is smaller than the number of gpus you are using. Some gpus cannot get data

piiswrong on 2 Jul 2016

👍10

All 6 comments

your batch_size is smaller than the number of gpus you are using. Some gpus cannot get data

piiswrong on 2 Jul 2016

👍10

ah ok, thanks...

freddycct on 2 Jul 2016

Thanks my brother!

ysh329 on 14 May 2017

@piiswrong I occurs this issue when inference. Does this mean MXNet only can use one gpu when inference because of this code below? Can MXNet make inference use multi-gpus?

mod.forward(Batch([mx.nd.array(img)]))

It seems need to use mxnet.dataiter, to infer based on batch (multi-gpus)?

ysh329 on 14 May 2017

I knew.

batch_size = 32
mod2 = mx.mod.Module(symbol=sym, label_names=None, context=mx.gpu())
mod2.bind(for_training=False, data_shapes=[('data', (batch_size,3,224,224))])

ysh329 on 18 May 2017

👍2

My batch size is equal to the number of gpus, however, the error still encountered.