Hi,
I get this error when I try to train on 2 GPUs, I wonder what is it about?
INFO:root:Auto-select kvstore type = local_update_cpu
INFO:root:Start training with [gpu(1), gpu(2)]
Traceback (most recent call last):
File "train_lstm.py", line 147, in <module>
main(sys.argv)
File "train_lstm.py", line 143, in main
epoch_end_callback = [ mx.callback.do_checkpoint( '%s/%s' % (params_dir, expt_name) ) ]
File "/home/chuaf/miniconda3/lib/python3.5/site-packages/mxnet-0.7.0-py3.5.egg/mxnet/model.py", line 788, in fit
sym_gen=self.sym_gen)
File "/home/chuaf/miniconda3/lib/python3.5/site-packages/mxnet-0.7.0-py3.5.egg/mxnet/model.py", line 192, in _train_multi_device
logger=logger)
File "/home/chuaf/miniconda3/lib/python3.5/site-packages/mxnet-0.7.0-py3.5.egg/mxnet/executor_manager.py", line 302, in __init__
slices = _split_input_slice(train_data.batch_size, work_load_list)
File "/home/chuaf/miniconda3/lib/python3.5/site-packages/mxnet-0.7.0-py3.5.egg/mxnet/executor_manager.py", line 43, in _split_input_slice
raise ValueError('Too many slices such that some splits are empty')
ValueError: Too many slices such that some splits are empty
your batch_size is smaller than the number of gpus you are using. Some gpus cannot get data
ah ok, thanks...
Thanks my brother!
@piiswrong I occurs this issue when inference. Does this mean MXNet only can use one gpu when inference because of this code below? Can MXNet make inference use multi-gpus?
mod.forward(Batch([mx.nd.array(img)]))
It seems need to use mxnet.dataiter, to infer based on batch (multi-gpus)?
I knew.
batch_size = 32
mod2 = mx.mod.Module(symbol=sym, label_names=None, context=mx.gpu())
mod2.bind(for_training=False, data_shapes=[('data', (batch_size,3,224,224))])
My batch size is equal to the number of gpus, however, the error still encountered.
Most helpful comment
your batch_size is smaller than the number of gpus you are using. Some gpus cannot get data