Test Environment:
P100
Test action:
1, install chainer
2, get convnet-benchmarks code:
git clone https://github.com/mitmul/convnet-benchmarks
3, test cases
3.1: case "pip install cupy==1.0.0.1"
(py2-chainer-gpu) [sys_dltest@mlt-gpu200 chainer]$ python train_imagenet.py
alexnet
('Chainer version:', '2.0.0b1')
('CuPy version:', '1.0.0.1')
('CUDA:', True)
('CUDA Version:', u'V8.0.61')
('cuDNN:', True)
('cuDNN Version:', 5110)
('Input data shape:', (128, 3, 224, 224))
('Average Forward: ', 16.15312328338623, ' ms')
('Average Backward: ', 35.27830085754395, ' ms')
('Average Total: ', 51.431424140930176, ' ms')
3.2: case "pip install cupy==2.0.0"
(py2-chainer-gpu) [sys_dltest@mlt-gpu200 chainer]$ python train_imagenet.py
alexnet
('Chainer version:', '2.0.0b1')
('CuPy version:', '2.0.0')
('CUDA:', True)
('cuDNN:', True)
('cuDNN Version:', 5110)
('Input data shape:', (128, 3, 224, 224))
('Average Forward: ', 35.381299591064455, ' ms')
('Average Backward: ', 63.26389694213867, ' ms')
('Average Total: ', 98.64519653320312, ' ms')
3.3: case "pip install cupy==2.0.0rc1"
(py2-chainer-gpu) [sys_dltest@mlt-gpu200 chainer]$ python train_imagenet.py
alexnet
('Chainer version:', '2.0.0b1')
('CuPy version:', '2.0.0rc1')
('CUDA:', True)
('cuDNN:', True)
('cuDNN Version:', 5110)
('Input data shape:', (128, 3, 224, 224))
('Average Forward: ', 35.5438117980957, ' ms')
('Average Backward: ', 63.336796569824216, ' ms')
('Average Total: ', 98.88060836791992, ' ms')
Notice: when run "case cupy==2.0.0*", you need to comment following lines in train_imagenet.py.
#if chainer.cuda.available:
# cuda_v = cupy.cuda.compiler._get_nvcc_version().split()[-1].decode('utf-8')
# print('CUDA Version:', cuda_v)
(py2-intel-chainer) [mingxiao@mlt-gpu201 scripts]$ python train_imagenet_gpu.py
alexnet
('Chainer version:', '3.1.0')
('CuPy version:', '3.0.0a1')
('CUDA:', True)
('cuDNN:', True)
('cuDNN Version:', 6021)
('Input data shape:', (128, 3, 224, 224))
('Average Forward: ', 15.92390718460083, ' ms')
('Average Backward: ', 29.40155200958252, ' ms')
('Average Total: ', 45.32545919418335, ' ms')
(py2-intel-chainer) [mingxiao@mlt-gpu201 scripts]$ python train_imagenet_gpu.py
alexnet
('Chainer version:', '3.1.0')
('CuPy version:', '2.0.0rc1')
('CUDA:', True)
('cuDNN:', True)
('cuDNN Version:', 6021)
('Input data shape:', (128, 3, 224, 224))
('Average Forward: ', 35.80836181640625, ' ms')
('Average Backward: ', 58.9523063659668, ' ms')
('Average Total: ', 94.76066818237305, ' ms')
Thank you for your investigation. Could you check if we use CuPy v2.4.0, which is the latest stable version.
Most helpful comment
Seems that the convenet benchmark performance turns up to normal after we upgrade cupy to '3.0.0a1'.
(py2-intel-chainer) [mingxiao@mlt-gpu201 scripts]$ python train_imagenet_gpu.py
alexnet
('Chainer version:', '3.1.0')
('CuPy version:', '3.0.0a1')
('CUDA:', True)
('cuDNN:', True)
('cuDNN Version:', 6021)
('Input data shape:', (128, 3, 224, 224))
('Average Forward: ', 15.92390718460083, ' ms')
('Average Backward: ', 29.40155200958252, ' ms')
('Average Total: ', 45.32545919418335, ' ms')
(py2-intel-chainer) [mingxiao@mlt-gpu201 scripts]$ python train_imagenet_gpu.py
alexnet
('Chainer version:', '3.1.0')
('CuPy version:', '2.0.0rc1')
('CUDA:', True)
('cuDNN:', True)
('cuDNN Version:', 6021)
('Input data shape:', (128, 3, 224, 224))
('Average Forward: ', 35.80836181640625, ' ms')
('Average Backward: ', 58.9523063659668, ' ms')
('Average Total: ', 94.76066818237305, ' ms')