In my research, I use 2 types of AWS EC2 instances to train my models: p2.xlarge (1 GPU) and p2.8xlarge (8 GPUs). I noticed that using multi_gpu_model() on the 8xlarge instances actually results in a ~50% increase in training time per epoch over the xlarge instances.
The environment I use is the Deep Learning Amazon Linux AMI. Specs can be found here: https://aws.amazon.com/marketplace/pp/B077GF11NF?qid=1516817149793&sr=0-10&ref_=srh_res_product_title(url)
My code for the multi gpu instantiation is as follows:
with tf.device('/cpu:0'): base_model = ...
model = multi_gpu_model(base_model, gpus=8)
Has anyone encountered this issue before? Is this issue specific to how the AMI was set up?
You can check more details here: #9204 #9502
Most helpful comment
You can check more details here: #9204 #9502