I encounter a strange issue; when I increase the GPU count from 1 to 2 to 3 to 4 (my max GPU count) the training time significantly decreases per epoch. Even when I use the CUDA_ACTIVE_DEVICES with the GPU ids. All GPU's are activated and utilized when training (so they are active)
The GPU's have 12GB memory, I tried both 2 images per GPU processing (Batch_size=8) as 1 image per GPU (Batch_size=4). The fastest training time is now (strangely) GPU_COUNT=1 and IMAGES_PER_GPU=1 (?).
Roughly it takes 25 minutes per epoch (COCO dataset training with 1000 cycles p.e.) with 1 activated GPU, 40 minutes with 2 GPU's, 50 minutes with 3 GPU's and 1hr10min with 4 GPU's.
Is it some issue in the parallel_model.py code (maybe in combination with Windows OS)? There is a clear trend visible (>GPU_COUNT = >TRAINING_TIME).
The specifications:
I tried to search the issue list, however I cannot find a similar issue. I don't understand the problem :-(
Similar to #589 (see multi GPU comment) .
Similar to #589 (see multi GPU comment) .
thanks for your response. Nevertheless, it doesn't work. I copied both your version of model.py and parallel_model.py into my mrcnn folder, but the issue remains :-(
*Update: when I create an Anaconda Env with all the libraries mentioned in post #710 it still doesn't work
I came to mention the post of @ericj974, I will try myself the changes and see how it goes.
@schmidje ; were you able to reproduce the error?
I just installed this repository on Linux (Ubuntu 16.04) and somehow the multi-gpu is working properly now... Maybe some issue with Windows10 OS?
Remaining specs (on Ubuntu 16.04):
GPU driver: 384.130
Tensorflow_gpu = 1.8
Keras version = 2.1.6
I did not find yet the time to test the proposed changes of @ericj974. In your case a fresh checkout of the repo just fixed the issue or you applied the proposed changes?
I am on linux rhel BTW.
thx
I did not find yet the time to test the proposed changes of @ericj974. In your case a fresh checkout of the repo just fixed the issue or you applied the proposed changes? I am on linux rhel BTW. thx>
I have a dual-boot computer with Ubuntu 16.04 and Windows10. Switching to Ubuntu 16.04 solved the issue, so based on that I think there might be some problem in the Windows related code/integration :-(
I think we may have found the explanation in #875.
@pieterbl86 A few points that might shed some light on the topic.
Following Keras convention, an epoch doesn't always mean a full pass through the dataset. Rather, the STEPS_PER_EPOCH config setting allows you to control the number of steps per epoch. You can use small epochs to get more frequent updates in TensorBoard, or you can set it such that it corresponds with a full pass through the dataset.
BATCH_SIZE = GPU_COUNT * IMAGES_PER_GPU.
Therefore, images per epoch = STEPS_PER_EPOCH * IMAGES_PER_GPU * GPU_COUNT
When you increase the GPU_COUNT, you're also increasing the number of images you're training on per epoch. So it's normal for training an epoch to take longer. You're effectively training a larger epoch.
On Windows, the effect is more obvious. Keras has a feature that allows loading data in parallel on multiple CPU threads, but this doesn't work on Windows due to the way Python threads are implemented. So, most likely, the bottleneck in Windows is in data loading.
Most helpful comment
@pieterbl86 A few points that might shed some light on the topic.
Following Keras convention, an epoch doesn't always mean a full pass through the dataset. Rather, the STEPS_PER_EPOCH config setting allows you to control the number of steps per epoch. You can use small epochs to get more frequent updates in TensorBoard, or you can set it such that it corresponds with a full pass through the dataset.
BATCH_SIZE = GPU_COUNT * IMAGES_PER_GPU.
Therefore, images per epoch = STEPS_PER_EPOCH * IMAGES_PER_GPU * GPU_COUNT
When you increase the GPU_COUNT, you're also increasing the number of images you're training on per epoch. So it's normal for training an epoch to take longer. You're effectively training a larger epoch.
On Windows, the effect is more obvious. Keras has a feature that allows loading data in parallel on multiple CPU threads, but this doesn't work on Windows due to the way Python threads are implemented. So, most likely, the bottleneck in Windows is in data loading.