When training, GPU usage is relatively low, a card occupied up, suddenly reduced. Training data is VOC, does data sets need to be change?
@xunkaixin @thtrieu I have the same question. I tried it on an 8 GPU AWS and only one GPU is being used close to 100% and the rest 0%. I think the training file has to be changed to include tf.device( gpus)
As a workaround: Tensorflow / Keras can support multiple GPUs. Thus, you could try to convert it into a pb file and then continue training in Tensorflow where multiple cards are supported.
+1
Would changing the ConfigProto in net/build.py to set device_count to the number of desired GPUs be enough? (e.g. cfg['device_count'] = {'GPU', 2})
Most examples of multi-GPU have an iteration over the GPU names and do something like:
for d in ['/gpu:2', '/gpu:3']:
with tf.device(d):
...
There is no logic in darkflow that uses tf.device from what I can see though.
+1
so guys does that means for now, we can't manually assign which GPU darkflow should use because there is no tf.device anywhere in darkflow?
It seems like simply changing the device_count in the configuration doesn't utilize the 2nd GPU (although it seems to recognize it exists).
My output when starting training with the change (setting device_count=2):
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 9491:00:00.0
Total memory: 11.92GiB
Free memory: 11.86GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x3f20ad0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 1 with properties:
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID ab90:00:00.0
Total memory: 11.92GiB
Free memory: 11.86GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 0 and 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 1 and 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y N
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 1: N Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id: 9491:00:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla K80, pci bus id: ab90:00:00.0)
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 11.92G (12798197760 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 11.92G (12798197760 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
Finished in 10.286311626434326s
Enter training ...
cfg/bcard.cfg parsing /home/mjohnst/business_cards_train/
Parsing for ['businesscard']
[====================>]100% b64.xml
Statistics:
businesscard: 158
Dataset size: 81
Result saved to net/yolo/bcard.parsed
Dataset of 81 instance(s)
Training statistics:
Learning rate : 0.0001
Batch size : 64
Epoch number : 50000000
Backup every : 20000
step 1 - loss 11.029691696166992 - moving ave loss 11.029691696166992
...
But when I look at nvidia-smi dmon I only get GPU 0 being utilized, not GPU 1.
I'll let it run for a while longer to see if GPU 1 gets utilized later (might grow GPU usage into the 2nd GPU), but I don't think that will happen (didn't pass allow growth in this run)
@xunkaixin
I did some try doing several changes to the code but now faced some issues due to lack of experience in Tensorflow flow code control.
I refers to this scripts and trying to do it.
The idea is to create independent GPU tower that can train with each unique batch in same time and gather average gradients for updating networks in order to increase the training batch size and made whole training process faster.
The changes are made in build.py, help.py
Firstly I focus on the build.py, I added get_available_gpus() to makes PC get available GPUs. And I added a build_train_mutigpu_op = help.build_train_mutigpu_op to the TFNet class. Comment out original if self.FLAGS.train: self.build_train_op() and change to self.FLAGS.train: self.build_train_mutigpu_op()
Changes in build.py:
from tensorflow.python.client import device_lib
def get_available_gpus():
local_device_ports = device_lib.list_local_devices()
return [x.name for x in local_device_ports if x.device_type == 'GPU']
class TFNet(object):
_TRAINER = dict({
.........
})
# imported methods
......
build_train_op = help.build_train_op
build_train_mutigpu_op = help.build_train_mutigpu_op
load_from_ckpt = help.load_from_ckpt
......
def setup_meta_ops(self):
cfg = dict({
'allow_soft_placement': False,
'log_device_placement': True
})
.......
self.usable_gpu = get_available_gpus()# For mutiple gpu usage
if self.FLAGS.train: self.build_train_mutigpu_op()# For mutiple gpu usage
#if self.FLAGS.train: self.build_train_op()
.......
And I believe the scripts in help.py are the critical scripts that need to be modified. I added following scripts:
def build_train_mutigpu_op(self):
optimizer = self._TRAINER[self.FLAGS.trainer](self.FLAGS.lr)
tower_grads = []
for d in self.usable_gpu:
with tf.device(d):
with tf.name_scope("Tower"+str(self.usable_gpu.index(d))) as scope:
loss = self.framework.tower_loss(scope)
#loss = self.framework.loss(self.out,scope) #original Scripts
# Reuse variables for the next tower.
tf.get_variable_scope().reuse_variables()
# Get the gradients of this batch to this tower
grads = optimizer.compute_gradients(loss)
# Track up the tower_grads for sum
tower_grads.append(grads)
# Use mean gradient for updating the network
ave_grads = average_gradients(tower_grads)
self.train_op = optimizer.apply_gradients(ave_grads)
I realized that currently loss is been declare to the whole class makes it hard to separate, I believe it should use a new loss function. Since original self.framework.loss(self.out,scope) do prediction it self, it seems unusable in the multiple GPUs processing. However, I don't know how to correctly define the loss function to tower_loss with name_scope, and how to manage the batch to several mini-batch separately to each tower, can anyone help?
Dose anyone figure out how to run it on multiple gpu cards? it is so slow when just train on one gpu. and the batch size can't be above 20. :(
Why is this issue closed?
@evilmtv it's not closed, only #525 is closed (confused me as well for a second)
Seems this still isn't solved unfortunately.
not sure, try to make it clear, in readme.md, it is said --gpu 0.0 to 1.0, here 1.0 means hunderd percert use gpu, and for example --gpu 0.4 means 40% with gpu, and 60% with cpu, if I understand it right. Due to this, seems only one gpu can be applied
Hi,
Any idea what changes are required to do this? I tried what @kuochiyoug said earlier but couldn't figure out how to split the loss function so that the total loss for a single tower will be calculated. Any help will be appreciated since I'm new to tensorflow and neural networks in general.
๋ณํํ๋ ๊ฒ์ธ๊ฐ
ConfigProto์net/build.py์ธํธ๋ฅผdevice_count์ถฉ๋ถ ์ํ๋ GPU์ ์์? (์๋ฅผ ๋ค์ดcfg['device_count'] = {'GPU', 2})๋ฉํฐ GPU์ ๋๋ถ๋ถ์ ์๋ GPU ์ด๋ฆ์ ๋ฐ๋ณตํ๋ฉฐ ๋ค์๊ณผ ๊ฐ์ ์์ ์ ์ํํฉ๋๋ค.
for d in ['/gpu:2', '/gpu:3']: with tf.device(d): ...darkflow์๋
tf.device๋ด๊ฐ ๋ณผ ์์๋ ๊ฒ์ ์ฌ์ฉํ๋ ๋ ผ๋ฆฌ๊ฐ ์์ต๋๋ค .
๋ณํํ๋ ๊ฒ์ธ๊ฐ
ConfigProto์net/build.py์ธํธ๋ฅผdevice_count์ถฉ๋ถ ์ํ๋ GPU์ ์์? (์๋ฅผ ๋ค์ดcfg['device_count'] = {'GPU', 2})๋ฉํฐ GPU์ ๋๋ถ๋ถ์ ์๋ GPU ์ด๋ฆ์ ๋ฐ๋ณตํ๋ฉฐ ๋ค์๊ณผ ๊ฐ์ ์์ ์ ์ํํฉ๋๋ค.
for d in ['/gpu:2', '/gpu:3']: with tf.device(d): ...darkflow์๋
tf.device๋ด๊ฐ ๋ณผ ์์๋ ๊ฒ์ ์ฌ์ฉํ๋ ๋ ผ๋ฆฌ๊ฐ ์์ต๋๋ค .
If have 3 gpu cards, which is correct?
ใ
กใ
กใ
กใ
กใ
กใ
กใ
กใ
กใ
กใ
กใ
กใ
กใ
กใ
กใ
กใ
กใ
กใ
กใ
กใ
กใ
กใ
กใ
กใ
กใ
กใ
กใ
กใ
กใ
กใ
กใ
กใ
กใ
กใ
ก
else:
self.say('Running entirely on CPU')
cfg['device_count'] = {'GPU': 0}
cfg['device_count'] = {'GPU': 1}
cfg['device_count'] = {'GPU': 2}
ใ
กใ
กใ
กใ
กใ
กใ
กใ
กใ
กใ
กใ
กใ
กใ
กใ
กใ
กใ
กใ
กใ
กใ
กใ
กใ
กใ
กใ
กใ
กใ
กใ
กใ
กใ
กใ
กใ
กใ
กใ
กใ
กใ
กใ
กใ
ก
else:
self.say('Running entirely on CPU')
cfg['device_count'] = {'GPU': 3}
thanks
I am also looking for an answer to this. How do we update the library?
Most helpful comment
@xunkaixin
I did some try doing several changes to the code but now faced some issues due to lack of experience in Tensorflow flow code control.
I refers to this scripts and trying to do it.
The idea is to create independent GPU tower that can train with each unique batch in same time and gather average gradients for updating networks in order to increase the training batch size and made whole training process faster.
The changes are made in build.py, help.py
Firstly I focus on the build.py, I added get_available_gpus() to makes PC get available GPUs. And I added a build_train_mutigpu_op = help.build_train_mutigpu_op to the TFNet class. Comment out original if self.FLAGS.train: self.build_train_op() and change to self.FLAGS.train: self.build_train_mutigpu_op()
Changes in build.py:
And I believe the scripts in help.py are the critical scripts that need to be modified. I added following scripts:
I realized that currently loss is been declare to the whole class makes it hard to separate, I believe it should use a new loss function. Since original self.framework.loss(self.out,scope) do prediction it self, it seems unusable in the multiple GPUs processing. However, I don't know how to correctly define the loss function to tower_loss with name_scope, and how to manage the batch to several mini-batch separately to each tower, can anyone help?