Darkflow: does It supports multi gpu card?

Created on 20 Feb 2017 · 15Comments · Source: thtrieu/darkflow

When training, GPU usage is relatively low, a card occupied up, suddenly reduced. Training data is VOC, does data sets need to be change?

help wanted

Source

xunkaixin

Most helpful comment

@xunkaixin
I did some try doing several changes to the code but now faced some issues due to lack of experience in Tensorflow flow code control.

I refers to this scripts and trying to do it.
The idea is to create independent GPU tower that can train with each unique batch in same time and gather average gradients for updating networks in order to increase the training batch size and made whole training process faster.

The changes are made in build.py, help.py

Firstly I focus on the build.py, I added get_available_gpus() to makes PC get available GPUs. And I added a build_train_mutigpu_op = help.build_train_mutigpu_op to the TFNet class. Comment out original if self.FLAGS.train: self.build_train_op() and change to self.FLAGS.train: self.build_train_mutigpu_op()

Changes in build.py:

from tensorflow.python.client import device_lib

def get_available_gpus():
   local_device_ports = device_lib.list_local_devices()
   return [x.name for x in local_device_ports if x.device_type == 'GPU']

class TFNet(object):
    _TRAINER = dict({
        .........
    })
    # imported methods
    ......
    build_train_op = help.build_train_op    
    build_train_mutigpu_op = help.build_train_mutigpu_op
    load_from_ckpt = help.load_from_ckpt

    ......

    def setup_meta_ops(self):
        cfg = dict({
            'allow_soft_placement': False,
            'log_device_placement': True
        })

      .......
      self.usable_gpu = get_available_gpus()# For mutiple gpu usage
        if self.FLAGS.train: self.build_train_mutigpu_op()# For mutiple gpu usage
        #if self.FLAGS.train: self.build_train_op()

      .......

And I believe the scripts in help.py are the critical scripts that need to be modified. I added following scripts:

def build_train_mutigpu_op(self):
    optimizer = self._TRAINER[self.FLAGS.trainer](self.FLAGS.lr)
    tower_grads = []
    for d in self.usable_gpu:
        with tf.device(d):
            with tf.name_scope("Tower"+str(self.usable_gpu.index(d))) as scope:
                loss = self.framework.tower_loss(scope)
                #loss = self.framework.loss(self.out,scope) #original Scripts

                # Reuse variables for the next tower.
                tf.get_variable_scope().reuse_variables()

                # Get the gradients of this batch to this tower
                grads = optimizer.compute_gradients(loss)

                # Track up the tower_grads for sum
                tower_grads.append(grads)

    # Use mean gradient for updating the network
    ave_grads = average_gradients(tower_grads)
    self.train_op = optimizer.apply_gradients(ave_grads)

I realized that currently loss is been declare to the whole class makes it hard to separate, I believe it should use a new loss function. Since original self.framework.loss(self.out,scope) do prediction it self, it seems unusable in the multiple GPUs processing. However, I don't know how to correctly define the loss function to tower_loss with name_scope, and how to manage the batch to several mini-batch separately to each tower, can anyone help?

kuochiyoug on 23 May 2017

👍4

All 15 comments

@xunkaixin @thtrieu I have the same question. I tried it on an 8 GPU AWS and only one GPU is being used close to 100% and the rest 0%. I think the training file has to be changed to include tf.device( gpus)

miladf2 on 25 Feb 2017

As a workaround: Tensorflow / Keras can support multiple GPUs. Thus, you could try to convert it into a pb file and then continue training in Tensorflow where multiple cards are supported.

kevinkit on 13 Mar 2017

👍1

baicunko on 12 Apr 2017

Would changing the ConfigProto in net/build.py to set device_count to the number of desired GPUs be enough? (e.g. cfg['device_count'] = {'GPU', 2})

Most examples of multi-GPU have an iteration over the GPU names and do something like:

for d in ['/gpu:2', '/gpu:3']:
  with tf.device(d):
    ...

There is no logic in darkflow that uses tf.device from what I can see though.

mjohnst on 3 May 2017

👍2

limjoe on 3 May 2017

👍1

so guys does that means for now, we can't manually assign which GPU darkflow should use because there is no tf.device anywhere in darkflow?

borasy on 4 May 2017

It seems like simply changing the device_count in the configuration doesn't utilize the 2nd GPU (although it seems to recognize it exists).

My output when starting training with the change (setting device_count=2):

I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 9491:00:00.0
Total memory: 11.92GiB
Free memory: 11.86GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x3f20ad0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 1 with properties: 
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID ab90:00:00.0
Total memory: 11.92GiB
Free memory: 11.86GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 0 and 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 1 and 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 1 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y N 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 1:   N Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id: 9491:00:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla K80, pci bus id: ab90:00:00.0)
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 11.92G (12798197760 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 11.92G (12798197760 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
Finished in 10.286311626434326s

Enter training ...

cfg/bcard.cfg parsing /home/mjohnst/business_cards_train/
Parsing for ['businesscard'] 
[====================>]100%  b64.xml
Statistics:
businesscard: 158
Dataset size: 81
Result saved to net/yolo/bcard.parsed
Dataset of 81 instance(s)
Training statistics: 
    Learning rate : 0.0001
    Batch size    : 64
    Epoch number  : 50000000
    Backup every  : 20000
step 1 - loss 11.029691696166992 - moving ave loss 11.029691696166992
...

But when I look at nvidia-smi dmon I only get GPU 0 being utilized, not GPU 1.
I'll let it run for a while longer to see if GPU 1 gets utilized later (might grow GPU usage into the 2nd GPU), but I don't think that will happen (didn't pass allow growth in this run)

mjohnst on 8 May 2017

@xunkaixin
I did some try doing several changes to the code but now faced some issues due to lack of experience in Tensorflow flow code control.

The changes are made in build.py, help.py

Changes in build.py:

from tensorflow.python.client import device_lib

def get_available_gpus():
   local_device_ports = device_lib.list_local_devices()
   return [x.name for x in local_device_ports if x.device_type == 'GPU']

class TFNet(object):
    _TRAINER = dict({
        .........
    })
    # imported methods
    ......
    build_train_op = help.build_train_op    
    build_train_mutigpu_op = help.build_train_mutigpu_op
    load_from_ckpt = help.load_from_ckpt

    ......

    def setup_meta_ops(self):
        cfg = dict({
            'allow_soft_placement': False,
            'log_device_placement': True
        })

      .......
      self.usable_gpu = get_available_gpus()# For mutiple gpu usage
        if self.FLAGS.train: self.build_train_mutigpu_op()# For mutiple gpu usage
        #if self.FLAGS.train: self.build_train_op()

      .......

And I believe the scripts in help.py are the critical scripts that need to be modified. I added following scripts:

def build_train_mutigpu_op(self):
    optimizer = self._TRAINER[self.FLAGS.trainer](self.FLAGS.lr)
    tower_grads = []
    for d in self.usable_gpu:
        with tf.device(d):
            with tf.name_scope("Tower"+str(self.usable_gpu.index(d))) as scope:
                loss = self.framework.tower_loss(scope)
                #loss = self.framework.loss(self.out,scope) #original Scripts

                # Reuse variables for the next tower.
                tf.get_variable_scope().reuse_variables()

                # Get the gradients of this batch to this tower
                grads = optimizer.compute_gradients(loss)

                # Track up the tower_grads for sum
                tower_grads.append(grads)

    # Use mean gradient for updating the network
    ave_grads = average_gradients(tower_grads)
    self.train_op = optimizer.apply_gradients(ave_grads)

kuochiyoug on 23 May 2017

👍4

Dose anyone figure out how to run it on multiple gpu cards? it is so slow when just train on one gpu. and the batch size can't be above 20. :(

zhiqizhang on 12 Sep 2017

Why is this issue closed?

evilmtv on 7 Mar 2018

@evilmtv it's not closed, only #525 is closed (confused me as well for a second)

Seems this still isn't solved unfortunately.

rclough on 8 Mar 2018

not sure, try to make it clear, in readme.md, it is said --gpu 0.0 to 1.0, here 1.0 means hunderd percert use gpu, and for example --gpu 0.4 means 40% with gpu, and 60% with cpu, if I understand it right. Due to this, seems only one gpu can be applied

melonetern on 26 Mar 2018

Hi,
Any idea what changes are required to do this? I tried what @kuochiyoug said earlier but couldn't figure out how to split the loss function so that the total loss for a single tower will be calculated. Any help will be appreciated since I'm new to tensorflow and neural networks in general.

chinacat567 on 1 Jan 2019

변화하는 것인가 ConfigProto에 net/build.py세트를 device_count충분 원하는 GPU의 수에? (예를 들어 cfg['device_count'] = {'GPU', 2})

멀티 GPU의 대부분의 예는 GPU 이름을 반복하며 다음과 같은 작업을 수행합니다.
for d in ['/gpu:2', '/gpu:3']:
  with tf.device(d):
    ...
darkflow에는 tf.device내가 볼 수있는 것을 사용하는 논리가 없습니다 .

If have 3 gpu cards, which is correct?
ㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡ
else:
self.say('Running entirely on CPU')
cfg['device_count'] = {'GPU': 0}
cfg['device_count'] = {'GPU': 1}
cfg['device_count'] = {'GPU': 2}

ㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡ
else:
self.say('Running entirely on CPU')
cfg['device_count'] = {'GPU': 3}

thanks