Darkflow: Out of Memory

Created on 3 Feb 2017 · 8Comments · Source: thtrieu/darkflow

Hi @thtrieu,

What the minimal requirement for the GPU device.
I use GTX 1080 and even set the batch = 1 and got out of memory?

darkflow$ ./flow --model cfg/tiny-yolo-voc.cfg --load bin/tiny-yolo-voc.weights --train --gpu 1.0
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
Parsing ./cfg/tiny-yolo-voc.cfg
Parsing cfg/tiny-yolo-voc.cfg
Loading bin/tiny-yolo-voc.weights ...
Successfully identified 63471556 bytes
Finished in 0.00466394424438s

Source

pribadihcr

Most helpful comment

Tensorflow is asking for a block of about 8GB on your GPU device. Memory used is from: variables, intermediate calculations, gradients, moving averages. I suggest using --trainer sgd to avoid additional memory for gradients' moving average.

thtrieu on 3 Feb 2017

👍4

All 8 comments

thtrieu on 3 Feb 2017

👍4

@pribadihcr how did you solve your issue?

rainmat on 28 Feb 2017

@thtrieu : The solution of setting --trainer sgd is not solving the problem and it is throwing different error as below.

 optimizer = self._TRAINER[self.FLAGS.trainer](self.FLAGS.lr)
KeyError: 'sgd'

nitish11 on 10 May 2017

I'm sorry, this should be fixed in https://github.com/thtrieu/darkflow/commit/af77e3b2ce21b308c2016decccb212b4efb63b3c

thtrieu on 19 May 2017

@thtrieu : thanks for commit, but still facing the same issue:

GPU mode with 1.0 usage
cfg/yolo.cfg loss hyper-parameters:
    H       = 13
    W       = 13
    box     = 5
    classes = 3
    scales  = [1.0, 5.0, 1.0, 1.0]
Building cfg/yolo.cfg loss
Building cfg/yolo.cfg train op
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:910] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: GeForce GTX 750 Ti
major: 5 minor: 0 memoryClockRate (GHz) 1.15
pciBusID 0000:04:00.0
Total memory: 1.95GiB
Free memory: 1.93GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 750 Ti, pci bus id: 0000:04:00.0)
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 1.95G (2096431104 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY

run.sh: line 2: 14186 Segmentation fault      (core dumped) ./flow --model cfg/yolo.cfg --train --dataset training_data_baggage_3_claases/train_images/ --annotation training_data_baggage_3_claases/annotations/ --batch 2 --epoch 100 --savepb --gpu 1.0 --trainer sgd

Is there any minimum memory requirement for the same ??

nitish11 on 22 May 2017

It worked when I changed 'gpu' input argument :

/flow --model cfg/yolo.cfg --train --dataset training_data_baggage_3_claases/train_images/ --annotation training_data_baggage_3_claases/annotations/ --batch 2 --epoch 100 --savepb --gpu 1.0 --trainer sgd

nitish11 on 22 May 2017

what is your original command and what did you changed? Might be useful for later users.

thtrieu on 22 May 2017

The working command is as below :

/flow --model cfg/yolo.cfg --train --dataset training_data_baggage_3_claases/train_images/ --annotation training_data_baggage_3_claases/annotations/ --batch 2 --epoch 100 --savepb --gpu 0.8 --trainer sgd

nitish11 on 22 May 2017

Was this page helpful?

0 / 5 - 0 ratings