Darkflow: CUDA_ERROR_OUT_OF_MEMORY and possible workaround

Created on 23 Feb 2017 · 6Comments · Source: thtrieu/darkflow

Thank you for sharing such nice work!
I am aware of the issue and similarly, I am using GTX 1080 and tried batch=1. However I do not see any option to use SGD optimizer in the code available.

How to solve it so I am still able to train using my GPU?

self._TRAINER Out[2]: {'adadelta': tensorflow.python.training.adadelta.AdadeltaOptimizer, 'adagrad': tensorflow.python.training.adagrad.AdagradOptimizer, 'adagradDA': tensorflow.python.training.adagrad_da.AdagradDAOptimizer, 'adam': tensorflow.python.training.adam.AdamOptimizer, 'ftrl': tensorflow.python.training.ftrl.FtrlOptimizer, 'momentum': tensorflow.python.training.momentum.MomentumOptimizer, 'rmsprop': tensorflow.python.training.rmsprop.RMSPropOptimizer}

For momentum and adagradDA I get:

Traceback (most recent call last):
  File "./flow", line 42, in <module>
    tfnet = TFNet(FLAGS)
  File "/home/dem/mydarkflow/net/build.py", line 51, in __init__
    self.setup_meta_ops()
  File "/home/dem/mydarkflow/net/build.py", line 94, in setup_meta_ops
    if self.FLAGS.train: self.build_train_op()
  File "/home/dem/mydarkflow/net/help.py", line 17, in build_train_op
    optimizer = self._TRAINER[self.FLAGS.trainer](self.FLAGS.lr)
TypeError: __init__() takes at least 3 arguments (2 given)

For rmsprop, ftrl, adam, adagrad, and adadelta I get CUDA_ERROR_OUT_OF_MEMORY:

````
Building net ...
Source | Train? | Layer description | Output size
-------+--------+----------------------------------+---------------
| | input | (?, 448, 448, 3)
Load | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 448, 448, 16)
Load | Yep! | maxp 2x2p0_2 | (?, 224, 224, 16)
Load | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 224, 224, 32)
Load | Yep! | maxp 2x2p0_2 | (?, 112, 112, 32)
Load | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 112, 112, 64)
Load | Yep! | maxp 2x2p0_2 | (?, 56, 56, 64)
Load | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 56, 56, 128)
Load | Yep! | maxp 2x2p0_2 | (?, 28, 28, 128)
Load | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 28, 28, 256)
Load | Yep! | maxp 2x2p0_2 | (?, 14, 14, 256)
Load | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 14, 14, 512)
Load | Yep! | maxp 2x2p0_2 | (?, 7, 7, 512)
Load | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 7, 7, 1024)
Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 7, 7, 256)
Load | Yep! | flat | (?, 12544)
Init | Yep! | full 12544 x 735 linear | (?, 735)
-------+--------+----------------------------------+---------------
GPU mode with 1.0 usage
cfg/v1.1/tiny-yolov1-5c.cfg loss hyper-parameters:
side = 7
box = 2
classes = 5
scales = [1.0, 1.0, 0.5, 5.0]
Building cfg/v1.1/tiny-yolov1-5c.cfg loss
Building cfg/v1.1/tiny-yolov1-5c.cfg train op
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:910] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.8225
pciBusID 0000:01:00.0
Total memory: 7.92GiB
Free memory: 7.55GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0)
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 7.92G (8504279040 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
Finished in 2.2762658596s

Enter training ...
``

help wanted

Source

rainmat

Most helpful comment

Seems like a GPU memory utilization problem. Suggestion: use smaller percentage (e.g. --gpu 0.5), use SGD instead of ADAM/RMSPROP since these optimizers require additional variables, use smaller batch_size may help too.

This issue pops up several places in other applications/repos and in the Tensorflow official repo as well. I don't think I can do anything about it at the moment...

thtrieu on 19 May 2017

👍4

All 6 comments

I tried to restrict amount of GPU memory allowed to be allocated by setting --gpu 0.9. (per_process_gpu_memory_fraction)

Why is --gpu 1.0 causing the original issue I posted, but other values <1.0 result in the message from this post, regardless to gpu usage percentage I set?
Why the allocation of chunk of ~2.45 GB does not fail at each training step? (for example step 2 - step 3)?


I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.8225
pciBusID 0000:01:00.0
Total memory: 7.92GiB
Free memory: 7.55GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0)
Finished in 2.17069816589s

Enter training ...
Dataset of 13063 instance(s)
Training statistics: 
    Learning rate : 1e-05
    Batch size    : 48
    Epoch number  : 1000
    Backup every  : 2000
W tensorflow/core/common_runtime/bfc_allocator.cc:217] Ran out of memory trying to allocate 2.43GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
W tensorflow/core/common_runtime/bfc_allocator.cc:217] Ran out of memory trying to allocate 2.43GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
step 1 - loss 26.9644641876 - moving ave loss 26.9644641876
W tensorflow/core/common_runtime/bfc_allocator.cc:217] Ran out of memory trying to allocate 2.43GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
W tensorflow/core/common_runtime/bfc_allocator.cc:217] Ran out of memory trying to allocate 2.43GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
step 2 - loss 25.0753860474 - moving ave loss 26.7755563736
step 3 - loss 26.65924263 - moving ave loss 26.7639249992
W tensorflow/core/common_runtime/bfc_allocator.cc:217] Ran out of memory trying to allocate 2.44GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
W tensorflow/core/common_runtime/bfc_allocator.cc:217] Ran out of memory trying to allocate 2.44GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
step 4 - loss 25.394159317 - moving ave loss 26.626948431

rainmat on 23 Feb 2017

christopher5106 on 1 Mar 2017

lababidi on 31 Mar 2017

The property "gpu" represent how many gpu memory you will use , so you may try set it the value which is less than 1.
good luck!

ZhuoZheng on 18 May 2017

This issue pops up several places in other applications/repos and in the Tensorflow official repo as well. I don't think I can do anything about it at the moment...

thtrieu on 19 May 2017

👍4

Switching to sgd worked for me, tks!