Thank you for sharing such nice work!
I am aware of the issue and similarly, I am using GTX 1080 and tried batch=1. However I do not see any option to use SGD optimizer in the code available.
How to solve it so I am still able to train using my GPU?
self._TRAINER
Out[2]:
{'adadelta': tensorflow.python.training.adadelta.AdadeltaOptimizer,
'adagrad': tensorflow.python.training.adagrad.AdagradOptimizer,
'adagradDA': tensorflow.python.training.adagrad_da.AdagradDAOptimizer,
'adam': tensorflow.python.training.adam.AdamOptimizer,
'ftrl': tensorflow.python.training.ftrl.FtrlOptimizer,
'momentum': tensorflow.python.training.momentum.MomentumOptimizer,
'rmsprop': tensorflow.python.training.rmsprop.RMSPropOptimizer}
For momentum and adagradDA I get:
Traceback (most recent call last):
File "./flow", line 42, in <module>
tfnet = TFNet(FLAGS)
File "/home/dem/mydarkflow/net/build.py", line 51, in __init__
self.setup_meta_ops()
File "/home/dem/mydarkflow/net/build.py", line 94, in setup_meta_ops
if self.FLAGS.train: self.build_train_op()
File "/home/dem/mydarkflow/net/help.py", line 17, in build_train_op
optimizer = self._TRAINER[self.FLAGS.trainer](self.FLAGS.lr)
TypeError: __init__() takes at least 3 arguments (2 given)
For rmsprop, ftrl, adam, adagrad, and adadelta I get CUDA_ERROR_OUT_OF_MEMORY:
````
Building net ...
Source | Train? | Layer description | Output size
-------+--------+----------------------------------+---------------
| | input | (?, 448, 448, 3)
Load | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 448, 448, 16)
Load | Yep! | maxp 2x2p0_2 | (?, 224, 224, 16)
Load | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 224, 224, 32)
Load | Yep! | maxp 2x2p0_2 | (?, 112, 112, 32)
Load | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 112, 112, 64)
Load | Yep! | maxp 2x2p0_2 | (?, 56, 56, 64)
Load | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 56, 56, 128)
Load | Yep! | maxp 2x2p0_2 | (?, 28, 28, 128)
Load | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 28, 28, 256)
Load | Yep! | maxp 2x2p0_2 | (?, 14, 14, 256)
Load | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 14, 14, 512)
Load | Yep! | maxp 2x2p0_2 | (?, 7, 7, 512)
Load | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 7, 7, 1024)
Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 7, 7, 256)
Load | Yep! | flat | (?, 12544)
Init | Yep! | full 12544 x 735 linear | (?, 735)
-------+--------+----------------------------------+---------------
GPU mode with 1.0 usage
cfg/v1.1/tiny-yolov1-5c.cfg loss hyper-parameters:
side = 7
box = 2
classes = 5
scales = [1.0, 1.0, 0.5, 5.0]
Building cfg/v1.1/tiny-yolov1-5c.cfg loss
Building cfg/v1.1/tiny-yolov1-5c.cfg train op
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:910] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.8225
pciBusID 0000:01:00.0
Total memory: 7.92GiB
Free memory: 7.55GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0)
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 7.92G (8504279040 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
Finished in 2.2762658596s
Enter training ...
``
I tried to restrict amount of GPU memory allowed to be allocated by setting --gpu 0.9. (per_process_gpu_memory_fraction)
--gpu 1.0 causing the original issue I posted, but other values <1.0 result in the message from this post, regardless to gpu usage percentage I set?
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.8225
pciBusID 0000:01:00.0
Total memory: 7.92GiB
Free memory: 7.55GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0)
Finished in 2.17069816589s
Enter training ...
Dataset of 13063 instance(s)
Training statistics:
Learning rate : 1e-05
Batch size : 48
Epoch number : 1000
Backup every : 2000
W tensorflow/core/common_runtime/bfc_allocator.cc:217] Ran out of memory trying to allocate 2.43GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
W tensorflow/core/common_runtime/bfc_allocator.cc:217] Ran out of memory trying to allocate 2.43GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
step 1 - loss 26.9644641876 - moving ave loss 26.9644641876
W tensorflow/core/common_runtime/bfc_allocator.cc:217] Ran out of memory trying to allocate 2.43GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
W tensorflow/core/common_runtime/bfc_allocator.cc:217] Ran out of memory trying to allocate 2.43GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
step 2 - loss 25.0753860474 - moving ave loss 26.7755563736
step 3 - loss 26.65924263 - moving ave loss 26.7639249992
W tensorflow/core/common_runtime/bfc_allocator.cc:217] Ran out of memory trying to allocate 2.44GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
W tensorflow/core/common_runtime/bfc_allocator.cc:217] Ran out of memory trying to allocate 2.44GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
step 4 - loss 25.394159317 - moving ave loss 26.626948431
+1
+1
The property "gpu" represent how many gpu memory you will use , so you may try set it the value which is less than 1.
good luck!
Seems like a GPU memory utilization problem. Suggestion: use smaller percentage (e.g. --gpu 0.5), use SGD instead of ADAM/RMSPROP since these optimizers require additional variables, use smaller batch_size may help too.
This issue pops up several places in other applications/repos and in the Tensorflow official repo as well. I don't think I can do anything about it at the moment...
Switching to sgd worked for me, tks!
Most helpful comment
Seems like a GPU memory utilization problem. Suggestion: use smaller percentage (e.g.
--gpu 0.5), use SGD instead of ADAM/RMSPROP since these optimizers require additional variables, use smaller batch_size may help too.This issue pops up several places in other applications/repos and in the Tensorflow official repo as well. I don't think I can do anything about it at the moment...