Darkflow: CUDA_ERROR_OUT_OF_MEMORY

Created on 10 Apr 2017  路  2Comments  路  Source: thtrieu/darkflow

I'm not sure if the error here was CUDA_ERROR_OUT_OF_MEMORY or the AssertionError: Cannot capture source

# ./flow --model cfg/yolo.cfg --load bin/yolo.weights --demo samples/video_1.avi --gpu 1.0
Parsing ./cfg/yolo.cfg
Parsing cfg/yolo.cfg
Loading bin/yolo.weights ...
Successfully identified 269862452 bytes
Finished in 0.017832040786743164s
Model has a coco model name, loading coco labels.

Building net ...
Source | Train? | Layer description                | Output size
-------+--------+----------------------------------+---------------
       |        | input                            | (?, 416, 416, 3)
 Load  |  Yep!  | conv 3x3p1_1  +bnorm  leaky      | (?, 416, 416, 32)
 Load  |  Yep!  | maxp 2x2p0_2                     | (?, 208, 208, 32)
 Load  |  Yep!  | conv 3x3p1_1  +bnorm  leaky      | (?, 208, 208, 64)
 Load  |  Yep!  | maxp 2x2p0_2                     | (?, 104, 104, 64)
 Load  |  Yep!  | conv 3x3p1_1  +bnorm  leaky      | (?, 104, 104, 128)
 Load  |  Yep!  | conv 1x1p0_1  +bnorm  leaky      | (?, 104, 104, 64)
 Load  |  Yep!  | conv 3x3p1_1  +bnorm  leaky      | (?, 104, 104, 128)
 Load  |  Yep!  | maxp 2x2p0_2                     | (?, 52, 52, 128)
 Load  |  Yep!  | conv 3x3p1_1  +bnorm  leaky      | (?, 52, 52, 256)
 Load  |  Yep!  | conv 1x1p0_1  +bnorm  leaky      | (?, 52, 52, 128)
 Load  |  Yep!  | conv 3x3p1_1  +bnorm  leaky      | (?, 52, 52, 256)
 Load  |  Yep!  | maxp 2x2p0_2                     | (?, 26, 26, 256)
 Load  |  Yep!  | conv 3x3p1_1  +bnorm  leaky      | (?, 26, 26, 512)
 Load  |  Yep!  | conv 1x1p0_1  +bnorm  leaky      | (?, 26, 26, 256)
 Load  |  Yep!  | conv 3x3p1_1  +bnorm  leaky      | (?, 26, 26, 512)
 Load  |  Yep!  | conv 1x1p0_1  +bnorm  leaky      | (?, 26, 26, 256)
 Load  |  Yep!  | conv 3x3p1_1  +bnorm  leaky      | (?, 26, 26, 512)
 Load  |  Yep!  | maxp 2x2p0_2                     | (?, 13, 13, 512)
 Load  |  Yep!  | conv 3x3p1_1  +bnorm  leaky      | (?, 13, 13, 1024)
 Load  |  Yep!  | conv 1x1p0_1  +bnorm  leaky      | (?, 13, 13, 512)
 Load  |  Yep!  | conv 3x3p1_1  +bnorm  leaky      | (?, 13, 13, 1024)
 Load  |  Yep!  | conv 1x1p0_1  +bnorm  leaky      | (?, 13, 13, 512)
 Load  |  Yep!  | conv 3x3p1_1  +bnorm  leaky      | (?, 13, 13, 1024)
 Load  |  Yep!  | conv 3x3p1_1  +bnorm  leaky      | (?, 13, 13, 1024)
 Load  |  Yep!  | conv 3x3p1_1  +bnorm  leaky      | (?, 13, 13, 1024)
 Load  |  Yep!  | concat [16]                      | (?, 26, 26, 512)
 Load  |  Yep!  | local flatten 2x2                | (?, 13, 13, 2048)
 Load  |  Yep!  | concat [26, 24]                  | (?, 13, 13, 3072)
 Load  |  Yep!  | conv 3x3p1_1  +bnorm  leaky      | (?, 13, 13, 1024)
 Load  |  Yep!  | conv 1x1p0_1    linear           | (?, 13, 13, 425)
-------+--------+----------------------------------+---------------
GPU mode with 1.0 usage
2017-04-10 00:36:20.905741: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-04-10 00:36:20.905804: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-04-10 00:36:20.905832: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-04-10 00:36:20.959927: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-04-10 00:36:20.960233: I tensorflow/core/common_runtime/gpu/gpu_device.cc:887] Found device 0 with properties: 
name: GRID K520
major: 3 minor: 0 memoryClockRate (GHz) 0.797
pciBusID 0000:00:03.0
Total memory: 3.94GiB
Free memory: 3.91GiB
2017-04-10 00:36:20.960289: I tensorflow/core/common_runtime/gpu/gpu_device.cc:908] DMA: 0 
2017-04-10 00:36:20.960319: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 0:   Y 
2017-04-10 00:36:20.960349: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GRID K520, pci bus id: 0000:00:03.0)
2017-04-10 00:36:20.966911: E tensorflow/stream_executor/cuda/cuda_driver.cc:893] failed to allocate 3.94G (4232052736 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
Finished in 2.0513877868652344s

Press [ESC] to quit demo
Traceback (most recent call last):
  File "./flow", line 52, in <module>
    tfnet.camera(FLAGS.demo, FLAGS.saveVideo)
  File "/darkflow/darkflow/net/help.py", line 78, in camera
    'Cannot capture source'
AssertionError: Cannot capture source

I'm running on

# nvidia-smi
Mon Apr 10 00:38:56 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.57                 Driver Version: 367.57                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GRID K520           Off  | 0000:00:03.0     Off |                  N/A |
| N/A   36C    P8    17W / 125W |      0MiB /  4036MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Now looking at SF (see here), it seems that there is an option to force TensorFlow to allocate a specified percentage of the GPU available memory, like:

# Assume that you have 12GB of GPU memory and want to allocate ~4GB:
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))

This will prevent TF to try to allocate all available memory on the GPU, thus causing the CUDA_ERROR_OUT_OF_MEMORY issue - see here.

I didn't find where to do this TF setup within the code...

Most helpful comment

./flow --model cfg/yolo.cfg --load bin/yolo.weights --demo samples/video_1.avi --gpu .5

you sometimes get that error when trying to use 100% of the gpu, try using 50%

All 2 comments

./flow --model cfg/yolo.cfg --load bin/yolo.weights --demo samples/video_1.avi --gpu .5

you sometimes get that error when trying to use 100% of the gpu, try using 50%

@venuktan thank you! the CUDA_ERROR_OUT_OF_MEMORY is solved even if another error comes out.

Was this page helpful?
0 / 5 - 0 ratings