Models: Colab Notebook to Train EfficientDet in the TensorFlow 2 Object Detection API

Created on 16 Jul 2020  路  12Comments  路  Source: tensorflow/models

Prerequisites

  • [X ] I checked to make sure that this issue has not been filed already.

1. The entire URL of the documentation with the issue

The tutorial documentation was useful but did not generalize to EfficientDet well. To write this tutorial, I drew from the following resources:

2. Describe the issue

Using the above resources, I wrote a tutorial to train EfficientDet in Google Colab with the TensorFlow 2 Object Detection API.

You can run this tutorial by changing just one line for your custom dataset import. I hope this tutorial allows newcomers to the repository to quickly get up and running with TensorFlow 2 for object detection!

In the tutorial, I write how to:

  • Acquire Labeled Object Detection Data
  • Install TensorFlow 2 Object Detection Dependencies
  • Download Custom TensorFlow 2 Object Detection Dataset
  • Write Custom TensorFlow 2 Object Detection Training Configuration
  • Train Custom TensorFlow 2 Object Detection Model
  • Export Custom TensorFlow 2 Object Detection Weights
  • Use Trained TensorFlow 2 Object Detection For Inference on Test Images
research docs

Most helpful comment

@AarnavSawant that means you ran out of VRAM on GPU. Reducing either the Batch Size or the size of images should help.

All 12 comments

Thank you very much .. that was really helpful

Hi there,
Firstly, I wanted to thank you for your notebook. It was very helpful! I tried using your notebook for my dataset and am getting a RAM Error in Colab though. I ran all of your cells step for step and just changed the curl link to my curl link and the corresponding train/test paths in the config file. Here is the stack trace
I am getting

Traceback (most recent call last):
  File "/content/models/research/object_detection/model_main_tf2.py", line 106, in <module>
    tf.compat.v1.app.run()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/content/models/research/object_detection/model_main_tf2.py", line 103, in main
    use_tpu=FLAGS.use_tpu)
  File "/usr/local/lib/python3.6/dist-packages/object_detection/model_lib_v2.py", line 622, in train_loop
    loss = _dist_train_step(train_input_iter)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 580, in __call__
    result = self._call(*args, **kwds)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 611, in _call
    return self._stateless_fn(*args, **kwds)  # pylint: disable=not-callable
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 2420, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1665, in _filtered_call
    self.captured_inputs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1746, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 598, in call
    ctx=ctx)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
  (0) Resource exhausted:  OOM when allocating tensor with shape[16,49104,4] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
     [[node Loss/Loss/huber_loss/Minimum (defined at /local/lib/python3.6/dist-packages/object_detection/core/losses.py:176) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[Func/Loss/regularization_loss/write_summary/summary_cond/then/_20/input/_95/_364]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

  (1) Resource exhausted:  OOM when allocating tensor with shape[16,49104,4] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
     [[node Loss/Loss/huber_loss/Minimum (defined at /local/lib/python3.6/dist-packages/object_detection/core/losses.py:176) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations.
0 derived errors ignored. [Op:__inference__dist_train_step_87648]

Function call stack:
_dist_train_step -> _dist_train_step

Could you give some insight on what could be causing this error? Is it a problem with my Colab environment or am I missing something in the notebook? Thanks!

@AarnavSawant that means you ran out of VRAM on GPU. Reducing either the Batch Size or the size of images should help.

Hi @Jacobsolawetz, your tutorial is amazing, thanks a lot! I was wondering, how can I get the coco evaluation metrics ?

@tazu786 thank you - I'm still working on that! The code is commented out in there to "eval continuously" which should give us the COCO metrics, but I didn't have luck with that at first.

@Jacobsolawetz yes I have the same problem. However, the training with my custom dataset somehow works. I tried to get some qualitative results from the trained model and they are pretty decent. Anyone else luckier than us?

@Jacobsolawetz I tried to apply this suggestion https://github.com/tensorflow/models/issues/8856#issuecomment-664753607 to the train_main_tf2.py but without success. Any news?

@tazu786 no luck yet! glad to hear the qualitative at least looks good... As soon a I see a fix, I will be putting it in there.

Any updates from the team on this issue with eval in training? I'm willing to take a look into the code and try to solve it, but only if this is not already being resolved by someone from the team...

@Jacobsolawetz i tried to run the colab tutorial however it throw the error below, maybe they removed something as i was able to run the demo successfully before

`/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/initializers/__init__.py in populate_deserializable_objects()
83 v2_objs = {}
84 base_cls = initializers_v2.Initializer
---> 85 generic_utils.populate_dict_with_module_objects(
86 v2_objs,
87 [initializers_v2],

AttributeError: module 'tensorflow.python.keras.utils.generic_utils' has no attribute 'populate_dict_with_module_objects'`

The error happens when i run this:
`import matplotlib
import matplotlib.pyplot as plt

import os
import random
import io
import imageio
import glob
import scipy.misc
import numpy as np
from six import BytesIO
from PIL import Image, ImageDraw, ImageFont
from IPython.display import display, Javascript
from IPython.display import Image as IPyImage

import tensorflow as tf

from object_detection.utils import label_map_util
from object_detection.utils import config_util
from object_detection.utils import visualization_utils as viz_utils
from object_detection.utils import colab_utils
from object_detection.builders import model_builder
%matplotlib inline`


Update:
I solved the problem by updating to latest TF version: !pip install -U --pre tensorflow_gpu
and removed the edit to the tf_utils.py file

Hello, Jacobsolawetz did you finish the evaluation code to EfficientDET ?

Hello, I can't make a custom EfficientDet training work due to memory issues. Any help on this? #9141

Was this page helpful?
0 / 5 - 0 ratings