Models: Colab Notebook to Train EfficientDet in the TensorFlow 2 Object Detection API

Created on 16 Jul 2020 · 12Comments · Source: tensorflow/models

Prerequisites

[X ] I checked to make sure that this issue has not been filed already.

1. The entire URL of the documentation with the issue

The tutorial documentation was useful but did not generalize to EfficientDet well. To write this tutorial, I drew from the following resources:

Rubber Ducky Intro Tutorial
Inference Tutorial
An Old TF1 OD API Tutorial
@sayakpaul published a tutorial running on GCP

2. Describe the issue

Using the above resources, I wrote a tutorial to train EfficientDet in Google Colab with the TensorFlow 2 Object Detection API.

You can run this tutorial by changing just one line for your custom dataset import. I hope this tutorial allows newcomers to the repository to quickly get up and running with TensorFlow 2 for object detection!

In the tutorial, I write how to:

Acquire Labeled Object Detection Data
Install TensorFlow 2 Object Detection Dependencies
Download Custom TensorFlow 2 Object Detection Dataset
Write Custom TensorFlow 2 Object Detection Training Configuration
Train Custom TensorFlow 2 Object Detection Model
Export Custom TensorFlow 2 Object Detection Weights
Use Trained TensorFlow 2 Object Detection For Inference on Test Images

research docs

Source

Jacobsolawetz

👍8 ❤3

Most helpful comment

@AarnavSawant that means you ran out of VRAM on GPU. Reducing either the Batch Size or the size of images should help.

tzekid on 20 Jul 2020

👍4

All 12 comments

Thank you very much .. that was really helpful

abdalkhalik on 17 Jul 2020

Hi there,
Firstly, I wanted to thank you for your notebook. It was very helpful! I tried using your notebook for my dataset and am getting a RAM Error in Colab though. I ran all of your cells step for step and just changed the curl link to my curl link and the corresponding train/test paths in the config file. Here is the stack trace
I am getting

Traceback (most recent call last):
  File "/content/models/research/object_detection/model_main_tf2.py", line 106, in <module>
    tf.compat.v1.app.run()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/content/models/research/object_detection/model_main_tf2.py", line 103, in main
    use_tpu=FLAGS.use_tpu)
  File "/usr/local/lib/python3.6/dist-packages/object_detection/model_lib_v2.py", line 622, in train_loop
    loss = _dist_train_step(train_input_iter)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 580, in __call__
    result = self._call(*args, **kwds)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 611, in _call
    return self._stateless_fn(*args, **kwds)  # pylint: disable=not-callable
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 2420, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1665, in _filtered_call
    self.captured_inputs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1746, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 598, in call
    ctx=ctx)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
  (0) Resource exhausted:  OOM when allocating tensor with shape[16,49104,4] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
     [[node Loss/Loss/huber_loss/Minimum (defined at /local/lib/python3.6/dist-packages/object_detection/core/losses.py:176) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[Func/Loss/regularization_loss/write_summary/summary_cond/then/_20/input/_95/_364]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

  (1) Resource exhausted:  OOM when allocating tensor with shape[16,49104,4] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
     [[node Loss/Loss/huber_loss/Minimum (defined at /local/lib/python3.6/dist-packages/object_detection/core/losses.py:176) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations.
0 derived errors ignored. [Op:__inference__dist_train_step_87648]

Function call stack:
_dist_train_step -> _dist_train_step

Could you give some insight on what could be causing this error? Is it a problem with my Colab environment or am I missing something in the notebook? Thanks!

AarnavSawant on 20 Jul 2020

@AarnavSawant that means you ran out of VRAM on GPU. Reducing either the Batch Size or the size of images should help.

tzekid on 20 Jul 2020

👍4

Hi @Jacobsolawetz, your tutorial is amazing, thanks a lot! I was wondering, how can I get the coco evaluation metrics ?

tazu786 on 22 Jul 2020

@tazu786 thank you - I'm still working on that! The code is commented out in there to "eval continuously" which should give us the COCO metrics, but I didn't have luck with that at first.

Jacobsolawetz on 22 Jul 2020

@Jacobsolawetz yes I have the same problem. However, the training with my custom dataset somehow works. I tried to get some qualitative results from the trained model and they are pretty decent. Anyone else luckier than us?

tazu786 on 24 Jul 2020

@Jacobsolawetz I tried to apply this suggestion https://github.com/tensorflow/models/issues/8856#issuecomment-664753607 to the train_main_tf2.py but without success. Any news?

tazu786 on 29 Jul 2020

@tazu786 no luck yet! glad to hear the qualitative at least looks good... As soon a I see a fix, I will be putting it in there.

Jacobsolawetz on 29 Jul 2020

Any updates from the team on this issue with eval in training? I'm willing to take a look into the code and try to solve it, but only if this is not already being resolved by someone from the team...

qraleq on 6 Aug 2020

@Jacobsolawetz i tried to run the colab tutorial however it throw the error below, maybe they removed something as i was able to run the demo successfully before

`/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/initializers/__init__.py in populate_deserializable_objects()
83 v2_objs = {}
84 base_cls = initializers_v2.Initializer
---> 85 generic_utils.populate_dict_with_module_objects(
86 v2_objs,
87 [initializers_v2],

AttributeError: module 'tensorflow.python.keras.utils.generic_utils' has no attribute 'populate_dict_with_module_objects'`

The error happens when i run this:
`import matplotlib
import matplotlib.pyplot as plt

import os
import random
import io
import imageio
import glob
import scipy.misc
import numpy as np
from six import BytesIO
from PIL import Image, ImageDraw, ImageFont
from IPython.display import display, Javascript
from IPython.display import Image as IPyImage

import tensorflow as tf

from object_detection.utils import label_map_util
from object_detection.utils import config_util
from object_detection.utils import visualization_utils as viz_utils
from object_detection.utils import colab_utils
from object_detection.builders import model_builder
%matplotlib inline`

Update:
I solved the problem by updating to latest TF version: !pip install -U --pre tensorflow_gpu
and removed the edit to the tf_utils.py file

abdalkhalik on 8 Aug 2020

Hello, Jacobsolawetz did you finish the evaluation code to EfficientDET ?

wesleynf on 19 Aug 2020

Hello, I can't make a custom EfficientDet training work due to memory issues. Any help on this? #9141

keremnymn on 23 Aug 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Slow inference speed of object detection models and a hack as solution

wkelongws · 78Comments

ImportError: No module named nets

10183308 · 50Comments

ObjectDetection API not suitable for tf 2.0.0-alpha0

theangels · 70Comments

Object Detection API 2.0, error with load checkpoints: A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights) but not all checkpointed values were used.

Derekabc · 119Comments

[SSD] Small object detection

Tsuihao · 90Comments