Models: Evaluate while Training in Object Detection

Created on 11 Sep 2020 · 6Comments · Source: tensorflow/models

In TF1.x we were able to run evaluation automatically during a training on a single GPU. Is it possible to achieve the same?

research support

Source

turowicz

Most helpful comment

@turowicz As a temporary workaround, you can run a second evaluation-only process in parallel via (in a separate shell):

CUDA_VISIBLE_DEVICES=-1 python object_detection/model_main_tf2.py --checkpoint_dir <same path as model_dir> --model_dir <the model_dir you passed in the training process> --pipeline_config_path <path to the pipeline.config file you're training with>

This will run on the CPU only, monitor checkpoint_dir (up to, by default, 1h. look at all the flags of model_main_tf2) and run an evaluation every time a new checkpoint is generated. The benefit of this vs the V1 approach is, training never stops. Of course, though, if you need to run eval on the GPU this will require further tweaking (especially if you only have 1 GPU and/or don't want to always dedicate one device to evaluation).

GPhilo on 16 Sep 2020

👍4

All 6 comments

@turowicz

Can you please elaborate about the issue & the context.Thanks!

ravikyram on 14 Sep 2020

@ravikyram object_detection/model_main.py runs both training and evaluation as a single command, so you can leave it for the weekend and come back on monday to see the results. In contrast, model_main_tf2.py requires us to stop the training, run the evaulation manually and then restart the training. It makes the process impossible to be left alone for long periods of time.

turowicz on 14 Sep 2020

👍2

@turowicz As a temporary workaround, you can run a second evaluation-only process in parallel via (in a separate shell):

CUDA_VISIBLE_DEVICES=-1 python object_detection/model_main_tf2.py --checkpoint_dir <same path as model_dir> --model_dir <the model_dir you passed in the training process> --pipeline_config_path <path to the pipeline.config file you're training with>

GPhilo on 16 Sep 2020

👍4

Hey @GPhilo !
I´ve been searching for some information on this topic for some time.
Is this the only possible way to evaluate my model during training?
If so, can I change how often a checkpoint is generated so my model is evaluated more often?

ItsMeTheBee on 21 Sep 2020

@ItsMeTheBee

Is this the only possible way to evaluate my model during training?

Definitely not (you could implement your own evaluation, for example), but it's definitely the most practical.

If so, can I change how often a checkpoint is generated so my model is evaluated more often?

Yep, there's flags you can pass to model_main to configure how often checkpoints are created. IIRC there's also a flag for the evaluation process that sets the minimum time between evaluations, so you should make sure not to generate checkpoints too often (or remember to lower the minimum time, though I seem to remember it should be by default 10 minutes, which is really low already for all object detection models I can think of at the moment).

GPhilo on 22 Sep 2020

👍2

Thanks @GPhilo this should do it.

turowicz on 29 Sep 2020

Was this page helpful?

0 / 5 - 0 ratings