Models: Evaluation during training

Created on 7 Jul 2017  Â·  10Comments  Â·  Source: tensorflow/models

Hi,

is there a way to run model evaluation during training? It seems to me that I need to stop training process and run evaluation on checkpoints in train directory to get some metrics. It would be more elegant to run the evaluation process after X number of training steps.

help wanted feature

Most helpful comment

you can run eval.py using CPU by adding

import os
os.environ["CUDA_VISIBLE_DEVICES"]=""

All 10 comments

I just ran the eval on a separate terminal window without having to cancel the training

Thanks for the tip, I am aware that it is possible to run evaluation in a different process if there is enough memory available. My point is that periodic model evaluation during training is common practice and it would be great to see how it can be implemented in this pipeline.

Is there a particular model you have in mind? And is there a particular set of TF libraries that you're using to help you with training?

In general, might this be a better discussion for StackOverflow? For example: https://stackoverflow.com/questions/41217953/tensorflow-evaluate-while-training-with-queues
https://stackoverflow.com/questions/42407483/tensorflow-supervisor-for-both-training-and-evaluating-operations

For training I'm using standard mobilenet v1 ssd configuration for object detection on a custom training and validation dataset.

As for evaluation during training I was thinking of something along training procedure at https://github.com/davidsandberg/facenet/blob/master/src/train_softmax.py

Thanks for the links, it seems that preferred approach is to run evaluation in a separate process.

@antoniosimunovic have you solved the memory problem then ? Or found another solution ?

Currently I'm running evaluation on a seperate device in a different
process as recommended.

2017-07-10 18:43 GMT+02:00 gdelab notifications@github.com:

@antoniosimunovic https://github.com/antoniosimunovic have you solved
the memory problem then ? Or found another solution ?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/tensorflow/models/issues/1885#issuecomment-314164016,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGIgjN0_STaWpNL7eqKkVZH7T3tiEyJ-ks5sMlS4gaJpZM4OQ1fh
.

You'd need to modify the code to do evaluation. Many people do evaluation as separate processes because it reduces the total training time by not putting an evaluation step in the middle of the training loop, but it requires another device or enough memory to run on the same device. You might try stackoverflow to see if anyone has a patch to do this, but if you extend the model to allow this option, we'd be happy to accept it.

you can run eval.py using CPU by adding

import os
os.environ["CUDA_VISIBLE_DEVICES"]=""

Thanks for the tip, I am aware that it is possible to run evaluation in a different process if there is enough memory available. My point is that periodic model evaluation during training is common practice and it would be great to see how it can be implemented in this pipeline.

How can I train the model using GPU,and valuate the model using CPU? but the train.py and eval.py are in the same pycharm project? I have tried "with tf.device('/gpu:0')" in train.py,and with tf.device('/cpu:0')" in eval.py,but it does not work.
It seems that the my environment does not support the tensorflow-CPU and tensorflow-GPU at the same time.

Hi There,
We are checking to see if you still need help on this, as this seems to be considerably old issue. Please update this issue with the latest information, code snippet to reproduce your issue and error you are seeing.
If we don't hear from you in the next 7 days, this issue will be closed automatically. If you don't need help on this issue any more, please consider closing this.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

nmfisher picture nmfisher  Â·  3Comments

frankkloster picture frankkloster  Â·  3Comments

kamal4493 picture kamal4493  Â·  3Comments

dsindex picture dsindex  Â·  3Comments

airmak picture airmak  Â·  3Comments