Hi,
is there a way to run model evaluation during training? It seems to me that I need to stop training process and run evaluation on checkpoints in train directory to get some metrics. It would be more elegant to run the evaluation process after X number of training steps.
I just ran the eval on a separate terminal window without having to cancel the training
Thanks for the tip, I am aware that it is possible to run evaluation in a different process if there is enough memory available. My point is that periodic model evaluation during training is common practice and it would be great to see how it can be implemented in this pipeline.
Is there a particular model you have in mind? And is there a particular set of TF libraries that you're using to help you with training?
In general, might this be a better discussion for StackOverflow? For example: https://stackoverflow.com/questions/41217953/tensorflow-evaluate-while-training-with-queues
https://stackoverflow.com/questions/42407483/tensorflow-supervisor-for-both-training-and-evaluating-operations
For training I'm using standard mobilenet v1 ssd configuration for object detection on a custom training and validation dataset.
As for evaluation during training I was thinking of something along training procedure at https://github.com/davidsandberg/facenet/blob/master/src/train_softmax.py
Thanks for the links, it seems that preferred approach is to run evaluation in a separate process.
@antoniosimunovic have you solved the memory problem then ? Or found another solution ?
Currently I'm running evaluation on a seperate device in a different
process as recommended.
2017-07-10 18:43 GMT+02:00 gdelab notifications@github.com:
@antoniosimunovic https://github.com/antoniosimunovic have you solved
the memory problem then ? Or found another solution ?—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/tensorflow/models/issues/1885#issuecomment-314164016,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGIgjN0_STaWpNL7eqKkVZH7T3tiEyJ-ks5sMlS4gaJpZM4OQ1fh
.
You'd need to modify the code to do evaluation. Many people do evaluation as separate processes because it reduces the total training time by not putting an evaluation step in the middle of the training loop, but it requires another device or enough memory to run on the same device. You might try stackoverflow to see if anyone has a patch to do this, but if you extend the model to allow this option, we'd be happy to accept it.
you can run eval.py using CPU by adding
import os
os.environ["CUDA_VISIBLE_DEVICES"]=""
Thanks for the tip, I am aware that it is possible to run evaluation in a different process if there is enough memory available. My point is that periodic model evaluation during training is common practice and it would be great to see how it can be implemented in this pipeline.
How can I train the model using GPU,and valuate the model using CPU? but the train.py and eval.py are in the same pycharm project? I have tried "with tf.device('/gpu:0')" in train.py,and with tf.device('/cpu:0')" in eval.py,but it does not work.
It seems that the my environment does not support the tensorflow-CPU and tensorflow-GPU at the same time.
Hi There,
We are checking to see if you still need help on this, as this seems to be considerably old issue. Please update this issue with the latest information, code snippet to reproduce your issue and error you are seeing.
If we don't hear from you in the next 7 days, this issue will be closed automatically. If you don't need help on this issue any more, please consider closing this.
Most helpful comment
you can run eval.py using CPU by adding