My usecase involves _dockerizing_ a training script and launching many instances of this image to a GPU cluster, varying the hyperparams for model optimization purposes. Currently I'm collecting the results by writing them to a mounted drive, but it would be nice to have an optional parameter where I could pass a callback that would write the training logs and results to a stream or database connection of my choice.
There are other ML libraries that include such a feature.
Sounds interesting - could you point us to some examples?
trainer.fit(
train_data=train_dataloader,
test_data=test_dataloader,
epochs=MAX_EPOCHS,
callbacks=[Monitor()], # send the metrics to the monitor
verbose=0)
From there I would imagine it's a matter of redirecting the writing to buffers (stdout or otherwise) to the write method of the callback class.
Yes, good idea. I'll add a feature tag to this, but cannot say when we can get around to implementing this. We want to refactor the training method first to make it more extensible so at the same time we might be able to work in a callback mechanism like you propose!
@alanakbik I'm also interested in this feature. I assume that the recent refactorings include the changes you were referring to.
I would be happy to implement a callback mechanism. Please let me know if help is needed.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Most helpful comment
Yes, good idea. I'll add a feature tag to this, but cannot say when we can get around to implementing this. We want to refactor the training method first to make it more extensible so at the same time we might be able to work in a callback mechanism like you propose!