Pytorch-lightning: Why is there no training_epoch_end?

Created on 6 Mar 2020 · 9Comments · Source: PyTorchLightning/pytorch-lightning

🚀 Feature

If i want to calculate and log average statistics for the training epoch, it seems like there is no option to define a "training_epoch_end" in the LightningModule, as there is validation_epoch_end and test_epoch_end.

Motivation

Seems very intuitive to have this function. I know the on_epoch_end hook exists, but the "outputs" object with training history for that epoch is not available.

Pitch

Same behavior of validation_epoch_end and test_epoch_end in training.

Sorry if there is something like this already, just started to use Pl. (the master version).

enhancement help wanted let's do it!

Source

dscarmo

👍5

Most helpful comment

It is now here thanks to @jbschiratti ! #1357

awaelchli on 4 Apr 2020

👍2

All 9 comments

Hi! thanks for your contribution!, great first issue!

github-actions[bot] on 6 Mar 2020

didn't get around to it for this release. but feel free to PR it!
we do need it

williamFalcon on 7 Mar 2020

Do you think more people would want a list of every full batch output (so the results of each training_step / training_step_end if implemented) or the accumulated batch outputs?

gerardrbentley on 7 Mar 2020

Do you think more people would want a list of every full batch output (so the results of each training_step / training_step_end if implemented) or the accumulated batch outputs?

I think it could be in the same way the others work, in my understanding they return a list of dicts, with each dict corresponding to the return of an epoch (return of trainining_step or training_end if exists).

dscarmo on 12 Mar 2020

The only gotcha that we need to watch out for is that all of the collected outputs need to be detached so they don't keep the gradient trees in memory. I would suggest writing a method that recursively traverses the output dictionary and creates a new one with the same elements but all detached, then we can apply this to each output before adding it to the dict. Will also need some good tests to make sure that there aren't any leaks :)

ethanwharris on 12 Mar 2020

The message here suggest using training_epoch_end, however, it is not called...

liebkne on 15 Mar 2020

Is training_epoch_end available on the latest release (0.7.2-dev)? The docs seem to suggest so. But I am not able to get it to work (tried logging by returning a 'log' dict).