Pytorch-lightning: Why is there no training_epoch_end?

Created on 6 Mar 2020  路  9Comments  路  Source: PyTorchLightning/pytorch-lightning

馃殌 Feature

If i want to calculate and log average statistics for the training epoch, it seems like there is no option to define a "training_epoch_end" in the LightningModule, as there is validation_epoch_end and test_epoch_end.

Motivation

Seems very intuitive to have this function. I know the on_epoch_end hook exists, but the "outputs" object with training history for that epoch is not available.

Pitch

Same behavior of validation_epoch_end and test_epoch_end in training.

Sorry if there is something like this already, just started to use Pl. (the master version).

enhancement help wanted let's do it!

Most helpful comment

It is now here thanks to @jbschiratti ! #1357

All 9 comments

Hi! thanks for your contribution!, great first issue!

didn't get around to it for this release. but feel free to PR it!
we do need it

Do you think more people would want a list of every full batch output (so the results of each training_step / training_step_end if implemented) or the accumulated batch outputs?

Do you think more people would want a list of every full batch output (so the results of each training_step / training_step_end if implemented) or the accumulated batch outputs?

I think it could be in the same way the others work, in my understanding they return a list of dicts, with each dict corresponding to the return of an epoch (return of trainining_step or training_end if exists).

The only gotcha that we need to watch out for is that all of the collected outputs need to be detached so they don't keep the gradient trees in memory. I would suggest writing a method that recursively traverses the output dictionary and creates a new one with the same elements but all detached, then we can apply this to each output before adding it to the dict. Will also need some good tests to make sure that there aren't any leaks :)

The message here suggest using training_epoch_end, however, it is not called...

Is training_epoch_end available on the latest release (0.7.2-dev)? The docs seem to suggest so. But I am not able to get it to work (tried logging by returning a 'log' dict).

The message here suggest using training_epoch_end, however, it is not called...

Yes, this is quite confusing...And the training_epoch_end is refered here in the documentation as well.

It is now here thanks to @jbschiratti ! #1357

Was this page helpful?
0 / 5 - 0 ratings

Related issues

mmsamiei picture mmsamiei  路  3Comments

remisphere picture remisphere  路  3Comments

polars05 picture polars05  路  3Comments

maxime-louis picture maxime-louis  路  3Comments

edenlightning picture edenlightning  路  3Comments