Pytorch-lightning: Log epoch as step when on_epoch=True and on_step=False

Created on 27 Aug 2020 · 5Comments · Source: PyTorchLightning/pytorch-lightning

🚀 Feature

When using the new structured Result API, it is no longer possible to force PL to report the epoch as the current step to loggers instead of the global step (or elapsed number of training steps).

Motivation

This results in confusing results when viewing results in a tool that uses step count by default (e.g. tensorboard's scalars view). Intuitively, one would think .log(..., on_epoch=True, on_step=False) would count epochs and not steps.

It is possible to obtain this behaviour by overriding both training_epoch_end _and_ validation_epoch_end and returning an EvalResult with a "step" metric. Unfortunately, this adds back a bunch of boilerplate and many of the nice metric aggregation features PL offers when *_epoch_end is not implemented.

Pitch

Either a) allow for overriding the step for a given result, or b) default that step to current_epoch when on_epoch=True and on_step=False.

Alternatives

Use a "step" key with the old dict-based logging system (not documented, but worked as of 0.8.3)
Override train/validation_epoch_end and add return an EvalResult with .log('step', self.current_epoch).

Additional context

Original discussion: https://forums.pytorchlightning.ai/t/is-there-a-way-to-only-log-on-epoch-end-using-the-new-result-apis/74/3

ResultObj enhancement help wanted

Source

ToucheSir

👍1

Most helpful comment

is this behavior really desired? When graphing, don't you want everything to be on the same scale?
if this change happens, the metrics logged by epoch won't be able to be compared visually with things logged per step...

makes it really hard to compare apples to apples.

I don't think we should make this change.
@PyTorchLightning/core-contributors ?

williamFalcon on 3 Oct 2020

👍2

All 5 comments

Hi! thanks for your contribution!, great first issue!

github-actions[bot] on 27 Aug 2020

any ideas for the case on_epoch=True and on_step=True?

Edit:
looking through the code it would be simple enough to make on_epoch=True to log using epochs but that wouldn't allow it to keep using steps when on_step is also True, perhaps they should be made mutually exclusive? I can't think of a situation where I would log both on the same graph even if they are both using steps for x.

Edit 2:
nvm they are logged to two different keys so there should be no conflict, simple 1 line change

s-rog on 28 Aug 2020

makes it really hard to compare apples to apples.

I don't think we should make this change.
@PyTorchLightning/core-contributors ?

williamFalcon on 3 Oct 2020

👍2

As per the discourse discussion, logs could be on epoch only xor step only so everything would be on the same scale (modulo whatever ephemeral logs show up on the progress bar). However, it's not possible now if one wants to log _everything_ on epoch without overriding *_epoch_end.

ToucheSir on 3 Oct 2020

This changed with new API for logging. Please upgrade to 1.0.2. Feel free to reopen if needed.