When using the new structured Result API, it is no longer possible to force PL to report the epoch as the current step to loggers instead of the global step (or elapsed number of training steps).
This results in confusing results when viewing results in a tool that uses step count by default (e.g. tensorboard's scalars view). Intuitively, one would think .log(..., on_epoch=True, on_step=False) would count epochs and not steps.
It is possible to obtain this behaviour by overriding both training_epoch_end _and_ validation_epoch_end and returning an EvalResult with a "step" metric. Unfortunately, this adds back a bunch of boilerplate and many of the nice metric aggregation features PL offers when *_epoch_end is not implemented.
Either a) allow for overriding the step for a given result, or b) default that step to current_epoch when on_epoch=True and on_step=False.
EvalResult with .log('step', self.current_epoch).
Original discussion: https://forums.pytorchlightning.ai/t/is-there-a-way-to-only-log-on-epoch-end-using-the-new-result-apis/74/3
Hi! thanks for your contribution!, great first issue!
any ideas for the case on_epoch=True and on_step=True?
Edit:
looking through the code it would be simple enough to make on_epoch=True to log using epochs but that wouldn't allow it to keep using steps when on_step is also True, perhaps they should be made mutually exclusive? I can't think of a situation where I would log both on the same graph even if they are both using steps for x.
Edit 2:
nvm they are logged to two different keys so there should be no conflict, simple 1 line change
is this behavior really desired? When graphing, don't you want everything to be on the same scale?
if this change happens, the metrics logged by epoch won't be able to be compared visually with things logged per step...
makes it really hard to compare apples to apples.
I don't think we should make this change.
@PyTorchLightning/core-contributors ?
As per the discourse discussion, logs could be on epoch only xor step only so everything would be on the same scale (modulo whatever ephemeral logs show up on the progress bar). However, it's not possible now if one wants to log _everything_ on epoch without overriding *_epoch_end.
This changed with new API for logging. Please upgrade to 1.0.2. Feel free to reopen if needed.
Most helpful comment
is this behavior really desired? When graphing, don't you want everything to be on the same scale?
if this change happens, the metrics logged by epoch won't be able to be compared visually with things logged per step...
makes it really hard to compare apples to apples.
I don't think we should make this change.
@PyTorchLightning/core-contributors ?