Pytorch-lightning: validation_epoch_end not logging validation_step EvalResult values

Created on 27 Aug 2020  Β·  12Comments  Β·  Source: PyTorchLightning/pytorch-lightning

πŸ› Bug

When overwriting validation_epoch_end the EvalResult values from validation_step are not logged.

For my experiments I keep track of several metrics that I only log to TensorBoard at the end of each validation epoch. For most metrics I can specify EvalResult().log(on_epoch=True), but one of the metrics I can only calculate at the end of the epoch. If I use validation_epoch_end to calculate this metric, the results from validation_step are not logged.

To Reproduce

    def validation_step(self, batch, batch_idx):
        x, y = batch
        y_hat = self(x)
        loss = F.cross_entropy(y_hat, y)

        result = pl.EvalResult(checkpoint_on=loss)
        result.log('val/loss', loss, on_step=False, on_epoch=True)
        result.log('val/y_hat', y_hat, logger=False, on_step=False, on_epoch=False)
        return result

    def validation_epoch_end(self, outputs):
        y_hat = outputs['val/y_hat']
        example = y_hat.exp().mean()

        result = pl.EvalResult()
        result.log('val/epoch_end_metric', example)
        return result

The code above writes 'val/epoch_end_metric' to TensorBoard, but does nothing with the result from validation_step.

As a result, the checkpoint metric specified in validation_step is also lost. Warnings are printed to screen, but it still prints a statement at the end of fit claiming to save the checkpoint (which is not happening).

...

  | Name | Type   | Params
--------------------------------
0 | l1   | Linear | 7 K   

Epoch 0:  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ      | 5/10 [00:00<00:00, 549.90it/s, loss=2.454, v_num=2]
Validating: 0it [00:00, ?it/s] ---/pytorch_lightning/utilities/distributed.py:37: RuntimeWarning: The metric you returned None must be a `torch.Tensor` instance, checkpoint not saved HINT: what is the value of loss in validation_epoch_end()?
  warnings.warn(*args, **kwargs)
---/pytorch_lightning/utilities/distributed.py:37: RuntimeWarning: Can save best model only with loss available, skipping.
  warnings.warn(*args, **kwargs)

...

Epoch 4: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 10/10 [00:00<00:00, 911.31it/s, loss=1.847, v_num=2]
                              Saving latest checkpoint..
Epoch 4: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 10/10 [00:00<00:00, 761.27it/s, loss=1.847, v_num=2]

Expected behavior

Results from validation_step and validation_epoch_end should both be logged according to what is specified. The .log(..., logger, on_step, on_epoch) method provides enough granularity to specify how each metric should be processed for both validation methods.

I have tried using return [result, outputs] in validation_epoch_end to work around this, but the metrics in the EvalResult are assumed to be reduced to scalars if validation_epoch_end is overwritten and are therefore not handled correctly.

Environment

  • PyTorch-Lightning Version: 0.9.0
  • PyTorch Version: 1.6.0
  • OS: linux
  • How you installed PyTorch: pip
  • Python version: 3.6.9
API / design ResultObj bug / fix help wanted

Most helpful comment

@williamFalcon please take a look!

All 12 comments

Hi! thanks for your contribution!, great first issue!

Any workarounds or exisiting solutions are most welcome.

Click for full example code

```import os

import torch
import torch.nn.functional as F
from torch.utils.data import DataLoader
from torchvision.datasets import MNIST
from torchvision import transforms

import pytorch_lightning as pl

class LitModel(pl.LightningModule):
def __init__(self):
super().__init__()
self.l1 = torch.nn.Linear(28 * 28, 10)

def forward(self, x):
    return torch.relu(self.l1(x.view(x.size(0), -1)))

def training_step(self, batch, batch_idx):
    x, y = batch
    y_hat = self(x)
    loss = F.cross_entropy(y_hat, y)
    result = pl.TrainResult(loss)
    result.log('train/loss', loss, on_step=False, on_epoch=True)
    return result

def validation_step(self, batch, batch_idx):
    x, y = batch
    y_hat = self(x)
    loss = F.cross_entropy(y_hat, y)

    result = pl.EvalResult(checkpoint_on=loss)
    result.log('val/loss', loss, on_step=False, on_epoch=True)
    result.log('val/y_hat', y_hat, logger=False, on_step=False, on_epoch=False)
    return result

def validation_epoch_end(self, outputs):
    y_hat = outputs['val/y_hat']
    example = y_hat.exp().mean()

    result = pl.EvalResult()
    result.log('val/epoch_end_metric', example)
    return result

def configure_optimizers(self):
    return torch.optim.Adam(self.parameters(), lr=0.02)

train_loader = DataLoader(MNIST(os.getcwd(), download=True, transform=transforms.ToTensor()))
trainer = pl.Trainer(max_epochs=5, limit_train_batches=5, limit_val_batches=5)
model = LitModel()

trainer.fit(model, train_loader, train_loader)```

Hi @gerardsn, if you want to do simple things (like .exp().mean() in your example) at the end of epoch, you could try with your own reduction function for reduce_fx. The reduce_fx is applied at the end of the epoch.
Here's the modified example from your comment

import os
import torch
import torch.nn.functional as F
from torch.utils.data import DataLoader
from torchvision.datasets import MNIST
from torchvision import transforms

import pytorch_lightning as pl


class LitModel(pl.LightningModule):
    def __init__(self):
        super().__init__()
        self.l1 = torch.nn.Linear(28 * 28, 10)

    def forward(self, x):
        return torch.relu(self.l1(x.view(x.size(0), -1)))

    def training_step(self, batch, batch_idx):
        x, y = batch
        y_hat = self(x)
        loss = F.cross_entropy(y_hat, y)
        result = pl.TrainResult(loss)
        result.log('train/loss', loss, on_step=False, on_epoch=True)
        return result

    def exp_mean(self, y_hat):
        return torch.exp(torch.mean(y_hat))

    def validation_step(self, batch, batch_idx):
        x, y = batch
        y_hat = self(x)
        loss = F.cross_entropy(y_hat, y)

        result = pl.EvalResult(checkpoint_on=loss)
        result.log('val/loss', loss, on_step=False, on_epoch=True, prog_bar=True)
        result.log('val/epoch_end_metric', y_hat, on_step=False, on_epoch=True, prog_bar=True, reduce_fx=self.exp_mean)
        return result

    # def validation_epoch_end(self, outputs):
    #     y_hat = outputs['val/y_hat']
    #     example = y_hat.exp().mean()

    #     result = pl.EvalResult()
    #     result.log('val/epoch_end_metric', example)
    #     return result

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=0.02)


train_loader = DataLoader(MNIST(os.getcwd(), download=True, transform=transforms.ToTensor()))
trainer = pl.Trainer(max_epochs=5, limit_train_batches=5, limit_val_batches=5)
model = LitModel()

trainer.fit(model, train_loader, train_loader)

@gerardsn If you implement validation_epoch_end, it is always assumed that you want to do your own reduction, so it will pass all results in there before logging. I'm not sure, @williamFalcon is it a bug or is it expected behaviour?

@awaelchli I think this expected behaviour, but it is not very user user-friendly. I am trying to log 5 metrics in the validation_step, and only 1 in the validation_epoch_end. In the current setup I would have to loop over the metrics again at the end of each epoch and reduce them all myself. Since the input to validation_epoch_end is essentially an EvalResult it would be nice to just keep logging metrics on to this and reduce them in the back.

you don’t need to do this...

log step only

log_dict(the four metrics, on_step=True, on_epoch=False)

log epoch only

log(5th metric, on_step=False, on_epoch=True)

fyi... the default in validation step is on_step=False, on_epoch=True

@williamFalcon @awaelchli I think this would be a good solution in many cases and is one of the options I considered. Unfortunately it didn't work well for me. One of the metrics I'm tracking is the Dice score and working with 3D medical data I can only process a very small batch size on each step. With such a small batch the metric value varies significantly on each step making it difficult to give meaningful interpretation.

I have currently patched the issue by adding

if using_eval_result:
    eval_results_step = self.__auto_reduce_result_objs(outputs)

at the start of __run_eval_epoch_end() in the evaluation_loop and replacing the return statement return eval_results with return eval_results + eval_results_step. This isn't ideal since the callback_metrics are sensitive to the order in which the results are returned, but it works for my case.

Digging through this part of the loop it doesn't seem too difficult to pass the result from the validation_step to the validation_epoch_end call and keep logging to the same Result. Replacing the default reduce_fx=torch.mean in the on_validation_epoch_end callback with no reduction should prevent issues when trying to reduce scalar results added at the end of the epoch. However, I don't know how this effects other parts of the loop and if this can easily be extended to validation_step_end, but I think it would be a valuable change that increases user-friendliness and makes usage a bit more intuitive.

happy to think about this after the refactors this week!

I have submitted a PR request with minimal code changes that achieves similar behaviour as described below. However, the PR proposes an over simplified solution that may result in other issues. As I have only addressed the issue in the evaluation loop it causes the behaviour of the training and evaluation loop to diverge.

@williamFalcon I've dug a bit deeper and I think it is possible to add this behaviour without creating breaking changes through the following proposals.

  1. Add option for setting default reduce_fx and tbptt_reduce_fx functions to Result(). This provides some more flexibility for users and allows changing the default option on epoch end from torch.mean to nothing (not sure if β€˜nothing’ should be no_op / Identity or that the reduction method should skip reduction when the reduction function value is set to None).

  2. The first few lines of Result.reduce_on_epoch_end() can be replaced by Result.gather(). I propose to take these lines out and assume the input to reduce_on_epoch_end() is the output of gather(). By separating the 2 methods it is possible to gather all outputs in a single Result object before the call to validation_epoch_end(). (Another option is to check if the input to reduce_on_epoch_end() is a list (current expected input is list of Results) to keep current behavior and apply proposed behavior if the input is a Result object.) This object is then passed into validation_epoch_end() where other results can then be logged onto it and everything can then be reduced at the end. Required changes for this are:

    • Update batch_sizes meta data in Result.gather() such that it is available for reduction.
    • Remove β€œgather lines” from Result.reduce_on_epoch_end() and change expected input to be a Result object. (Or determine behavior depending on input)
    • In the training_loop’s __auto_reduce_results_on_epoch_end add call to gather() and pass results to reduce_on_epoch_end() call. (Optional if the latter's behaviour is input dependent.)
    • The validation_loop needs several changes:

      • In __run_eval_epoch_end:

      • remove all __gather_epoch_end_eval_results() calls and call it once at the start (if using_eval_result) to produce list of gathered results per dataloader.

      • change the default reduce_fx and tbptt_reduce_fx for new log entries to no reduction.

      • Reduce all results after call to overridden epoch_end methods.

      • Simplify __gather_epoch_end_eval_results() to remove redundant parts.

      • Update __auto_reduce_result_objs() to reflect changed behavior of Result.reduce_on_epoch_end().

        (Ideally I think the checkpoint_on and early_stop_on reduction should also move to Result.reduce_on_epoch_end() and change the reduction function with the proper batch weighted mean, but I am not sure how this affects other areas or how to handle this if these values are set on epoch end (since batch weighted mean will fail in this case).)

  3. (Not required for issue at hand.) Is the Result object’s meta.{name}.value used anywhere? Currently Result.gather() and Result.reduce_on_epoch_end() do not update this value and retain only the value present in the first batch of the epoch. This can be very confusing. If this key is not used, I propose getting rid of it. (It is updated in Result.padded_gather() but not sure if it actually used anywhere.)

The majority of proposed changes are for the Result object and none of the suggestions creates breaking changes as far as I can tell. It simplifies the overall code a bit, adds functionality, and shouldn't add overhead since I am still performing the same operations on the data albeit in a sightly different order. However, I have only tested this on a single GPU, so I do not know if it works in distributed setups, and I have not checked if this works for overriding training_epoch_end. Let me know what you think.

@williamFalcon please take a look!

@williamFalcon please see if this can be closed

yes, please use the new API which fixes this!

Was this page helpful?
0 / 5 - 0 ratings