Pytorch-lightning: Current batch loss and mean reduced loss

Created on 1 Oct 2020 · 3Comments · Source: PyTorchLightning/pytorch-lightning

Over training_step and validation_step I am logging the losses (train_loss and val_loss) and metrics (train_mrr and val_mrr), both in the logger and in the progress bar:

    def training_step(self, batch, batch_idx):
        x1, x2 = batch["x1"], batch["x2"]
        r1, r2 = self(x1, x2)
        train_loss = self.loss_fn(r1, r2)
        train_mrr = self.mrr(r1, r2)

        result = TrainResult(minimize=train_loss)
        result.log('train_loss', train_loss, prog_bar=True)
        result.log('train_mrr', train_mrr, prog_bar=True)

        return result

    def validation_step(self, batch, batch_idx):
        x1, x2 = batch["x1"], batch["x2"]
        r1, r2 = self(x1, x2)
        val_loss = self.loss_fn(r1, r2)
        val_mrr = self.mrr(r1, r2)

        result = EvalResult(checkpoint_on=val_loss)

        # logging
        result.log('val_loss', val_loss, prog_bar=True)
        result.log('val_mrr', val_mrr, prog_bar=True)

        return result

However, the progress bar also shows a loss with a value different from the losses aforementioned mentioned.

Epoch 1:  69%|██████▊   | 49804/72642 [3:55:49<1:48:08,  3.52it/s, loss=0.532, v_num=1, train_loss=0.255, train_mrr=0.927, val_loss=0.518, val_mrr=0.891]

Then, loss printed over the progress bar is current batch loss and train_loss is actually the mean reduce overpassed train_losses?

question

Source

Ceceu

Most helpful comment

They are different because the progress bar loss is a smoothed value over a window (of length 20). From the docs (bottom):

The loss value shown in the progress bar is smoothed (averaged) over the last values, so it differs from the actual loss returned in train/validation step.

carmocca on 1 Oct 2020

👍3

All 3 comments

loss printed in logger and progress bar is going to be different b/c of row_log_interval

loss and train_loss are the same batch loss since you don't pass True to on_epoch, but not sure why they are different tho.

ydcjeff on 1 Oct 2020

They are different because the progress bar loss is a smoothed value over a window (of length 20). From the docs (bottom):

The loss value shown in the progress bar is smoothed (averaged) over the last values, so it differs from the actual loss returned in train/validation step.

carmocca on 1 Oct 2020

👍3

Closing it as it is resolved, feel free to reopen if needed

ydcjeff on 29 Oct 2020

Was this page helpful?

0 / 5 - 0 ratings