Over training_step and validation_step I am logging the losses (train_loss and val_loss) and metrics (train_mrr and val_mrr), both in the logger and in the progress bar:
def training_step(self, batch, batch_idx):
x1, x2 = batch["x1"], batch["x2"]
r1, r2 = self(x1, x2)
train_loss = self.loss_fn(r1, r2)
train_mrr = self.mrr(r1, r2)
result = TrainResult(minimize=train_loss)
result.log('train_loss', train_loss, prog_bar=True)
result.log('train_mrr', train_mrr, prog_bar=True)
return result
def validation_step(self, batch, batch_idx):
x1, x2 = batch["x1"], batch["x2"]
r1, r2 = self(x1, x2)
val_loss = self.loss_fn(r1, r2)
val_mrr = self.mrr(r1, r2)
result = EvalResult(checkpoint_on=val_loss)
# logging
result.log('val_loss', val_loss, prog_bar=True)
result.log('val_mrr', val_mrr, prog_bar=True)
return result
However, the progress bar also shows a loss with a value different from the losses aforementioned mentioned.
Epoch 1: 69%|██████▊ | 49804/72642 [3:55:49<1:48:08, 3.52it/s, loss=0.532, v_num=1, train_loss=0.255, train_mrr=0.927, val_loss=0.518, val_mrr=0.891]
Then, loss printed over the progress bar is current batch loss and train_loss is actually the mean reduce overpassed train_losses?
loss printed in logger and progress bar is going to be different b/c of row_log_interval
loss and train_loss are the same batch loss since you don't pass True to on_epoch, but not sure why they are different tho.
They are different because the progress bar loss is a smoothed value over a window (of length 20). From the docs (bottom):
The loss value shown in the progress bar is smoothed (averaged) over the last values, so it differs from the actual loss returned in train/validation step.
Closing it as it is resolved, feel free to reopen if needed
Most helpful comment
They are different because the progress bar
lossis a smoothed value over a window (of length 20). From the docs (bottom):