Pytorch-lightning: Logging loss, metric, and figure to TensorBoard (over train, test and validation step)

Created on 26 Jul 2020 · 9Comments · Source: PyTorchLightning/pytorch-lightning

In my research project the training_step, test_step and validation_step_step is implemented as follows:

    def training_step(self, batch, batch_idx):
        x1, x2 = batch["x1"], batch["x2"]
        predict = self(x1, x2)
        loss = self.loss_fn(predict, self.train_target)
        figure, mrr_metric = self.mrr(predict) # custom metric
        return {'loss': loss, 'progress_bar': {'test_mrr': mrr_metric}}

But I couldn't find an approach to logging the loss value neither the mrr_metric using a logger like TensorBoard.
It would also be interesting to save the figure generated by the metrics in the logger as well.

Currently, the trainer is configured as:

    tb_logger = pl_loggers.TensorBoardLogger(cfg.logs.path)
    model = JointEncoder(config=cfg)

    trainer = Trainer(
         max_epochs=cfg.train.max_epochs,
        gpus=1,
        logger=tb_logger
    )
    trainer.fit(model)

However, after the training, I was not able to visualize even the loss variation on the TensorBoard.

Any direction on this?
Thank you in advance.

question

Source

Ceceu

All 9 comments

Try: https://pytorch-lightning.readthedocs.io/en/stable/experiment_reporting.html#log-metrics

def training_step(self, batch, batch_idx):
    # ... computation
    log = {'test_mrr': mrr_metric, 'train_loss': loss}
    return {'loss': loss, 'log': log}

Similarly for val/test.

rohitgr7 on 26 Jul 2020

👍1

Thanks @rohitgr7, it helped a lot.

I realized that I can add images to the log using self.logger.experiment.add_image ....
But how could I do this only at the end of each epoch?

Ceceu on 27 Jul 2020

Override this in LightningModule?
https://github.com/PyTorchLightning/pytorch-lightning/blob/3f2c1022ab0e7027822a2acd66debc8afa90d5d6/pytorch_lightning/core/hooks.py#L112-L116

rohitgr7 on 27 Jul 2020

Override this in LightningModule?
https://github.com/PyTorchLightning/pytorch-lightning/blob/3f2c1022ab0e7027822a2acd66debc8afa90d5d6/pytorch_lightning/core/hooks.py#L112-L116

I also thought about override that callback. However, I still need data (from the training or validation set) to generate the image. In the context of the def on_epoch_end(self) method, I do not know how to obtain it.

Ceceu on 27 Jul 2020

If you want to log at the end of training, use training_epoch_end().

I usually log an image of my outputs during validation_epoch_end() in practice.

Here's a little snippet of what I had been doing (although, maybe this will change with the new Train/EvalResult objects):

   def validation_epoch_end(self, outputs):
        img_batch = outputs[-1]['step_dict']['viz']['x_hat']
        img_batch = img_batch.view(img_batch.shape[0], *[1,32,32])
        grid = torchvision.utils.make_grid(img_batch)
        self.logger.experiment.add_image('x_hat', grid, self.current_epoch)

outputs is a list of all the returns from validation_step() in this case.
step_dict is what I return at the end of each *_step and contains an entry called viz to remind me what i'm visualizing.

romesco on 27 Jul 2020

If you want to log at the end of training, use training_epoch_end().

I usually log an image of my outputs during validation_epoch_end() in practice.

Here's a little snippet of what I had been doing (although, maybe this will change with the new Train/EvalResult objects):
   def validation_epoch_end(self, outputs):
        img_batch = outputs[-1]['step_dict']['viz']['x_hat']
        img_batch = img_batch.view(img_batch.shape[0], *[1,32,32])
        grid = torchvision.utils.make_grid(img_batch)
        self.logger.experiment.add_image('x_hat', grid, self.current_epoch)
outputs is a list of all the returns from validation_step() in this case.
step_dict is what I return at the end of each *_step and contains an entry called viz to remind me what i'm visualizing.

If I could understand well, you return batch data from each validation_step call, which is maintained in memory through outputs. Is that correct? If so, this can be prohibitive due to the hardware limitations I have at my disposal.

Ceceu on 27 Jul 2020

Yep, if you look at this line in the eval loop, this is what it always does. That's a good point though, I have been keeping my validation sets pretty small so it hasn't been an issue. I guess ideally, one would only retain and visualize the very last batch of the validation epoch.

romesco on 27 Jul 2020

Asking related, but more specific question in #2728.

romesco on 27 Jul 2020

👍1