In my research project the training_step, test_step and validation_step_step is implemented as follows:
def training_step(self, batch, batch_idx):
x1, x2 = batch["x1"], batch["x2"]
predict = self(x1, x2)
loss = self.loss_fn(predict, self.train_target)
figure, mrr_metric = self.mrr(predict) # custom metric
return {'loss': loss, 'progress_bar': {'test_mrr': mrr_metric}}
But I couldn't find an approach to logging the loss value neither the mrr_metric using a logger like TensorBoard.
It would also be interesting to save the figure generated by the metrics in the logger as well.
Currently, the trainer is configured as:
tb_logger = pl_loggers.TensorBoardLogger(cfg.logs.path)
model = JointEncoder(config=cfg)
trainer = Trainer(
max_epochs=cfg.train.max_epochs,
gpus=1,
logger=tb_logger
)
trainer.fit(model)
However, after the training, I was not able to visualize even the loss variation on the TensorBoard.
Any direction on this?
Thank you in advance.
Try: https://pytorch-lightning.readthedocs.io/en/stable/experiment_reporting.html#log-metrics
def training_step(self, batch, batch_idx):
# ... computation
log = {'test_mrr': mrr_metric, 'train_loss': loss}
return {'loss': loss, 'log': log}
Similarly for val/test.
Thanks @rohitgr7, it helped a lot.
I realized that I can add images to the log using self.logger.experiment.add_image ....
But how could I do this only at the end of each epoch?
Override this in LightningModule?
https://github.com/PyTorchLightning/pytorch-lightning/blob/3f2c1022ab0e7027822a2acd66debc8afa90d5d6/pytorch_lightning/core/hooks.py#L112-L116
Override this in LightningModule?
https://github.com/PyTorchLightning/pytorch-lightning/blob/3f2c1022ab0e7027822a2acd66debc8afa90d5d6/pytorch_lightning/core/hooks.py#L112-L116
I also thought about override that callback. However, I still need data (from the training or validation set) to generate the image. In the context of the def on_epoch_end(self) method, I do not know how to obtain it.
If you want to log at the end of training, use training_epoch_end().
I usually log an image of my outputs during validation_epoch_end() in practice.
Here's a little snippet of what I had been doing (although, maybe this will change with the new Train/EvalResult objects):
def validation_epoch_end(self, outputs):
img_batch = outputs[-1]['step_dict']['viz']['x_hat']
img_batch = img_batch.view(img_batch.shape[0], *[1,32,32])
grid = torchvision.utils.make_grid(img_batch)
self.logger.experiment.add_image('x_hat', grid, self.current_epoch)
outputs is a list of all the returns from validation_step() in this case.
step_dict is what I return at the end of each *_step and contains an entry called viz to remind me what i'm visualizing.
If you want to log at the end of training, use training_epoch_end().
I usually log an image of my outputs during
validation_epoch_end()in practice.Here's a little snippet of what I had been doing (although, maybe this will change with the new
Train/EvalResultobjects):def validation_epoch_end(self, outputs): img_batch = outputs[-1]['step_dict']['viz']['x_hat'] img_batch = img_batch.view(img_batch.shape[0], *[1,32,32]) grid = torchvision.utils.make_grid(img_batch) self.logger.experiment.add_image('x_hat', grid, self.current_epoch)
outputsis a list of all the returns fromvalidation_step()in this case.
step_dictis what I return at the end of each*_stepand contains an entry calledvizto remind me what i'm visualizing.
If I could understand well, you return batch data from each validation_step call, which is maintained in memory through outputs. Is that correct? If so, this can be prohibitive due to the hardware limitations I have at my disposal.
Yep, if you look at this line in the eval loop, this is what it always does. That's a good point though, I have been keeping my validation sets pretty small so it hasn't been an issue. I guess ideally, one would only retain and visualize the very last batch of the validation epoch.
Asking related, but more specific question in #2728.
Asking related, but more specific question in #2728.
Great! I've already subscribed to know the best practice on this.