Pytorch-lightning: Best Practices: logger.experiment.add_image() at end of epoch when using new simplified pl.Train/EvalResult objects

Created on 27 Jul 2020 · 9Comments · Source: PyTorchLightning/pytorch-lightning

Question:

What are we going to consider best practice for visualizing images, embeddings, etc. to tensorboard when using pl.Train/EvalResult objects?

In light of #2651 and related PRs, what's the right way to do this?

Let's say we have a dataset of images, and we want to visualize a single batch of reconstructions once per epoch. I typically do this in validation_epoch_end(), using logger.experiment.add_image().

Code:

Let's say my code now looks like this:

 def validation_step(self, batch, batch_idx):
        step_dict = self._step(batch)
        result = pl.EvalResult(early_stop_on=step_dict['loss'],
                               checkpoint_on=step_dict['loss'],
                              )

        # logging
        result.log('avg_val_loss', step_dict['loss'],
                   on_epoch=True, reduce_fx=torch.mean)

        return result, step_dict

which works fine and is definitely much cleaner than the original method of returning multiple logging dicts. 😁

I now want to do something at the end of the validation loop so I specify:

def validation_epoch_end(self, outputs):
        img_batch = outputs[-1]['step_dict']['viz']['x_hat']
        img_batch = img_batch.view(img_batch.shape[0], *[1,32,32])
        grid = torchvision.utils.make_grid(img_batch)
        self.logger.experiment.add_image('x_hat', grid, self.current_epoch)

        avg_val_loss = torch.stack([x['avg_val_loss'] for x,y in outputs]).mean()
        return avg_val_loss

outputs in this case is a list of tuples where the first element is the EvalResult for each val step, and the second element contains step_dict which includes all losses and reconstructed x_hats for each val step.

Is there a better way? One potential downside to this is that outputs can eat up a significant chunk of memory if you're not careful.

What's your environment?

OS: [Ubuntu]
Packaging [pip]
Version [master]

question won't fix

Source

romesco

👍4

All 9 comments

I guess one other detail I will add is that it's unclear which of these has control over logging 'avg_val_loss', given that this used to be a role of validation_epoch_end() and its magic keyword 'avg_val_loss' in the returned dict.

Using the new way outlined above, I'm still getting this error:

 RuntimeWarning: The metric you returned None must be a `torch.Tensor` instance, checkpoint not saved HINT: what is the value of loss in validation_epoch_end()?
  warnings.warn(*args, **kwargs)

romesco on 28 Jul 2020

Looks like using
return {'loss': avg_val_loss}
at the end of validation_epoch_end() fixes that warning, but when combined with EvalResult, I don't really understand why it should be necessary to return that at all for checkpointing.

romesco on 28 Jul 2020

It would be great an example possibly in Colab in which it is shown good practices logging the loss and other metrics as well as examples of samples (some images like in VAE, for instance) over train/val/test.

Ceceu on 28 Jul 2020

I have this all set up for a VAE, but I want to make sure I'm doing best practices with the latest updates. Once we come to a consensus on this, I can provide a colab link. 😁 There's also the bolts repo, which we could update!

romesco on 28 Jul 2020

🎉1

I'm trying to log some figures once per epoch in validation_epoch_end() but struggles to find a good practice.
In addition to model outputs, I also need to return some input data in (effectively your) step_dict for visualisation.

I don't want to accumulate a list in the validation loop as I only want one set of images per epoch. Any advice?
(I work with 3D medical images so memory is definitely a concern!)

qiuhuaqi on 29 Jul 2020

👀3

I have this same question... what are the best practices for logging images? my usual wandb.log seems to no longer work, as it is now wandbLogger. I read the Train/EvalResults page but the documentation seems sparse here.

joshclancy on 18 Sep 2020

@joshclancy You should still be able to import wandb and log images separately.
Otherwise you also have access to wandb at self.logger.experiment within your trainer.

borisdayma on 15 Oct 2020

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!