Pytorch-lightning: Does one need to reset Metrics during the end of epoch in Lightning 1.0.3

Created on 27 Oct 2020 · 22Comments · Source: PyTorchLightning/pytorch-lightning

❓ Questions and Help

Before asking:

What is your question?

Given the new metrics in 1.0.0 and later (which I really like!), I have three accuracy metrics for training, validation and test initialized in __init__ function. Do I need to reset them at the end of training and validation epoch given they will be used multiple times?

Code

What have you tried?

What's your environment?

OS: [e.g. iOS, Linux, Win]
Packaging [e.g. pip, conda]
Version [e.g. 0.5.2.1]

Metrics question

Source

junwen-austin

Most helpful comment

@itsikad @SkafteNicki I took itskikad's colab notebook and made the following changes:

def training_step(self, batch, batch_idx):
        output = self.layer(batch)
        loss = self.loss(batch, output)
        self.metric.update(output, torch.ones_like(output))
        return {"loss": loss, "batch_mse": loss}

def training_epoch_end(self, outputs) -> None:
        avg_mse = self.metric.compute()
        self.log('mse', avg_mse, on_step=False, on_epoch=True)

        print(f'Sum squared error: {avg_mse}, Total samples: {self.metric.total}')

and now it works. Here is the link: https://colab.research.google.com/drive/1-NJKZ1hiXVCCirN7xsLmswf-zEyxVQIU?usp=sharing

essentially what I did is to update the metric in the train_step and then at the end of an epoch call compute explicitly and get the value of the metric and then pass on the value to the log. This might be a temp solution.

junwen-austin on 22 Nov 2020

👍2

All 22 comments

Depends on how you are using the metrics. In general if the .compute()method is called the internal state is reset. This means that if you call .compute() in the end off the epoch you should be fine. If you are using metrics in combination with self.log, then setting on_epoch=True will also internally call . compute() at the end of the epoch.

SkafteNicki on 27 Oct 2020

👍2

@SkafteNicki Thanks for the information. This has been very useful.

junwen-austin on 29 Oct 2020

I'd like to get the validation accuracy over the entire validation data and I have seen some strange results on it with DDP. I want to make sure I am doing the right thing with metrics with DDP. Below is my psedu-code:

def __init__(self):
    self.val_acc = Accuracy(compute_on_step=False)

def validation_step(self, batch, batch_idx):
      input, y = batch[0], batch[1]
      logits = self(input)
      _, pred = torch.max(logits, dim=1)
     self.val_acc.update(pred, y)
     self.log('val_acc', self.val_acc, on_step=False, on_epoch=True)

Could someone tell if this implementation is to give me validation accuracy for the entire validation data? Thanks.

junwen-austin on 30 Oct 2020

Yes that is the correct way.
Could you explain the strange results you are seeing in ddp mode?

SkafteNicki on 31 Oct 2020

I am facing similar issues with ddp. When computing F1 using Fbeta, if I use ddp, the results are not the same as running them in dp (in a single machine). To compare I use the same saved checkpoint. In my case the metric is computed in test_step_end:
def test_step_end(self, test_step_outputs):

def test_step_end(self, test_step_outputs):
    self.fbeta_test(test_step_outputs['y_hat'], test_step_outputs['y'])

without logging. And then in test_epoch_end:

def test_epoch_end(self, outputs: list):
    fbeta_test = self.fbeta_test.compute()

So I am not sure if in ddp is just averaging the metric across ranks instead of computing the metric over the whole dataset. It is worth saying that when computing the metric on test_epoch_end, the value fbeta_test is the same across ranks. With dp the result is .5670 and with ddp it returns .6154. Running it on a single gpu with no dp returns the same value as dp (.5670), which I assume is the correct one.

LittlePea13 on 31 Oct 2020

@SkafteNicki I am trying a new model with PyTorch Lightning and also with the new metrics in 1.0.3. The strange metric results in DDP may or may not be related to the new metrics. That's why I'd like to make sure I am doing the right thing with the new metrics. This helps me debug any issue that might be related to the new model. Thanks.

junwen-austin on 31 Oct 2020

Thanks @LittlePea13 and @junwen-austin for both getting back to me.
It seems to me that there may be a problem with how metrics are aggregated in ddp mode. I will try to identify the issue and get back to you.

SkafteNicki on 31 Oct 2020

No problem @SkafteNicki. Let me know if I can help in any way, I tried to debug a bit but didn't get very far, I am quite new to ddp. And thanks for your work :)

LittlePea13 on 31 Oct 2020

@LittlePea13 could you try setting the dist_sync_on_step to Trueand see if it solved the problem?
If so then it have something to do with how buffer metric states.

SkafteNicki on 31 Oct 2020

@SkafteNicki I just tried and the result was still the same (wrong) by declaring the metric as:
self.fbeta_test = Fbeta(num_classes=1, dist_sync_on_step = True)

LittlePea13 on 1 Nov 2020

Depends on how you are using the metrics. In general if the .compute()method is called the internal state is reset. This means that if you call .compute() in the end off the epoch you should be fine. If you are using metrics in combination with self.log, then setting on_epoch=True will also internally call . compute() at the end of the epoch.

Overwriting on_epoch_end allows computing metrics using all data, but there is no way to log those results (or even use it for model selection. It's because log_train_epoch_end_metrics is called right after the loop through batches and before on_epoch_end.
https://github.com/PyTorchLightning/pytorch-lightning/blob/b50dd12332bf83209d9535c8516486edc1a6b252/pytorch_lightning/trainer/training_loop.py#L608-L613

hoanghng on 1 Nov 2020

@hoanghng instead of using on_epoch_end could you use training_epoch_end? That should work with logging.

SkafteNicki on 1 Nov 2020

👍1

@SkafteNicki Thanks a lot. It works like a charm.

hoanghng on 2 Nov 2020

@SkafteNicki @hoanghng I feel like I missed something. Are you referring to metrics in ddp? I am using the *_epoch_end methods but still see different results with ddp.

LittlePea13 on 2 Nov 2020

@LittlePea13 this was just a question about which model methods that actually support logging

SkafteNicki on 2 Nov 2020

👍1

Metrics are not reset automatically on epoch end in DDP for me.

In LitModule __init__() I initialize MSE metric:
self.train_mse = pl.metrics.MeanSquaredError()

Then in training step:

mse = self.train_mse.update(pred, target)
self.log('train_mse', self.train_mse, on_step=False, on_epoch=True)

In addition I print the mse of each batch manually.
By comparing both methods it seems that the logged MSE is an average of all epochs so far.

PL v1.0.7
pytorch v1.6.0

itsikad on 20 Nov 2020

@itsikad I can confirm that it is not being reset correctly.
Could you open up a new issue, where you reproduce using boringmodel?

SkafteNicki on 22 Nov 2020

@SkafteNicki is the reset not correct in DDP mode for all class metrics or just for this MSE? If this is for all, could one manually reset the metrics after compute() method at the end of the epoch to fix it for the time being? Thanks.

junwen-austin on 22 Nov 2020

@SkafteNicki Done #4806

itsikad on 22 Nov 2020

@junwen-austin I have really not investigated this enough yet, so I don't know how deep the rabbit hole goes. I suspect it is the same for all other metrics. Until solved, just call self.metric.reset() in training_epoch_end(). Let's keep further discussion in the new issue.

SkafteNicki on 22 Nov 2020

@itsikad @SkafteNicki I took itskikad's colab notebook and made the following changes:

def training_step(self, batch, batch_idx):
        output = self.layer(batch)
        loss = self.loss(batch, output)
        self.metric.update(output, torch.ones_like(output))
        return {"loss": loss, "batch_mse": loss}

def training_epoch_end(self, outputs) -> None:
        avg_mse = self.metric.compute()
        self.log('mse', avg_mse, on_step=False, on_epoch=True)

        print(f'Sum squared error: {avg_mse}, Total samples: {self.metric.total}')

and now it works. Here is the link: https://colab.research.google.com/drive/1-NJKZ1hiXVCCirN7xsLmswf-zEyxVQIU?usp=sharing

junwen-austin on 22 Nov 2020

👍2

@junwen-austin
It works since you added an explicit call to .compute() (as I mention #4806). However, according to the documentation, it is unnecessary with self.log(...,on_epoch=True)

Edit: missed the last part of your reply, indeed a possible temp solution.

itsikad on 22 Nov 2020

Was this page helpful?

0 / 5 - 0 ratings