Pytorch-lightning: Where is EarlyStopping searching for metrics?

Created on 8 Jan 2020  路  8Comments  路  Source: PyTorchLightning/pytorch-lightning

Where is EarlyStopping search for metrics?

Code

    def validation_end(self, outputs):
        ...
        metrics = {
        'val_acc': val_acc,
        'val_loss': val_loss
        }
        ...
        output = OrderedDict({
            'val_acc':  torch.tensor(metrics['val_acc']),
            'val_loss': torch.tensor(metrics['val_loss']),
            'progress_bar': metrics,
            'log': metrics
        })
        return output

if I attempt to early stop according to val_acc I get the following error:

RuntimeWarning: Early stopping conditioned on metric 'val_acc' which is not available. Available metrics are: loss,train_loss

The metrics mentioned (loss,train_loss) are from training_step from what I could find.

I guess I'm doing something wrong, could anyone point me in the correct direction?

  • OS: Ubuntu
  • Packaging: pip
  • Version 0.5.3.2

Update #1: the same code works with version 0.5.1. Bug in 0.5.3?

Update #2:
I found that this line in trainer/training_loop.py:

self.callback_metrics = {k: v for d in all_callback_metrics for k, v in d.items()}

From what I see, before this line is executed, self.callback_metrics contains val_acc. After this line values that were put in callback_metrics after validation are gone, therefore EarlyStopping can't find them. Can anyone confirm this is an issue?

question

Most helpful comment

All 8 comments

If I understand correctly it is a known issue. Please look at #490. #492 fixes this in master.

Hi @kuynzereb, thanks! this was indeed the issue.
Kind of an embarrassing question, what is the best way to get these fixes from master?
Install like so?

pip install git+https://github.com/williamFalcon/pytorch-lightning.git@master --upgrade

Well, I don't really know either, but it looks like you are right :)

pip install https://github.com/PyTorchLightning/pytorch-lightning/archive/master.zip -U

I tried installing with both William's and Borda's methods but I still get an error that it can't find the val_loss metric. My validation_step is defined as

def validation_step(self, batch, batch_np):
    x, y = batch
    y_hat, _ = self.forward(x)

    return {'val_loss': loss(y_hat, y)}

and as I understand that should be enough. Any ideas what might be wrong?

You should also define validation_end(), not only validation_step().

That worked, thank you!

I must say that from the documentation it is not clear to me that it is necessary to define validation_end() as well

The outputs here are strictly for the progress bar. If you don't need to display anything, don't return anything.

As for as I know I am not doing anything progress-bar related and I should not have to define validation end. What is it that I have misunderstood?

Was this page helpful?
0 / 5 - 0 ratings