Describe the bug
When I train my network, which has validation steps defined similar to the doc example
def validation_step(self, batch, batch_nb):
x = torch.squeeze(batch['x'], dim=0).float()
y = torch.squeeze(batch['y'], dim=0).long()
output = self.forward(x)
return {'batch_val_loss': self.loss(output, y),
'batch_val_acc': accuracy(output, y)}
def validation_end(self, outputs):
avg_loss = torch.stack([x['batch_val_loss'] for x in outputs]).mean()
avg_acc = torch.stack([x['batch_val_acc'] for x in outputs]).mean()
return {'val_loss': avg_loss, 'val_acc': avg_acc}
with my cusotm EarlyStopCallback
early_stop_callback = EarlyStopping(monitor='val_loss', patience=5)
tt_logger = TestTubeLogger(
save_dir=log_dir,
name="default",
debug=False,
create_git_tag=False
)
trainer = Trainer(logger=tt_logger,
row_log_interval=10,
checkpoint_callback=checkpoint_callback,
early_stop_callback=early_stop_callback,
gradient_clip_val=0.5,
gpus=gpus,
check_val_every_n_epoch=1,
max_nb_epochs=99999,
train_percent_check=train_frac,
log_save_interval=100,
)
the program cannot see my validation metrics:
Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,epoch,batch_nb,v_nb <class 'RuntimeWarning'>
In a previous release running on Windows (now I am on macOS), this behaviour was not happening. But in the previous version, TestTubeLogger was not present
Desktop (please complete the following information):
the early stop metrics come from “progress_bar” entry:
So, add a key “progress_bar” and val_loss in there
I’ll update the docs
Thank you this fixed my issue
def validation_end(self, outputs):
avg_loss = torch.stack([x['batch_val_loss'] for x in outputs]).mean()
avg_acc = torch.stack([x['batch_val_acc'] for x in outputs]).mean()
return {
'val_loss': avg_loss,
'val_acc': avg_acc,
'progress_bar':{'val_loss': avg_loss, 'val_acc': avg_acc }}
Most helpful comment
Thank you this fixed my issue