Pytorch-lightning: TrainResult.log dosen't work as log_dit

Created on 25 Sep 2020 · 4Comments · Source: PyTorchLightning/pytorch-lightning

🐛 Bug

When using TrainResult as the return of LightningModule.training_step(),TrainRsult.log() can not add metrics to Trainer.logged_metrics and this make Checkpoint.format_checkpoint_name() works not well.

To Reproduce

class Resnet18(pl.LightningModule):
    def __init__(self, input_dim=40, numclass=1211, learning_rate=0.1, batch_size=128, num_workers=3,
                 **kwargs):
        super(Resnet18, self).__init__()
        self.save_hyperparameters()
        self.example_input_array = torch.rand((1, 200, input_dim))

        self.net = resnet18(num_classes=numclass)

    def forward(self, x):
        """
        input: size (batch, seq_len, input_features)
        outpu: size (batch, new_seq_len, output_features)
        """
        x = torch.unsqueeze(x, 1)
        x = torch.cat([x, x, x], 1)
        x = self.net(x)
        return x

    def train_dataloader(self):
        transform = transforms.Compose([transforms.ToTensor()])
        trainloader = Dataset(path,
                                         transform=transform)
        trainloader = DataLoader(trainloader,
                                  batch_size=self.hparams.batch_size,
                                  shuffle=True,
                                  num_workers=self.hparams.num_workers,
                                  pin_memory=True)
        return trainloader

    def loss(self, input, target):
        return F.cross_entropy(input, target)

    def training_step(self, batch, batch_idx):
        x, y = batch
        y_hat = self(x)
        loss = F.cross_entropy(y_hat, y)
        result = pl.TrainResult(minimize=loss, checkpoint_on=loss)
        result.log(train_loss,loss) # dose not work well
        print(result.get_epoch_log_metrics()) # will print None and checkpoint file can't get the metric
        result.log_dict({'train_loss': loss}) # work
        print(result.get_epoch_log_metrics()) # will print train_loss and checkpoint file correctly get the metric
        return result

if __name__ == '__main__':
    ckpt = ModelCheckpoint(filepath=osp.join("save/ckpt", "{epoch:03d}-{train_loss:.2f}"),
                           monitor='checkpoint_on',
                           mode='min',
                           save_top_k=-1,
                           verbose=True,
                           save_weights_only=True)
    callbacks = [ckpt]

    # model
    model = Resnet18()

    # training
    trainer = pl.Trainer(gpus=args.gpus,
                         max_epochs=2,
                         profiler=True,
                         checkpoint_callback=ckpt,
                         early_stop_callback=False,
                         # callbacks=callbacks,
                         limit_train_batches=0.01, )
    trainer.fit(model)

Environment

How you installed PyTorch (conda):
- CUDA:
  - GPU:
    
    - TITAN Xp
  - available: True
  - version: 9.2
- Packages:
  - numpy: 1.19.1
  - pyTorch_debug: False
  - pyTorch_version: 1.6.0
  - pytorch-lightning: 0.9.0
  - tqdm: 4.48.2
- System:
  - OS: Linux
  - architecture:
    
    - 64bit
    
    -
  - processor: x86_64
  - python: 3.7.9
  - version: #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018

ResultObj bug / fix help wanted

Source

2531919747

Most helpful comment

Here

result.log(train_loss, loss)

should be

result.log('train_loss', loss)

rohitgr7 on 26 Sep 2020

👍2

All 4 comments

Hi! thanks for your contribution!, great first issue!

github-actions[bot] on 25 Sep 2020

Here

result.log(train_loss, loss)

should be

result.log('train_loss', loss)

rohitgr7 on 26 Sep 2020

👍2

@2531919747 can you confirm that @rohitgr7's suggestion works?

awaelchli on 28 Sep 2020

Here

result.log(train_loss, loss)

should be

result.log('train_loss', loss)

Thank you , I found the default value of "on_epoch" in result.log() is False but "on_step" is True, and I used get_epoch_log_metrics() get the 'train_loss' correctly.

2531919747 on 30 Sep 2020

Was this page helpful?

0 / 5 - 0 ratings