Pytorch-lightning: TrainResult.log dosen't work as log_dit

Created on 25 Sep 2020  路  4Comments  路  Source: PyTorchLightning/pytorch-lightning

馃悰 Bug

When using TrainResult as the return of LightningModule.training_step(),TrainRsult.log() can not add metrics to Trainer.logged_metrics and this make Checkpoint.format_checkpoint_name() works not well.

To Reproduce

class Resnet18(pl.LightningModule):
    def __init__(self, input_dim=40, numclass=1211, learning_rate=0.1, batch_size=128, num_workers=3,
                 **kwargs):
        super(Resnet18, self).__init__()
        self.save_hyperparameters()
        self.example_input_array = torch.rand((1, 200, input_dim))

        self.net = resnet18(num_classes=numclass)

    def forward(self, x):
        """
        input: size (batch, seq_len, input_features)
        outpu: size (batch, new_seq_len, output_features)
        """
        x = torch.unsqueeze(x, 1)
        x = torch.cat([x, x, x], 1)
        x = self.net(x)
        return x

    def train_dataloader(self):
        transform = transforms.Compose([transforms.ToTensor()])
        trainloader = Dataset(path,
                                         transform=transform)
        trainloader = DataLoader(trainloader,
                                  batch_size=self.hparams.batch_size,
                                  shuffle=True,
                                  num_workers=self.hparams.num_workers,
                                  pin_memory=True)
        return trainloader

    def loss(self, input, target):
        return F.cross_entropy(input, target)

    def training_step(self, batch, batch_idx):
        x, y = batch
        y_hat = self(x)
        loss = F.cross_entropy(y_hat, y)
        result = pl.TrainResult(minimize=loss, checkpoint_on=loss)
        result.log(train_loss,loss) # dose not work well
        print(result.get_epoch_log_metrics()) # will print None and checkpoint file can't get the metric
        result.log_dict({'train_loss': loss}) # work
        print(result.get_epoch_log_metrics()) # will print train_loss and checkpoint file correctly get the metric
        return result

if __name__ == '__main__':
    ckpt = ModelCheckpoint(filepath=osp.join("save/ckpt", "{epoch:03d}-{train_loss:.2f}"),
                           monitor='checkpoint_on',
                           mode='min',
                           save_top_k=-1,
                           verbose=True,
                           save_weights_only=True)
    callbacks = [ckpt]

    # model
    model = Resnet18()

    # training
    trainer = pl.Trainer(gpus=args.gpus,
                         max_epochs=2,
                         profiler=True,
                         checkpoint_callback=ckpt,
                         early_stop_callback=False,
                         # callbacks=callbacks,
                         limit_train_batches=0.01, )
    trainer.fit(model)

Environment

  • How you installed PyTorch (conda):

    • CUDA:



      • GPU:


        - TITAN Xp


      • available: True


      • version: 9.2



    • Packages:



      • numpy: 1.19.1


      • pyTorch_debug: False


      • pyTorch_version: 1.6.0


      • pytorch-lightning: 0.9.0


      • tqdm: 4.48.2



    • System:



      • OS: Linux


      • architecture:


        - 64bit


        -


      • processor: x86_64


      • python: 3.7.9


      • version: #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018



ResultObj bug / fix help wanted

Most helpful comment

Here

result.log(train_loss, loss)

should be

result.log('train_loss', loss)

All 4 comments

Hi! thanks for your contribution!, great first issue!

Here

result.log(train_loss, loss)

should be

result.log('train_loss', loss)

@2531919747 can you confirm that @rohitgr7's suggestion works?

Here

result.log(train_loss, loss)

should be

result.log('train_loss', loss)

Thank you , I found the default value of "on_epoch" in result.log() is False but "on_step" is True, and I used get_epoch_log_metrics() get the 'train_loss' correctly.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

versatran01 picture versatran01  路  3Comments

anthonytec2 picture anthonytec2  路  3Comments

williamFalcon picture williamFalcon  路  3Comments

maxime-louis picture maxime-louis  路  3Comments

chuong98 picture chuong98  路  3Comments