Pytorch-lightning: Test metrics not logging to Comet after training

Created on 28 Jan 2020 · 10Comments · Source: PyTorchLightning/pytorch-lightning

🐛 Bug

When testing a model with Trainer.test metrics are not logged to Comet if the model was previously trained using Trainer.fit. While training metrics are logged correctly.

Code sample

    comet_logger = CometLogger()
    trainer = Trainer(logger=comet_logger)
    model = get_model()

    trainer.fit(model) # Metrics are logged to Comet
    trainer.test(model) # No metrics are logged to Comet

Expected behavior

Test metrics should also be logged in to Comet.

Environment

- PyTorch version: 1.3.0
Is debug build: No
CUDA used to build PyTorch: 10.1.243

OS: Ubuntu 18.04.3 LTS
GCC version: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
CMake version: version 3.10.2

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 10.1.168
GPU models and configuration:
GPU 0: GeForce GTX 1080 Ti
GPU 1: GeForce GTX 1080 Ti
GPU 2: GeForce GTX 1080 Ti
GPU 3: GeForce GTX 1080 Ti
GPU 4: GeForce GTX 1080 Ti
GPU 5: GeForce GTX 1080 Ti
GPU 6: GeForce GTX 1080 Ti
GPU 7: GeForce GTX 1080 Ti

Nvidia driver version: 418.67
cuDNN version: /usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudnn.so.7.6.1

Versions of relevant libraries:
[pip3] numpy==1.16.4
[pip3] pytorch-lightning==0.6.0
[pip3] torch==1.3.0
[pip3] torchvision==0.4.1
[conda] Could not collect

Additional context

I believe the issue is caused because at the end of the training routine, logger.finalize("success") is called. This in turn calls experiment.end() inside the logger and the Experiment object doesn't expect to send more information after this.

An alternative is to create another Trainer object, with another logger but this means that the metrics will be logged into a different Comet experiment from the original. This issue can be solved using the ExistingExperiment object form the Comet SDK, but the solution seems a little hacky and the CometLogger currently doesn't support this kind of experiment.

bug / fix

Source

fdelrio89

👍1

All 10 comments

Did you find a solution?
Mind submitting a PR?
@fdelrio89

williamFalcon on 11 Feb 2020

I did solve the issue but in a kind of hacky way. It's not that elegant but it works for me, and I haven't had the time to think of a better solution.

I solved it by getting the experiment key and creating another logger and trainer with it.

    comet_logger = CometLogger()
    trainer = Trainer(logger=comet_logger)
    model = get_model()

    trainer.fit(model)

    experiment_key = comet_logger.experiment.get_key()
    comet_logger = CometLogger(experiment_key=experiment_key)
    trainer = Trainer(logger=comet_logger)

    trainer.test(model)

For this to work, I had to modify the CometLogger class to accept the experiment_key and create a CometExistingExperiment from the Comet SDK when this param is present.

class CometLogger(LightningLoggerBase):
     ...

    @property
    def experiment(self):
        ...

        if self.mode == "online":
            if self.experiment_key is None:
                self._experiment = CometExperiment(
                    api_key=self.api_key,
                    workspace=self.workspace,
                    project_name=self.project_name,
                    **self._kwargs
                )
            else:
                self._experiment = CometExistingExperiment(
                    api_key=self.api_key,
                    workspace=self.workspace,
                    project_name=self.project_name,
                    previous_experiment=self.experiment_key,
                    **self._kwargs
                )
        else:
            ...

        return self._experiment

I can happily do the PR if this solution is acceptable for you guys, but I think a better solution can be achieved I haven't had the time to think about it @williamFalcon.

fdelrio89 on 13 Feb 2020

👍1

@williamFalcon Any progress on this Issue? I am facing the same problem.

xssChauhan on 17 Feb 2020

@fdelrio89 Since the logger object is available for the lifetime of the trainer, maybe you can refactor to store the experiment_key directly in the logger object itself, instead of having to re-instantiate the logger.

xssChauhan on 17 Feb 2020

@xssChauhan good idea, I just submitted a PR (https://github.com/PyTorchLightning/pytorch-lightning/pull/892) considering this. Thanks!

fdelrio89 on 18 Feb 2020

👍1

I assume that it was fixed by #892
if you have some other problems feel free to reopen or create a new... :robot: