Pytorch-lightning: Test metrics not logging to Comet after training

Created on 28 Jan 2020  路  10Comments  路  Source: PyTorchLightning/pytorch-lightning

馃悰 Bug

When testing a model with Trainer.test metrics are not logged to Comet if the model was previously trained using Trainer.fit. While training metrics are logged correctly.

Code sample

    comet_logger = CometLogger()
    trainer = Trainer(logger=comet_logger)
    model = get_model()

    trainer.fit(model) # Metrics are logged to Comet
    trainer.test(model) # No metrics are logged to Comet

Expected behavior

Test metrics should also be logged in to Comet.

Environment

- PyTorch version: 1.3.0
Is debug build: No
CUDA used to build PyTorch: 10.1.243

OS: Ubuntu 18.04.3 LTS
GCC version: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
CMake version: version 3.10.2

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 10.1.168
GPU models and configuration:
GPU 0: GeForce GTX 1080 Ti
GPU 1: GeForce GTX 1080 Ti
GPU 2: GeForce GTX 1080 Ti
GPU 3: GeForce GTX 1080 Ti
GPU 4: GeForce GTX 1080 Ti
GPU 5: GeForce GTX 1080 Ti
GPU 6: GeForce GTX 1080 Ti
GPU 7: GeForce GTX 1080 Ti

Nvidia driver version: 418.67
cuDNN version: /usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudnn.so.7.6.1

Versions of relevant libraries:
[pip3] numpy==1.16.4
[pip3] pytorch-lightning==0.6.0
[pip3] torch==1.3.0
[pip3] torchvision==0.4.1
[conda] Could not collect

Additional context

I believe the issue is caused because at the end of the training routine, logger.finalize("success") is called. This in turn calls experiment.end() inside the logger and the Experiment object doesn't expect to send more information after this.

An alternative is to create another Trainer object, with another logger but this means that the metrics will be logged into a different Comet experiment from the original. This issue can be solved using the ExistingExperiment object form the Comet SDK, but the solution seems a little hacky and the CometLogger currently doesn't support this kind of experiment.

bug / fix

All 10 comments

Did you find a solution?
Mind submitting a PR?
@fdelrio89

I did solve the issue but in a kind of hacky way. It's not that elegant but it works for me, and I haven't had the time to think of a better solution.

I solved it by getting the experiment key and creating another logger and trainer with it.

    comet_logger = CometLogger()
    trainer = Trainer(logger=comet_logger)
    model = get_model()

    trainer.fit(model)

    experiment_key = comet_logger.experiment.get_key()
    comet_logger = CometLogger(experiment_key=experiment_key)
    trainer = Trainer(logger=comet_logger)

    trainer.test(model)

For this to work, I had to modify the CometLogger class to accept the experiment_key and create a CometExistingExperiment from the Comet SDK when this param is present.

class CometLogger(LightningLoggerBase):
     ...

    @property
    def experiment(self):
        ...

        if self.mode == "online":
            if self.experiment_key is None:
                self._experiment = CometExperiment(
                    api_key=self.api_key,
                    workspace=self.workspace,
                    project_name=self.project_name,
                    **self._kwargs
                )
            else:
                self._experiment = CometExistingExperiment(
                    api_key=self.api_key,
                    workspace=self.workspace,
                    project_name=self.project_name,
                    previous_experiment=self.experiment_key,
                    **self._kwargs
                )
        else:
            ...

        return self._experiment

I can happily do the PR if this solution is acceptable for you guys, but I think a better solution can be achieved I haven't had the time to think about it @williamFalcon.

@williamFalcon Any progress on this Issue? I am facing the same problem.

@fdelrio89 Since the logger object is available for the lifetime of the trainer, maybe you can refactor to store the experiment_key directly in the logger object itself, instead of having to re-instantiate the logger.

@xssChauhan good idea, I just submitted a PR (https://github.com/PyTorchLightning/pytorch-lightning/pull/892) considering this. Thanks!

I assume that it was fixed by #892
if you have some other problems feel free to reopen or create a new... :robot:

Actually I'm still facing the problem.

@dvirginz are you using the latest master? may you provide a minimal example?

@dvirginz are you using the latest master? may you provide a minimal example?

You are right, sorry.
After building from source it works.

I should probably open a new issue, but it happens with Weights & Biases logger too. I haven't had the time to delve deep into it yet.

Was this page helpful?
0 / 5 - 0 ratings