Pytorch-lightning: wandb logger.experiment does not seem to be the same as run object in wandb API

Created on 14 Sep 2020 · 12Comments · Source: PyTorchLightning/pytorch-lightning

❓ Questions and Help

What is your question?

I use pytorch lightning + wandb logger. I do not know how to extract history of training (training losses, validation losses...) from pytorch lightning or from the logger.

What have you tried?

In the docs, the logger has a property (experiment) returning a wandb run object

https://pytorch-lightning.readthedocs.io/en/latest/loggers.html?highlight=wandb#weights-and-biases

this Run object looks like the run object described here

https://docs.wandb.com/library/reference/wandb_api#run

but it is missing some elements (no member called state) and missing some funcionalities. For example, the member history is not callable, so I cannot obtain information about the training history.

What's your environment?

OS: Linux
Packaging: pip
Version latest versions in pip (wandb 0.10, pytorch-lightning 0.9.1)

question

Source

fjhheras

Most helpful comment

This isn't a pytorch lightning issue, it is just a quirk of the wandb api. They have two different run objects, wandb.wandb_run.Run is returned by wandb.init and handles the logging but there is also wandb.apis.public.Run which is for reading data after the run is complete.

Here is a repo of your problem without pytorch lightning.

import wandb
api = wandb.Api()
run =  wandb.init()
for i in range(4):
    run.log({"i": i})
run_path = run.path
print("type(run): ", type(run))
print("run.history: ", type(run.history))
read_access_run = api.run(run_path)
print("type(read_access_run): ", type(read_access_run))
print("type(read_access_run.history()): ", type(read_access_run.history()))

This outputs:

type(run):  <class 'wandb.wandb_run.Run'>
run.history:  <class 'wandb.history.History'>
type(read_access_run):  <class 'wandb.apis.public.Run'>
type(read_access_run.history()):  <class 'pandas.core.frame.DataFrame'>

All you need to do is take the path from the experiment on the logger and pass is to api.run and that will create the run object you are looking for.

run_path = wandb_logger.experiment.path
read_access_run = api.run(run_path)

Tim-Chard on 22 Sep 2020

👍3

All 12 comments

I don't understand. The logger.experiment is what wandb.init returns, the Run object. How are you trying to access these attributes?

awaelchli on 15 Sep 2020

I have something like this to declare the logger and the trainer (WandbLogger is imported from pytorch_lightning.loggers)

      wandb_logger = WandbLogger(...)
      trainer = pl.Trainer(
          ...
          logger=wandb_logger,
      )

Then, after training, I try to get information from the Run object, which is wandb_logger.experiment, but I fail to obtain what I want (history):

$ wandb_logger.experiment
<wandb.sdk.wandb_run.Run object at 0x7f08576174f0>
$ wandb_logger.experiment.history
<wandb.sdk.wandb_history.History object at 0x7f0857617580>
$ wandb_logger.experiment.history()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'History' object is not callable

Alternatively, I also access the same Run object by importing wandb and accessing wandb.run:

$ import wandb
$ wandb.run
<wandb.sdk.wandb_run.Run object at 0x7f08576174f0>

Please note that some public attributes are missing (created_at, history_keys, state...; compare with https://docs.wandb.com/library/reference/wandb_api#run):

$[m for m in dir(wandb_logger.experiment) if not m.startswith('_')]
['config', 'dir', 'entity', 'finish', 'get_url', 'history', 'id', 'join', 'log', 'log_artifact', 'name', 'notes', 'path', 'project_name', 'restore', 'resumed', 'save', 'start_time', 'starting_step', 'stderr_redirector', 'stdout_redirector', 'step', 'summary', 'tags', 'url', 'use_artifact', 'watch']

fjhheras on 15 Sep 2020

This is happening on multi gpu, yes?

awaelchli on 15 Sep 2020

No, I do not think so, I have a single gpu and I give "gpus=1," option to my trainer

fjhheras on 15 Sep 2020

This is happening on multi gpu, yes?

Hello, I'm experiencing the similar thing and I'm using multi GPU.

run = trainer.logger.experiment
print(f"Ending run: {str(run.id)}")

translated into

Ending run: <bound method DummyExperiment.nop of <pytorch_lightning.loggers.base.DummyExperiment object at 0x7fb784f186d0>>

The full callback code is below

class WandbArtifactCallback(pl.Callback):
    def on_train_end(self, trainer, pl_module):
        run = trainer.logger.experiment
        print(f"Ending run: {str(run.id)}")
        artifact = wandb.Artifact(f"{str(run.id)}_model", type="model")
        for path, val_loss in trainer.checkpoint_callback.best_k_models.items():
            print(f"Adding artifact: {path}")
            artifact.add_file(path)
        run.log_artifact(artifact)

    def on_keyboard_interrupt(self, trainer, pl_module):
        self.on_train_end(trainer, pl_module)

kyoungrok0517 on 16 Sep 2020

For multi-gpu (@kyoungrok0517's case) I have an answer. The logger in multi-gpu only runs on the gpu 0. This is to avoid problems with file io, bottlenecks and multiprocessing in general. If you use the logger objects
self.logger.experiment, we only return the real object in process 0, and on the others a dummy experiment. This is so that your code does not break if you do things like self.logger.experiment.log_images(...) and on process > 0 it simply becomes a no op.
In your case, you want to create an artifact, so I suggest you do this on process 0 only, like this:

class WandbArtifactCallback(pl.Callback):
    def on_train_end(self, trainer, pl_module):
        if trainer.global_rank > 0:  # <------------------ add this
             return 

        # ... your custom logger code

    def on_keyboard_interrupt(self, trainer, pl_module):
        self.on_train_end(trainer, pl_module)

For your case @fjhheras, not sure yet what's happening. I will try to reproduce.

awaelchli on 21 Sep 2020

👍1

Now I figured why the outcome was inconsistent. Thanks!

kyoungrok0517 on 21 Sep 2020

Here is a repo of your problem without pytorch lightning.

import wandb
api = wandb.Api()
run =  wandb.init()
for i in range(4):
    run.log({"i": i})
run_path = run.path
print("type(run): ", type(run))
print("run.history: ", type(run.history))
read_access_run = api.run(run_path)
print("type(read_access_run): ", type(read_access_run))
print("type(read_access_run.history()): ", type(read_access_run.history()))

This outputs:

type(run):  <class 'wandb.wandb_run.Run'>
run.history:  <class 'wandb.history.History'>
type(read_access_run):  <class 'wandb.apis.public.Run'>
type(read_access_run.history()):  <class 'pandas.core.frame.DataFrame'>

All you need to do is take the path from the experiment on the logger and pass is to api.run and that will create the run object you are looking for.

run_path = wandb_logger.experiment.path
read_access_run = api.run(run_path)

Tim-Chard on 22 Sep 2020

👍3

@Tim-Chard Thanks for the explanation. I don't follow completely but would like to confirm if we need to do anything in our wandb wrapper.
@kyoungrok0517 @fjhheras did this suggestion from Tim work for you?

awaelchli on 28 Sep 2020

@awaelchli Essentially there are two Run classes in the wandb api that look that similar (they both have history for example) but they are for different things. The issue was caused by trying to use one in place of the other.
So to confirm, there are no changes required in the wrapper to address this issue.

Tim-Chard on 28 Sep 2020

❤1

@awaelchli Yes, @Tim-Chard explanation makes sense and what he proposes works. Thank you!

fjhheras on 29 Sep 2020

❤1

Unfortunately @Tim-Chard proposal only works when I am online (wandb on). As it is purely a wandb issue now, I opened a question there (https://github.com/wandb/client/issues/1308)

fjhheras on 30 Sep 2020

👍1

Was this page helpful?

0 / 5 - 0 ratings