Detectron2: Weight & Biases support

Created on 31 Jan 2020 · 5Comments · Source: facebookresearch/detectron2

🚀 Feature

Adding support for wandb library

Motivation

W&B is one of the most advanced and feature-rich experiment tracking tools. It would be great to have support in detectron2.
Great example: https://app.wandb.ai/syllogismos/keras-retinanet

Pitch

I implemented it in my own project as wanbb_writer.py module

import wandb
from detectron2.utils.events import (
    EventWriter,
    get_event_storage,
)


class WAndBWriter(EventWriter):
    """
    Write all scalars to a wandb tool.
    """

    def __init__(self, window_size: int = 20):
        self._window_size = window_size

    def write(self):
        storage = get_event_storage()
        for k, v in storage.latest_with_smoothing_hint(self._window_size).items():
            wandb.log({f"{k}": v}, step=storage.iter)

        if len(storage.vis_data) >= 1:
            for img_name, img, step_num in storage.vis_data:
                self._writer.add_image(img_name, img, step_num)
            storage.clear_images()

    def close(self):
        pass

and add this class as one of the writer here
https://github.com/facebookresearch/detectron2/blob/master/tools/plain_train_net.py#L136

If someone finds this feature useful as I do, I would be happy to send a PR with W&B writer and tests for it.

cc: @cvphelps

enhancement

Source

truskovskiyk

❤2 👍2

Most helpful comment

I suppose wandb will be enabled by just adding import wandb; wandb.init(sync_tensorboard=True) to your code. It doesn't look like we need to do anything for it.

AP curves will be available if you set TEST.EVAL_PERIOD.

ppwwyyxx on 31 Jan 2020

👍5

All 5 comments

I guess by syncing tensorboard in wandb could do the same thing thanks to the TensorboardXWriter. But it doesn't track the AP/AR during training as in your example.

kenhktsui on 31 Jan 2020

I suppose wandb will be enabled by just adding import wandb; wandb.init(sync_tensorboard=True) to your code. It doesn't look like we need to do anything for it.

AP curves will be available if you set TEST.EVAL_PERIOD.

ppwwyyxx on 31 Jan 2020

👍5

I have followed the suggestion by @ppwwyyxx and added import wandb; wandb.init(sync_tensorboard=True) to my Colab training session. However, I get the following error repeatedly:
wandb: ERROR Unable to log event [Errno 95] Operation not supported: '/content/gdrive/.shortcut-targets-by-id/1zWeNeHzrg6jXlkkpPMjpgtrJArsDu-Ri/DL-Panoramic-Images/Data/output/events.out.tfevents.1598531893.f62daae8a623.103.2' -> '/content/gdrive/.shortcut-targets-by-id/1zWeNeHzrg6jXlkkpPMjpgtrJArsDu-Ri/DL-Panoramic-Images/Data/wandb/run-20200827_123503-21i52v19/events.out.tfevents.1598531893.f62daae8a623.103.2'

In wandb it registers the run and I can see hardware related plots but nothing else is logged. Could you help me to figure out what's going wrong?

Csanad98 on 27 Aug 2020

I have followed the suggestion by @ppwwyyxx and added import wandb; wandb.init(sync_tensorboard=True) to my Colab training session. However, I get the following error repeatedly:
wandb: ERROR Unable to log event [Errno 95] Operation not supported: '/content/gdrive/.shortcut-targets-by-id/1zWeNeHzrg6jXlkkpPMjpgtrJArsDu-Ri/DL-Panoramic-Images/Data/output/events.out.tfevents.1598531893.f62daae8a623.103.2' -> '/content/gdrive/.shortcut-targets-by-id/1zWeNeHzrg6jXlkkpPMjpgtrJArsDu-Ri/DL-Panoramic-Images/Data/wandb/run-20200827_123503-21i52v19/events.out.tfevents.1598531893.f62daae8a623.103.2'

In wandb it registers the run and I can see hardware related plots but nothing else is logged. Could you help me to figure out what's going wrong?

There's a similar issue on Windows because it's not possible to create the symlinks due to permissions. Running in admin mode fixes that.

markuzo on 8 Sep 2020

Just to check, is there a simple way to also reflect the relevant configuration bits to wandb? (and possibly have wandb also override these for things like sweeps?)

(Will search and then play with this briefly, and edit this comment if I have an update)

EDIT 1: I see this demo:
https://github.com/wandb/artifacts-examples/tree/509330bce03d9b3c0ff1c45bd4bd54c7e4865e01/detectron2
But it's a bit more complex than just relying on tensorboard hooks, so I may pass on this for now. It also doesn't seem to permit overriding the cfg.

EDIT 2: Just did my own thing for now. Effectively, looks like this:

def cfg_node_to_dict(cfg):
    """
    We needs this because ``yacs`` over-encapsulates this logic:
    https://github.com/rbgirshick/yacs/blob/32d5e4ac/yacs/config.py#L188-L204
    """
    raw_cfg = yaml.safe_load(cfg.dump())
    return raw_cfg

cfg = make_my_default_cfg_node()
...
raw_cfg = cfg_node_to_dict(cfg)
raw_cfg_flat = flatten_config(raw_cfg)
wandb.init(..., config=raw_cfg_flat, sync_tensorboard=True)
raw_cfg = unflatten_config(dict(wandb.config.user_items()))
cfg.merge_from_other_cfg(CfgNode(raw_cfg))

The flatten / unflatten logic here could be the same as from yacs, but I already had stuff written elsewhere:
https://github.com/wandb/client/issues/982#issuecomment-652766322