Adding support for wandb library
W&B is one of the most advanced and feature-rich experiment tracking tools. It would be great to have support in detectron2.
Great example: https://app.wandb.ai/syllogismos/keras-retinanet
I implemented it in my own project as wanbb_writer.py module
import wandb
from detectron2.utils.events import (
EventWriter,
get_event_storage,
)
class WAndBWriter(EventWriter):
"""
Write all scalars to a wandb tool.
"""
def __init__(self, window_size: int = 20):
self._window_size = window_size
def write(self):
storage = get_event_storage()
for k, v in storage.latest_with_smoothing_hint(self._window_size).items():
wandb.log({f"{k}": v}, step=storage.iter)
if len(storage.vis_data) >= 1:
for img_name, img, step_num in storage.vis_data:
self._writer.add_image(img_name, img, step_num)
storage.clear_images()
def close(self):
pass
and add this class as one of the writer here
https://github.com/facebookresearch/detectron2/blob/master/tools/plain_train_net.py#L136
If someone finds this feature useful as I do, I would be happy to send a PR with W&B writer and tests for it.
cc: @cvphelps
I guess by syncing tensorboard in wandb could do the same thing thanks to the TensorboardXWriter. But it doesn't track the AP/AR during training as in your example.
I suppose wandb will be enabled by just adding import wandb; wandb.init(sync_tensorboard=True) to your code. It doesn't look like we need to do anything for it.
AP curves will be available if you set TEST.EVAL_PERIOD.
I have followed the suggestion by @ppwwyyxx and added import wandb; wandb.init(sync_tensorboard=True) to my Colab training session. However, I get the following error repeatedly:
wandb: ERROR Unable to log event [Errno 95] Operation not supported: '/content/gdrive/.shortcut-targets-by-id/1zWeNeHzrg6jXlkkpPMjpgtrJArsDu-Ri/DL-Panoramic-Images/Data/output/events.out.tfevents.1598531893.f62daae8a623.103.2' -> '/content/gdrive/.shortcut-targets-by-id/1zWeNeHzrg6jXlkkpPMjpgtrJArsDu-Ri/DL-Panoramic-Images/Data/wandb/run-20200827_123503-21i52v19/events.out.tfevents.1598531893.f62daae8a623.103.2'
In wandb it registers the run and I can see hardware related plots but nothing else is logged. Could you help me to figure out what's going wrong?
I have followed the suggestion by @ppwwyyxx and added
import wandb; wandb.init(sync_tensorboard=True)to my Colab training session. However, I get the following error repeatedly:
wandb: ERROR Unable to log event [Errno 95] Operation not supported: '/content/gdrive/.shortcut-targets-by-id/1zWeNeHzrg6jXlkkpPMjpgtrJArsDu-Ri/DL-Panoramic-Images/Data/output/events.out.tfevents.1598531893.f62daae8a623.103.2' -> '/content/gdrive/.shortcut-targets-by-id/1zWeNeHzrg6jXlkkpPMjpgtrJArsDu-Ri/DL-Panoramic-Images/Data/wandb/run-20200827_123503-21i52v19/events.out.tfevents.1598531893.f62daae8a623.103.2'In wandb it registers the run and I can see hardware related plots but nothing else is logged. Could you help me to figure out what's going wrong?
There's a similar issue on Windows because it's not possible to create the symlinks due to permissions. Running in admin mode fixes that.
Just to check, is there a simple way to also reflect the relevant configuration bits to wandb? (and possibly have wandb also override these for things like sweeps?)
(Will search and then play with this briefly, and edit this comment if I have an update)
EDIT 1: I see this demo:
https://github.com/wandb/artifacts-examples/tree/509330bce03d9b3c0ff1c45bd4bd54c7e4865e01/detectron2
But it's a bit more complex than just relying on tensorboard hooks, so I may pass on this for now. It also doesn't seem to permit overriding the cfg.
EDIT 2: Just did my own thing for now. Effectively, looks like this:
def cfg_node_to_dict(cfg):
"""
We needs this because ``yacs`` over-encapsulates this logic:
https://github.com/rbgirshick/yacs/blob/32d5e4ac/yacs/config.py#L188-L204
"""
raw_cfg = yaml.safe_load(cfg.dump())
return raw_cfg
cfg = make_my_default_cfg_node()
...
raw_cfg = cfg_node_to_dict(cfg)
raw_cfg_flat = flatten_config(raw_cfg)
wandb.init(..., config=raw_cfg_flat, sync_tensorboard=True)
raw_cfg = unflatten_config(dict(wandb.config.user_items()))
cfg.merge_from_other_cfg(CfgNode(raw_cfg))
The flatten / unflatten logic here could be the same as from yacs, but I already had stuff written elsewhere:
https://github.com/wandb/client/issues/982#issuecomment-652766322
Most helpful comment
I suppose wandb will be enabled by just adding
import wandb; wandb.init(sync_tensorboard=True)to your code. It doesn't look like we need to do anything for it.AP curves will be available if you set
TEST.EVAL_PERIOD.