Pytorch-lightning: Error using Hydra

Created on 29 May 2020  路  10Comments  路  Source: PyTorchLightning/pytorch-lightning

Hi all,

Thanks a lot for the awesome library. I'm trying to use Hydra with pytorch-lightning. I'm using the last release of pytorch-lightning.
However, I got the following error after my training step:

...
  File "/home/.local/lib/python3.8/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 241, in on_validation_end
    self._do_check_save(filepath, current, epoch)
  File "/home/.local/lib/python3.8/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 275, in _do_check_save
    self._save_model(filepath)
  File "/home/.local/lib/python3.8/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 142, in _save_model
    self.save_function(filepath)
  File "/home/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/training_io.py", line 260, in save_checkpoint
    checkpoint = self.dump_checkpoint()
  File "/home/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/training_io.py", line 353, in dump_checkpoint
    raise ValueError(
ValueError: ('The acceptable hparams type is dict or argparse.Namespace,', ' not DictConfig')
Exception ignored in: <function tqdm.__del__ at 0x7f1f9e379a60>

It says that Dictconfig are not supported for saving, but I saw in a pull request that this problem has been corrected.
Can you point me to direction on how to correct this?

Code

@hydra.main("config/config.yaml")
def main(cfg=None):
    wrap_tb_logger()
    model = hydra.utils.instantiate(cfg.model, cfg)

    trainer = pl.Trainer(
        gpus=list(cfg.gpus),
        max_epochs=cfg.epochs,
        train_percent_check=0.4
    )

    trainer.fit(model)

What's your environment?

  • OS: Linux
  • Packaging pip
  • Version 0.7.1
bug / fix question

All 10 comments

Hi! thanks for your contribution!, great first issue!

@Borda can you add this fix?

@williamFalcon @Borda I solved this problem by converting hparams to a flat dictionary after I initialized my model.

import omegaconf

def _to_dot_dict(cfg):
    res = {}
    for k, v in cfg.items():
        if isinstance(v, omegaconf.DictConfig):
            res.update(
                {k + "." + subk: subv for subk, subv in _to_dot_dict(v).items()}
            )
        elif isinstance(v, (str, int, float, bool)):
            res[k] = v

    return res

I admit this is only a detour waiting for the feature to be released!

@pvnieo check master?
i pushed a fix last night

@williamFalcon Yes thank you, I just tested with the last dev version and It's working!
However, I got this warning, and I don't know where is it called!

UserWarning: you called `module.module_arguments` without calling self.auto_collect_arguments()

we can just disable it haha. submit a PR?

Sorry, I submit a PR for what?

disabling the warning

Ok.
I just discovered that (I don't know if it's related the last dev version, but I hadn't this problem before) when I kill my program using Ctrl+C, the process isn't killed and it continue training (before it's says that it's trying "to kill the process gracefully" or sth like this).
I found in this stack over flow question that it's maybe related to using multiple threads and the killing isn't handled well.
Is this a known issue?

this is fixed in #2029

Was this page helpful?
0 / 5 - 0 ratings

Related issues

DavidRuhe picture DavidRuhe  路  3Comments

edenlightning picture edenlightning  路  3Comments

maxime-louis picture maxime-louis  路  3Comments

iakremnev picture iakremnev  路  3Comments

Vichoko picture Vichoko  路  3Comments