Ignite: Improve State and Engine docs

Created on 31 Aug 2020  路  12Comments  路  Source: pytorch/ignite

馃殌 Feature

Currently, if we execute twice trainer.run we can fall into the following situation:

# ...
trainer.run(train_loader, max_epochs=5)

# reset model weights etc and rerun
trainer.run(train_loader, max_epochs=2)
> ValueError: Argument max_epochs should be larger than the start epoch defined in the state: 2 vs 5

But if we would like to restart the training from the begining, we have to reset trainer.state as

trainer.run(train_loader, max_epochs=5)

# reset model weights etc and rerun
trainer.state.max_epochs = None
trainer.run(train_loader, max_epochs=2)

Let's improve two things:
1) let's provide State.restart() method that just sets state.max_epochs = None
2) Add a note into Engine.run docstring saying if we would like to restart the training from zero, we have to either call trainer.state.restart() or trainer.state.max_epochs = None.
3) Improve the message "ValueError: Argument max_epochs should be larger than the start epoch defined in the state: 2 vs 5" by adding something like "Please, call state.restart() before calling engine.run() if would like to restart from zero."

For Hacktoberfest contributors, feel free to ask questions for details if any and say that you would like to tackle the issue.
Please, take a look at CONTRIBUTING guide.

Hacktoberfest enhancement good first issue help wanted

Most helpful comment

Just a remark, having a restart() method lets user call it when he wants, even if it's not the good time. Why do not have such feature in the trainer ? No way to misuse it. What do you think ?
@vfdev-5 @fco-dv

All 12 comments

@vfdev-5 I could give some help on this one!

@fco-dv thanks, go ahead !

Just a remark, having a restart() method lets user call it when he wants, even if it's not the good time. Why do not have such feature in the trainer ? No way to misuse it. What do you think ?
@vfdev-5 @fco-dv

@sdesrozis I agree that exposing such method can lead to a potential misuse and bugs. Adding Engine.restart method can have the same problems.
I let you discuss what could be an appropriate way to handle the actual problem.

@sdesrozis I agree, there's maybe a safer way to do that. When you said " even if it's not the good time " , can you figure out a use case ?

For instance

@trainer.on(Events.ITERATION_COMPLETED)
def _():
    trainer.state.restart()

By writing that, I realized that it could be a very nice feature to be able to restart or invalidate epochs if something wrong happened during learning. For example
epoch 10 started
=> save weights and others parameters
=> do something (maybe change lr, or what you want)
run several epochs
=> if something wrong, restart to epoch 10 and rerun with new parametrization

That is out of the scope but if the api is too restrictive (as I thought), no way to do that. So let's code the State.restart() and see.

Sorry for the noise...

@sdesrozis I think calling trainer.state.restart() during the training can be dangerous and lead to unhandled/buggy situations. And it is a bit different from

@trainer.on(Events.ITERATION_COMPLETED)
def _():
    trainer.state.max_epochs = None

I think we shouldn't provide restart-like methods...

If we would like to devalidate some epoch etc. We can simply set the number as required (even it is very dynamic as approach):

@trainer.on(Events.EPOCH_COMPLETED)
def _():
    if some_condition():
        revert_n_epochs()
        trainer.state.epoch -= n

I think we shouldn't provide restart-like methods...

Ok that was my first felling too...

Maybe something like

trainer.run(train_loader, max_epochs=2, restart=True)

@vfdev-5 @fco-dv Thoughts ?

@sdesrozis looks better for sure

I'm not fan of introducing new arg like restart ...

@vfdev-5 I understand.
@fco-dv Do you have a better idea ? Otherwise, let's implement trainer.state.restart() and we will see.

@sdesrozis for the moment I don't

Was this page helpful?
0 / 5 - 0 ratings