Ignite: Event description TERMINATE_SINGLE_EPOCH not matching its actual behavior

Created on 28 Sep 2020 · 5Comments · Source: pytorch/ignite

🐛 Bug description

From the documentation it seems that the event TERMINATE_SINGLE_EPOCH is only fired, if engine.terminate_epoch is called.
I discovered, that the event is also fired, if I call engine.terminate inside of an epoch as can be seen in the code linked below.

Event description:
https://github.com/pytorch/ignite/blob/51d3e3e54ee042b5ef1b73a489aabe4b75968411/ignite/engine/events.py#L161-L162

Current implementation:
https://github.com/pytorch/ignite/blob/51d3e3e54ee042b5ef1b73a489aabe4b75968411/ignite/engine/engine.py#L817-L821

IMO, the implementation is fine and only the documentation should be updated.

I think the other option to not fire TERMINATE_SINGLE_EPOCH if enginge.terminate is called would make handling different types of termination more complicated.
Because:
Assuming you have to do some post processing after each epoch and it doesn't matter if the epoch completes with engine.terminate, engine.terminate_epoch or by StopIteration from the data loader.
You would have to attach the same processing function three times, for EPOCH_COMPLETED, TERMINATE_SINGLE_EPOCH and TERMINATE.

At the moment one have to attach post processing function to the two events EPOCH_COMPLETED and TERMINATE_SINGLE_EPOCH to catch termination via command.
I didn't expect that EPOCH_COMPLETED is not fired if I call any termination method. Is this documented somewhere? Haven't found it.

Thanks in advance :)

EDIT: Sorry, I missed that the signature of the function that is called on EPOCH_COMPLETED is different than for TERMINATE_SINGLE_EPOCH.
EDIT2: Fixed event name TERMINATE_SINGLE_EPOCH

EDIT3: I think, now I got it:

If engine.terminate is called, TERMINATE_SINGLE_EPOCH and TERMINATE are called but not EPOCH_COMPLETED.
If engine.terminate_epoch is called, TERMINATE_SINGLE_EPOCH and EPOCH_COMPLETED are called but not TERMINATE.
If epoch completes without termination, EPOCH_COMPLETED is called but not TERMINATE_SINGLE_EPOCH and TERMINATE.

There is no common event that is called in all three cases and which could be used if all cases should be treated equally.

Environment

PyTorch Version (e.g., 1.4): 1.5.1
Ignite Version (e.g., 0.3.0): 0.4.2
OS (e.g., Linux): Windows
How you installed Ignite (conda, pip, source): pip
Python version: 3.7.7
Any other relevant information: -

Hacktoberfest docs enhancement good first issue help wanted

Source

alxlampe

👍1

All 5 comments

@alxlampe thanks for the issue ! Yes, I agree that the docs should state clearly the behaviour.

If engine.terminate is called, TERMINATE_SINGLE_EPOCH and TERMINATE are called but not EPOCH_COMPLETED.

From a user point of view, do you think if we should not trigger TERMINATE_SINGLE_EPOCH in this case ?

vfdev-5 on 28 Sep 2020

@vfdev-5 I don't clearly understand the purpose of engine.terminate and engine.terminate_epoch. Is it called if something goes wrong? Or is it called if some threshold/metric is reached? Can you link an example?

My intuition about engine.terminate and engine.terminate_epoch was, that it just gives control to break out of the iteration loop with engine.terminate_epoch and to break out of the epoch loop with engine.terminate.
I would have expected something like this:
| method\event | EPOCH_COMPLETED | TEMINATE_SINGLE_EPOCH | TERMINATE |
| ---------------- | ------------------------ | ----------------------------- | -------------- |
| no termination | x | - | - |
| engine.terminate_epoch | x | x | - |
| engine.terminate | x | - | x |

Assuming that I want to calculate something at the end of each epoch, I only have to add an event handler for EPOCH_COMPLETED independent of the termination that occurred.

The current implementation is:
| method\event | EPOCH_COMPLETED | TEMINATE_SINGLE_EPOCH | TERMINATE |
| ---------------- | ------------------------ | ----------------------------- | -------------- |
| no termination | x | - | - |
| engine.terminate_epoch | x | x | - |
| engine.terminate | - | x | x |

For my use case and with the current implementation I would have to add two event handlers to get my function executed exactly one time, EPOCH_COMPLETED and TERMINATE.

@vfdev-5, your proposal would look like this I think:
| method\event | EPOCH_COMPLETED | TEMINATE_SINGLE_EPOCH | TERMINATE |
| ---------------- | ------------------------ | ----------------------------- | -------------- |
| no termination | x | - | - |
| engine.terminate_epoch | x | x | - |
| engine.terminate | - | - | x |

For my use case and with the this implementation I wouldn't have the chance to catch all cases but executing my function only exactly one time, because I would have to attach my function to all three events, but EPOCH_COMPLETED appears in two cases.
This use case is like the one above and I would have to attach my function to EPOCH_COMPLETED and TERMINATE.

When I have a look at the different logics in the tables, the logic I would have expected (at the tip) seems to be the most reasonable to me. Bus as I said, I don't know the use cases, where the current implementation is advantageous. Would be nice, if you can provide one :)

alxlampe on 28 Sep 2020

👍1

@alxlampe thanks for the tables, it helps to see better and maybe we have to update our docs with such tables !

Basic use-cases of engine.terminate and engine.terminate_epoch are the following:

engine.terminate is executed if a Nan is encountered and we can not proceed the training without repairing manually things (see TerminateOnNan handler). This one is sort of fatal engine stopping.
engine.terminate_epoch is used in RL when epoch = episode is a number of steps to play in the enviroment before it fails (see our RL examples). This one is more friendly way to stop current epoch and go to the next one.

Events like TERMINATE and TERMINATE_SINGLE_EPOCH are new and were introduced for a better support of various actions inside the engine.

Looking at the tables, I think both types of implementations can be argued for engine.terminate...

vfdev-5 on 28 Sep 2020

Hi, I'm interested in picking this up. Would you like only the docs to be updated with If engine.terminate is called, TERMINATE_SINGLE_EPOCH and TERMINATE are called but not EPOCH_COMPLETED., or the tables to also be included?