The display of epoch's number mismatches between the progress bar and the checkpoint indicator. I wonder this mismatch could confuse users.
I think that to change checkpoint and metrics.csv causes a serious problem.
So progress bar should be changed in my opinion.
What do you think about it?
Epoch 32: 100%|ββββββββββ| 331/331 [00:05<00:00, 88.73batch/s, batch_idx=17, loss=1.148, train_batch_loss=1.02, v_num=0, val_loss=1.05]
AINFO:root:
Epoch 00031: val_loss reached 1.04545 (best 1.04545), saving model to /dummy/version_0/checkpoints/_ckpt_epoch_31.ckpt as top 1
{'loss': 1.022357702255249, 'train_batch_loss': 1.022357702255249, 'val_loss': 1.0454469919204712}
Epoch 33: 5%|β | 18/331 [00:00<00:05, 61.06batch/s, batch_idx=17, loss=1.073, train_batch_loss=1.31, v_num=0, val_loss=1.05]
pip install git+https://github.com/williamFalcon/pytorch-lightning.git@master --upgradeIndeed, the inconsistency of zero-based/one-based epoch is very confusing.
I think we should use zero-based only.
@matthew-z i agree it should be zero based. want to submit a PR?
@onkyo14taro @matthew-z
@williamFalcon I'll try this weekend.
@matthew-z @neggert @williamFalcon @Borda
I found a 'one epoch'-based API in GradientAccumulationScheduler while I was fixing the code.
If we force the API and display to be zero-based, it will break backward compatibility.
However, I think that mixing 0 and 1 will confuse users than changing the API, it should be zero based.
I think it would be better we warn with FutureWarning in version 0.6.x, and then change the API and display to be zero-based in version 0.7.x.
What do you think about it?
I can't quite wrap my head around this. If you make it zero based, then when the progress bar shows
Epoch 1: 50/100%
it will no longer mean that the first epoch is in progress, but actually it is the second epoch 50% completed? Is this really what you want? Why should it be this way?
@awaelchli
I agree with you, but I think zero-based would be better than one-based.
Zero-based implementations are:
GradientAccumulationScheduler (_console display, API_)EarlyStopping (_console display_)One-based implementations are:
ModelCheckpoint (_console display_)I think what of most influensive in these items are "metrics.csv" and "_ckpt_epoch_{0-based epoch number}.ckpt."
That's because we usually use the result file rather than console display.
So I think that zero-based implementations have less confusion than one-based when breaking backward compatibility in order to unify representation of epoch numbers.
agreed. letβs do zero-based
@williamFalcon Is it ok that FutureWarning is thrown when Progress bar, GradientAccumulationScheduler or EarlyStopping are used in version 0.6.x , and then the API and display renew in version 0.7.0?
we are using DeprecatedWarning :]