checked.
Describe the bug
When adding a "progress_bar" key to the validation_end output, the progress bar doesn't behave as expected and prints one line per iteration, eg:
80%|8| 3014/3750 [00:23<00:01, 516.63it/s, batch_nb=1874, epoch=14, gpu=0, loss=1.070, training_loss=0.792, val_
82%|8| 3066/3750 [00:23<00:01, 517.40it/s, batch_nb=1874, epoch=14, gpu=0, loss=1.070, training_loss=0.792, val_
83%|8| 3118/3750 [00:23<00:01, 516.65it/s, batch_nb=1874, epoch=14, gpu=0, loss=1.070, training_loss=0.792, val_ 85%|8| 3170/3750 [00:23<00:01, 517.42it/s, batch_nb=1874, epoch=14, gpu=0, loss=1.070, training_loss=0.792, val_
86%|8| 3222/3750 [00:23<00:01, 517.59it/s, batch_nb=1874, epoch=14, gpu=0, loss=1.070, training_loss=0.792, val_
87%|8| 3274/3750 [00:23<00:00, 518.00it/s, batch_nb=1874, epoch=14, gpu=0, loss=1.070, training_loss=0.792, val_
89%|8| 3326/3750 [00:23<00:00, 518.16it/s, batch_nb=1874, epoch=14, gpu=0, loss=1.070, training_loss=0.792, val_
90%|9| 3378/3750 [00:23<00:00, 518.45it/s, batch_nb=1874, epoch=14, gpu=0, loss=1.070, training_loss=0.792, val_
91%|9| 3430/3750 [00:23<00:00, 518.36it/s, batch_nb=1874, epoch=14, gpu=0, loss=1.070, training_loss=0.792, val_
93%|9| 3482/3750 [00:23<00:00, 518.02it/s, batch_nb=1874, epoch=14, gpu=0, loss=1.070, training_loss=0.792, val_
94%|9| 3534/3750 [00:24<00:00, 517.26it/s, batch_nb=1874, epoch=14, gpu=0, loss=1.070, training_loss=0.792, val_
96%|9| 3586/3750 [00:24<00:00, 517.68it/s, batch_nb=1874, epoch=14, gpu=0, loss=1.070, training_loss=0.792, val_
97%|9| 3638/3750 [00:24<00:00, 518.08it/s, batch_nb=1874, epoch=14, gpu=0, loss=1.070, training_loss=0.792, val_
98%|9| 3690/3750 [00:24<00:00, 518.18it/s, batch_nb=1874, epoch=14, gpu=0, loss=1.070, training_loss=0.792, val_
100%|9| 3742/3750 [00:24<00:00, 518.23it/s, batch_nb=1874, epoch=14, gpu=0, loss=1.070, training_loss=0.792, val_
100%|#| 3750/3750 [00:24<00:00, 518.23it/s, batch_nb=1874, epoch=14, gpu=0, loss=1.070, training_loss=0.792, val_loss=1.16]
save callback...
100%|#| 3750/3750 [00:24<00:00, 152.16it/s, batch_nb=1874, epoch=14, gpu=0, loss=1.070, training_loss=0.792, val_loss=1.16]
To Reproduce
Steps to reproduce the behavior:
if __name__ == "__main__":
model = CoolModel()
# most basic trainer, uses good defaults
default_save_path = '/tmp/checkpoints/'
trainer = pl.Trainer(default_save_path=default_save_path,
show_progress_bar=True)
trainer.fit(model)
def validation_end(self, outputs):
avg_loss = torch.stack([x['val_loss'] for x in outputs]).mean()
tqdm_dict = {'val_loss': avg_loss}
return {
'progress_bar': tqdm_dict,
'log': {'val_loss': avg_loss},
}
def training_step(self, batch, batch_nb):
x, y = batch
y_hat = self.forward(x)
loss = F.cross_entropy(y_hat, y)
output = {
'loss': loss, # required
'progress_bar': {'training_loss': loss}, # optional (MUST ALL BE TENSORS)
}
return output
Note that both steps 2 and 3 are necessary to reproduce the issue, each separately would run as expected.
Expected behavior
A progress bar on a single line.
Screenshots
If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
Additional context
Actually I ran into this issue after trying to add EarlyStopping, which asked for val_loss, which I found out was to be added via the progress_bar metrics... which was quite unexpected for me (I would have had it in "log" or direct key?)
I just checked if that was the version of tqdm by upgrading from tqdm==4.35.0 to tqdm==4.36.1, to no avail.
@annemariet thanks for finding this. Are you using this in jupyter notebook? that might be the issue.
But on a usability note, we'll move the early stopping to use keys not in progress_bar or log. Good point!
Hi, thanks for your quick reply. I'm running this from command line.
@annemariet this can happen if you resize your terminal window during training. this is a tqdm bug, not PL bug.
Re the earlystopping, i just sent a fix yesterday where any of the keys NOT in "progress_bar" or "log" will be used for all callbacks. This is on master now.
I can reopen this if you are still having issues
This actually still happens in Spyder, but works fine in terminal.
pytorch-lightning==0.5.3.2
spyder==3.3.6
spyder-kernels==0.5.2
tqdm==4.38.0
ipython==7.9.0
Sorry, although I searched for it I had not seen it was already discussed here. I think its still an important open issue.
https://github.com/PyTorchLightning/pytorch-lightning/issues/721
I am using Jupyter notebook and this happens in there. Is there a fix for it in Jupyter notebook? I like to develop there before moving to the command line.
@sudarshan85 it is issue of TQDM, not lightning, we cannot do much about it, try to upgrade...
@Borda nevertheless we could think of a solution to disable the val-progress bar individually, or otherwise give flexibility
I'm curious whether something like fastpgross could be included in Lightning. There is also tqdm_notebook. I wonder whether this can be passed into Lightning for using as progress bar.
One option is to use from tqdm.auto import tqdm this way it will use Ipython widgets when in the notebook.
is it just like it, import another tqdm class? Would you consider making a PR?
Sure, it's PR #752. There will be some edge cases where someone is in a notebook environment but doesn't have their widgets set up. In that case they will get a warning message about what to do and their progress bar wont show.
So could potentially add a training parameter to override this, which may help those people.
FYI: this is how tqdm does the notebook detection
I will close this in favour of #765 so pls let's continue the discussion there... :robot:
Most helpful comment
One option is to use
from tqdm.auto import tqdmthis way it will use Ipython widgets when in the notebook.