Pytorch-lightning: Validation loss in progress bar printed line by line

Created on 8 Oct 2019 · 14Comments · Source: PyTorchLightning/pytorch-lightning

Common bugs:

checked.

Describe the bug
When adding a "progress_bar" key to the validation_end output, the progress bar doesn't behave as expected and prints one line per iteration, eg:

80%|8| 3014/3750 [00:23<00:01, 516.63it/s, batch_nb=1874, epoch=14, gpu=0, loss=1.070, training_loss=0.792, val_ 
82%|8| 3066/3750 [00:23<00:01, 517.40it/s, batch_nb=1874, epoch=14, gpu=0, loss=1.070, training_loss=0.792, val_ 
83%|8| 3118/3750 [00:23<00:01, 516.65it/s, batch_nb=1874, epoch=14, gpu=0, loss=1.070, training_loss=0.792, val_ 85%|8| 3170/3750 [00:23<00:01, 517.42it/s, batch_nb=1874, epoch=14, gpu=0, loss=1.070, training_loss=0.792, val_ 
86%|8| 3222/3750 [00:23<00:01, 517.59it/s, batch_nb=1874, epoch=14, gpu=0, loss=1.070, training_loss=0.792, val_
87%|8| 3274/3750 [00:23<00:00, 518.00it/s, batch_nb=1874, epoch=14, gpu=0, loss=1.070, training_loss=0.792, val_ 
89%|8| 3326/3750 [00:23<00:00, 518.16it/s, batch_nb=1874, epoch=14, gpu=0, loss=1.070, training_loss=0.792, val_ 
90%|9| 3378/3750 [00:23<00:00, 518.45it/s, batch_nb=1874, epoch=14, gpu=0, loss=1.070, training_loss=0.792, val_
91%|9| 3430/3750 [00:23<00:00, 518.36it/s, batch_nb=1874, epoch=14, gpu=0, loss=1.070, training_loss=0.792, val_ 
93%|9| 3482/3750 [00:23<00:00, 518.02it/s, batch_nb=1874, epoch=14, gpu=0, loss=1.070, training_loss=0.792, val_ 
94%|9| 3534/3750 [00:24<00:00, 517.26it/s, batch_nb=1874, epoch=14, gpu=0, loss=1.070, training_loss=0.792, val_ 
96%|9| 3586/3750 [00:24<00:00, 517.68it/s, batch_nb=1874, epoch=14, gpu=0, loss=1.070, training_loss=0.792, val_ 
97%|9| 3638/3750 [00:24<00:00, 518.08it/s, batch_nb=1874, epoch=14, gpu=0, loss=1.070, training_loss=0.792, val_ 
98%|9| 3690/3750 [00:24<00:00, 518.18it/s, batch_nb=1874, epoch=14, gpu=0, loss=1.070, training_loss=0.792, val_
100%|9| 3742/3750 [00:24<00:00, 518.23it/s, batch_nb=1874, epoch=14, gpu=0, loss=1.070, training_loss=0.792, val_
100%|#| 3750/3750 [00:24<00:00, 518.23it/s, batch_nb=1874, epoch=14, gpu=0, loss=1.070, training_loss=0.792, val_loss=1.16]
save callback...
100%|#| 3750/3750 [00:24<00:00, 152.16it/s, batch_nb=1874, epoch=14, gpu=0, loss=1.070, training_loss=0.792, val_loss=1.16]

To Reproduce
Steps to reproduce the behavior:

Take MNIST script minimal example (https://williamfalcon.github.io/pytorch-lightning/LightningModule/RequiredTrainerInterface/)
with some code to run it

if __name__ == "__main__":
    model = CoolModel()

    # most basic trainer, uses good defaults
    default_save_path = '/tmp/checkpoints/'
    trainer = pl.Trainer(default_save_path=default_save_path,
                         show_progress_bar=True)
    trainer.fit(model)

Change validation_end method to:

    def validation_end(self, outputs):
        avg_loss = torch.stack([x['val_loss'] for x in outputs]).mean()
        tqdm_dict = {'val_loss': avg_loss}

        return {
                'progress_bar': tqdm_dict,
                'log': {'val_loss': avg_loss},
        }

Change training_step to:

    def training_step(self, batch, batch_nb):
        x, y = batch
        y_hat = self.forward(x)
        loss = F.cross_entropy(y_hat, y)
        output = {
            'loss': loss,  # required
            'progress_bar': {'training_loss': loss},  # optional (MUST ALL BE TENSORS)
        }
        return output

Run the script, see error at validation time.

Note that both steps 2 and 3 are necessary to reproduce the issue, each separately would run as expected.

Expected behavior
A progress bar on a single line.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: Linux
Version
pytorch-lightning==0.5.1.3
torch==1.2.0

Additional context
Actually I ran into this issue after trying to add EarlyStopping, which asked for val_loss, which I found out was to be added via the progress_bar metrics... which was quite unexpected for me (I would have had it in "log" or direct key?)

bug / fix enhancement help wanted

Source

annemariet

Most helpful comment

One option is to use from tqdm.auto import tqdm this way it will use Ipython widgets when in the notebook.

wassname on 25 Jan 2020

👍2

All 14 comments

I just checked if that was the version of tqdm by upgrading from tqdm==4.35.0 to tqdm==4.36.1, to no avail.

annemariet on 8 Oct 2019

@annemariet thanks for finding this. Are you using this in jupyter notebook? that might be the issue.

But on a usability note, we'll move the early stopping to use keys not in progress_bar or log. Good point!

williamFalcon on 8 Oct 2019

Hi, thanks for your quick reply. I'm running this from command line.

annemariet on 8 Oct 2019

@annemariet this can happen if you resize your terminal window during training. this is a tqdm bug, not PL bug.

Re the earlystopping, i just sent a fix yesterday where any of the keys NOT in "progress_bar" or "log" will be used for all callbacks. This is on master now.

I can reopen this if you are still having issues

williamFalcon on 9 Oct 2019

This actually still happens in Spyder, but works fine in terminal.

pytorch-lightning==0.5.3.2
spyder==3.3.6
spyder-kernels==0.5.2
tqdm==4.38.0
ipython==7.9.0

ayberkydn on 22 Nov 2019

Sorry, although I searched for it I had not seen it was already discussed here. I think its still an important open issue.

https://github.com/PyTorchLightning/pytorch-lightning/issues/721

ChristofHenkel on 22 Jan 2020

I am using Jupyter notebook and this happens in there. Is there a fix for it in Jupyter notebook? I like to develop there before moving to the command line.

sudarshan85 on 22 Jan 2020

@sudarshan85 it is issue of TQDM, not lightning, we cannot do much about it, try to upgrade...

Borda on 23 Jan 2020

@Borda nevertheless we could think of a solution to disable the val-progress bar individually, or otherwise give flexibility

ChristofHenkel on 23 Jan 2020

I'm curious whether something like fastpgross could be included in Lightning. There is also tqdm_notebook. I wonder whether this can be passed into Lightning for using as progress bar.

sudarshan85 on 23 Jan 2020

👍1

One option is to use from tqdm.auto import tqdm this way it will use Ipython widgets when in the notebook.

wassname on 25 Jan 2020

👍2

is it just like it, import another tqdm class? Would you consider making a PR?

Borda on 25 Jan 2020

Sure, it's PR #752. There will be some edge cases where someone is in a notebook environment but doesn't have their widgets set up. In that case they will get a warning message about what to do and their progress bar wont show.

So could potentially add a training parameter to override this, which may help those people.

FYI: this is how tqdm does the notebook detection

wassname on 26 Jan 2020

I will close this in favour of #765 so pls let's continue the discussion there... :robot:

Borda on 3 Mar 2020

Was this page helpful?

0 / 5 - 0 ratings