Pytorch-lightning: Tqdm progress bar error

Created on 21 Jan 2020 · 7Comments · Source: PyTorchLightning/pytorch-lightning

When running one epoch with train and val dataloader, as soon as validation is started the progressbar will create a new line for each iteration. I have this bug in pycharm as well as kaggle kernels. Below a typical example. 80% runs smoothly, as soon as validation starts a new line for each tqdm iteration is started

Selected optimization level O1: Insert automatic casts around Pytorch functions and Tensor methods.
Defaults for this optimization level are:
enabled : True
opt_level : O1
cast_model_type : None
patch_torch_functions : True
keep_batchnorm_fp32 : None
master_weights : None
loss_scale : dynamic
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled : True
opt_level : O1
cast_model_type : None
patch_torch_functions : True
keep_batchnorm_fp32 : None
master_weights : None
loss_scale : dynamic
Epoch 1: 80%|████████ | 1216/1520 [09:01<02:08, 2.36batch/s, batch_nb=1215, gpu=0, loss=0.649, train_loss=0.616, v_nb=0]
Validating: 0%| | 0/304 [00:00 Epoch 1: 80%|████████ | 1217/1520 [09:01<01:44, 2.90batch/s, batch_nb=1215, gpu=0, loss=0.649, train_loss=0.616, v_nb=0]
Epoch 1: 80%|████████ | 1218/1520 [09:02<01:26, 3.48batch/s, batch_nb=1215, gpu=0, loss=0.649, train_loss=0.616, v_nb=0]
Epoch 1: 80%|████████ | 1219/1520 [09:02<01:14, 4.05batch/s, batch_nb=1215, gpu=0, loss=0.649, train_loss=0.616, v_nb=0]
Epoch 1: 80%|████████ | 1220/1520 [09:02<01:05, 4.58batch/s, batch_nb=1215, gpu=0, loss=0.649, train_loss=0.616, v_nb=0]
Epoch 1: 80%|████████ | 1221/1520 [09:02<00:59, 5.04batch/s, batch_nb=1215, gpu=0, loss=0.649, train_loss=0.616, v_nb=0]
Epoch 1: 80%|████████ | 1222/1520 [09:02<00:54, 5.42batch/s, batch_nb=1215, gpu=0, loss=0.649, train_loss=0.616, v_nb=0]
Epoch 1: 80%|████████ | 1223/1520 [09:02<00:51, 5.72batch/s, batch_nb=1215, gpu=0, loss=0.649, train_loss=0.616, v_nb=0]

Environment

PyTorch version: 1.2.0
Is debug build: No
CUDA used to build PyTorch: 10.0.130

OS: Ubuntu 18.04.3 LTS
GCC version: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
CMake version: Could not collect

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 10.1.243
GPU models and configuration:
GPU 0: GeForce GTX 1080 Ti
GPU 1: GeForce GTX 1080 Ti
GPU 2: GeForce GTX 1080 Ti

Nvidia driver version: 418.87.00
cuDNN version: Could not collect

Versions of relevant libraries:
[pip] numpy==1.16.4
[pip] pytorch-lightning==0.5.3.2
[pip] pytorchcv==0.0.50
[pip] torch==1.2.0
[pip] torchaudio==0.3.0
[pip] torched==0.11
[pip] torchfile==0.1.0
[pip] torchvision==0.4.0
[conda] pytorch-lightning 0.5.3.2 pypi_0 pypi
[conda] pytorchcv 0.0.50 pypi_0 pypi
[conda] torch 1.2.0 pypi_0 pypi
[conda] torchaudio 0.3.0 pypi_0 pypi
[conda] torched 0.11 pypi_0 pypi
[conda] torchfile 0.1.0 pypi_0 pypi
[conda] torchvision 0.4.0 pypi_0 pypi

Additional context

bug / fix duplicate help wanted

Source

ChristofHenkel

👍9

Most helpful comment

I am afraid that we cannot do much TQDM, I have experienced the printing bar on a new line even in other projects and it is typically when (another) process move errstream cursor or print anything else to stdout

Borda on 22 Jan 2020

👍2

All 7 comments

Indeed, this looks quite annoying in such an otherwise amazing CLI user experience!
Happens to me only on Colab though with everything working as expected locally :/

I'm not very familiar with the code base, but it seems that the tqdm progressbar for test/val and train has a bit different set of parameters on creation.

A quick search lands on a similar issue on SO that suggests initializing tqdm with position=0 and leave=True.

I do not exactly understand how that supposed to fix the issue, but as according to the tqdm docs leave is set by default, that makes me think it may have something to do with the initial position value.

bzz on 21 Jan 2020

I think this is a tqdm issue, since I've seen it across a variety of code that uses tqdm. I've mostly seen it when my terminal isn't wide enough to fit the progress bar plus all of the printed quantities.

neggert on 21 Jan 2020

Borda on 22 Jan 2020

👍2

I am not sure this is tqdm, as I don't use it (using my own progress bar). Slightly different circumstances - things work fine and suddenly the progress bar - trn, val or test - creates a new line on each call. My current theory is that this is an Ubuntu terminal problem - but I have yet to prove it.
s

btw - happy to donate my prog bar code - it simply does the job of the moving bar, and takes a leadin and a leadout string to print before and after. Unicode terminals only, only tested on Ubuntu