In the following dictionary from the training log output:
{"epoch": 27, "update": 26.267, "loss": "8.206", "nll_loss": "7.049", "ppl": "132.47", "wps": "1195.4", "ups": "1.62", "wpb": "738.1", "bsz": "46.4", "num_updates": "33700", "lr": "0.00017226", "gnorm": "1.833", "clip": "1", "train_wall": "61", "wall": "30542"}
I assume the following from looking at the code and other issues:
bsz = batch size
gnorm = L2 norm of the gradients
clip = gradient clipping threshold
train_wall = time taken for one training step
wall = total time spent training, validating, saving checkpoints (so far)
wps = ?
ups = ?
wpb = ?
wps - Words Per Second
ups - Updates Per Second
wpb - Words Per Batch
Thanks!
Most helpful comment
wps - Words Per Second
ups - Updates Per Second
wpb - Words Per Batch