Hi,
I am having trouble reproducing the speech recognition results. With the default settings, the model stagnates at 25% train accuracy. By employing a different optimizer, increasing the batch size and tuning the lr, I was able to reach 8% WER, but that is far from the claimed 5% without tuning.
Could you please provide additional info about your configuration (the model and number of GPUs, the total batch size), or even better: logs and/or model checkpoints?
Thank you.
@okhonko
Hi,
I'm having similar results on 1 GPU for a different dataset. Could you share with us the parameters you used to improve the results?
Thank you
hi, i was having similar issues but was able to do better with the default settings on one gpu by simulating the larger batch size with --update-freq 16
@alexbie98 I actually used this parameter when training on 1 GPU, and it didn't help. Can you elaborate on "do better"? Did you replicate the paper's WER?
@carlosep93 My parameters were: --optimizer adam --lr 5e-4 --fp16 --memory-efficient-fp16 --warmup-updates 2500 --update-freq 4
I also changed the batching logic to pack as much data on each GPU as possible, resulting in the average batch size 670 for all 8 GPUs. Only after that it started properly training.
right now it's at 96% train acc/91.7% valid acc after training for 5 days (epoch 31). Have not yet matched the reported WER, getting 9.9 on the current checkpoint. The loss/acc plateaus for a bit before dropping quite low.
Wow, that looks nice! What batch size do you have? Also, could you share the accuracy plot?
https://i.imgur.com/dKadcXq.png
The effective batch size is 80k. My training command is the same as the one in the repo with --update-freq 16
Thanks for providing the plot! Are you sure about 80k? I think, the whole librispeech train set has around 200k utterances, which means 3 batches per epoch in your case.
sorry 80k tokens*, using the default command's --max-tokens 5000 with --update-freq 16, the average number of sentences is around 60