Fairseq: Speech recognition reproducibility

Created on 17 Sep 2019 · 9Comments · Source: pytorch/fairseq

Hi,

I am having trouble reproducing the speech recognition results. With the default settings, the model stagnates at 25% train accuracy. By employing a different optimizer, increasing the batch size and tuning the lr, I was able to reach 8% WER, but that is far from the claimed 5% without tuning.

Could you please provide additional info about your configuration (the model and number of GPUs, the total batch size), or even better: logs and/or model checkpoints?

Thank you.

question

Source

Bobrosoft98

All 9 comments

@okhonko

huihuifan on 17 Sep 2019

Hi,

I'm having similar results on 1 GPU for a different dataset. Could you share with us the parameters you used to improve the results?

Thank you

carlosep93 on 19 Sep 2019

hi, i was having similar issues but was able to do better with the default settings on one gpu by simulating the larger batch size with --update-freq 16

alexbie98 on 23 Sep 2019

👍1

@alexbie98 I actually used this parameter when training on 1 GPU, and it didn't help. Can you elaborate on "do better"? Did you replicate the paper's WER?

@carlosep93 My parameters were: --optimizer adam --lr 5e-4 --fp16 --memory-efficient-fp16 --warmup-updates 2500 --update-freq 4

I also changed the batching logic to pack as much data on each GPU as possible, resulting in the average batch size 670 for all 8 GPUs. Only after that it started properly training.

Bobrosoft98 on 24 Sep 2019

👍1

right now it's at 96% train acc/91.7% valid acc after training for 5 days (epoch 31). Have not yet matched the reported WER, getting 9.9 on the current checkpoint. The loss/acc plateaus for a bit before dropping quite low.

https://i.imgur.com/XBL1TZo.png

alexbie98 on 24 Sep 2019

Wow, that looks nice! What batch size do you have? Also, could you share the accuracy plot?

Bobrosoft98 on 24 Sep 2019

https://i.imgur.com/dKadcXq.png

The effective batch size is 80k. My training command is the same as the one in the repo with --update-freq 16

alexbie98 on 24 Sep 2019

Thanks for providing the plot! Are you sure about 80k? I think, the whole librispeech train set has around 200k utterances, which means 3 batches per epoch in your case.

Bobrosoft98 on 25 Sep 2019

sorry 80k tokens*, using the default command's --max-tokens 5000 with --update-freq 16, the average number of sentences is around 60

alexbie98 on 25 Sep 2019

Was this page helpful?

0 / 5 - 0 ratings