Wav2letter: Why are Dropout layers dropped while converting from Flashlight to FBGEMM?

Created on 9 Nov 2020 · 3Comments · Source: flashlight/wav2letter

Question

Hi,

I noticed that the Flashlight=>FBGEMM converter for streaming convnets drops the Dropout layers and could not find any reasoning for that in the documentation. The FBGEMM model provided on the Inference Wiki is derived from this architecture that has Dropout layers. Moreover, the inference model doesn't seem to suffer from these missing layers as it outputs good quality transcriptions.

Is it that Dropout probability is quite low (0.1) and adding it back during inference would have made little difference, hence the decision to drop them altogether?

Please let me know since I am thinking about training my own model with a higher dropout rate.

Thanks!

question

Source

abhinavkulkarni

Most helpful comment

For e.g., it is common to weigh the weights of the layers by pkeep during inference (they are dropped by the probability of 1-pkeep during training).

Yes, that is one way to perform dropout. In wav2letter, we scale the output during training phase itself, so that we don't have to this for inference. See https://github.com/facebookresearch/flashlight/blob/master/flashlight/fl/autograd/Functions.cpp#L1139

vineelpratap on 9 Nov 2020

👍2

All 3 comments

Hi, 'dropout' layer is only used during the training phase. And, we remove it during inference.

vineelpratap on 9 Nov 2020

Hey @vineelpratap,

I was rather wondering why the weights of the layers the Dropout is applied to aren't weighted accordingly during inference time?

For e.g., it is common to weigh the weights of the layers by p_keep during inference (they are dropped by the probability of 1-p_keep during training).

Thanks!

abhinavkulkarni on 9 Nov 2020