Wav2letter: Why are Dropout layers dropped while converting from Flashlight to FBGEMM?

Created on 9 Nov 2020  路  3Comments  路  Source: flashlight/wav2letter

Question

Hi,

I noticed that the Flashlight=>FBGEMM converter for streaming convnets drops the Dropout layers and could not find any reasoning for that in the documentation. The FBGEMM model provided on the Inference Wiki is derived from this architecture that has Dropout layers. Moreover, the inference model doesn't seem to suffer from these missing layers as it outputs good quality transcriptions.

Is it that Dropout probability is quite low (0.1) and adding it back during inference would have made little difference, hence the decision to drop them altogether?

Please let me know since I am thinking about training my own model with a higher dropout rate.

Thanks!

question

Most helpful comment

For e.g., it is common to weigh the weights of the layers by pkeep during inference (they are dropped by the probability of 1-pkeep during training).

Yes, that is one way to perform dropout. In wav2letter, we scale the output during training phase itself, so that we don't have to this for inference. See https://github.com/facebookresearch/flashlight/blob/master/flashlight/fl/autograd/Functions.cpp#L1139

All 3 comments

Hi, 'dropout' layer is only used during the training phase. And, we remove it during inference.

Hey @vineelpratap,

I was rather wondering why the weights of the layers the Dropout is applied to aren't weighted accordingly during inference time?

For e.g., it is common to weigh the weights of the layers by pkeep during inference (they are dropped by the probability of 1-pkeep during training).

Thanks!

For e.g., it is common to weigh the weights of the layers by pkeep during inference (they are dropped by the probability of 1-pkeep during training).

Yes, that is one way to perform dropout. In wav2letter, we scale the output during training phase itself, so that we don't have to this for inference. See https://github.com/facebookresearch/flashlight/blob/master/flashlight/fl/autograd/Functions.cpp#L1139

Was this page helpful?
0 / 5 - 0 ratings

Related issues

pzelasko picture pzelasko  路  6Comments

hajix picture hajix  路  4Comments

Terry1504 picture Terry1504  路  4Comments

smolendawid picture smolendawid  路  3Comments

tarang-jain picture tarang-jain  路  3Comments