Hi,
I noticed that the Flashlight=>FBGEMM converter for streaming convnets drops the Dropout layers and could not find any reasoning for that in the documentation. The FBGEMM model provided on the Inference Wiki is derived from this architecture that has Dropout layers. Moreover, the inference model doesn't seem to suffer from these missing layers as it outputs good quality transcriptions.
Is it that Dropout probability is quite low (0.1) and adding it back during inference would have made little difference, hence the decision to drop them altogether?
Please let me know since I am thinking about training my own model with a higher dropout rate.
Thanks!
Hi, 'dropout' layer is only used during the training phase. And, we remove it during inference.
Hey @vineelpratap,
I was rather wondering why the weights of the layers the Dropout is applied to aren't weighted accordingly during inference time?
For e.g., it is common to weigh the weights of the layers by pkeep during inference (they are dropped by the probability of 1-pkeep during training).
Thanks!
For e.g., it is common to weigh the weights of the layers by pkeep during inference (they are dropped by the probability of 1-pkeep during training).
Yes, that is one way to perform dropout. In wav2letter, we scale the output during training phase itself, so that we don't have to this for inference. See https://github.com/facebookresearch/flashlight/blob/master/flashlight/fl/autograd/Functions.cpp#L1139
Most helpful comment
Yes, that is one way to perform dropout. In wav2letter, we scale the output during training phase itself, so that we don't have to this for inference. See https://github.com/facebookresearch/flashlight/blob/master/flashlight/fl/autograd/Functions.cpp#L1139