Hi there I have been going to the doc and try to find a way to apply transfer learning, my question is there is any option to freeze some parts of the AM models ?, I saw that there is "fork" mode in for training (which I imagine is used for full finetuning) but I can find any flag to indicate the trained model binary
Hi,
We don't support it off-the-shelf but it is possible to do fine-tuning in wav2letter++ by making some code changes. Just add the following function
// nLayersForFinetuning - number of layers (with parameters) starting from the last layer that we need to fine tune
void setTrainForFinetuning(std::shared_ptr<fl::Module> ntwrk, int nLayersForFinetuning) {
if (nLayersForFinetuning < 0) {
ntwrk->train();
return;
}
auto seq = std::dynamic_pointer_cast<fl::Sequential>(ntwrk);
if (!seq) {
throw std::runtime_error("something went wrong.");
}
int processedLastLayers = 0;
for (int i = seq->modules().size() - 1; i >= 0; --i) {
auto module = seq->module(i);
if (processedLastLayers < nLayersForFinetuning && module->params().size() > 0) {
processedLastLayers++;
module->train();
} else {
module->eval();
}
}
}
and replace ntwrk->train() in Train.cpp file with setTrainForFinetuning(ntwrk, nLayersForFinetuning) and the run the training in fork mode.
Hope it helps !
great! thanks for the help , I imagine I will also need to recompile the code right?, in the other hand is posible to perform finetuning using the fork mode (I mean using a pre-trained model and changing the output dim to the token size in my problem)?
I imagine I will also need to recompile the code right?
Yes, make sure you are using the latest code, make the code changes as mentioned above and recompile.
changing the output dim to the token size in my problem
Yes, it is possible! But it would need some more code changes for your specific use case. If you can mention the architecture, model being fine-tuned and details on the tokens that you want to change, I can give some code pointers.
that would be very nice thanks, I'm using the original conv_glu because of the relatively small number of parameters for my AM and i'm using wordpiece tokenizer with 9996 different tokens, so I was planning to use that architecture changing the final linear layer output dim, but i'm not sure if fork mode will work to use the pretrained weights as start point
For running the fork model,

Note that this is rough way to do this. You might have to adapt depend on your specific use case.
FWIW, I wouldn't recommend to use the conv_glu model for token size 9996 because of two reasons
thanks for the recomendations ;)
so you recommend me to use word in char level if i want to use GLU with ASG right
yes.
Hi, We are trying to do transfer learning from librivox SOTA recipe using TDS Seq2Seq (as mentioned here: https://github.com/facebookresearch/wav2letter/tree/master/recipes/models/sota/2019), with 4 warmup epochs and stepsize 60 epochs on our private dataset with seq2seq criterion. (creating the lexicon and tokens and wordpiece in the usual way given by the recipe).
However, even after 7-8 epochs, the train WER still stays high (say, 100 to 105) and doesn't change. What could be causing the issue that the WER doesn't come down even a bit?
Also, are there some recommended config settings for better transfer learning?
Most helpful comment
Hi,
We don't support it off-the-shelf but it is possible to do fine-tuning in wav2letter++ by making some code changes. Just add the following function
and replace
ntwrk->train()inTrain.cppfile withsetTrainForFinetuning(ntwrk, nLayersForFinetuning)and the run the training in fork mode.Hope it helps !