That's a good question, this is currently under exploration. Training with noisy mined data is somewhat complex specially for low resource languages but we are exploring this direction.
Unfortunately I can't give any timelines here as there are many unknowns but if we find this direction to be fruitful, we will definitely open-source the code and model like other projects.
Most helpful comment
That's a good question, this is currently under exploration. Training with noisy mined data is somewhat complex specially for low resource languages but we are exploring this direction.
Unfortunately I can't give any timelines here as there are many unknowns but if we find this direction to be fruitful, we will definitely open-source the code and model like other projects.