The paper Unsupervised Cross-lingual Representation Learning for Speech Recognition describes training the wav2vec2 model with multilingual data (Mozilla CommonVoice and other datasets), rather than just English data as in the wav2vec2 models available already in fairseq. I notice that several authors are the same (including @alexeib), and they mention that the model was developed in fairseq. I wonder if they might consider making the pretrained models public?
Thank you for releasing the wav2vec2 models. I've tried finetuning them on other languages, and as expected (from the cross-language monolingual experiments described in the XLSR paper, table 1), the performance is worse on languages other than English, especially lower-resource ones. I think a multilingual pretrained model would be very valuable to a wider community, in addition to the already valuable English models.
this should be coming soon
this should be coming soon
CANT WAIT TO USE THAT MODEL !
Most helpful comment
this should be coming soon