Fairseq: how do i use the features get from wav2vec in wav2letter++

Created on 27 Sep 2019 · 11Comments · Source: pytorch/fairseq

hi, i train the wav2vec, and get the model parameters, then, how do i use the xx.pt to train wav2letter, for i want see the result of asr

question

Source

leixiaoning

Most helpful comment

Can anybody help a bit here. We are kind of stuck! After extracting the embeddings from the downstream data, how do we now provide them to wav2letter++ ? There is not any documnetation available for that.
@alexeib @myleott

sumegha19 on 15 Oct 2019

👍3

All 11 comments

i save the result for kaldi data and training my asr model rather than wav2letter++ model

leixiaoning on 29 Sep 2019

@leixiaoning can you provide some details about this please? thank you.

optimusfzco on 4 Oct 2019

👍1

@leixiaoning did you figure it out? Because I too am stuck at the same point. Two questions in fact,:

What are the task wavs in PYTHONPATH /path/to/fairseq python scripts/wav2vec_featurize.py --input /path/to/task/waves --output /path/to/output
--model /model/path/checkpoint_best.pt --split train valid test ?
How are train, valid test fed to wav2letter++ ? and what is their output format

sumegha19 on 9 Oct 2019

sumegha19 on 15 Oct 2019

👍3

I tried to train the speech model (deepspeech2) on Librispeech using context representations (C) extracted from Pre-trained wav2vec model provided in Repo but model is not converging after several epochs.
@alexeib any help on this??
did you guys changed the architecture of the model to make it working or you achieved state of the art result by just replacing Spectogram by context representation and using same architecture shown in (deepspeech2 or wave2letter ) paper ??

rajeevbaalwan on 5 Nov 2019

👍1

Hi @rajeevbaalwan !
Can you please share how did you incorporated these embeddings in the DeepSpeech2 model.
I have been struggling with it since a long time.
Thanks in advance!

sumegha19 on 5 Nov 2019

I am needing advice on this topic. Thanks

marcosmacedo on 15 Nov 2019

@rajeevbaalwan @alexeib
can anybody elaborate on this please?

optimusfzco on 23 Nov 2019

@leixiaoning @marcosmacedo check the issues of wav2letter. https://github.com/facebookresearch/wav2letter/issues/436
@maltium has a fork that accepts hdf5 as input https://github.com/maltium/wav2letter/tree/feature/loading-from-hdf5

alpoktem on 10 Jan 2020

sorry i just saw this. we just replaced spectrogram features in wav2letter with the wav2vec ones. you can extract the features as shown in the examples doc and feed it into any asr system youd like and it will work (e.g. we have tried bi-lstms also)