Fairseq: Wav2vec Decoding with KenLM

Created on 25 Sep 2020 · 8Comments · Source: pytorch/fairseq

When I try evaluating the wav2vec-2.0 model with Ken-LM, I encounter a Segmentation fault error.
By debugging, I found the location that this error occurs.

In examples/speech_recognition/w2l_decoder.py 145-153 line (W2lKenLMDecoder Class)

for i, (word, spellings) in enumerate(self.lexicon.items()):
        word_idx = self.word_dict.get_index(word)
        _, score = self.lm.score(start_state, word_idx)
        for spelling in spellings:
            spelling_idxs = [tgt_dict.index(token) for token in spelling]
            assert (
                tgt_dict.unk() not in spelling_idxs
            ), f"{spelling} {spelling_idxs}"
            self.trie.insert(spelling_idxs, word_idx, score)

I download lexicon from https://dl.fbaipublicfiles.com/fairseq/wav2vec/librispeech_lexicon.lst
and I download kenlm from https://dl.fbaipublicfiles.com/wav2letter/lexicon_free/librispeech/models/lm/lm_librispeech_kenlm_word_4g_200kvocab.bin

I don't know what I did wrong. Please let me know.

Thank You !!

Command

python examples/speech_recognition/infer.py $MANIFEST_PATH --task audio_pretraining \
--nbest 1 --path /data/project/rw/kaki/model/wav2vec/wav2vec2_vox_960h.pt --gen-subset test-other \
--results-path /root/ --w2l-decoder kenlm \
--lm-model /data/project/rw/kaki/model/wav2vec/lm_librispeech_kenlm_word_4g_200kvocab.bin  \
--lm-weight 2 --word-score -1 --sil-weight 0 --criterion ctc --labels ltr --max-tokens 4000000 \
--post-process letter --lexicon /data/project/rw/kaki/model/wav2vec/librispeech_lexicon.lst

bug needs triage

Source

sooftware

All 8 comments

Looks like lexicon is correct (from #2502), but lm model that I finally succeeded to use with that lexicon is just standard 4-gram one from openslr: http://www.openslr.org/resources/11/4-gram.arpa.gz

So you need to download it, convert to trie with kenlm binary as ./build_binary trie 4-gram.arpa.gz 4-gram.bin and launch it with --lm-model 4-gram.bin --lexicon librispeech_lexicon.lst --w2l-decoder kenlm

Hopefully it will be stated clear in readme one day

nosyrev on 1 Oct 2020

👍1

@nosyrev Thank you!! I don't understand what ./build_birnary is. Can you explain more specifically??

sooftware on 1 Oct 2020

@sooftware sure it's one of kenlm binaries. You just need to build https://github.com/kpu/kenlm , it will be in build/bin folder after it

nosyrev on 1 Oct 2020

👍1

@nosyrev Thank you!! I`ll try it.

sooftware on 1 Oct 2020

@nosyrev I encounter Segmentation fault. I use the following command

python examples/speech_recognition/infer.py $MANIFEST_PATH --task audio_pretraining \
--nbest 1 --path wav2vec2_vox_960h.pt --gen-subset test-other --results-path $RESULT_PATH \
--w2l-decoder kenlm --word-score -1 --sil-weight 0 --criterion ctc --labels ltr --max-tokens 4000000 \
--post-process letter --lm-model 4-gram.bin --lexicon librispeech_lexicon.lst

Did I do anything wrong?

sooftware on 1 Oct 2020

@sooftware unfortunately have no idea, haven't experienced segfault myself (fortunately). Maybe it could be memory issue and model just doesn't fit in? Is it failing during loading or during processing? Will it work with single file? If I'd needed to debug this myself I'd went through that stages first, but it's obvious anyway. Sorry, can't help more here

nosyrev on 1 Oct 2020

@nosyrev Thank you. I'll debug T.T

sooftware on 1 Oct 2020

@nosyrev I encounter Segmentation fault. I use the following command

python examples/speech_recognition/infer.py $MANIFEST_PATH --task audio_pretraining \
--nbest 1 --path wav2vec2_vox_960h.pt --gen-subset test-other --results-path $RESULT_PATH \
--w2l-decoder kenlm --word-score -1 --sil-weight 0 --criterion ctc --labels ltr --max-tokens 4000000 \
--post-process letter --lm-model 4-gram.bin --lexicon librispeech_lexicon.lst

Did I do anything wrong?

I think the problem related in kenlm, try watch the video in here, Tilman Kemp show to us the step by step procedural to reproduce the language model that build with kenLM.

look it, start from minutes 19:54.

even the video talk about deepspeech that use kenlm for the language model, I think what you try to build is related if that use kenlm, I mean all the procedure that Tilman Kemp show is own by kenlm not deepspeech.

Hope it help you..cheers