Fairseq: issue while decoding wav2vec model

Created on 24 Aug 2020 · 10Comments · Source: pytorch/fairseq

I'm trying to do 'Evaluating a CTC model' step :-
Running below command --

PYTHONPATH=/path/fairseq/ python3 examples/speech_recognition/infer.py /path/audio_file/wav2vec/ --task audio_pretraining \
--nbest 1 --path /path/model_exportdir1/checkpoint_best.pt --gen-subset valid --results-path /path/audio_file/wav2vec/tmp/am/ --w2l-decoder kenlm \
--lm-model /path/audio_file/wav2vec/lm_librispeech_word_transformer.pt --lexicon=/path/audio_file/wav2vec/librispeech_lexicon.lst --lm-weight 2 --word-score -1 --sil-weight 0 --criterion ctc --labels ltr --max-tokens 4000000 \
--post-process letter

Getting below errors --

INFO:fairseq.data.audio.raw_audio_dataset:loaded 2674, skipped 0 samples
INFO:__main__:| /path/audio_file/wav2vec/ train 2674 examples
INFO:__main__:| decoding with criterion ctc
INFO:__main__:| loading model(s) from /path/model_exportdir1/checkpoint_best.pt
Loading the LM will be faster if you build a binary file.
Reading /path/audio_file/wav2vec/lm_librispeech_word_transformer.pt
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100

Traceback (most recent call last):
File "examples/speech_recognition/infer.py", line 428, in
cli_main()
File "examples/speech_recognition/infer.py", line 424, in cli_main
main(args)
File "examples/speech_recognition/infer.py", line 300, in main
generator = build_generator(args)
File "examples/speech_recognition/infer.py", line 292, in build_generator
return W2lKenLMDecoder(args, task.target_dictionary)
File "/path/fairseq/examples/speech_recognition/w2l_decoder.py", line 141, in __init__
self.lm = KenLM(args.kenlm_model, self.word_dict)
RuntimeError

Can't get what I'm missing here

bug needs triage

Source

MrityunjoyS

Most helpful comment

No worries!
So there is already an except clause around the import, the problem for me was that the except clause contains, namely the statement on line 44:

LMState = object

The result of that is that the LMState import from wav2letter.decoder is not used, and so the start() function in w2l_decoder.py just returns a plain Object (LMState), causing the following line (309) to fail as Object has no attributes at all.

outstate = state.child(token_index)

maxdh on 26 Oct 2020

👍2

All 10 comments

if you want to use a transformer lm, then you need to set the decoder type to "fairseqlm" not kenlm

alexeib on 24 Aug 2020

Ok, so as you suggested, I tried both ways :-

Tried using "fairseqlm" as decoder -->

PYTHONPATH=/path/fairseq/ python3 examples/speech_recognition/infer.py /path/audio_file/wav2vec/ --task audio_pretraining \
--nbest 1 --path /path/model_exportdir1/checkpoint_best.pt --gen-subset valid --results-path /path/audio_file/wav2vec/tmp/am/ --w2l-decoder fairseqlm \
--lm-model /path/audio_file/wav2vec/lm_librispeech_word_transformer.pt --lexicon=/path/audio_file/wav2vec/librispeech_lexicon.lst --lm-weight 2 --word-score -1 --sil-weight 0 --criterion ctc --labels ltr --max-tokens 4000000 \
--post-process letter

Getting error -->

File "/path/fairseq/fairseq/file_io.py", line 51, in open
newline=newline,
FileNotFoundError: [Errno 2] No such file or directory: '/path/audio_file/wav2vec/dict.txt'

What's this 'dict.txt' file the model is looking for ?

While using "kenlm" model :-

PYTHONPATH=/path/fairseq/ python3 examples/speech_recognition/infer.py /path/audio_file/wav2vec/ --task audio_pretraining \
--nbest 1 --path /path/model_exportdir1/checkpoint_best.pt --gen-subset valid --results-path /path/audio_file/wav2vec/tmp/am/ --w2l-decoder kenlm \
 --lexicon=/path/audio_file/wav2vec/librispeech_lexicon.lst --lm-weight 2 --word-score -1 --sil-weight 0 --criterion ctc --labels ltr --max-tokens 4000000 \
--post-process letter

Getting error -->

line 141, in __init__
self.lm = KenLM(args.kenlm_model, self.word_dict)
TypeError: __init__(): incompatible constructor arguments. The following argument types are supported:
1. wav2letter._decoder.KenLM(path: str, usr_token_dict: wav2letter._common.Dictionary)

Can you sugggest something on the above ? Thank you.

MrityunjoyS on 25 Aug 2020

for 1. you need to download dict.txt for the transformer lm (it is on the wav2letter github page). dont forget to uppercase the dictionary. we use uppercase targets while they use lowercase.

for 2. you need to specify a kenlm model. you can download the official librispeech model and use that. here is the link: http://www.openslr.org/11/
you will need to binarize it (read kenlm docs)

if you dont use a language model then use --w2l-decoder viterbi

alexeib on 25 Aug 2020

Thanks @alexeib , I chose to proceed with "fairseqlm" decoder. For dictionary I'm using lm_librispeech_word_transformer.dict, I downloaded it from "wav2letter github page" and renamed the file as dict.txt .

So below is error is coming now :-

File "/path/fairseq/fairseq/models/transformer_lm.py", line 134, in build_model
if args.decoder_layers_to_keep:
AttributeError: 'Namespace' object has no attribute 'decoder_layers_to_keep'

> if you dont use a language model then use --w2l-decoder viterbi

I tried doing that, but the output that I'm getting is absurd, and I can't understand it.

File name created -->
_hypo.units-checkpoint_best.pt-valid.txt
hypo.word-checkpoint_best.pt-valid.txt
ref.units-checkpoint_best.pt-valid.txt
ref.word-checkpoint_best.pt-valid.txt_

Eg. of how this file looks :--

cat hypo.units-checkpoint_best.pt-valid.txt --> F | F N Y C N N N H N | N R N G
cat hypo.word-checkpoint_best.pt-valid.txt --> FNYCNNNHN NRNG
cat ref.units-checkpoint_best.pt-valid.txt --> I N | M A R Y | I T | S E E M S
cat ref.word-checkpoint_best.pt-valid.txt --> IN MARY IT SEEMS TO ME

I kind of not understanding, what's happening here

MrityunjoyS on 25 Aug 2020

Hi @alexeib can you please look into this issue, I'm really stuck here, sorry for bothering again and again but I'm trying to run this model but constantly facing some errors, always.

MrityunjoyS on 27 Aug 2020

re: viterbi decoding - seems like either your finetuning didnt work well, your dictionary is not aligned, or your input audio is in some different format than what the pre-trained models were trained on.

the language model issue has been fixed now

alexeib on 29 Aug 2020

for 1. you need to download dict.txt for the transformer lm (it is on the wav2letter github page).

Can you specify the specific link for this, I'm getting confused with so many instructions and downloadable files

the language model issue has been fixed now

I tried the CTC decoding step once again, getting below error :-

INFO:fairseq.tasks.language_modeling:dictionary: 221456 types
Traceback (most recent call last):
File "examples/speech_recognition/infer.py", line 428, in
cli_main()
File "examples/speech_recognition/infer.py", line 424, in cli_main
main(args)
File "examples/speech_recognition/infer.py", line 300, in main
generator = build_generator(args)
File "examples/speech_recognition/infer.py", line 296, in build_generator
return W2lFairseqLMDecoder(args, task.target_dictionary)
File "/path/fairseq/examples/speech_recognition/w2l_decoder.py", line 384, in __init__
_, score = self.lm.score(start_state, word_idx, no_cache=True)
File "/path/fairseq/examples/speech_recognition/w2l_decoder.py", line 307, in score
outstate = state.child(token_index)
AttributeError: 'object' object has no attribute 'child'

MrityunjoyS on 29 Aug 2020

👍1

@MrityunjoyS I was having the same AttributeError: 'object' object has no attribute 'child' that you are describing and managed to fix it. For me, the issue was because w2l_decoder.py tries to import LexiconFreeDecoder from wav2letter, but the wav2vec Python bindings don't currently support LexiconFreeDecoder. This was causing the try/except at the top of the script to treat 'state' as a plain Object, and as such 'state' has no attribute child.

According to the wav2letter repo, you can rebuild the python bindings with LexiconFreeDecoder (see this issue). Personally I just commented out that import statement at the top of w2l_decoder.py instead as I'm not going to use it anyway. I hope that can help you too!

maxdh on 26 Oct 2020

thanks for finding the root cause @maxdh
ill put a try/except around that import

alexeib on 26 Oct 2020

No worries!
So there is already an except clause around the import, the problem for me was that the except clause contains, namely the statement on line 44: