I'm trying to do 'Evaluating a CTC model' step :-
Running below command --
PYTHONPATH=/path/fairseq/ python3 examples/speech_recognition/infer.py /path/audio_file/wav2vec/ --task audio_pretraining \
--nbest 1 --path /path/model_exportdir1/checkpoint_best.pt --gen-subset valid --results-path /path/audio_file/wav2vec/tmp/am/ --w2l-decoder kenlm \
--lm-model /path/audio_file/wav2vec/lm_librispeech_word_transformer.pt --lexicon=/path/audio_file/wav2vec/librispeech_lexicon.lst --lm-weight 2 --word-score -1 --sil-weight 0 --criterion ctc --labels ltr --max-tokens 4000000 \
--post-process letter
Getting below errors --
INFO:fairseq.data.audio.raw_audio_dataset:loaded 2674, skipped 0 samples
INFO:__main__:| /path/audio_file/wav2vec/ train 2674 examples
INFO:__main__:| decoding with criterion ctc
INFO:__main__:| loading model(s) from /path/model_exportdir1/checkpoint_best.pt
Loading the LM will be faster if you build a binary file.
Reading /path/audio_file/wav2vec/lm_librispeech_word_transformer.pt
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
Traceback (most recent call last):
File "examples/speech_recognition/infer.py", line 428, in
cli_main()
File "examples/speech_recognition/infer.py", line 424, in cli_main
main(args)
File "examples/speech_recognition/infer.py", line 300, in main
generator = build_generator(args)
File "examples/speech_recognition/infer.py", line 292, in build_generator
return W2lKenLMDecoder(args, task.target_dictionary)
File "/path/fairseq/examples/speech_recognition/w2l_decoder.py", line 141, in __init__
self.lm = KenLM(args.kenlm_model, self.word_dict)
RuntimeError
Can't get what I'm missing here
if you want to use a transformer lm, then you need to set the decoder type to "fairseqlm" not kenlm
Ok, so as you suggested, I tried both ways :-
PYTHONPATH=/path/fairseq/ python3 examples/speech_recognition/infer.py /path/audio_file/wav2vec/ --task audio_pretraining \
--nbest 1 --path /path/model_exportdir1/checkpoint_best.pt --gen-subset valid --results-path /path/audio_file/wav2vec/tmp/am/ --w2l-decoder fairseqlm \
--lm-model /path/audio_file/wav2vec/lm_librispeech_word_transformer.pt --lexicon=/path/audio_file/wav2vec/librispeech_lexicon.lst --lm-weight 2 --word-score -1 --sil-weight 0 --criterion ctc --labels ltr --max-tokens 4000000 \
--post-process letter
Getting error -->
File "/path/fairseq/fairseq/file_io.py", line 51, in open
newline=newline,
FileNotFoundError: [Errno 2] No such file or directory: '/path/audio_file/wav2vec/dict.txt'
What's this 'dict.txt' file the model is looking for ?
PYTHONPATH=/path/fairseq/ python3 examples/speech_recognition/infer.py /path/audio_file/wav2vec/ --task audio_pretraining \
--nbest 1 --path /path/model_exportdir1/checkpoint_best.pt --gen-subset valid --results-path /path/audio_file/wav2vec/tmp/am/ --w2l-decoder kenlm \
--lexicon=/path/audio_file/wav2vec/librispeech_lexicon.lst --lm-weight 2 --word-score -1 --sil-weight 0 --criterion ctc --labels ltr --max-tokens 4000000 \
--post-process letter
Getting error -->
line 141, in __init__
self.lm = KenLM(args.kenlm_model, self.word_dict)
TypeError: __init__(): incompatible constructor arguments. The following argument types are supported:
1. wav2letter._decoder.KenLM(path: str, usr_token_dict: wav2letter._common.Dictionary)
Can you sugggest something on the above ? Thank you.
for 1. you need to download dict.txt for the transformer lm (it is on the wav2letter github page). dont forget to uppercase the dictionary. we use uppercase targets while they use lowercase.
for 2. you need to specify a kenlm model. you can download the official librispeech model and use that. here is the link: http://www.openslr.org/11/
you will need to binarize it (read kenlm docs)
if you dont use a language model then use --w2l-decoder viterbi
So below is error is coming now :-
File "/path/fairseq/fairseq/models/transformer_lm.py", line 134, in build_model
if args.decoder_layers_to_keep:
AttributeError: 'Namespace' object has no attribute 'decoder_layers_to_keep'
I tried doing that, but the output that I'm getting is absurd, and I can't understand it.
File name created -->
_hypo.units-checkpoint_best.pt-valid.txt
hypo.word-checkpoint_best.pt-valid.txt
ref.units-checkpoint_best.pt-valid.txt
ref.word-checkpoint_best.pt-valid.txt_
Eg. of how this file looks :--
cat hypo.units-checkpoint_best.pt-valid.txt --> F | F N Y C N N N H
N | N R N G
cat hypo.word-checkpoint_best.pt-valid.txt --> FNYCNNNHN NRNG
cat ref.units-checkpoint_best.pt-valid.txt --> I N | M A R Y | I T | S E E M S
cat ref.word-checkpoint_best.pt-valid.txt --> IN MARY IT SEEMS TO ME
I kind of not understanding, what's happening here
Hi @alexeib can you please look into this issue, I'm really stuck here, sorry for bothering again and again but I'm trying to run this model but constantly facing some errors, always.
re: viterbi decoding - seems like either your finetuning didnt work well, your dictionary is not aligned, or your input audio is in some different format than what the pre-trained models were trained on.
the language model issue has been fixed now
for 1. you need to download dict.txt for the transformer lm (it is on the wav2letter github page).
Can you specify the specific link for this, I'm getting confused with so many instructions and downloadable files
the language model issue has been fixed now
I tried the CTC decoding step once again, getting below error :-
INFO:fairseq.tasks.language_modeling:dictionary: 221456 types
Traceback (most recent call last):
File "examples/speech_recognition/infer.py", line 428, in
cli_main()
File "examples/speech_recognition/infer.py", line 424, in cli_main
main(args)
File "examples/speech_recognition/infer.py", line 300, in main
generator = build_generator(args)
File "examples/speech_recognition/infer.py", line 296, in build_generator
return W2lFairseqLMDecoder(args, task.target_dictionary)
File "/path/fairseq/examples/speech_recognition/w2l_decoder.py", line 384, in __init__
_, score = self.lm.score(start_state, word_idx, no_cache=True)
File "/path/fairseq/examples/speech_recognition/w2l_decoder.py", line 307, in score
outstate = state.child(token_index)
AttributeError: 'object' object has no attribute 'child'
@MrityunjoyS I was having the same AttributeError: 'object' object has no attribute 'child' that you are describing and managed to fix it. For me, the issue was because w2l_decoder.py tries to import LexiconFreeDecoder from wav2letter, but the wav2vec Python bindings don't currently support LexiconFreeDecoder. This was causing the try/except at the top of the script to treat 'state' as a plain Object, and as such 'state' has no attribute child.
According to the wav2letter repo, you can rebuild the python bindings with LexiconFreeDecoder (see this issue). Personally I just commented out that import statement at the top of w2l_decoder.py instead as I'm not going to use it anyway. I hope that can help you too!
thanks for finding the root cause @maxdh
ill put a try/except around that import
No worries!
So there is already an except clause around the import, the problem for me was that the except clause contains, namely the statement on line 44:
LMState = object
The result of that is that the LMState import from wav2letter.decoder is not used, and so the start() function in w2l_decoder.py just returns a plain Object (LMState), causing the following line (309) to fail as Object has no attributes at all.
outstate = state.child(token_index)
Most helpful comment
No worries!
So there is already an except clause around the import, the problem for me was that the except clause contains, namely the statement on line 44:
The result of that is that the LMState import from wav2letter.decoder is not used, and so the
start()function in w2l_decoder.py just returns a plainObject(LMState), causing the following line (309) to fail asObjecthas no attributes at all.