Wav2letter: Inference using trained TDS seq2seq model.

Created on 24 Jun 2020 · 7Comments · Source: flashlight/wav2letter

I have trained seq2seq AM on Hindi Devanagari data, and a kenlm on devanagari corpus. The results are satisfactory when I am Decoding.
I want to take inference using the inference docker with simple_streaming_asr_example. But I need to have a feature_extractor.bin for that. I found out in tools that a script exists to get this required file and serialize the acoustic model. I want to know if there is a way I can use inference docker for my acoustic model?

inference

Source

gopesh97

👍3

Most helpful comment

@vineelpratap thanks for taking your time out.
I have the acoustic model trained already with this training configuration.

--datadir=/home/data/                                                                                                                           
--runname=quiet_10lakh_model_1                                                                                                                  
--rundir=/home/training/                                                                                                                        
--tokensdir=/home/am/                                                                                                                           
--listdata=true                                                                                                                                 
--train=/lists/train.lst                                                                                                                        
--valid=/lists/dev.lst                                                                                                                          
--input=wav                                                                                                                                     
--arch=network.arch                                                                                                                             
--archdir=/home/                                                                                                                                
--lexicon=/home/am/devnagari-train+dev-unigram-10000-nbest10.lexicon
--tokens=devnagari-train-all-unigram-10000.tokens
--criterion=seq2seq
--lr=0.05
--lrcrit=0.05
--momentum=0.0
--stepsize=40
--gamma=0.5
--maxgradnorm=15
--mfsc=true
--dataorder=output_spiral
--inputbinsize=25
--filterbanks=80
--attention=keyvalue
--encoderdim=512
--attnWindow=softPretrain
--softwstd=4
--trainWithWindow=true
--pretrainWindow=3
--maxdecoderoutputlen=120
--usewordpiece=true
--wordseparator=_
--sampletarget=0.01
--target=ltr
--batchsize=16
--labelsmooth=0.05
--nthread=3
--memstepsize=4194304
--eostoken=true
--pcttraineval=1
--pctteacherforcing=99
--iter=200
--enable_distributed=true

I trained the acoustic model. Now I want to use it for inference like
https://github.com/facebookresearch/wav2letter/wiki/Inference-Run-Examples#quickly-run-streaming-asr-examples-using-docker

afaik I need to have the language model, acoustic model, lexicons, tokens, feature_extractor.bin, and decoding configuration json files inside the /root/host/model. Also, I am under the impression that I first need to serialize the acoustic model in order to get inference docker working on it. Currently providing these files as it is inside /root/host/model is not working. Also how to generate the feature_extractor.bin.
Also I don't want to train another AM using the recipe given above, because it already took me more than 6 weeks to get to point.
If that is not possible on current scenario , can you please suggest me an alternative way of using the inference docker along with my trained AM?
Thanks.

gopesh97 on 30 Jun 2020

👍3

All 7 comments

cc @vineelpratap @avidov

tlikhomanenko on 25 Jun 2020

Is there a way I can serialize the tds seq2seq acoustic model on my own?

gopesh97 on 29 Jun 2020

Hi, are you looking for running in streaming mode using wav2letter@anywhere ? In this case, you would need to follow recipe from https://github.com/facebookresearch/wav2letter/tree/master/recipes/models/streaming_convnets here.

If you just want to run inference, you might also want to take a look at Decode.cpp file.

vineelpratap on 30 Jun 2020

@vineelpratap thanks for taking your time out.
I have the acoustic model trained already with this training configuration.

--datadir=/home/data/                                                                                                                           
--runname=quiet_10lakh_model_1                                                                                                                  
--rundir=/home/training/                                                                                                                        
--tokensdir=/home/am/                                                                                                                           
--listdata=true                                                                                                                                 
--train=/lists/train.lst                                                                                                                        
--valid=/lists/dev.lst                                                                                                                          
--input=wav                                                                                                                                     
--arch=network.arch                                                                                                                             
--archdir=/home/                                                                                                                                
--lexicon=/home/am/devnagari-train+dev-unigram-10000-nbest10.lexicon
--tokens=devnagari-train-all-unigram-10000.tokens
--criterion=seq2seq
--lr=0.05
--lrcrit=0.05
--momentum=0.0
--stepsize=40
--gamma=0.5
--maxgradnorm=15
--mfsc=true
--dataorder=output_spiral
--inputbinsize=25
--filterbanks=80
--attention=keyvalue
--encoderdim=512
--attnWindow=softPretrain
--softwstd=4
--trainWithWindow=true
--pretrainWindow=3
--maxdecoderoutputlen=120
--usewordpiece=true
--wordseparator=_
--sampletarget=0.01
--target=ltr
--batchsize=16
--labelsmooth=0.05
--nthread=3
--memstepsize=4194304
--eostoken=true
--pcttraineval=1
--pctteacherforcing=99
--iter=200
--enable_distributed=true

I trained the acoustic model. Now I want to use it for inference like
https://github.com/facebookresearch/wav2letter/wiki/Inference-Run-Examples#quickly-run-streaming-asr-examples-using-docker

gopesh97 on 30 Jun 2020

👍3

I was also trying to port https://github.com/facebookresearch/wav2letter/tree/master/recipes/models/sota/2019 to use as an inference since these models were available. I run into the exact same scenario as @gopesh97 where the inference requires a feature_extraction.bin file. I'm just a beginner with ML and CNN, so it might be just my ignorance at play. I was also reading the per the wiki that feature extraction gets done by automagically using flags, i.e. --MTC. So the feature extraction somehow needs to be dumped into a file so it can be used by an inference setup.

hovanessb on 10 Jul 2020

@hovanessb
If you're using their model, then you may get the required files from here:
https://github.com/facebookresearch/wav2letter/wiki/Inference-Run-Examples#download-the-example-trained-models-from-aws-s3
You can also build tools and then get the required files and serialise your am from here : https://github.com/facebookresearch/wav2letter/tree/master/tools#using-the-pipeline-1

gopesh97 on 10 Jul 2020

@gopesh97 Thanks, that was exactly what I was looking for.

hovanessb on 10 Jul 2020

Was this page helpful?

0 / 5 - 0 ratings