Wav2letter: AM not saving while running streaming_convnets

Created on 31 Mar 2020  路  23Comments  路  Source: flashlight/wav2letter

Hi,
I have followed the readme.md in stream convnets folder, it is running successfully, but it isn't saving the model after every epoch, nor is it throwing any of the model stats after every epoch, like how it generally comes (tried the recipe in tutorials/1-libri)
Each epoch finishes almost instantly, the logs look something like this... can you tell me what im doing wrong.
I had faced a similar issue when I tried the sota resnet example recipe, where after editing the right path locations in the cfg file, it doesn't throw any error and trains instantly without saving any model (.bin file) ,
I0331 09:08:02.674365 30153 Train.cpp:538] Epoch 999998 started!
I0331 09:08:02.674376 30153 Train.cpp:531] Shuffling trainset
I0331 09:08:02.674386 30153 Train.cpp:538] Epoch 999999 started!
I0331 09:08:02.674399 30153 Train.cpp:531] Shuffling trainset
I0331 09:08:02.674408 30153 Train.cpp:538] Epoch 1000000 started!
I0331 09:08:02.674418 30153 Train.cpp:689] Finished training
Thanks

All 23 comments

what did you use as reportiters ? How big is your dataset ? If reportiters is set to something way too big for your dataset, it will behave like above. To see if that is the case, set reportiters to 0

Hey @joazoa , thanks for your reply... Yup, the default was set to 1000, now I'm able to see the train stats at every epoch, after setting the reportiters flag to 0.
My data set is 100 hours of librispeech,
But the main issue still remains... each epoch finishes in less than a second, and no AM gets saved ...
I0404 08:21:13.663045 24958 Train.cpp:340] epoch: 1 | nupdates: 0 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:00:00 | bch(ms): 0.00 | smp(ms): 0.00 | fwd(ms): 0.00 | crit-fwd(ms): 0.00 | bwd(ms): 0.00 | optim(ms): 0.00 | loss: 0.00000 | train-TER: 0.00 | train-WER: 0.00 | /home/developer2/w2l/lists/dev-clean.lst-loss: 0.00000 | /home/developer2/w2l/lists/dev-clean.lst-TER: 0.00 | /home/developer2/w2l/lists/dev-clean.lst-WER: 0.00 | avg-isz: 000 | avg-tsz: 000 | max-tsz: 000 | hrs: 0.00 | thrpt(sec/sec): n/a
I0404 08:21:14.954645 24958 Train.cpp:555] Shuffling trainset
no data gets loaded up, it says hrs: 0.00,
I'm unable to figure out why thats so, would u say theres an issue with the .lst file, I had given the same lst for preparing the data, using the
python3 ../../utilities/prepare_librispeech_wp_and_official_lexicon.py --data_dst [...] --model_dst [...] --nbest 10 --wp 10000
command given in their readme.md

Did your whole training set get filtered? Are your lexicon/tokens maybe broken?

I had just followed the readme.md, and used whatever token/lexicon generated by the dataprep code. I shall try to look more into this, but if you have any leads on identifying broken tokens/lexicon, please do let me know.
Thanks

I0410 19:11:42.050197 10276 Train.cpp:250] [Network Params: 93568814]
I0410 19:11:42.050231 10276 Train.cpp:251] [Criterion] ConnectionistTemporalClassificationCriterion
I0410 19:11:42.050297 10276 Train.cpp:259] [Network Optimizer] SGD
I0410 19:11:42.050307 10276 Train.cpp:260] [Criterion Optimizer] SGD
I0410 19:11:55.881227 10276 W2lListFilesDataset.cpp:141] 527188 files found.
I0410 19:11:55.905663 10276 Utils.cpp:102] Filtered 527188/527188 samples
I0410 19:11:55.905714 10276 W2lListFilesDataset.cpp:62] Total batches (i.e. iters): 0
I0410 19:11:58.983441 10276 W2lListFilesDataset.cpp:141] 116208 files found.
I0410 19:11:58.988852 10276 Utils.cpp:102] Filtered 116208/116208 samples
I0410 19:11:58.988893 10276 W2lListFilesDataset.cpp:62] Total batches (i.e. iters): 0
I0410 19:11:58.989362 10276 Train.cpp:557] Shuffling trainset
I0410 19:11:58.989430 10276 Train.cpp:564] Epoch 1 started!
I0410 19:11:59.232162 10276 Train.cpp:342] epoch: 1 | nupdates: 0 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:00:00 | bch(ms): 0.00 | smp(ms): 0.00 | fwd(ms): 0.00 | crit-fwd(ms): 0.00 | bwd(ms): 0.00 | optim(ms): 0.00 | loss: 0.00000 | train-TER: 0.00 | train-WER: 0.00 | dev-clean-loss: 0.00000 | dev-clean-TER: 0.00 | dev-clean-WER: 0.00 | avg-isz: 000 | avg-tsz: 000 | max-tsz: 000 | hrs: 0.00 | thrpt(sec/sec): n/a
Hey, I have verified.. there seems to be no issues with the tokens.txt and the lexicon.txt outputted by the dataprep code, it does'nt seem to be saving the AM.
I'm facing some doubts in this particular line...
I0410 19:11:55.881227 10276 W2lListFilesDataset.cpp:141] 527188 files found.
But,
I0410 19:11:55.905714 10276 W2lListFilesDataset.cpp:62] Total batches (i.e. iters): 0
return total batches as zero, can someone help me understand this better, I have used the same lst file for other models and it seems to work fine, but not for stream conv nets.
Thanks

I0410 19:11:58.988852 10276 Utils.cpp:102] Filtered 116208/116208 samples

You need to check the duration of your samples in the lst, as well as the final token length (after looking at lexicon) against min/max isz and tsz flags.

Hey @lunixbochs ,
What do you mean by final token length here, and isz and tsz flags are as follows:
--minisz=200
--mintsz=2
--maxisz=33000
as given in the default recipe, https://github.com/facebookresearch/wav2letter/blob/master/recipes/models/streaming_convnets/librispeech/train_am_500ms_future_context.cfg
Thanks

@adamchant What are the sample duration you specified in the list file?

Can you post the head of your list file?

Hey @tlikhomanenko , I am using the sox function ( sox.file_info.duration(audio_path)) ), to get the sample duration, one of the lines in my lst file looks like
eng_asr_audio_102 /home/developer/speech_corpus/english_asr_corpus/train/audio_files/eng_asr_audio_102.wav 4.32 this is one of the most pristine and beautiful parts of the world

the value needs to be in ms, instead of 4.32, it probably should say 4320

Hey @joazoa , Is it for this model specifically, I ran the convglu example with the same lst, and it seems to be working there

All your files are filtered. You put a minimum input size of 200, this line with 4.32 is filtered out as its too small.

Yep, in the list file you should put duration in milliseconds as pointed by @joazoa.

--minisz=200
--mintsz=2
--maxisz=33000

With this flags all audio with less than 200 milliseconds (the value we read from the list file) will be filtered. In case of convglu recipe there was no additional filtering and minisz=0, so all your files where you put duration in seconds will be used.

Just fix your list file by multiplying sox output on 1000.

Oh, yes.... thanks a lot for this, I apologize for not going through the flags documentation before posting this issue... Thanks again

Hey, I had one small issue with the training of stream_convnets, I have followed the data prep code, and the tokens are the sub words of unigrams of the vocabulary. I am facing issues like,
Skipping unknown entry :
Falling back to using letters as targets for the unknown word: ,
and this keeps repeating for almost all the words, when I try to train.
I was getting not in dictionary error when using the subword token list, so i appended all the words to tokens.txt, do I have to make some suitable changes to the lexicon as well, that consist of this new tokens (the word tokens.?) and is this the right way to go about it..?
Thanks

The lexicon file is the one that is missing entries.
When you add them to the lexicon, you may miss tokens for those new words though.
It's best to regenerate both, i can recommend the wpiece tool from @lunixbochs wav2train to do so.
It will work with the warnings as well, it's not really an issue, but if its a lot of warnings, it will probably slow down training.

Hey @joazoa , so I have re created the tokens and lexicon file using the wav2train tool kit, im getting similar warnings,
Skipping unknown entry :
Falling back to using letters as targets for the unknown word:
I've ignored the warnnigs to see how long it takes for an epoch,
So, my dataset is around 720 hours of train and 80 odd hours of test,
and this is the log that ive got...
epoch: 1 | nupdates: 1000 | lr: 0.050000 | lrcriterion: 0.000000 | runtime: 00:07:10 | bch(ms): 430.76 | smp(ms): 204.57 | fwd(ms): 72.98 | crit-fwd(ms): 9.22 | bwd(ms): 125.23 | optim(ms): 26.05 | loss: 40.11172 | train-TER: 154.52 | train-WER: 137.07 | dev-clean-loss: 26.06268 | dev-clean-TER: 100.00 | dev-clean-WER: 100.00 | avg-isz: 495 | avg-tsz: 015 | max-tsz: 028 | hrs: 11.01 | thrpt(sec/sec): 92.05
But, whys it say hrs: 11.01, and it trained rather quickly than i expected (in around one and half hours)... is it skipping most of the audio files because of the warning...?
I had used the lexicon ( simple charachter space saperated lexicon )and tokens (a-z) file from the tutorials experiment, and it seemed to take 6-7 hours per epoch. This was the log I was getting in that scenario
epoch: 10 | nupdates: 658990 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 07:32:55 | bch(ms): 412.38 | smp(ms): 221.96 | fwd(ms): 55.30 | crit-fwd(ms): 1.31 | bwd(ms): 113.29 | optim(ms): 21.56 | loss: 12.38181 | train-TER: 47.41 | train-WER: 73.07 | dev-clean-loss: 9.22206 | dev-clean-TER: 27.70 | dev-clean-WER: 54.54 | avg-isz: 494 | avg-tsz: 069 | max-tsz: 218 | hrs: 724.65 | thrpt(sec/sec): 96.00
Thanks

No it's not filtering because of the unknown word, its just using letters instead of wordpieces. It would still work.
Is this really a full epoch ? If so, something is wrong as it shows only 11 hours and I would expect it to say 720h there, it only used 11 hours of the training set for this interval
Is your reportiters set to 0 ?
The other option is that the audio gets filtered, but you should see that here: I0410 19:11:58.988852 10276 Utils.cpp:102] Filtered 116208/116208 samples

Nope, my report iters was at 1000, I have rerun it with reportiters = 0
Also, the audio filter log at the start of the epoch looks like this
I0416 11:31:09.399603 18374 W2lListFilesDataset.cpp:141] 116208 files found.
I0416 11:31:09.407313 18374 Utils.cpp:102] Filtered 252/116208 samples
I0416 11:31:09.481977 18374 W2lListFilesDataset.cpp:62] Total batches (i.e. iters): 14495
I0416 11:31:09.487402 18374 Train.cpp:557] Shuffling trainset
I0416 11:31:09.527410 18374 Train.cpp:564] Epoch 1 started

Got a bunch of
Falling back to using letters as targets for the unknown word: report's
Falling back to using letters as targets for the unknown word: report's
warnings before that, does it have something to do with
I0416 11:31:09.407313 18374 Utils.cpp:102] Filtered 252/116208 samples

No this just excluded 252 of the 115208 files.

Also, will i get good results if I use the lexicon ( simple charachter space saperated lexicon )and tokens (a-z) file from the tutorials experiment...it seems to train normally without the warnings, but the wer is reducing way to slowly...

I cannot predict the results, haven't tried it with only letters.

Hi all,

About Falling back to using letters as targets for the unknown word: - this just means that some word is absent in the lexicon so the w2l doesn't know how to convert it to the sequence of tokens and we just use letters sequence for this word (all letters should be your tokens set, otherwise there will be an exception, but in case of word pieces all letter are often included inside tokens set). All training will go normally, you can think about it as a mix of letters and word-pieces (letter is also word piece).

From our practice and recent papers, word pieces are working better than letters. But if you have small number of data (let's say < 100h) letters could be better than word pieces (because word pieces can be screwed up for small data).

Was this page helpful?
0 / 5 - 0 ratings